Implement Active Learning #423

kyledmiller · 2023-02-28T17:54:52Z

Addressed by PR #431

What

Pool-based sampling to train in increments, selectively feeding the network only the worst-predicted examples from the training set.

Workflow

Train on N0 training examples.
While there are unseen examples left and validation loss is above some threshold:
- Predict on unseen training examples.
- Train on N unseen training examples with largest loss

Why

LDOS data is plentiful.
LDOS data contains lots of redundancy.
Prediction is relatively cheap.

Pool-based sampling would help our network focus training on atomic interactions which are rare and/or difficult to capture while avoiding training on as much redundant data. Since prediction is relatively cheap for neural networks, the additional time complexity from selecting the worst-performing examples should be more than recouped by the reduction in redundant training.

Anticipated Issues or Questions

How can we efficiently reload each new training/testing data batch?
Do we need to train on the cumulative seen dataset each time (to avoid forgetting old knowledge) or just the newest selection of badly-predicted training examples (to maximize speed)? Or maybe something in between?

kyledmiller added enhancement New feature or request to do labels Feb 28, 2023

kyledmiller self-assigned this Feb 28, 2023

kyledmiller mentioned this issue Feb 28, 2023

Active learning #424

Closed

kyledmiller linked a pull request Mar 18, 2023 that will close this issue

Active learning #424

Closed

kyledmiller removed a link to a pull request Mar 18, 2023

Active learning #424

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Active Learning #423

Implement Active Learning #423

kyledmiller commented Feb 28, 2023 •

edited

Loading

Implement Active Learning #423

Implement Active Learning #423

Comments

kyledmiller commented Feb 28, 2023 • edited Loading

What

Why

Anticipated Issues or Questions

kyledmiller commented Feb 28, 2023 •

edited

Loading