You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pool-based sampling to train in increments, selectively feeding the network only the worst-predicted examples from the training set.
Workflow
Train on N0 training examples.
While there are unseen examples left and validation loss is above some threshold:
Predict on unseen training examples.
Train on N unseen training examples with largest loss
Why
LDOS data is plentiful.
LDOS data contains lots of redundancy.
Prediction is relatively cheap.
Pool-based sampling would help our network focus training on atomic interactions which are rare and/or difficult to capture while avoiding training on as much redundant data. Since prediction is relatively cheap for neural networks, the additional time complexity from selecting the worst-performing examples should be more than recouped by the reduction in redundant training.
Anticipated Issues or Questions
How can we efficiently reload each new training/testing data batch?
Do we need to train on the cumulative seen dataset each time (to avoid forgetting old knowledge) or just the newest selection of badly-predicted training examples (to maximize speed)? Or maybe something in between?
The text was updated successfully, but these errors were encountered:
Addressed by PR #431
What
Pool-based sampling to train in increments, selectively feeding the network only the worst-predicted examples from the training set.
Workflow
Why
Pool-based sampling would help our network focus training on atomic interactions which are rare and/or difficult to capture while avoiding training on as much redundant data. Since prediction is relatively cheap for neural networks, the additional time complexity from selecting the worst-performing examples should be more than recouped by the reduction in redundant training.
Anticipated Issues or Questions
The text was updated successfully, but these errors were encountered: