Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Active Learning #423

Open
kyledmiller opened this issue Feb 28, 2023 · 0 comments
Open

Implement Active Learning #423

kyledmiller opened this issue Feb 28, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request to do

Comments

@kyledmiller
Copy link
Contributor

kyledmiller commented Feb 28, 2023

Addressed by PR #431

What

Pool-based sampling to train in increments, selectively feeding the network only the worst-predicted examples from the training set.

Workflow

  1. Train on N0 training examples.
  2. While there are unseen examples left and validation loss is above some threshold:
    • Predict on unseen training examples.
    • Train on N unseen training examples with largest loss

Why

  1. LDOS data is plentiful.
  2. LDOS data contains lots of redundancy.
  3. Prediction is relatively cheap.

Pool-based sampling would help our network focus training on atomic interactions which are rare and/or difficult to capture while avoiding training on as much redundant data. Since prediction is relatively cheap for neural networks, the additional time complexity from selecting the worst-performing examples should be more than recouped by the reduction in redundant training.

Anticipated Issues or Questions

  1. How can we efficiently reload each new training/testing data batch?
  2. Do we need to train on the cumulative seen dataset each time (to avoid forgetting old knowledge) or just the newest selection of badly-predicted training examples (to maximize speed)? Or maybe something in between?
@kyledmiller kyledmiller added enhancement New feature or request to do labels Feb 28, 2023
@kyledmiller kyledmiller self-assigned this Feb 28, 2023
@kyledmiller kyledmiller linked a pull request Mar 18, 2023 that will close this issue
@kyledmiller kyledmiller removed a link to a pull request Mar 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request to do
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant