-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Epsilon Decay (Nonstationary Exploration)
Note: this reduction is experimental and is subject to change
The goal of the Epsilon Decay reduction is to gradually taper off exploration in a well understood setting, while being able to increase epsilon (exploration rate) again if there is a change in the distribution of the input data (nonstationary). In some cases a fixed epsilon can be wasteful in long-term stationary settings by making exploratory predictions which are not necessary for improving the model. If these settings become nonstationary however, it is important that epsilon can quickly be increased to learn the new setting.
This reduction defines epsilon as a function of the number of examples that have been processed in a given model with the following formula:
Similar to the Automl algorithm, Epsilon Decay will run multiple models in parallel where one model (the champion) is used for predictions (the policy), and the other models (challengers) are periodically compared with the champion, and swapped into the champion position if they prove superior. Models are compared against each other using confidence sequence estimators which return confidence intervals with a
Epsilon Decay runs multiple models in parallel which vary by the number of examples that each has processed (scope). The idea is that the champion will have processed the largest number of examples (and thus have the lowest epsilon based on the formula above), while the challengers will have processed a smaller number of examples (and thus have larger epsilons). If there are
Consider a scenario where
Each of the challengers is compared to the champion as each new example is processed. If a challenger has a greater lower_bound than the champions upper_bound, then that challenger will overtake the champion, and all challengers below that challenger will be promoted, and all challengers above it (and the champion) will be removed. Say for example we have 5 models
When comparing two models, it is important that the estimators being compared have processed the same scope of examples. For this reason, each model will actually have multiple estimators of different scopes so they can be compared against each other. More specifically, if there are
Where
Where
Here models
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: