-
Notifications
You must be signed in to change notification settings - Fork 1.9k
olgavrou edited this page Jan 4, 2023
·
11 revisions
This is a living document that will be updated as questions are asked and answered.
The --csoaa_ldf
and --wap_ldf
modes use label dependent features, which allow you to specify a dynamic set of labels on each example.
See here
See tutorials page here
MTR stands for Multi Task Regression and more information can be found in the CB Bakeoff paper.
The reason it is set as the default is that in an action dependent features settings, the update rule is usually more efficient relative to IPS/DR:
- In MTR, only the weights of the chosen action are updated (rather than updating all weights assuming 0 reward for non-chosen actions for IPS)
- In MTR, the propensity (i.e., the probability of the chosen action) is directly used in the regression cost formula as a weight rather than only be used in the estimator of loss (in the CB Bakeoff paper, compare eq. 6 for MTR versus eq. 5 where the estimator of the loss is in eq. 3 for IPS and in eq. 4 for DR).
I have historical data without probabilities, can I estimate performance of contextual bandits offline?
When the propensities are not available there are two options:
- No randomization is performed online, hence each action is taken with probability 1. In this case, offline estimation of the performance of CB (or any other offline algorithm) cannot be done reliably, incurring also possibly obtaining very bias and wrong estimates (e.g., the No Unknown Confounder assumption may not hold). One option is to start implementing an A/B test where you randomize your campaigns. There is a relevant discussion here
- Randomization is performed but the propensity is not known. In this case, one could use offline experimentation estimators that do not use the propensity. One such estimator is the Direct Method estimator. In our empirical evidence this option is usually not very data efficient since the DM estimator would still need a large offline dataset to provide small confidence intervals, and it is also prone to very difficult to spot estimation errors due to bugs in the data collection pipeline. When possible, we suggest collecting data with data pipelines that include logging the propensity, as done in Azure Personalizer.
See documentation here
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: