-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Warm Starting Contextual Bandits
VW has a --warm_cb
reduction that simulates warm-starting contextual bandits learning. In this setting, the learner is given a set of warm-start supervised learning examples to help with contextual bandit learning. With the help of these additional warm-start examples, the learner is able to achieve a smaller cost in the interaction stage.
The learning process consists of two stages:
- Warm-start: The learner receives warm-start examples and updates its model accordingly.
- Interaction: The learner starts with the warm-started model, perform online contextual bandit learning, alternating between prediction and update.
The performance of the learner is measured by its cost incurred in the interaction stage.
Warm CB has several options, of particular interest are:
-
--warm_start x
wherex
specifies the number of warm start examples -
--interaction y
wherey
is the length of the interaction stage
Warm CB by default uses examples with multiclass labels. It can also be used with cost sensitive examples, see the section on that.
Suppose we have text_highnoise_m.vw, a dataset of 10-class multiclass examples in VW format.
We can run:
./vw --warm_cb 10 --cb_explore_adf --cb_type mtr --epsilon 0.05 --warm_start 10 --interaction 1000 --warm_start_update --interaction_update -d text_highnoise_m.vw
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = text_highnoise_m.vw
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.000000 0.000000 11 1.0 4 4 101
0.000000 0.000000 12 2.0 6 6 101
0.000000 0.000000 14 4.0 6 6 101
0.125000 0.250000 18 8.0 8 5 101
0.125000 0.125000 26 16.0 8 8 101
0.125000 0.125000 42 32.0 10 10 101
0.093750 0.062500 74 64.0 9 6 101
0.117188 0.140625 138 128.0 5 5 101
0.132812 0.148438 266 256.0 9 3 101
0.134766 0.136719 522 512.0 6 6 101
finished run
number of examples = 10000
weighted example sum = 1000.000000
weighted label sum = 0.000000
average loss = 0.132000
total feature number = 1010000
average variance estimate = 10.795802
theoretical average variance = 200.000000
last lambda chosen = 0.500000 among lambdas ranging from 0.500000 to 0.500000
Note that the VW output has the same doubling schedule; however, we only count the example weight, the average loss, and the loss since last checkpoint in the interaction stage. (The "example counter" starts from 10 though - this is because we processed the first 10 warm start examples before the interaction stage.)
We can also run the above command without the --warm_start_update
option, which essentially skips the warm start examples and perform contextual bandit learning directly:
./vw --warm_cb 10 --cb_explore_adf --cb_type mtr --epsilon 0.05 --warm_start 10 --interaction 1000 --interaction_update -d text_highnoise_m.vw
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = text_highnoise_m.vw
num sources = 1
average since example example current current current
loss last counter weight label predict features
1.000000 1.000000 11 1.0 4 5 101
1.000000 1.000000 12 2.0 6 9 101
1.000000 1.000000 14 4.0 6 7 101
1.000000 1.000000 18 8.0 8 6 101
0.875000 0.750000 26 16.0 8 1 101
0.937500 1.000000 42 32.0 10 9 101
0.859375 0.781250 74 64.0 9 6 101
0.687500 0.515625 138 128.0 5 5 101
0.453125 0.218750 266 256.0 9 9 101
0.283203 0.113281 522 512.0 6 6 101
finished run
number of examples = 10000
weighted example sum = 1000.000000
weighted label sum = 0.000000
average loss = 0.166000
total feature number = 1010000
average variance estimate = 22.381275
theoretical average variance = 200.000000
last lambda chosen = 1.000000 among lambdas ranging from 1.000000 to 1.000000
Another extreme is to train a model purely based on the warm start examples, and use the model in the interaction stage, with the data collected in the interaction stage ignored. In this case, we can turn off exploration to minimize the exploration cost overhead.
./vw --warm_cb 10 --cb_explore_adf --cb_type mtr --epsilon 0.0 --warm_start 10 --interaction 1000 --warm_start_update -d text_highnois
e_m.vw
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = text_highnoise_m.vw
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.000000 0.000000 11 1.0 4 4 101
0.000000 0.000000 12 2.0 6 6 101
0.000000 0.000000 14 4.0 6 6 101
0.000000 0.000000 18 8.0 8 8 101
0.062500 0.125000 26 16.0 8 8 101
0.093750 0.125000 42 32.0 10 10 101
0.109375 0.125000 74 64.0 9 3 101
0.117188 0.125000 138 128.0 5 5 101
0.121094 0.125000 266 256.0 9 3 101
0.111328 0.101562 522 512.0 6 6 101
finished run
number of examples = 10000
weighted example sum = 1000.000000
weighted label sum = 0.000000
average loss = 0.108000
total feature number = 1010000
average variance estimate = 1.000000
theoretical average variance = inf
last lambda chosen = 0.000000 among lambdas ranging from 0.000000 to 0.000000
Sometimes, we would like to simulate the setting where the label distribution of the warm-start examples does not perfectly match those of the examples in the interaction stage. VW supports this by allowing to specify one of three modes of corruption, and the corruption probability (specified in --corrupt_type_warm_start
and--corrupt_prob_warm_start
). The three modes or corruptions are:
- Replace a label with one chosen uniformly-at-random;
- Replace label i with its next label ((i+1) mod K, where K is the total number of classes)
- Replace a label with a "overwriting" label l.
For example, we can add type 2 corruption with probability 0.5 on the first 10 supervised learning examples:
./vw --warm_cb 10 --cb_explore_adf --cb_type mtr --epsilon 0.05 --warm_start 10 --interaction 1000 --warm_start_update --interaction_update --corrupt_type_warm_start 2 --corrupt_prob_warm_start 0.5 -d text_highnoise_m.vw
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = text_highnoise_m.vw
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.000000 0.000000 11 1.0 4 4 101
0.500000 1.000000 12 2.0 6 1 101
0.500000 0.500000 14 4.0 6 7 101
0.500000 0.500000 18 8.0 8 5 101
0.562500 0.625000 26 16.0 8 1 101
0.593750 0.625000 42 32.0 10 1 101
0.593750 0.593750 74 64.0 9 2 101
0.632812 0.671875 138 128.0 5 5 101
0.593750 0.554688 266 256.0 9 8 101
0.529297 0.464844 522 512.0 6 10 101
finished run
number of examples = 10000
weighted example sum = 1000.000000
weighted label sum = 0.000000
average loss = 0.436000
total feature number = 1010000
average variance estimate = 15.968479
theoretical average variance = 200.000000
last lambda chosen = 0.500000 among lambdas ranging from 0.500000 to 0.500000
By default, the warm-start contextual bandit learner place equal weights on every warm-start examples and every interaction examples. It is often a good idea to use a large set of weighted combination values and perform selection on top of them, to find out the right balance between warm-start examples and the bandit examples in the interaction stage. This can be specified with the --lambda_scheme
parameter and --choices_lambda
parameter, as shown in the example below.
./vw --warm_cb 10 --cb_explore_adf --cb_type mtr --epsilon 0.05 --warm_start 10 --interaction 1000 --warm_start_update --interaction_update --corrupt_type_warm_start 2 --corrupt_prob_warm_start 0.5 --lambda_scheme 2 --choices_lambda 2 -d text_hig
hnoise_m.vw
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = text_highnoise_m.vw
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.000000 0.000000 11 1.0 4 4 101
0.500000 1.000000 12 2.0 6 1 101
0.750000 1.000000 14 4.0 6 7 101
0.500000 0.250000 18 8.0 8 8 101
0.687500 0.875000 26 16.0 8 4 101
0.687500 0.687500 42 32.0 10 9 101
0.656250 0.625000 74 64.0 9 3 101
0.570312 0.484375 138 128.0 5 5 101
0.433594 0.296875 266 256.0 9 1 101
0.304688 0.175781 522 512.0 6 6 101
finished run
number of examples = 10000
weighted example sum = 1000.000000
weighted label sum = 0.000000
average loss = 0.177000
total feature number = 1010000
average variance estimate = 27.510756
theoretical average variance = 200.000000
last lambda chosen = 1.000000 among lambdas ranging from 0.000000 to 1.000000
Warm CB also supports the input examples being of cost-sensitive form, i.e. each example's label part is a cost vector. To accept this input format, VW needs to take an additional option --warm_cb_cs
. Here we have a dataset text_highnoise.vw, which is identical to text_highnoise_m.vw, except that each example's label is in the cost vector format. We get exactly the same result as running using text_highnoise_m.vw as input without --warm_cb_cs
option:
./vw --warm_cb 10 --cb_explore_adf --cb_type mtr --epsilon 0.05 --warm_start 10 --interaction 1000 --warm_start_update --interaction_update --warm_cb_cs -d text_highnoise.vw
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = text_highnoise.vw
num sources = 1
average since example example current current current
loss last counter weight label predict features
0.000000 0.000000 11 1.0 known 4 101
0.000000 0.000000 12 2.0 known 6 101
0.000000 0.000000 14 4.0 known 6 101
0.125000 0.250000 18 8.0 known 5 101
0.125000 0.125000 26 16.0 known 8 101
0.125000 0.125000 42 32.0 known 10 101
0.093750 0.062500 74 64.0 known 6 101
0.117188 0.140625 138 128.0 known 5 101
0.132812 0.148438 266 256.0 known 3 101
0.134766 0.136719 522 512.0 known 6 101
finished run
number of examples = 10000
weighted example sum = 1000.000000
weighted label sum = 0.000000
average loss = 0.132000
total feature number = 1010000
average variance estimate = 10.795802
theoretical average variance = 200.000000
last lambda chosen = 0.500000 among lambdas ranging from 0.500000 to 0.500000
We also include a baseline approach, named Sim-Bandit in the warm contextual bandits paper. In the warm-start stage, it performs simulation of contextual bandit learning and produces a model; then, in the interaction stage, it continues contextual bandit learning, with the model initialized as the one at the end of the warm-start stage. This under-utilizes the warm-start examples, as the algorithm only uses part of the label for every warm-start example.
./vw --warm_cb 10 --cb_explore_adf --cb_type mtr --epsilon 0.05 --warm_start 10 --interaction 1000 --warm_start_update --interaction_update -d text_highnoise_m.vw --sim_bandit
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = text_highnoise_m.vw
num sources = 1
average since example example current current current
loss last counter weight label predict features
1.000000 1.000000 11 1.0 4 9 101
1.000000 1.000000 12 2.0 6 3 101
0.750000 0.500000 14 4.0 6 6 101
0.750000 0.750000 18 8.0 8 10 101
0.750000 0.750000 26 16.0 8 1 101
0.781250 0.812500 42 32.0 10 4 101
0.765625 0.750000 74 64.0 9 10 101
0.664062 0.562500 138 128.0 5 9 101
0.457031 0.250000 266 256.0 9 9 101
0.267578 0.078125 522 512.0 6 6 101
finished run
number of examples = 10000
weighted example sum = 1000.000000
weighted label sum = 0.000000
average loss = 0.158000
total feature number = 1010000
average variance estimate = 21.936954
theoretical average variance = 200.000000
last lambda chosen = 0.500000 among lambdas ranging from 0.500000 to 0.500000
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: