-
Notifications
You must be signed in to change notification settings - Fork 1.9k
CB with Graph Feedback
What: Contextual Bandits (CB) with graph feedback can be used for scenarios where some actions, when taken, reveal information other actions (not taken), or maybe don't reveal any information at all. If there exists prior knowledge of this relationship between actions then that knowledge can be used to make exploration and learning more efficient.
CB with graph feedback is a constraint optimization problem where regret is minimized while maximizing the information gathering (less exploration waste) by using the graph feedback leads.
Publication: https://arxiv.org/abs/2302.08631
Some scenarios include spam filtering (aka apple tasting), first-price auction bidding, and inventory.
Input is the same as cb_explore_adf
input for Vowpal Wabbit but it now includes the graph. The other change is that if an action is taken and it reveals information about other actions, all actions influenced can have a label, thus accelerating learning.
This tutorial here is a good example of how to build VW examples, especially the section on understanding the format.
The shared
example would have to change to accommodate the graph. The graph is supplied as an NxN
matrix, where N
is the number of actions in the input. Each cell represents the probability of action in the row affecting the action in the column. The graph is supplied in coordinate format (row, col, val) and only the non-zero values need to be specified.
for example, the graph below
0 1
0 1
indicates that taking action 0 will reveal no information either for itself nor for action 1, where as action 1 will reveal information for both actions (e.g. categorizing an email as spam vs non spam).
This is represented in the Vowpal Wabbit input as:
- for text input:
shared graph 0,1,1 1,1,1 | <shared features>
- for json input:
"_graph" : [{"row": 0, "col": 1, "val": 1}, {"row": 1, "col": 1, "val": 1}]
In the above example, if action 1 is taken then we can have a cost function that determines, given the features, what is the cost of taking action 1 on action 1, and what is the cost of taking action 1 on action 0. We can set a label with action:cost:probability
tuple for both actions when learning. The important thing to get right here is the cost
of each action and what information is revealed about it when an action is taken. action
can be set to 0
as it will be ignored and probability
can be set to 1
.
-
--cb_explore_adf --graph_feedback
activates the algorithm -
gamma_scale
defaults to0.5
andgamma_exponent
defaults to1.0
, values are used to setgamma=[gamma_scale]*[num examples]^[gamma_exponent]
-
gamma
will increase as the number of examples consumed grows: a small gamma means the resulting probability distribution over the available actions will be influenced more heavily influenced by the graph, and as gamma increases then the distribution will be less influenced by the graph and more by the learned policy
-
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: