-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Interaction Grounded Learning
Reward is essential in Reinforcement Learning, but crafting the reward function can be complex and laborious. A fixed reward function may work well for one type of user group and not able to be generalized to various user groups. Furthermore, it requires more effort to maintain as system evolving.
IGL is an algorithm that discovers a personalized reward function automatically based on user's context and feedback. It's particularly useful for interactive learning applications such as recommender systems.
- blog post
- IGL trains two models inside the reduction - a reward decoder model and a cb model. The reward decoder model is based on inverse kinematics strategy. It is trained with context features and feedback features and predict the distribution over actions. Anytime the posterior probability of an action is predicted to be more than twice the prior probability, we deduce
$r \ne 0$ and calculate the cost.
IGL uses the same dsjson example format with CB. The only difference is instead of using a numeric reward, using json format for feedback features.
If there's an extreme negative feedback signal, you should label it with "_definitely_bad". This step is optional.
e.g:
{
"c": ...,
"o": [
{
"v": {
"feedback": "dislike",
....
},
"_definitely_bad": true
}
],
}
Run --cb_explore_adf --experimental_igl
to enable this reduction.
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: