Interaction Grounded Learning

What is IGL

Reward is essential in Reinforcement Learning, but crafting the reward function can be complex and laborious. A fixed reward function may work well for one type of user group and not able to be generalized to various user groups. Furthermore, it requires more effort to maintain as system evolving.

IGL is an algorithm that discovers a personalized reward function automatically based on user's context and feedback. It's particularly useful for interactive learning applications such as recommender systems.

How it works

blog post
IGL trains two models inside the reduction - a reward decoder model and a cb model. The reward decoder model is based on inverse kinematics strategy. It is trained with context features and feedback features and predict the distribution over actions. Anytime the posterior probability of an action is predicted to be more than twice the prior probability, we deduce $r \ne 0$ and calculate the cost.

How to use it with VW

Example format

IGL uses the same dsjson example format with CB. The only difference is instead of using a numeric reward, using json format for feedback features.

If there's an extreme negative feedback signal, you should label it with "_definitely_bad". This step is optional.

e.g:

{
  "c": ...,
  "o": [
    {
      "v": {
        "feedback": "dislike",
        ....
      },
      "_definitely_bad": true
    }
  ],
}

Available options

Run --cb_explore_adf --experimental_igl to enable this reduction.

Learn more:

Paper: https://arxiv.org/abs/2211.15823

Provide feedback

Saved searches

Use saved searches to filter your results more quickly