-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Learning to Search Sub System
The learning to search sub-system deals with problems that involve jointly predicting multiple mutual dependent output variables. To use the system, a developer has to define the search space as an imperative program (or use a predefined module in VW). Detailed usage is described below.
In joint prediction problems, the goal is creating good joint predictions. As an example, consider recognizing a handwritten word where each character might be recognized in turn to understand the word. Here, it is commonly observed that exposing information from related predictions (i.e. adjacent letters) aids individual predictions. Furthermore, optimizing a joint loss function can improve the gracefulness of error recovery. Despite these advantages, it is empirically common to build independent predictors, because they are simpler to implement and faster to run. The goal of L2S sub-system is to make structured prediction algorithms as easy and fast as possible to program and to compute. Specifically, the L2S sub-system works as a compiler. A developer need only code desired test time behavior. Then, the compiler automatically translates the decoding function and labeled data into updates of the learning model, which learns to avoid compounding errors and makes a sequence of coherent decisions.
We first describe the system architecture in an abstract fashion. More details will be shown later. The system provides two library functions, PREDICT(...)
and LOSS(...)
. We explain the functionality of these two library functions using the following code for a part of speech tagger (or generic sequence labeler) under Hamming loss.
Def MYRUN(X): % for sequence tagging, X: input sequence, Y: output
1: Y ← []
2: for t = 1 to LEN(X) do
3: ref ← X[t].true_label
4: Y[t] ← PREDICT(x=examples[t], y=ref , tag=t, condition=[1:t-1])
5: LOSS(number of Y[t] != X[t].true_label)
6: return Y
The algorithm takes as input a sequence of examples (e.g., words), and predicts the meaning of each element in
turn. The i-th prediction depends on previous predictions. The function PREDICT(...)
returns individual predictions based on x while LOSS(...)
allows the declaration of an arbitrary loss for the point set of predictions. The
LOSS(...)
function and the reference y inputted to PREDICT(...)
are only used in the training phase and it has no effect in the test phase. This single library interface is sufficient for both testing and training, when augmented to include label “advice” from a training set as a reference decision (by the parameter y). This means that a developer only has to specify the desired test time behavior and gets training with minor additional decoration. As we mentioned, the underlying system works as a credit assignment compiler to translate the user-specified decoding function and labeled data into updates of the learning model. How can you learn a good PREDICT function given just an imperative program like Algorithm 1? It is essential to run the MYRUN(...)
function many times, “trying out” different versions of PREDICT(...)
to learn one that yields low LOSS(...)
. Details can be found in the paper.
For more information, please refer to the following tutorial and papers
- Tutorial: Learning to Search
- L2s System - A Credit Assignment Compiler for Joint Prediction
- L2S approach - LOLS
- L2S approach - AggreVate
- L2S approach - Dagger
- L2S approach - Searn
VW provides predefined structured prediction models for various applications, including sequential tagging, dependency parsing, and an entity-relation joint prediction task.
Please see dependency parsing demo, Entity-Relation extraction demo, and Test scripts for using the pre-defined structured prediction models.
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: