Skip to content
taolei87 edited this page Mar 25, 2015 · 18 revisions

Frequently asked questions


===========


Q1: Which model type (i.e. "basic", "standard" and "full" ) should I use?

A: The short answer is "full" for best accuracy, "standard" for a good accuracy-speed trade-off, and "basic" for best speed.

The basic model type uses 1st-order features, and runs Chu-Liu-Edmond algorithm for decoding. The standard model type involves up to 3rd-order features and approximation decoding. The full model type adds two additional types of 3rd-order features, and some global features from re-ranking literature. Therefore it is the most accurate but the slowest model type.

======

Q2: How fast is a basic/standard/full model? Can I tune the speed?

A: Actual parsing speed varies depending on the sentence length and the size of the model (i.e. number of parameters). Typically, a basic model is about 2x~3x faster than a standard one, and the latter is about 2x faster than a full model.

Here are some options to obtain better parsing speed:

  1. use option "label:false" if dependency label is not required
  2. use more threads in parallel for decoding (e.g. "thread:6")
  3. for standard/full model type, change the decoding converge threshold to trade-off between speed and accuracy (e.g. "converge-test:k"; k=1 is the fastest but values in [20, 300] are more reasonable)
  4. use the basic model type (i.e. "model:basic")

As a flavor of how fast a typical model is, the table below shows the parsing speed (tokens/sec) on the CoNLL-2008 English dataset, with an average sentence length 24:

Model setting label:true label:false
basic 2,431 4,811
standard, thread:4, converge:30 1,468 2,298
full, thread:4, converge:30 896 1,154

=======

Q3: I have a development set. Can I tune the performance?

A: Currently RBGParser doesn't provide an automatic procedure for tuning parsing accuracy (e.g. UAS). If you do want to tune the parsing accuracy (UAS), try to train different models with various gamma such as {0.1, 0.3, ..., 0.9 } and tensor rank R in such as { 30, 50, 70, ... }, and use the one with best UAS on the dev set.

RBGParser can automatically tune the parsing speed for standard/full model type, by searching an optimal decoding converge threshold. If you are to train a model, add arguments "dev test-set:example.dev" to enable speed tuning. The parser will tune the converge threshold right after the training is done. If you already trained a model, you can also tune the model via:

java -classpath "bin:lib/trove.jar" -Xmx32000m \
  parser.DependencyParser \
  model-file:example.model \
  dev test-file:example.dev

This will load the model, search the optimal threshold, and over-write the model file with the optimized configuration.

The speed tuning procedure prints some information like the following:

 Tuning hill-climbing converge number on eval set...
	converge=300    UAS=0.933531
	converge=155    UAS=0.933133
	converge=80     UAS=0.933551
	converge=45     UAS=0.932753
	converge=65     UAS=0.933113
	converge=55     UAS=0.933093
	converge=50     UAS=0.933113
 final converge=50

The procedure does binary search to find a minimal converge value k with no more than 0.05% UAS decrease.

=====

Clone this wiki locally