Quick Start

1. Compilation

To compile the project, first do a "make" in directory lib/SVDLIBC to compile the SVD library. Next, make sure you have Java JDK installed on your machine and find the directory path of Java JNI include files. The directory should contains header files jni.h and jni_md.h. Take a look or directly use the shell script make.sh to compile the rest of the Java code. You have to replace the "jni_path" variable in make.sh with the correct JNI include path. Also, create a "bin" directory in the project directory before running make.sh script.

2. Data Format

We support CoNLL-2006 and CoNLL-2009 data formats, which describe a collection of annotated sentences (and the corresponding gold dependency structures). We assume the dependency trees can be non-projective. See more details of the format at here and here.

3. Usage

3.1 Train a Parser

Take a look at run.sh as an example of running the parser. You could also run the parser as follows. The first thing is to add the RBGParser directory to the library path such that the parser can find the compiled jni library for SVD tensor intialization. Assuming the directory is "/path/to/rbg", this can be done by:

export LD_LIBRARY_PATH="/path/to/rbg:${LD_LIBRARY_PATH}"

After this, we can run the parser:

java -classpath "bin:lib/trove.jar" -Xmx32000m \
  parser.DependencyParser \
  model-file:example.model \
  train train-file:example.train \
  dev test-file:example.dev \
  output-file:example.dev.out

This will train a parser from the training data example.train, evaluate the parser on a dev set example.dev, save the dependency model to the file example.model, and output dependency predictions to the file example.dev.out.

###### 3.2 Test a Parser

To test a trained model, you could run the following command:

java -classpath "bin:lib/trove.jar" -Xmx32000m \
  parser.DependencyParser \
  model-file:example.model \
  test test-file:example.test \
  output-file:example.test.out

###### 3.3 More Options

The parser will train a 3rd-order parser by default. To train a 1st-order (arc-based) model, run the parser like this:

java -classpath "bin:lib/trove.jar" -Xmx32000m \
  parser.DependencyParser \
  model-file:example.model \
  train train-file:example.train \
  dev test-file:example.dev \
  model:basic

The argument ``model:basic'' specifies the model type (basic: 1st-order features, standard: 3rd-order features and full: more 3rd-order and high-order global features).

There are many other possible running options. Here is a more complicated example:

java -classpath "bin:lib/trove.jar" -Xmx32000m \
  parser.DependencyParser \
  model-file:example.model \
  train train-file:example.train \
  dev test-file:example.dev \
  output-file:example.dev.out \
  model:standard  C:1.0  iters:5  pruning:false \
  R:20 gamma:0.3 thread:4 converge-test:50

This will run a standard model with regularization C=1.0, number of training iteration iters=5, rank of the tensor R=20, number of threads in parallel thread=4, weight of the tensor component gamma=0.3, the number of adaptive hill-climbing restarts during testing converge-test=50, and no dependency arc pruning pruning=false. You may take a look at RBGParser/src/parser/Options.java to see a full list of possible options.

###### 3.4 Using Word Embeddings

To add unsupervised word embeddings (word vectors) as auxiliary features to the parser. Use option "word-vector:example.embeddings":

java -classpath "bin:lib/trove.jar" -Xmx32000m \
  parser.DependencyParser \
  model-file:example.model \
  train train-file:example.train \
  dev test-file:example.dev \
  model:basic \
  word-vector:example.embeddings

The input file example.embeddings should be a text file specifying the real-value vectors of different words. Each line of the file should starts with the word, followed by a list of real numbers representing the vector of this word. For example:

this 0.01 0.2 -0.05 0.8 0.12
and 0.13 -0.1 0.12 0.07 0.03
to 0.11 0.01 0.15 0.08 0.23
*UNKNOWN* 0.04 -0.14 0.03 0.04 0
...
...

There may be a special word *UNKNOWN* used for OOV (out-of-vocabulary) word. Each line should contain the same number of real numbers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick Start

1. Compilation

2. Data Format

3. Usage

3.1 Train a Parser

Clone this wiki locally