Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
babi_FP.sh		babi_FP.sh
babi_RBI.sh		babi_RBI.sh
babi_REINFORCE.sh		babi_REINFORCE.sh
data.lua		data.lua
movieQA_FP.sh		movieQA_FP.sh
movieQA_RBI.sh		movieQA_RBI.sh
movieQA_REINFORCE.sh		movieQA_REINFORCE.sh
online_memmnet.lua		online_memmnet.lua
online_simulate.lua		online_simulate.lua
parse.lua		parse.lua
setup_data.sh		setup_data.sh
setup_turk_data.sh		setup_turk_data.sh

README.md

Dialogue Learning With Human-in-the-Loop

This project contains code for the dialog-based learning MemN2N setup in the following paper: "Dialogue Learning with Human-in-the-Loop".

Setup

This code requires Torch7 and its luarocks packages cutorch, cunn, nngraph, torchx, and tds.

To get the synthetic data, from this directory first run ./setup_data.sh to download the data (90M download, unpacks to 435M).

Dataset

After running ./setup_data.sh:

./data/ contains synthetic data for simulations.

The synthetic data includes babi ("babi1_*") tasks and WikiMovies ("movieQA_*") data.

We additionally have another dataset available, which contains human-annotated versions of WikiMovies data. This data is in a slightly simpler format, so the code here does not yet run on it out-of-the-box. It is a 4M download which unpacks to 14M.

Usage

You can use one of the *.sh scripts as examples of how to train the model on one of the datasets.

As demonstrated there, to train run:

th online_simulate.lua [params]

Available options are:

-batch_size		(default 32, the batch size for model training)
-token_size		(default 0, number of tokens)
-init_weight	(default 0.1, initialization weights)
-N_hop			(default 3, number of hops)
-lr				(default 0.01, learning rate)
-thres			(default 40, threshold for gradient clipping)
-gpu_index		(default 1, which GPU to use)
-dataset		(default 'babi', choose from 'babi' or 'movieQA')
-setting		(default 'RBI', choose from 'RBI' or 'FP')
-randomness     (default 0.2, random exploration rate for epsilon greedy)
-simulator_batch_size   (default 32, the batch size of data generation. It is different from model batch size)
-task			(default 3, which task to test)
-nepochs		(default 20, number of iterations)
-negative		(default 5, number of negative samples for FP)
-REINFORCE      (default false, where to train the REINFORCE algorithm)
-REINFORCE_reg  (default 0.1, entropy regularizer for the REINFORCE algorithm)
-RF_lr          (default 0.0005, learning rate used by the REINFORCE baseline)
-log_freq       (default 200, how often we log)
-balance        (default false, enable label balancing experience replay strategy for FP)

References

Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato and Jason Weston, "Dialogue Learning with Human-in-the-Loop", arXiv:1611.09823 [cs.AI].

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HITL

HITL

README.md

Dialogue Learning With Human-in-the-Loop

Setup

Dataset

Usage

References

Files

HITL

Directory actions

More options

Directory actions

More options

Latest commit

History

HITL

Folders and files

parent directory

README.md

Dialogue Learning With Human-in-the-Loop

Setup

Dataset

Usage

References