This directory contains the code and the scripts for running the baseline models for Sub-Task #1: Multimodal Assistant API Prediction.
This subtask involves predicting the assistant actions through API calls along with the necessary arguments using dialog history, multimodal context, and the current user utterance as inputs. For example, enquiring about an attribute value (e.g., price) for a shared furniture item is realized through a call to the SpecifyInfo API with the price argument. A comprehensive set of APIs for our SIMMC dataset is given in the paper.
Please check the task input file for a full description of inputs for each subtask.
Currently, we evaluate action prediction as a round-wise, multiclass classification problem over the set of APIs, and measure the accuracy of the most dominant action. In addition, we also use action perplexity (defined as the exponential of the mean loglikelihood of the dominant action) to allow situations where several actions are equally valid in a given context. We also measure the correctness of the predicted action (API) arguments using attribute accuracy (for Furniture) and f1 score (for Fashion). Specifically, the following API classes and attributes are evaluated.
SIMMC-Furniture
API | API Attributes |
---|---|
SearchFurniture |
furnitureType , color |
FocusOnFurniture |
position |
SpecifyInfo |
attributes |
Rotate |
direction |
NavigateCarousel |
navigateDirection |
AddToCart |
- |
None |
- |
Each of the above attributes is a categorical variable, modeled as multiclass classification problem, and evaluated using attribute accuracy.
Note: minPrice
and maxPrice
attributes corresponding to the SpecifyInfo
action for Furniture are excluded in the current evaluation.
SIMMC-Fashion
API | API Attributes |
---|---|
SearchDatabase |
attributes |
SearchMemory |
attributes |
SpecifyInfo |
attributes |
AddToCart |
- |
None |
- |
Each of the attributes takes multiple values from a fixed set, modeled as multilabel classification problem, and evaluated using attribute F1 score.
The code to evaluate Sub-Task #1 is given in tools/action_evaluation.py
.
The model outputs are expected in the following format:
[
{
"dialog_id": ...,
"predictions": [
{
"action": <predicted_action>,
"action_log_prob": {
<action_token>: <action_log_prob>,
...
},
"attributes": {
<attribute_label>: <attribute_val>,
...
},
"turn_id": ..
}
]
}
]
where attribute_label
corresponds to the API attribute(s) predicted for each API (refer to the table above) and
attribute_val
contains the list of values taken by the key attribute_label
.
NOTE: We plan to extend the Multimodal Assistant API Prediction from the most dominant assistant action to allow the prediction of a series of multiple actions per turn. Please follow the Latest News section in the main README of the repository for updates.
For more details on the task definition and the baseline models we provide, please refer to our SIMMC paper:
@article{moon2020situated,
title={Situated and Interactive Multimodal Conversations},
author={Moon, Seungwhan and Kottur, Satwik and Crook, Paul A and De, Ankita and Poddar, Shivani and Levin, Theodore and Whitney, David and Difranco, Daniel and Beirami, Ahmad and Cho, Eunjoon and Subba, Rajen and Geramifard, Alborz},
journal={arXiv preprint arXiv:2006.01460},
year={2020}
}
NOTE: The paper reports the results from an earlier version of the dataset and with different train-dev-test splits, hence the baseline performances on the challenge resources will be slightly different.
- Git clone the repository:
$ git lfs install
$ git clone https://github.com/facebookresearch/simmc.git
NOTE: We recommend installation in a virtual environment (user guide). Create a new virtual environment and activate it prior to installing the packages.
- Install the required Python packages:
The baselines for API Prediction (Sub-Task #1) and Assistant Response Generation & Retrieval (Sub-Task #2) are jointly learnt. For these baselines, the following are the additional dependencies:
pip install absl-py
pip install numpy
pip install json
pip install nltk
pip install spacy
Code also uses spaCy's en_vectors_web_lg
dataset for GloVE embeddings. To install:
python -m spacy download en_vectors_web_lg
Code also uses NLTK's punkt
. To install:
python
>>> import nltk
>>> nltk.download('punkt')
Contains the following baselines models:
- History-agnostic Encoder (HAE)
- Hierarchical Recurrent Encoder (HRE)
- Memory Network Encoder (MN)
- Transformer-based History-agnostic Encoder (T-HAE)
- TF-IDF-based Encoder (TF-IDF)
- LSTM-based Language Model (LSTM)
Please see our paper for more details about the models.
options.py
: Command line arguments to control behaviortrain_simmc_agent.py
: Trains SIMMC baselineseval_simmc_agent.py
oreval_genie.py
: Evaluates trained checkpointsloaders/
: Dataloaders for SIMMCmodels/
: Model filesassistant.py
: SIMMC Assistant Wrapper Classencoders/
: Different types of encodershistory_agnostic.py
hierarchical_recurrent.py
memory_network.py
tf_idf_encoder.py
decoder.py
: Response decoder, language model with LSTM or Transformersaction_executor.py
: Learns action and action attributescarousel_embedder.py
: Learns multimodal embedding for furnitureuser_memory_embedder.py
: Learns multimodal embedding for fashionpositional_encoding.py
: Positional encoding unit for transformersself_attention.py
: Self attention model unit{fashion|furniture}_model_metainfo.json
: Model action/attribute configurations for SIMMC
tools/
: Supporting scripts for preprocessing and other utilitesscripts/
: Bash scripts to run preprocessing, training, evaluation
Run scripts/preprocess_simmc.sh
with appropriate $DOMAIN
(either "furniture" or "fashion") to
run through the following steps:
- Extract supervision for dominant Assistant Action API
- Extract word vocabulary from the train split
- Read and embed the shopping assets into a feature vector
- Convert all the above information into a multimodal numpy input array for dataloader consumption
- Extract action attribute vocabulary from train split
Please see scripts/preprocess_simmc.sh
to better understand the inputs/outputs for each
of the above steps.
To train a model or evaluate a saved checkpoints, please check examples in
scripts/train_simmc_model.sh
.
You can also train all the above baselines at once using scripts/train_all_simmc_models.sh
.
For description and usage of necessary options/flags, please refer to options.py
or one of the above two
scripts.
The baselines trained through the code obtain the following results for Sub-Task #1.
SIMMC-Furniture
Model | Action Accuracy | Action Perplexity | Attribute Accuracy |
---|---|---|---|
TF-IDF | 77.1 | 2.59 | 57.5 |
HAE | 79.7 | 1.70 | 53.6 |
HRE | 80.0 | 1.66 | 54.7 |
MN | 79.2 | 1.71 | 53.3 |
T-HAE | 78.4 | 1.83 | 53.6 |
SIMMC-Fashion
Model | Action Accuracy | Action Perplexity | Attribute Accuracy |
---|---|---|---|
TD-IDF | 78.1 | 3.51 | 57.9 |
HAE | 81.0 | 1.75 | 60.2 |
HRE | 81.9 | 1.76 | 62.1 |
MN | 81.6 | 1.74 | 61.6 |
T-HAE | 81.4 | 1.78 | 62.1 |
Note: DSTC9 SIMMC Challenge was conducted on SIMMC v1.0. Thus all the results and baseline performances are on SIMMC v1.0.
- Disallowed Input:
belief_state
,system_transcript
,system_transcript_annotated
,state_graph_1
,state_graph_2
, and anything from future turns. - If you would like to use any other external resources, please consult with the track organizers ([email protected]). Generally, we allow the use of publicly available pre-trained language models, such as BERT, GPT-2, etc.