This is the code for our ACL 2021 paper Question Answering over Temporal Knowledge Graphs
UPDATE: There has been a small update to the CronQuestions dataset. There was an error in dataset creation that resulted in '{tail2}' being present in the question string instead of getting slot filled by the proper entity (#13). This affected 31951/410000 questions. We have uploaded a fixed version of the dataset. Although the numbers reported in the paper are based on the original version of the dataset, we encourage everyone to used the fixed version. Details of this are in the dataset download section.
Clone and create a conda environment
git clone [email protected]:apoorvumang/CronKGQA.git
cd CronKGQA
conda create --prefix ./cronkgqa_env python=3.8
conda activate ./cronkgqa_env
We use TComplEx KG Embeddings as proposed in Tensor Decompositions for temporal knowledge base completion. We use a slightly modified version of their code from https://github.com/facebookresearch/tkbc
Install tkbc requirements and setup tkbc
conda install --file requirements_tkbc.txt -c pytorch
python setup_tkbc.py install
Install CronKGQA requirements
conda install --file requirements.txt -c conda-forge
Download and unzip data_v2.zip
and models.zip
in the root directory. data.zip
contains the old version without the '{tail2}' fix, please refrain from using it.
Drive: https://drive.google.com/drive/folders/15L4bpGEvCCp7Kuz6xJOFBQFzQGWKJ9rL?usp=sharing, or use gdown
gdown https://drive.google.com/uc\?id\=1fe7-x7ChszqzczKncoZcpwmWc1PBq1_0
gdown https://drive.google.com/uc\?id\=18w_aPl-oLfWnTLnoMnTU9Pm4El1T9wkB
unzip -q data_v2.zip && unzip -q models.zip
rm data_v2.zip && rm models.zip
Run a jupyter notebook in the root folder. Make sure to activate the correct environment before running the notebook
The notebook cronkgqa_testing.ipynb
can be used to test a model's responses to any textual question, provided you give the list of entities and times in the question as well - this is needed since perfect entity linking is assumed. You can explore the dataset for questions which have entity annotation and modify those questions. You can also make a reverse dict of data/wikidata_big/kg/wd_id2entity_text.txt
and search for wikidata ids of an entity that you want.
Finally you can run training of QA model using these trained tkbc embeddings. embedkgqa model = cronkgqa (will fix naming etc. soon)
CUDA_VISIBLE_DEVICES=1 python -W ignore ./train_qa_model.py --frozen 1 --eval_k 1 --max_epochs 200 \
--lr 0.00002 --batch_size 250 --mode train --tkbc_model_file tcomplex_17dec.ckpt \
--dataset wikidata_big --valid_freq 3 --model embedkgqa --valid_batch_size 50 \
--save_to temp --lm_frozen 1 --eval_split valid
Evaluating the pretrained model (CronKGQA):
CUDA_VISIBLE_DEVICES=1 python -W ignore ./train_qa_model.py \
--mode eval --tkbc_model_file tcomplex_17dec.ckpt \
--dataset wikidata_big --model embedkgqa --valid_batch_size 50 \
--load_from cronkgqa_29may --eval_split test
Please explore the qa_models.py file for other models, you can change the model by providing the --model parameter.
If you used our work or found it helpful, please use the following citation:
@inproceedings{saxena2021cronkgqa,
title={Question Answering over Temporal Knowledge Graphs},
author={Saxena, Apoorv and Chakrabarti, Soumen and Talukdar, Partha},
booktitle={Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics},
year={2021}
}