Code Release for Learning Answer Embeddings for Visual Question Answering. (CVPR 2018)
usage: train_v7w_embedding.py [-h] [--gpu_id GPU_ID] [--batch_size BATCH_SIZE]
[--max_negative_answer MAX_NEGATIVE_ANSWER]
[--answer_batch_size ANSWER_BATCH_SIZE]
[--loss_temperature LOSS_TEMPERATURE]
[--pretrained_model PRETRAINED_MODEL]
[--context_embedding {SAN,BoW}]
[--answer_embedding {BoW,RNN}] [--name NAME]
optional arguments:
-h, --help show this help message and exit
--gpu_id GPU_ID
--batch_size BATCH_SIZE
--max_negative_answer MAX_NEGATIVE_ANSWER
--answer_batch_size ANSWER_BATCH_SIZE
--loss_temperature LOSS_TEMPERATURE
--pretrained_model PRETRAINED_MODEL
--context_embedding {SAN,BoW}
--answer_embedding {BoW,RNN}
--name NAME
Please cite with the following bibtex if you are using any related resource of this repo for your research.
@inproceedings{hu2018learning,
title={Learning Answer Embeddings for Visual Question Answering},
author={Hu, Hexiang and Chao, Wei-Lun and Sha, Fei},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={5428--5436},
year={2018}
}
Part of this code uses components from pytorch-vqa and torchtext. We thank authors for releasing their code.
- Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets (qaVG website)
- Visual7W: Grounded Question Answering in Images (website)
- Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering website