Resource List
Year | Authors | Conf. | Title | Links |
---|---|---|---|---|
2016 | Yang et al. | NAACL-HLT'16 | Hierarchical Attention Networks for Document Classification | [pdf] |
2016 | Zoph et al. | arXiv | Multi-Source Neural Translation | [pdf] |
2017 | Vaswani et al. | NIPS'17 | Attention Is All You Need | [pdf] [github] |
2017 | Xia et al. | NIPS'17 | Deliberation Networks: Sequence Generation Beyond One-Pass Decoding | [pdf] [github] |
2018 | Miculicich et al. | EMNLP'18 | Document-Level Neural Machine Translation with Hierarchical Attention Networks | [pdf] |
2018 | Devlin et al. | arXiv | BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | [pdf] [github] |
2018 | Yang et al. | NAACL-HLT'18 | Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets | [pdf] |
2018 | Wu et al. | NAACL-HLT'18 | Adversarial Neural Machine Translation | [pdf] |
2019 | Dai et al. | ACL'19 | Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context | [pdf] [github] |
2019 | Yang et al. | arXiv | XLNet: Generalized Autoregressive Pretraining for Language Understanding | [pdf] [github] |
2019 | Liu et al. | ACL'19 | Hierarchical Transformers for Multi-Document Summarization | [pdf] [github] |
2019 | Pourdamghani et al. | ACL'19 | Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation | [pdf] |
2019 | Zhou et al. | arXiv | Synchronous Bidirectional Neural Machine Translation | [pdf] [github] |
Year | Authors | Conf. | Title | Links |
---|---|---|---|---|
2011 | Jia et al. | ICCV'11 | Learning Cross-modality Similarity for Multinomial Data | [pdf] |
2014 | Mao et al. | arXiv | Explain Images with Multimodal Recurrent Neural Networks | [pdf] |
2014 | Kiros et al. | arXiv | Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models | [pdf] |
2015 | Ma et al. | ICCV'15 | Multimodal Convolutional Neural Networks for Matching Image and Sentence | [pdf] |
2015 | Mao et al. | ICLR'15 | Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) | [pdf] [github] |
2016 | Yang et al. | NIPS'16 | Review Networks for Caption Generation | [pdf] [github] |
2016 | You et al. | CVPR'16 | Image Captioning with Semantic Attention | [pdf] |
2016 | Lu et al. | NIPS'16 | Hierarchical Question-Image Co-Attention for Visual Question Answering | [pdf] [github] |
2018 | Anderson et al. | CVPR'18 | Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering | [pdf] |
2018 | Nguyen et al. | CVPR'18 | Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering | [pdf] |
2018 | Wang et al. | NAACL'18 | Object Counts! Bringing Explicit Detections Back into Image Captioning | [pdf] |
2019 | Qin et al. | CVPR'19 | Look Back and Predict Forward in Image Captioning | [pdf] |
2019 | Li et al. | AAAI'19 | Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering | [pdf] |
2019 | Yu et al. | CVPR'19 | Deep Modular Co-Attention Networks for Visual Question Answering | [pdf] |
Year | Authors | Conf. | Title | Links |
---|---|---|---|---|
2016 | Caglayan et al. | WMT'16 | Does Multimodality Help Human and Machine for Translation and Image Captioning? | [pdf] |
2016 | Caglayan et al. | arXiv | Multimodal Attention for Neural Machine Translation | [pdf] |
2016 | Huang et al. | WMT'16 | Attention-based Multimodal Neural Machine Translation | [pdf] |
2017 | Nakayama et al. | arXiv | Zero-resource Machine Translation by Multimodal Encoder-decoder Network with Multimedia Pivot | [pdf] |
2017 | Delbrouck et al. | ICLR'17 | Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation | [pdf] |
2017 | Lala et al. | PBML'17 | Unraveling the Contribution of Image Captioning and Neural Machine Translation for Multimodal Machine Translation | [pdf] |
2017 | Chen et al. | arXiv | A Teacher-Student Framework for Zero-Resource Neural Machine Translation | [pdf] |
2017 | Elliott et al. | arXiv | Imagination improves Multimodal Translation | [pdf] |
2017 | Elliott et al. | WMT'17 | Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description | [pdf] |
2017 | Calixto et al. | arXiv | Doubly-Attentive Decoder for Multi-modal Neural Machine Translation | [pdf] [github] |
2017 | Libovicky et al. | ACL'17 | Attention Strategies for Multi-Source Sequence-to-Sequence Learning | [pdf] |
2017 | Calixto et al. | EMNLP'17 | Incorporating Global Visual Features into Attention-Based Neural Machine Translation | [pdf] |
2018 | Barrault et al. | WMT'18 | Findings of the Third Shared Task on Multimodal Machine Translation | [pdf] |
2018 | Caglayan et al. | WMT'18 | LIUM-CVC Submissions for WMT18 Multimodal Translation Task | [pdf] |
2018 | Gronroos et al. | WMT'18 | The MeMAD Submission to the WMT18 Multimodal Translation Task | [pdf] |
2018 | Gwinnup et al. | WMT'18 | The AFRL-Ohio State WMT18 Multimodal System: Combining Visual with Traditional | [pdf] |
2018 | Helcl et al. | WMT'18 | CUNI System for the WMT18 Multimodal Translation Task | [pdf] |
2018 | Lala et al. | WMT'18 | Sheffield Submissions for WMT18 Multimodal Translation Shared Task | [pdf] |
2018 | Zheng et al. | WMT'18 | Ensemble Sequence Level Training for Multimodal MT: OSU-Baidu WMT18 Multimodal Translation System Report | [pdf] |
2018 | Delbrouck et al. | WMT'18 | UMONS Submission for WMT18 Multimodal Translation Task | [pdf] [github] |
2018 | Libovicky et al. | WMT'18 | Input Combination Strategies for Multi-Source Transformer Decoder | [pdf] |
2018 | Shin et al. | WMT'18 | Multi-encoder Transformer Network for Automatic Post-Editing | [pdf] |
2018 | Zhou et al. | ACL'18 | A Visual Attention Grounding Neural Model for Multimodal Machine Translation | [pdf] |
2018 | Qian et al. | arXiv | Multimodal Machine Translation with Reinforcement Learning | [pdf] |
2019 | Caglayan et al. | NAACL-HLT'19 | Probing the Need for Visual Context in Multimodal Machine Translation | [pdf] |
2019 | Su et al. | CVPR'19 | Unsupervised Multi-modal Neural Machine Translation | [pdf] |
2019 | Ive et al. | ACL'19 | Distilling Translations with Visual Awareness | [pdf] [github] |
2019 | Calixto et al. | ACL'19 | Latent Variable Model for Multi-modal Translation | [pdf] |
2019 | Chen et al. | IJCAI'19 | From Words to Sentences: A Progressive Learning Approach for Zero-resource Machine Translation with Visual Pivots | [pdf] |
2019 | Hirasawa et al. | ACL'19 | Debiasing Word Embedding Improves Multimodal Machine Translation | [pdf] |
2019 | Mogadala et al. | arXiv | Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods | [pdf] |
2019 | Calixto et al. | Springer | An Error Analysis for Image-based Multi-modal Neural Machine Translation | [pdf] |
2019 | Hirasawa et al. | arXiv | Multimodal Machine Translation with Embedding Prediction | [pdf] [github] |
2020.01 | Park et al. | WACV'20 | MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding | [pdf] [repo] |
Dataset | Authors | Paper | Links |
---|---|---|---|
Flickr30K | Young et al. | From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions | [pdf] [web] |
Flickr30K Entities | Plummer et al. | Flickr30K Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models | [pdf] [web] [github] |
Multi30K | Elliott et al. | Multi30K: Multilingual English-German Image Descriptions | [pdf] [github] |
IAPR-TC12 | Grubinger et al. | The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems | [pdf] [web] |
VATEX | Wang et al. | VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research | [pdf] [web] |
Metric | Authors | Paper | Links |
---|---|---|---|
BLEU | Papineni et al. | BLEU: a Method for Automatic Evaluation of Machine Translation | [pdf] |
METEOR | Banerjee et al. | METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments | [pdf] [web] |
METEOR 1.5 | Denkowski et al. | METEOR Universal: Language Specific Translation Evaluation for Any Target Language | [pdf] [web] |
TER | Snover et al. | A study of Translation Edit Rate with Targeted Human Annotation | [pdf] |
Year | Authors | Title | Links |
---|---|---|---|
2016 | Elliott et al. | Multimodal Learning and Reasoning | [pdf] |
2017 | Lucia Specia | Multimodal Machine Translation | [pdf] |
2018 | Loic Barrault | Introduction to Multimodal Machine Translation | [pdf] |