diff --git a/README.md b/README.md index ff76f89a..cc2d714d 100755 --- a/README.md +++ b/README.md @@ -10,9 +10,12 @@ I am trying a new initiative - a-paper-a-week. This repository will hold all tho * [Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning](https://shagunsodhani.in/papers-I-read/Improving-Information-Extraction-by-Acquiring-External-Evidence-with-Reinforcement-Learning) * [An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks](https://shagunsodhani.in/papers-I-read/An-Empirical-Investigation-of-Catastrophic-Forgetting-in-Gradient-Based-Neural-Networks) * [Learning an SAT Solver from Single-Bit Supervision](https://shagunsodhani.in/papers-I-read/Learning-a-SAT-Solver-from-Single-Bit-Supervision) +* [Neural Relational Inference for Interacting Systems](https://shagunsodhani.in/papers-I-read/Neural-Relational-Inference-for-Interacting-Systems) +* [Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks](https://shagunsodhani.in/papers-I-read/Stylistic-Transfer-in-Natural-Language-Generation-Systems-Using-Recurrent-Neural-Networks) * [Get To The Point: Summarization with Pointer-Generator Networks](https://shagunsodhani.in/papers-I-read/Get-To-The-Point-Summarization-with-Pointer-Generator-Networks) * [StarSpace - Embed All The Things!](https://shagunsodhani.in/papers-I-read/StarSpace-Embed-All-The-Things) * [Emotional Chatting Machine - Emotional Conversation Generation with Internal and External Memory](https://shagunsodhani.in/papers-I-read/Emotional-Chatting-Machine-Emotional-Conversation-Generation-with-Internal-and-External-Memory) +* [Exploring Models and Data for Image Question Answering](https://shagunsodhani.in/papers-I-read/Exploring-Models-and-Data-for-Image-Question-Answering) * [How transferable are features in deep neural networks](https://shagunsodhani.in/papers-I-read/How-transferable-are-features-in-deep-neural-networks) * [Distilling the Knowledge in a Neural Network](https://shagunsodhani.in/papers-I-read/Distilling-the-Knowledge-in-a-Neural-Network) * [Revisiting Semi-Supervised Learning with Graph Embeddings](https://shagunsodhani.in/papers-I-read/Revisiting-Semi-Supervised-Learning-with-Graph-Embeddings) diff --git a/site/_posts/2017-10-01-Task-Oriented Query Reformulation with Reinforcement Learning.md b/site/_posts/2017-10-01-Task-Oriented Query Reformulation with Reinforcement Learning.md index afae1129..2f253385 100755 --- a/site/_posts/2017-10-01-Task-Oriented Query Reformulation with Reinforcement Learning.md +++ b/site/_posts/2017-10-01-Task-Oriented Query Reformulation with Reinforcement Learning.md @@ -3,7 +3,7 @@ layout: post title: Task-Oriented Query Reformulation with Reinforcement Learning comments: True excerpt: The paper introduces a query reformulation system that rewrites a query to maximise the number of "relevant" documents that are extracted from a given black box search engine. -tags: ['2017', 'EMNLP 2017', Information Retrieval', AI, EMNLP, NLP, RL] +tags: ['2017', 'EMNLP 2017', 'Information Retrieval', AI, EMNLP, NLP, RL] --- ## Introduction diff --git a/site/_posts/2018-01-14-Exploring Models and Data for Image Question Answering.md b/site/_posts/2018-01-14-Exploring Models and Data for Image Question Answering.md new file mode 100755 index 00000000..3d1cfe67 --- /dev/null +++ b/site/_posts/2018-01-14-Exploring Models and Data for Image Question Answering.md @@ -0,0 +1,59 @@ +--- +layout: post +title: Exploring Models and Data for Image Question Answering +comments: True +excerpt: Given an image, answer a given question about the image. +tags: ['2015', 'NIPS 2015', AI, CV, Dataset, NIPS, NLP, VQA] +--- + +## Introduction + +* **Problem Statement**: Given an image, answer a given question about the image. + +* [Link to the paper](https://arxiv.org/abs/1505.02074) + +* **Assumptions**: + * The answer is assumed to be a single word thereby bypassing the evaluation issues of multi-word generation tasks. + +## VIS-LSTM Model + +* Treat the input image as the first word in the question. +* Obtain the vector representation (skip-gram) for words in the question. +* Obtain the VGG Net embeddings of the image and use a linear transformation (dimensionality reduction weight matrix) to match the dimensions of word embeddings. +* Keep image embedding frozen during training and use an LSTM to combine the word vectors. +* LSTM outputs are fed into a softmax layer which generates the answer. + +## Dataset + +* DAtaset for QUestion Ansering on Real-world images (DAQUAR) + * 1300 images and 7000 questions with 37 object classes. + * Downside is that even guess work can yield good results. +* The paper proposed an algorithm for generating questions using MS-COCO dataset. + * Perform preprocessing steps like breaking large sentences and changing indefinite determines to definite ones. + * *object* questions, *number* questions, *colour* questions and *location* questions can be generated by searching for nouns, numbers, colours and prepositions respectively. + * Resulting dataset has ~120K questions across above 4 semantic types. + +## Models + +* VIS+LSTM - explained above +* 2-VIS+BLSTM - Add the image features twice, in beginning and in the end (using different linear transformations) plus use bidirectional LSTM +* IMG+BOW - Multinomial logistic regression on image features without dimensionality reduction + bag of words (averaging word vectors). +* FULL - Simple average of above 2 models. + +### Baseline + +* Includes models where the answer is guessed, or only image or question features are used or image features along with prior knowledge of object are used. +* Also includes a KNN model where the system finds the nearest (image, question) pair. + +### Metrics + +* Accuracy +* Wu-Palmer similarity measure + +## Observations + +* The VIS-LSTM model outperforms the baselines while the FULL model benefits from averaging across all the models. +* Some useful information seems to be lost when downsizing the VGG vectors. +* Fine tuning the word vectors helps with performance. +* Normalising CNN hidden image features into zero mean and unit variance leads to faster training. +* Model does not perform well on the task of considering spatial relations between multiple objects and counting objects when multiple objects are present \ No newline at end of file diff --git a/site/_posts/2018-02-11-Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks.md b/site/_posts/2018-02-11-Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks.md new file mode 100755 index 00000000..8ab92565 --- /dev/null +++ b/site/_posts/2018-02-11-Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks.md @@ -0,0 +1,43 @@ +--- +layout: post +title: Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks +comments: True +excerpt: The paper explores the problem of style transfer in natural language generation. +tags: ['2016', 'ACL 2016', ACL, AI, NLG, NLP, Workshop] +--- + +## Introduction + +* [This workshop paper](https://aclweb.org/anthology/W/W16/W16-6010.pdf) explores the problem of style transfer in natural language generation (NLG). +* One possible manifestation would be rewriting technical articles in an easy-to-understate manner. + +## Challenges + +* Identifying relevant stylistic cues and using them to control text generation in NLG systems. +* Absence of a large amount of training data. + +## Pitch + +* Using Recurrent Neural Networks (RNNs) to disentangle the style from semantic content. +* Autoencoder model with two components - one for learning style and another for learning content. +* This allows for "style" component to be replaced while keeping the "content" component same, resulting in a style transfer. +* One way to think about this is - the encoder generates a 100-dimensional vector. In this, the first 50 entries, correspond to the "style" component and remaining to the "content" component. +* The proposal is that the loss function should be modified to include a cross-covariance term for ensuring disentanglement. +* I think one way of doing this is to have two loss functions: + * The **first loss** function ensures that the input sentence is decoded properly into the target sentence. This loss is computed for each sentence. + * The **second loss** ensures that the first 50 entries across all the encoded represenations are are correlated. This loss operates at the batch level. + * The **total loss** is the weighted sum of these 2 losses. + +## Possible Datasets + +* [Complete works of Shakespeare](http://norvig.com/ngrams/shakespeare.txt) +* [Wikpedia Kaggle dataset](https://www.kaggle.com/c/wikichallenge/data) +* [Oxford Text Archive](https://ota.ox.ac.uk/) +* Twitter data + +## Possible Metrics + +* Soundness - is the generated text entailed with the input sentence. +* Coherence - free of grammatical errors, proper word usage etc. +* Effectiveness - how effective was the style transfer +* Since some of the metrics are subjective, human evaluators also need to be employed. \ No newline at end of file