Added some more papers

shagunsodhani · Mar 25, 2018 · 7216d80 · 7216d80
1 parent 8191a01
commit 7216d80
Show file tree

Hide file tree

Showing 4 changed files with 106 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -10,9 +10,12 @@ I am trying a new initiative - a-paper-a-week. This repository will hold all tho
 * [Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning](https://shagunsodhani.in/papers-I-read/Improving-Information-Extraction-by-Acquiring-External-Evidence-with-Reinforcement-Learning)
 * [An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks](https://shagunsodhani.in/papers-I-read/An-Empirical-Investigation-of-Catastrophic-Forgetting-in-Gradient-Based-Neural-Networks)
 * [Learning an SAT Solver from Single-Bit Supervision](https://shagunsodhani.in/papers-I-read/Learning-a-SAT-Solver-from-Single-Bit-Supervision)
+* [Neural Relational Inference for Interacting Systems](https://shagunsodhani.in/papers-I-read/Neural-Relational-Inference-for-Interacting-Systems)
+* [Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks](https://shagunsodhani.in/papers-I-read/Stylistic-Transfer-in-Natural-Language-Generation-Systems-Using-Recurrent-Neural-Networks)
 * [Get To The Point: Summarization with Pointer-Generator Networks](https://shagunsodhani.in/papers-I-read/Get-To-The-Point-Summarization-with-Pointer-Generator-Networks)
 * [StarSpace - Embed All The Things!](https://shagunsodhani.in/papers-I-read/StarSpace-Embed-All-The-Things)
 * [Emotional Chatting Machine - Emotional Conversation Generation with Internal and External Memory](https://shagunsodhani.in/papers-I-read/Emotional-Chatting-Machine-Emotional-Conversation-Generation-with-Internal-and-External-Memory)
+* [Exploring Models and Data for Image Question Answering](https://shagunsodhani.in/papers-I-read/Exploring-Models-and-Data-for-Image-Question-Answering)
 * [How transferable are features in deep neural networks](https://shagunsodhani.in/papers-I-read/How-transferable-are-features-in-deep-neural-networks)
 * [Distilling the Knowledge in a Neural Network](https://shagunsodhani.in/papers-I-read/Distilling-the-Knowledge-in-a-Neural-Network)
 * [Revisiting Semi-Supervised Learning with Graph Embeddings](https://shagunsodhani.in/papers-I-read/Revisiting-Semi-Supervised-Learning-with-Graph-Embeddings)

diff --git a/...sts/2017-10-01-Task-Oriented Query Reformulation with Reinforcement Learning.md b/...sts/2017-10-01-Task-Oriented Query Reformulation with Reinforcement Learning.md
@@ -3,7 +3,7 @@ layout: post
 title: Task-Oriented Query Reformulation with Reinforcement Learning
 comments: True
 excerpt: The paper introduces a query reformulation system that rewrites a query to maximise the number of "relevant" documents that are extracted from a given black box search engine.
-tags: ['2017', 'EMNLP 2017', Information Retrieval', AI, EMNLP, NLP, RL]
+tags: ['2017', 'EMNLP 2017', 'Information Retrieval', AI, EMNLP, NLP, RL]
 ---
 
 ## Introduction

diff --git a/site/_posts/2018-01-14-Exploring Models and Data for Image Question Answering.md b/site/_posts/2018-01-14-Exploring Models and Data for Image Question Answering.md
@@ -0,0 +1,59 @@
+---
+layout: post
+title: Exploring Models and Data for Image Question Answering
+comments: True
+excerpt: Given an image, answer a given question about the image.
+tags: ['2015', 'NIPS 2015', AI, CV, Dataset, NIPS, NLP, VQA]
+---
+
+## Introduction
+
+* **Problem Statement**: Given an image, answer a given question about the image.
+
+* [Link to the paper](https://arxiv.org/abs/1505.02074)
+
+* **Assumptions**:
+    * The answer is assumed to be a single word thereby bypassing the evaluation issues of multi-word generation tasks.
+
+## VIS-LSTM Model
+
+* Treat the input image as the first word in the question.
+* Obtain the vector representation (skip-gram) for words in the question.
+* Obtain the VGG Net embeddings of the image and use a linear transformation (dimensionality reduction weight matrix) to match the dimensions of word embeddings.
+* Keep image embedding frozen during training and use an LSTM to combine the word vectors.
+* LSTM outputs are fed into a softmax layer which generates the answer.
+
+## Dataset
+
+* DAtaset for QUestion Ansering on Real-world images (DAQUAR)
+    * 1300 images and 7000 questions with 37 object classes.
+    * Downside is that even guess work can yield good results.
+* The paper proposed an algorithm for generating questions using MS-COCO dataset.
+    * Perform preprocessing steps like breaking large sentences and changing indefinite determines to definite ones.
+    * *object* questions, *number* questions, *colour* questions and *location* questions can be generated by searching for nouns, numbers, colours and prepositions respectively.
+    * Resulting dataset has ~120K questions across above 4 semantic types.
+
+## Models
+
+* VIS+LSTM - explained above
+* 2-VIS+BLSTM - Add the image features twice, in beginning and in the end (using different linear transformations) plus use bidirectional LSTM
+* IMG+BOW - Multinomial logistic regression on image features without dimensionality reduction + bag of words (averaging word vectors).
+* FULL - Simple average of above 2 models.
+
+### Baseline
+
+* Includes models where the answer is guessed, or only image or question features are used or image features along with prior knowledge of object are used.
+* Also includes a KNN model where the system finds the nearest (image, question) pair.
+
+### Metrics
+
+* Accuracy
+* Wu-Palmer similarity measure
+
+## Observations
+
+* The VIS-LSTM model outperforms the baselines while the FULL model benefits from averaging across all the models.
+* Some useful information seems to be lost when downsizing the VGG vectors.
+* Fine tuning the word vectors helps with performance.
+* Normalising CNN hidden image features into zero mean and unit variance leads to faster training.
+* Model does not perform well on the task of considering spatial relations between multiple objects and counting objects when multiple objects are present
diff --git a/...nsfer in Natural Language Generation Systems Using Recurrent Neural Networks.md b/...nsfer in Natural Language Generation Systems Using Recurrent Neural Networks.md
@@ -0,0 +1,43 @@
+---
+layout: post
+title: Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks
+comments: True
+excerpt: The paper explores the problem of style transfer in natural language generation.
+tags: ['2016', 'ACL 2016', ACL, AI, NLG, NLP, Workshop]
+---
+
+## Introduction
+
+* [This workshop paper](https://aclweb.org/anthology/W/W16/W16-6010.pdf) explores the problem of style transfer in natural language generation (NLG).
+* One possible manifestation would be rewriting technical articles in an easy-to-understate manner.
+
+## Challenges
+
+* Identifying relevant stylistic cues and using them to control text generation in NLG systems.
+* Absence of a large amount of training data.
+
+## Pitch
+
+* Using Recurrent Neural Networks (RNNs) to disentangle the style from semantic content.
+* Autoencoder model with two components - one for learning style and another for learning content.
+* This allows for "style" component to be replaced while keeping the "content" component same, resulting in a style transfer.
+* One way to think about this is - the encoder generates a 100-dimensional vector. In this, the first 50 entries, correspond to the "style" component and remaining to the "content" component.
+* The proposal is that the loss function should be modified to include a cross-covariance term for ensuring disentanglement.
+* I think one way of doing this is to have two loss functions:
+    * The **first loss** function ensures that the input sentence is decoded properly into the target sentence. This loss is computed for each sentence.
+    * The **second loss** ensures that the first 50 entries across all the encoded represenations are are correlated. This loss operates at the batch level.
+    * The **total loss** is the weighted sum of these 2 losses.
+
+## Possible Datasets
+
+* [Complete works of Shakespeare](http://norvig.com/ngrams/shakespeare.txt)
+* [Wikpedia Kaggle dataset](https://www.kaggle.com/c/wikichallenge/data)
+* [Oxford Text Archive](https://ota.ox.ac.uk/)
+* Twitter data
+
+## Possible Metrics
+
+* Soundness - is the generated text entailed with the input sentence.
+* Coherence - free of grammatical errors, proper word usage etc.
+* Effectiveness - how effective was the style transfer
+* Since some of the metrics are subjective, human evaluators also need to be employed.