Added pointer networks paper

shagunsodhani · Aug 27, 2017 · f9ff24d · f9ff24d
1 parent 58ae3af
commit f9ff24d
Show file tree

Hide file tree

Showing 3 changed files with 47 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -5,6 +5,7 @@ I am trying a new initiative - a-paper-a-week. This repository will hold all tho
 
 ## List of papers
 
+* [Pointer Networks](https://shagunsodhani.in/papers-I-read/Pointer-Networks)
 * [Learning to Compute Word Embeddings On the Fly](https://shagunsodhani.in/papers-I-read/Learning-to-Compute-Word-Embeddings-On-the-Fly)
 * [R-NET - Machine Reading Comprehension with Self-matching Networks](https://shagunsodhani.in/papers-I-read/R-NET-Machine-Reading-Comprehension-with-Self-matching-Networks)
 * [ReasoNet - Learning to Stop Reading in Machine Comprehension](https://shagunsodhani.in/papers-I-read/ReasoNet-Learning-to-Stop-Reading-in-Machine-Comprehension)

diff --git a/site/_posts/2017-08-27-Pointer Networks.md b/site/_posts/2017-08-27-Pointer Networks.md
@@ -0,0 +1,45 @@
+---
+layout: post
+title: Pointer Networks
+comments: True
+excerpt: The paper introduces a novel architecture that generates an output sequence such that the elements of the output sequence are discrete tokens corresponding to positions in the input sequence.
+tags: ['2015', 'NIPS 2015', 'Seq2Seq', AI, NIPS, NLP, Softmax]
+---
+
+## Introduction
+
+* The paper introduces a novel architecture that generates an output sequence such that the elements of the output sequence are discrete tokens corresponding to positions in the input sequence.
+
+* Such a problem can not be solved using [Seq2Seq](https://gist.github.com/shagunsodhani/a2915921d7d0ac5cfd0e379025acfb9f) or Neural Turing Machines as the size of the output softmax is variable (as it depends on the size of the input sequence).
+
+* [Link to the paper](https://arxiv.org/abs/1506.03134)
+
+## Architecture
+
+* Traditional attention-base sequence-to-sequence models compute an attention vector for each step of the output decoder and use that to blend the individual context vectors of the input into a single, consolidated attention vector. This attention vector is used to compute a fixed size softmax.
+
+* In Pointer Nets, the normalized attention vector (over all the tokens in the input sequence) is normalized and treated as the softmax output over the input tokens.
+
+* So Pointer Net is a very simple modification of the attention model. 
+
+## Application
+
+* Any problem where the size of the output depends on the size of the input because of which fixed length softmax is ruled out.
+
+* eg combinatorial problems such as planar convex hull where the size of the output would depend on the size of the input.
+
+## Evaluation
+
+* The paper considers the following 3 problems:
+
+    * Convex Hull
+    * Delaunay triangulations
+    * Travelling Salesman Problem (TSP)
+
+* Since some of the problems are NP hard, the paper considers approximate solutions whereever the exact solutions are not feasible to compute.
+
+* The authors used the exact same architecture and model parameters of all the instances of the 3 problems to show the generality of the model.
+
+* The proosed Pointer Nets outperforms LSTMs and LSTMs with attention and can generalise quite well for much larger sequences. 
+
+* Interestingly, the order in which the inputs are fed to the system affects its performance. The authors discussed this apsect in their subsequent paper titled [Order Matters: Sequence To Sequence for Sets](https://arxiv.org/pdf/1511.06391v4.pdf)
diff --git a/site/_site b/site/_site