Skip to content

Commit

Permalink
Added pointer networks paper
Browse files Browse the repository at this point in the history
  • Loading branch information
shagunsodhani committed Aug 27, 2017
1 parent 58ae3af commit f9ff24d
Show file tree
Hide file tree
Showing 3 changed files with 47 additions and 1 deletion.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ I am trying a new initiative - a-paper-a-week. This repository will hold all tho

## List of papers

* [Pointer Networks](https://shagunsodhani.in/papers-I-read/Pointer-Networks)
* [Learning to Compute Word Embeddings On the Fly](https://shagunsodhani.in/papers-I-read/Learning-to-Compute-Word-Embeddings-On-the-Fly)
* [R-NET - Machine Reading Comprehension with Self-matching Networks](https://shagunsodhani.in/papers-I-read/R-NET-Machine-Reading-Comprehension-with-Self-matching-Networks)
* [ReasoNet - Learning to Stop Reading in Machine Comprehension](https://shagunsodhani.in/papers-I-read/ReasoNet-Learning-to-Stop-Reading-in-Machine-Comprehension)
Expand Down
45 changes: 45 additions & 0 deletions site/_posts/2017-08-27-Pointer Networks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
layout: post
title: Pointer Networks
comments: True
excerpt: The paper introduces a novel architecture that generates an output sequence such that the elements of the output sequence are discrete tokens corresponding to positions in the input sequence.
tags: ['2015', 'NIPS 2015', 'Seq2Seq', AI, NIPS, NLP, Softmax]
---

## Introduction

* The paper introduces a novel architecture that generates an output sequence such that the elements of the output sequence are discrete tokens corresponding to positions in the input sequence.

* Such a problem can not be solved using [Seq2Seq](https://gist.github.com/shagunsodhani/a2915921d7d0ac5cfd0e379025acfb9f) or Neural Turing Machines as the size of the output softmax is variable (as it depends on the size of the input sequence).

* [Link to the paper](https://arxiv.org/abs/1506.03134)

## Architecture

* Traditional attention-base sequence-to-sequence models compute an attention vector for each step of the output decoder and use that to blend the individual context vectors of the input into a single, consolidated attention vector. This attention vector is used to compute a fixed size softmax.

* In Pointer Nets, the normalized attention vector (over all the tokens in the input sequence) is normalized and treated as the softmax output over the input tokens.

* So Pointer Net is a very simple modification of the attention model.

## Application

* Any problem where the size of the output depends on the size of the input because of which fixed length softmax is ruled out.

* eg combinatorial problems such as planar convex hull where the size of the output would depend on the size of the input.

## Evaluation

* The paper considers the following 3 problems:

* Convex Hull
* Delaunay triangulations
* Travelling Salesman Problem (TSP)

* Since some of the problems are NP hard, the paper considers approximate solutions whereever the exact solutions are not feasible to compute.

* The authors used the exact same architecture and model parameters of all the instances of the 3 problems to show the generality of the model.

* The proosed Pointer Nets outperforms LSTMs and LSTMs with attention and can generalise quite well for much larger sequences.

* Interestingly, the order in which the inputs are fed to the system affects its performance. The authors discussed this apsect in their subsequent paper titled [Order Matters: Sequence To Sequence for Sets](https://arxiv.org/pdf/1511.06391v4.pdf)
2 changes: 1 addition & 1 deletion site/_site
Submodule _site updated from 6d3c0d to 694ba0

0 comments on commit f9ff24d

Please sign in to comment.