diff --git a/.gitignore b/.gitignore old mode 100644 new mode 100755 diff --git a/README.md b/README.md old mode 100644 new mode 100755 index 8fdf3c88..7515068e --- a/README.md +++ b/README.md @@ -5,6 +5,7 @@ I am trying a new initiative - a-paper-a-week. This repository will hold all tho ## List of papers +* [Refining Source Representations with Relation Networks for Neural Machine Translation](https://shagunsodhani.in/papers-I-read/Refining-Source-Representations-with-Relation-Networks-for-Neural-Machine-Translation) * [Pointer Networks](https://shagunsodhani.in/papers-I-read/Pointer-Networks) * [Learning to Compute Word Embeddings On the Fly](https://shagunsodhani.in/papers-I-read/Learning-to-Compute-Word-Embeddings-On-the-Fly) * [R-NET - Machine Reading Comprehension with Self-matching Networks](https://shagunsodhani.in/papers-I-read/R-NET-Machine-Reading-Comprehension-with-Self-matching-Networks) diff --git a/assets/BatchNormalization/eq1.png b/assets/BatchNormalization/eq1.png old mode 100644 new mode 100755 diff --git a/assets/BatchNormalization/eq2.png b/assets/BatchNormalization/eq2.png old mode 100644 new mode 100755 diff --git a/assets/FewThingsAboutML/BiasVarianceDiagram.png b/assets/FewThingsAboutML/BiasVarianceDiagram.png old mode 100644 new mode 100755 diff --git a/assets/RNTN/MVRNN.png b/assets/RNTN/MVRNN.png old mode 100644 new mode 100755 diff --git a/assets/RNTN/P1RNTN.png b/assets/RNTN/P1RNTN.png old mode 100644 new mode 100755 diff --git a/assets/RNTN/P2RNTN.png b/assets/RNTN/P2RNTN.png old mode 100644 new mode 100755 diff --git a/assets/RNTN/ParseTreeMVRNN.png b/assets/RNTN/ParseTreeMVRNN.png old mode 100644 new mode 100755 diff --git a/assets/RNTN/RNN.png b/assets/RNTN/RNN.png old mode 100644 new mode 100755 diff --git a/assets/RNTN/RNNModels.png b/assets/RNTN/RNNModels.png old mode 100644 new mode 100755 diff --git a/site/404.html b/site/404.html old mode 100644 new mode 100755 diff --git a/site/LICENSE.md b/site/LICENSE.md old mode 100644 new mode 100755 diff --git a/site/README.md b/site/README.md old mode 100644 new mode 100755 diff --git a/site/_config.yml b/site/_config.yml old mode 100644 new mode 100755 diff --git a/site/_includes/comments.html b/site/_includes/comments.html old mode 100644 new mode 100755 diff --git a/site/_includes/google_analytics.html b/site/_includes/google_analytics.html old mode 100644 new mode 100755 diff --git a/site/_includes/head.html b/site/_includes/head.html old mode 100644 new mode 100755 diff --git a/site/_includes/sidebar.html b/site/_includes/sidebar.html old mode 100644 new mode 100755 diff --git a/site/_layouts/default.html b/site/_layouts/default.html old mode 100644 new mode 100755 diff --git a/site/_layouts/page.html b/site/_layouts/page.html old mode 100644 new mode 100755 diff --git a/site/_layouts/post.html b/site/_layouts/post.html old mode 100644 new mode 100755 diff --git a/site/_posts/2017-04-27-VQA Visual Question Answering.md b/site/_posts/2017-04-27-VQA Visual Question Answering.md old mode 100644 new mode 100755 diff --git a/site/_posts/2017-04-28-Simple Baseline for Visual Question Answering.md b/site/_posts/2017-04-28-Simple Baseline for Visual Question Answering.md old mode 100644 new mode 100755 diff --git a/site/_posts/2017-05-07-Conditional Similarity Networks.md b/site/_posts/2017-05-07-Conditional Similarity Networks.md old mode 100644 new mode 100755 diff --git a/site/_posts/2017-05-14-Making the V in VQA Matter - Elevating the Role of Image Understanding in Visual Question Answering.md b/site/_posts/2017-05-14-Making the V in VQA Matter - Elevating the Role of Image Understanding in Visual Question Answering.md old mode 100644 new mode 100755 diff --git a/site/_posts/2017-05-23-Neural Module Networks.md b/site/_posts/2017-05-23-Neural Module Networks.md old mode 100644 new mode 100755 diff --git a/site/_posts/2017-06-03-A Fast and Accurate Dependency Parser using Neural Networks.md b/site/_posts/2017-06-03-A Fast and Accurate Dependency Parser using Neural Networks.md old mode 100644 new mode 100755 diff --git a/site/_posts/2017-06-17-A Decomposable Attention Model for Natural Language Inference.md b/site/_posts/2017-06-17-A Decomposable Attention Model for Natural Language Inference.md old mode 100644 new mode 100755 diff --git a/site/_posts/2017-06-26-Two-Too Simple Adaptations of Word2Vec for Syntax Problems.md b/site/_posts/2017-06-26-Two-Too Simple Adaptations of Word2Vec for Syntax Problems.md old mode 100644 new mode 100755 diff --git a/site/_posts/2017-07-01-One Model To Learn Them All.md b/site/_posts/2017-07-01-One Model To Learn Them All.md old mode 100644 new mode 100755 diff --git a/site/_posts/2017-07-09-Ask Me Anything: Dynamic Memory Networks for Natural Language Processing.md b/site/_posts/2017-07-09-Ask Me Anything: Dynamic Memory Networks for Natural Language Processing.md deleted file mode 100644 index 76a77847..00000000 --- a/site/_posts/2017-07-09-Ask Me Anything: Dynamic Memory Networks for Natural Language Processing.md +++ /dev/null @@ -1,97 +0,0 @@ ---- -layout: post -title: Ask Me Anything - Dynamic Memory Networks for Natural Language Processing -comments: True -excerpt: Dynamic Memory Networks (DMN) is a neural network based general framework that can be used for tasks like sequence tagging, classification, sequence to sequence and question answering requiring transitive reasoning. -tags: ['2016', AI, Attention, Machine Comprehension, NLP, POS, QA, Sentiment Analysis, SOTA] ---- - -## Introduction - -* Dynamic Memory Networks (DMN) is a neural network based general framework that can be used for tasks like sequence tagging, classification, sequence to sequence and question answering requiring transitive reasoning. - -* The basic idea is that all these tasks can be modelled as question answering task in general and a common architecture could be used for solving them. - -* [Link to the paper](https://arxiv.org/abs/1506.07285) - -## Architecture - -* DMN takes as input a document(sentence, story, article etc) and a question which is to be answered given the document. - -### Input Module - -* Concatenate all the sentences (or facts) in the document and encode them by feeding the word embeddings of the text to a GRU. - -* Each time a sentence ends, extract the hidden representation of the GRU till that point and use as the encoded representation of the sentence. - -### Question Module - -* Similarly, feed the question to a GRU to obtain its representation. - -### Episodic Memory Module - -* Episodic memory consists of an attention mechanism and a recurrent network with which it updates its memory. - -* During each iteration, the network generates an episode *e* by attending over the representation of the sentences, question and the previous memory. - -* The episodic memory is updated using the current episode and the previous memory. - -* Depending on the amount of supervision available, the network may perform multiple passes. eg, in the bAbI dataset, some tasks specify how many passes would be needed and which sentence should be attended to in each pass. For others, a fixed number of passes are made. - -* Multiple passes allow the network to perform transitive inference. - -### Attention Mechanism - -* Given the input representation *c*, memory *m* and question *q*, produce a scalar score using a 2-layer feedforward network, to use as attention mechanism. - -* A separate GRU encodes the input representation and weights it by the attention. - -* Final state of the GRU is fed to the answer module. - -### Answer Module - -* Use a GRU (initialized with the final state of the episodic module) and at each timestep, feed it the question vector, last hidden state of the same GRU and the previously predicted output. - -### Training - -* There are two possible losses: - * Cross-entropy loss of the predicted answer (all datasets) - * Cross-entropy loss of the attention supervision (for datasets like bAbI) - -## Experiments - -### Question Answering - -* bAbI Dataset - -* For most tasks, DMN either outperforms or performs as good as Memory Networks. - -* For tasks like answering with 2 or 3 supporting facts, DMN lags because of limitation of RNN in modelling long sentences. - -### Text Classification - -* Stanford Sentiment Treebank Dataset - -* DMN outperforms all the baselines for both binary and fine-grained sentiment analysis. - -### Sequence Tagging - -* Wall Street Journal Dataset - -* DMN archives state of the art accuracy of 97.56% - -## Observations - -* Multiple passes help in reasoning tasks but not so much for sentiment/POS tags. - -* Attention in the case of 2-iteration DMN is more focused than attention in 1-iteration DMN. - -* For 2-iteration DMN, attention in the second iteration focuses only on relevant words and less attention is paid to words that lose their relevance in the context of the entire document. - -## Notes - -* It would be interesting to put some mechanism in place to determine the number of episodes that should be generated before an answer is predicted. A naive way would be to predict the answer after each episode and check if the softmax score of the predicted answer is more than a threshold. - -* Alternatively, the softmax score and other information could be fed to a Reinforcement Learning (RL) agent which decided if the document should be read again. So every time an episode is generated, the state is passed to the RL agent which decides if another iteration should be performed. If it decides to predict the answer and correct answer is generated, the agent gets a large +ve reward else a large -ve reward. - -* To discourage unnecessary iterations, a small -ve reward could be given everytime the agent decides to perform another iteration. \ No newline at end of file diff --git a/site/_posts/2017-07-17-Principled Detection of Out of Distribution Examples in Neural Networks.md b/site/_posts/2017-07-17-Principled Detection of Out of Distribution Examples in Neural Networks.md old mode 100644 new mode 100755 diff --git a/site/_posts/2017-07-24-ReasoNet - Learning to Stop Reading in Machine Comprehension.md b/site/_posts/2017-07-24-ReasoNet - Learning to Stop Reading in Machine Comprehension.md old mode 100644 new mode 100755 diff --git a/site/_posts/2017-08-07-R-NET - Machine Reading Comprehension with Self-matching Networks.md b/site/_posts/2017-08-07-R-NET - Machine Reading Comprehension with Self-matching Networks.md old mode 100644 new mode 100755 diff --git a/site/_posts/2017-08-21-Learning to Compute Word Embeddings On the Fly.md b/site/_posts/2017-08-21-Learning to Compute Word Embeddings On the Fly.md old mode 100644 new mode 100755 diff --git a/site/_posts/2017-08-27-Pointer Networks.md b/site/_posts/2017-08-27-Pointer Networks.md old mode 100644 new mode 100755 diff --git a/site/_posts/2017-09-22-Refining Source Representations with Relation Networks for Neural Machine Translation.md b/site/_posts/2017-09-22-Refining Source Representations with Relation Networks for Neural Machine Translation.md new file mode 100644 index 00000000..b4d8ae06 --- /dev/null +++ b/site/_posts/2017-09-22-Refining Source Representations with Relation Networks for Neural Machine Translation.md @@ -0,0 +1,67 @@ +--- +layout: post +title: Refining Source Representations with Relation Networks for Neural Machine Translation +comments: True +excerpt: +tags: ['2017', 'Relational Network', 'Representation Learning', AI, NLP, NMT] +--- + +## Introduction + +* The paper introduces Relation Network (RN) that refines the encoding representation of the given source document (or sentence). +* This refined source representation can then be used in Neural Machine Translation (NMT) systems to counter the problem of RNNs forgetting old information. +* [Link to the paper](https://arxiv.org/abs/1709.03980) + +## Limitations of existing NMT models + +* The RNN encoder-decoder architecture is the standard choice for NMT systems. But the RNNs are prone to forgetting old information. +* In NMT models, the attention is modeled in the unit of words while the use of phrases (instead of words) would be a better choice. +* While NMT systems might be able to capture certain relationships between words, they are not explicitly designed to capture such information. + +## Contributions of the paper + +* Learn the relationship between the source words using the context (neighboring words). +* Relation Networks (RNs) build pairwise relations between source words using the representations generated by the RNNs. The RN would sit between the encoder and the attention layer of the encoder-decoder framework thereby keeping the main architecture unaffected. + +## Relation Network + +* Neural network which is desgined for relational reasoning. +* Given a set of inputs * O = o1, ..., on *, RN is formed as a composition of inputs: + RN(O) = f(sum(g(oi, oj))), f and g are functions used to learn the relations (feed forward networks) +* *g* learns how the objects are related hence the name "relation". +* **Components**: + * CNN Layer + * Extract information from the words surrounding the given word (context). + * The final output of this layer is the sequence of vectors for different kernel width. + + * Graph Propagation (GP) Layer + * Connect all the words with each other in the form of a graph. + * Each output vector from the CNN corresponds to a node in the graph and there is an edge between all possible pair of nodes. + * The information flows between the nodes of the graph in a message passing sort of fashion (graph propagation) to obtain a new set of vectors for each node. + + * Multi-Layer Perceptron (MLP) Layer + * The representation from the GP Layer is fed to the MLP layer. + * The layer uses residual connections from previous layers in form of concatenation. + +## Datasets + +* IWSLT Data - 44K sentences from tourism and travel domain. +* NIST Data - 1M Chinese-English parallel sentence pairs. + +## Models + +* MOSES - Open source translation system - http://www.statmt.org/moses/ +* NMT - Attention based NMT +* NMT+ - NMT with improved decoder +* TRANSFORMER - Google's new NMT +* RNMT+ - Relation Network integrated with NMT+ + +## Evaluation Metric + +* case-insensitive 4-gram BLEU score + +## Observations + +* As sentences become larger (more than 50 words), RNMT clearly outperforms other baselines. +* Qualitative evaluation shows that RNMT+ model captures the word alignment better than the NMT+ models. +* Similarly, NMT+ system tends to miss some information from the source sentence (more so for longer sentences). While both CNNs and RNNs are weak at capturing long-term dependency, using the relation layer mitigates this issue to some extent. diff --git a/site/_site b/site/_site index 694ba0cd..ff50790b 160000 --- a/site/_site +++ b/site/_site @@ -1 +1 @@ -Subproject commit 694ba0cd2df7a146f947a37a3e615f29682109d3 +Subproject commit ff50790b67fa5b1ac34ff41c801a4704720ea81e diff --git a/site/archieve.md b/site/archieve.md old mode 100644 new mode 100755 diff --git a/site/atom.xml b/site/atom.xml old mode 100644 new mode 100755 diff --git a/site/index.html b/site/index.html old mode 100644 new mode 100755 diff --git a/site/index.html.1 b/site/index.html.1 old mode 100644 new mode 100755 diff --git a/site/public/apple-touch-icon-precomposed.png b/site/public/apple-touch-icon-precomposed.png old mode 100644 new mode 100755 diff --git a/site/public/css/lanyon.css b/site/public/css/lanyon.css old mode 100644 new mode 100755 diff --git a/site/public/css/poole.css b/site/public/css/poole.css old mode 100644 new mode 100755 diff --git a/site/public/css/style.css b/site/public/css/style.css old mode 100644 new mode 100755 diff --git a/site/public/css/syntax.css b/site/public/css/syntax.css old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/HELP-US-OUT.txt b/site/public/font-awesome-4.7.0/HELP-US-OUT.txt old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/css/font-awesome.css b/site/public/font-awesome-4.7.0/css/font-awesome.css old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/css/font-awesome.min.css b/site/public/font-awesome-4.7.0/css/font-awesome.min.css old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/fonts/FontAwesome.otf b/site/public/font-awesome-4.7.0/fonts/FontAwesome.otf old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.eot b/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.eot old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.svg b/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.svg old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.ttf b/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.ttf old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.woff b/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.woff old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.woff2 b/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.woff2 old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/less/animated.less b/site/public/font-awesome-4.7.0/less/animated.less old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/less/bordered-pulled.less b/site/public/font-awesome-4.7.0/less/bordered-pulled.less old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/less/core.less b/site/public/font-awesome-4.7.0/less/core.less old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/less/fixed-width.less b/site/public/font-awesome-4.7.0/less/fixed-width.less old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/less/font-awesome.less b/site/public/font-awesome-4.7.0/less/font-awesome.less old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/less/icons.less b/site/public/font-awesome-4.7.0/less/icons.less old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/less/larger.less b/site/public/font-awesome-4.7.0/less/larger.less old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/less/list.less b/site/public/font-awesome-4.7.0/less/list.less old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/less/mixins.less b/site/public/font-awesome-4.7.0/less/mixins.less old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/less/path.less b/site/public/font-awesome-4.7.0/less/path.less old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/less/rotated-flipped.less b/site/public/font-awesome-4.7.0/less/rotated-flipped.less old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/less/screen-reader.less b/site/public/font-awesome-4.7.0/less/screen-reader.less old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/less/stacked.less b/site/public/font-awesome-4.7.0/less/stacked.less old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/less/variables.less b/site/public/font-awesome-4.7.0/less/variables.less old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/scss/_animated.scss b/site/public/font-awesome-4.7.0/scss/_animated.scss old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/scss/_bordered-pulled.scss b/site/public/font-awesome-4.7.0/scss/_bordered-pulled.scss old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/scss/_core.scss b/site/public/font-awesome-4.7.0/scss/_core.scss old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/scss/_fixed-width.scss b/site/public/font-awesome-4.7.0/scss/_fixed-width.scss old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/scss/_icons.scss b/site/public/font-awesome-4.7.0/scss/_icons.scss old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/scss/_larger.scss b/site/public/font-awesome-4.7.0/scss/_larger.scss old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/scss/_list.scss b/site/public/font-awesome-4.7.0/scss/_list.scss old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/scss/_mixins.scss b/site/public/font-awesome-4.7.0/scss/_mixins.scss old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/scss/_path.scss b/site/public/font-awesome-4.7.0/scss/_path.scss old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/scss/_rotated-flipped.scss b/site/public/font-awesome-4.7.0/scss/_rotated-flipped.scss old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/scss/_screen-reader.scss b/site/public/font-awesome-4.7.0/scss/_screen-reader.scss old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/scss/_stacked.scss b/site/public/font-awesome-4.7.0/scss/_stacked.scss old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/scss/_variables.scss b/site/public/font-awesome-4.7.0/scss/_variables.scss old mode 100644 new mode 100755 diff --git a/site/public/font-awesome-4.7.0/scss/font-awesome.scss b/site/public/font-awesome-4.7.0/scss/font-awesome.scss old mode 100644 new mode 100755 diff --git a/site/tags.md b/site/tags.md old mode 100644 new mode 100755