diff --git a/README.md b/README.md index d82fb306..87078344 100755 --- a/README.md +++ b/README.md @@ -5,6 +5,7 @@ I am trying a new initiative - a-paper-a-week. This repository will hold all tho ## List of papers +* [Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks](https://shagunsodhani.com/papers-I-read/Set-Transformer-A-Framework-for-Attention-based-Permutation-Invariant-Neural-Networks) * [Measuring abstract reasoning in neural networks](https://shagunsodhani.com/papers-I-read/Measuring-Abstract-Reasoning-in-Neural-Networks) * [Hamiltonian Neural Networks](https://shagunsodhani.com/papers-I-read/Hamiltonian-Neural-Networks) * [Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations](https://shagunsodhani.com/papers-I-read/Extrapolating-Beyond-Suboptimal-Demonstrations-via-Inverse-Reinforcement-Learning-from-Observations) diff --git a/site/_posts/2019-06-01-Relational Reinforcement Learning.md b/site/_posts/2019-06-01-Relational Reinforcement Learning.md index 32bfb929..18e6f747 100755 --- a/site/_posts/2019-06-01-Relational Reinforcement Learning.md +++ b/site/_posts/2019-06-01-Relational Reinforcement Learning.md @@ -3,7 +3,7 @@ layout: post title: Relational Reinforcement Learning comments: True excerpt: -tags: ['2018', 'Deep Reinforcement Learning', 'ICLR 2019', Reinforcement Learning', 'Relational Learning', AI, ICLR, RL, RRL] +tags: ['2018', 'Deep Reinforcement Learning', 'ICLR 2019', 'Reinforcement Learning', 'Relational Learning', AI, ICLR, RL, RRL] --- diff --git a/site/_posts/2019-07-18-Set Transformer A Framework for Attention-based Permutation-Invariant Neural Networks.md b/site/_posts/2019-07-18-Set Transformer A Framework for Attention-based Permutation-Invariant Neural Networks.md new file mode 100755 index 00000000..3a7698b6 --- /dev/null +++ b/site/_posts/2019-07-18-Set Transformer A Framework for Attention-based Permutation-Invariant Neural Networks.md @@ -0,0 +1,69 @@ +--- +layout: post +title: Set Transformer - A Framework for Attention-based Permutation-Invariant Neural Networks +comments: True +excerpt: +tags: ['2018', 'ICML 2019', 'Relation Learning', 'Relational Learning', AI, ICML, Set] + +--- + +## Introduction + +* Consider problems where the input to the model is a set. In such problems (referred to as the set-input problems), the model should be invariant to the permutation of the data points. + +* In "set pooling" methods ([1](https://arxiv.org/abs/1606.02185), [2](https://arxiv.org/abs/1703.06114)), each data point (in the input set) is encoded using a feed-forward network and the resulting set of encoded representations are pooled using the "sum" operator. + +* This approach can be shown to be bot permutation-invariant and a universal function approximator. + +* The paper proposes an attention-based network module, called as the Set Transformer, which can model the interactions between the elements of an input set while being permutation invariant. + +* [Link to the paper](https://arxiv.org/abs/1810.00825) + +## Transformer + +* An attention function *Attn(Q, K, V) = (QKT)V* is used to map queries *Q* to output using key-value pairs *K, V*. + +* In case of multi-head attention, the key, query, and value are projected into *h* different vectors and attention is applied on all these vectors. The output is a linear transformation of the concatenation of all the vectors. + +## Set Transformer + +* 3 modules are introduced: MAB, SAB and ISAB. + +* Multihead Attention Block (MAB) is a module very similar to to the encoder in the Transformer, without the positional encoding and dropout. + +* Set Attention Block (SAB) is a module that takes as input a set and performs self-attention between the elements of the set to produce another set of the same size ie *SAB(X) = MAB(X, X)*. + +* The time complexity of the SAB operation is *O(n2)* where *n* is the number of elements in the set. It can be reduced to *O(m\*n)* by using Induced Set Attention Blocks (ISAB) with *m* induced point vectors (denoted as I). + +* *ISABm = MAB(X, MAB(I, X))*. + +* ISAB can be seen as performing a low-rank projection of inputs. + +* These modules can be used to model the interactions between data points in any given set. + +## Pooling by Multihead Attention (PMA) + +* Aggregation is performed by applying multi-head attention on a set of *k* seed vectors. + +* The interaction between the *k* outputs (from PMA) can be modeled by applying another SAB. + +* Thus the entire network is a stack of SABs and ISABs. Both the modules are permutation invariant and so is any network obtained by stacking them. + +## Experiments + +* Datasets include: + + * Predicting the maximum value from a set. + * Counting unique (Omniglot) characters from an image. + * Clustering with a mixture of Gaussians (synthetic points and CIFAR 100). + * Set Anomaly detection (celebA). + +* Generally, increasing *m* (the number of inducing datapoints) improve performance, to some extent. This is somewhat expected. + +* The paper considers various ablations of the proposed approach (like disabling attention in the encoder or pooling layer) and shows that attention mechanism is needed during both the stages. + +* The work has two main benefits over prior work: + + * Reducing the *O(n2)* complexity to *O(m\*n)* complexity. + + * Using self-attention mechanism for both encodings the inputs and for aggregating the encoded representations. \ No newline at end of file