Skip to content

Commit

Permalink
Added GECA paper
Browse files Browse the repository at this point in the history
  • Loading branch information
shagunsodhani committed May 26, 2019
1 parent d5833c3 commit 05e9d77
Show file tree
Hide file tree
Showing 3 changed files with 37 additions and 1 deletion.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ I am trying a new initiative - a-paper-a-week. This repository will hold all tho

## List of papers

* [Good-Enough Compositional Data Augmentation](https://shagunsodhani.com/papers-I-read/Good-Enough-Compositional-Data-Augmentation)
* [Towards a natural benchmark for continual learning](https://shagunsodhani.com/papers-I-read/Towards-a-natural-benchmark-for-continual-learning)
* [Meta-Learning Update Rules for Unsupervised Representation Learning](https://shagunsodhani.com/papers-I-read/Meta-Learning-Update-Rules-for-Unsupervised-Representation-Learning)
* [GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks](https://shagunsodhani.com/papers-I-read/GNN-Explainer-A-Tool-for-Post-hoc-Explanation-of-Graph-Neural-Networks)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
layout: post
title: Good-Enough Compositional Data Augmentation
comments: True
excerpt:
tags: ['2019', 'Data Augmentation', 'Sequential models', AI, Compositionality, NLP]
---

## Introduction

* The paper introduces a simple data augmentation protocol that provides a good compositional inductive bias for sequential models.

* Synthetic examples are created by taking real sequences and replacing the fragments in sequences which appear in similar environments. This operation is referred to as GECA (Good Enough Compositional Augmentation).

* The underlying idea is that if two fragments of training examples occur in some environment, then any environment where the first fragment appears is also a valid environment for the second fragment.

* [Link to the paper](https://arxiv.org/abs/1904.09545)

## Approach

* Discover substitutable fragments (ie pairs of fragments that co-occur with a common fragment) and use them to generate new sequences by swapping fragments.

* The current work uses very simple criteria to decide if fragments are substitutable - fragments should occur in at least one lexical environment that is exactly the same. A lexical environment is the k-word window around each span of the fragment.

* Though the idea can be motivated by work in generative syntax and distributional semantics, it would not hold like a physical law when applied to the real data.

* The authors view this tradeoff as a balance between the shortage of training data vs relative frequency of mistake in the proposed data augmentation approach.

## Results

* The approach is evaluated on the SCAN dataset when the model is trained on the short sequence of English commands. Though the dataset augmentation helps the baseline models, it is not surprising given the nature of the SCAN dataset.

* More challenging tasks (for evaluating the proposed approach) are semantic parsing (where the query is represented in the form of λ calculus or SQL and low resource language modeling. While the improvement (in terms of metrics) is sometimes limited, the gains are consistent across different datasets.

* Given that the proposed approach is relatively simple and straightforward, it appears to be quite promising.
2 changes: 1 addition & 1 deletion site/_site
Submodule _site updated from ce34bc to f92606

0 comments on commit 05e9d77

Please sign in to comment.