Add new papers

shagunsodhani · Apr 11, 2021 · eef99dc · eef99dc
1 parent c3bb11f
commit eef99dc
Show file tree

Hide file tree

Showing 12 changed files with 602 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -4,6 +4,11 @@ I am trying a new initiative - a-paper-a-week. This repository will hold all tho
 
 ## List of papers
 
+* [Zero-shot Learning by Generating Task-specific Adapters](https://shagunsodhani.com/papers-I-read/Zero-shot-Learning-by-Generating-Task-specific-Adapters)
+* [HyperNetworks](https://shagunsodhani.com/papers-I-read/HyperNetworks)
+* [Energy-based Models for Continual Learning](https://shagunsodhani.com/papers-I-read/Energy-based-Models-for-Continual-Learning)
+* [GPipe - Easy Scaling with Micro-Batch Pipeline Parallelism](https://shagunsodhani.com/papers-I-read/GPipe-Easy-Scaling-with-Micro-Batch-Pipeline-Parallelism)
+* [Compositional Explanations of Neurons](https://shagunsodhani.com/papers-I-read/Compositional-Explanations-of-Neurons)
 * [Design patterns for container-based distributed systems](https://shagunsodhani.com/papers-I-read/Design-patterns-for-container-based-distributed-systems)
 * [Cassandra - a decentralized structured storage system](https://shagunsodhani.com/papers-I-read/Cassandra-a-decentralized-structured-storage-system)
 * [CAP twelve years later - How the rules have changed](https://shagunsodhani.com/papers-I-read/CAP-twelve-years-later-How-the-rules-have-changed)

diff --git a/site/_config_local.yml b/site/_config_local.yml
@@ -0,0 +1,27 @@
+# Permalinks
+#
+# Use of `relative_permalinks` ensures post links from the index work properly.
+permalink:           '/:title'
+# relative_permalinks: true
+
+# Setup
+title:               'Papers I Read'
+tagline:             'Notes and Summaries'
+description:         'I am trying a new initiative - <i>A Paper A Week</i>. This blog will hold all the notes and summaries.'
+# url:                 'https://shagunsodhani.in/test'
+baseurl:             ''
+paginate:            5
+gems:                [jekyll-paginate]
+
+# About/contact
+author:
+  name:              Shagun Sodhani
+  url:               https://shagunsodhani.in
+  email:             [email protected]
+
+# Custom vars
+version:             1.0.0
+str_continue_reading: " Continue reading"
+
+github:
+  repo: https://github.com/shagunsodhani/papers-I-read
diff --git a/site/_config_server.yml b/site/_config_server.yml
@@ -0,0 +1,27 @@
+# Permalinks
+#
+# Use of `relative_permalinks` ensures post links from the index work properly.
+permalink:           '/:title'
+# relative_permalinks: true
+
+# Setup
+title:               'Papers I Read'
+tagline:             'Notes and Summaries'
+description:         'I am trying a new initiative - <i>A Paper A Week</i>. This blog will hold all the notes and summaries.'
+# url:                 'https://shagunsodhani.in/test'
+baseurl:             'https://shagunsodhani.in/papers-I-read'
+paginate:            5
+gems:                [jekyll-paginate]
+
+# About/contact
+author:
+  name:              Shagun Sodhani
+  url:               https://shagunsodhani.in
+  email:             [email protected]
+
+# Custom vars
+version:             1.0.0
+str_continue_reading: " Continue reading"
+
+github:
+  repo: https://github.com/shagunsodhani/papers-I-read
diff --git a/site/_oldposts/2013-12-31-whats-jekyll.md b/site/_oldposts/2013-12-31-whats-jekyll.md
@@ -0,0 +1,13 @@
+---
+layout: post
+title: What's Jekyll?
+comments: True
+---
+
+[Jekyll](http://jekyllrb.com) is a static site generator, an open-source tool for creating simple yet powerful websites of all shapes and sizes. From [the project's readme](https://github.com/mojombo/jekyll/blob/master/README.markdown):
+
+  > Jekyll is a simple, blog aware, static site generator. It takes a template directory [...] and spits out a complete, static website suitable for serving with Apache or your favorite web server. This is also the engine behind GitHub Pages, which you can use to host your project’s page or blog right here from GitHub.
+
+It's an immensely useful tool and one we encourage you to use here with Lanyon.
+
+Find out more by [visiting the project on GitHub](https://github.com/mojombo/jekyll).
diff --git a/site/_oldposts/2014-01-01-example-content.md b/site/_oldposts/2014-01-01-example-content.md
@@ -0,0 +1,123 @@
+---
+layout: post
+title: Example content
+comments: True
+---
+
+
+<div class="message">
+  Howdy! This is an example blog post that shows several types of HTML content supported in this theme.
+</div>
+
+Cum sociis natoque penatibus et magnis <a href="#">dis parturient montes</a>, nascetur ridiculus mus. *Aenean eu leo quam.* Pellentesque ornare sem lacinia quam venenatis vestibulum. Sed posuere consectetur est at lobortis. Cras mattis consectetur purus sit amet fermentum.
+
+> Curabitur blandit tempus porttitor. Nullam quis risus eget urna mollis ornare vel eu leo. Nullam id dolor id nibh ultricies vehicula ut id elit.
+
+Etiam porta **sem malesuada magna** mollis euismod. Cras mattis consectetur purus sit amet fermentum. Aenean lacinia bibendum nulla sed consectetur.
+
+## Inline HTML elements
+
+HTML defines a long list of available inline tags, a complete list of which can be found on the [Mozilla Developer Network](https://developer.mozilla.org/en-US/docs/Web/HTML/Element).
+
+- **To bold text**, use `<strong>`.
+- *To italicize text*, use `<em>`.
+- Abbreviations, like <abbr title="HyperText Markup Langage">HTML</abbr> should use `<abbr>`, with an optional `title` attribute for the full phrase.
+- Citations, like <cite>&mdash; Mark otto</cite>, should use `<cite>`.
+- <del>Deleted</del> text should use `<del>` and <ins>inserted</ins> text should use `<ins>`.
+- Superscript <sup>text</sup> uses `<sup>` and subscript <sub>text</sub> uses `<sub>`.
+
+Most of these elements are styled by browsers with few modifications on our part.
+
+## Heading
+
+Vivamus sagittis lacus vel augue rutrum faucibus dolor auctor. Duis mollis, est non commodo luctus, nisi erat porttitor ligula, eget lacinia odio sem nec elit. Morbi leo risus, porta ac consectetur ac, vestibulum at eros.
+
+### Code
+
+Cum sociis natoque penatibus et magnis dis `code element` montes, nascetur ridiculus mus.
+
+{% highlight js %}
+// Example can be run directly in your JavaScript console
+
+// Create a function that takes two arguments and returns the sum of those arguments
+var adder = new Function("a", "b", "return a + b");
+
+// Call the function
+adder(2, 6);
+// > 8
+{% endhighlight %}
+
+Aenean lacinia bibendum nulla sed consectetur. Etiam porta sem malesuada magna mollis euismod. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa.
+
+### Lists
+
+Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Aenean lacinia bibendum nulla sed consectetur. Etiam porta sem malesuada magna mollis euismod. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus.
+
+* Praesent commodo cursus magna, vel scelerisque nisl consectetur et.
+* Donec id elit non mi porta gravida at eget metus.
+* Nulla vitae elit libero, a pharetra augue.
+
+Donec ullamcorper nulla non metus auctor fringilla. Nulla vitae elit libero, a pharetra augue.
+
+1. Vestibulum id ligula porta felis euismod semper.
+2. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.
+3. Maecenas sed diam eget risus varius blandit sit amet non magna.
+
+Cras mattis consectetur purus sit amet fermentum. Sed posuere consectetur est at lobortis.
+
+<dl>
+  <dt>HyperText Markup Language (HTML)</dt>
+  <dd>The language used to describe and define the content of a Web page</dd>
+
+  <dt>Cascading Style Sheets (CSS)</dt>
+  <dd>Used to describe the appearance of Web content</dd>
+
+  <dt>JavaScript (JS)</dt>
+  <dd>The programming language used to build advanced Web sites and applications</dd>
+</dl>
+
+Integer posuere erat a ante venenatis dapibus posuere velit aliquet. Morbi leo risus, porta ac consectetur ac, vestibulum at eros. Nullam quis risus eget urna mollis ornare vel eu leo.
+
+### Tables
+
+Aenean lacinia bibendum nulla sed consectetur. Lorem ipsum dolor sit amet, consectetur adipiscing elit.
+
+<table>
+  <thead>
+    <tr>
+      <th>Name</th>
+      <th>Upvotes</th>
+      <th>Downvotes</th>
+    </tr>
+  </thead>
+  <tfoot>
+    <tr>
+      <td>Totals</td>
+      <td>21</td>
+      <td>23</td>
+    </tr>
+  </tfoot>
+  <tbody>
+    <tr>
+      <td>Alice</td>
+      <td>10</td>
+      <td>11</td>
+    </tr>
+    <tr>
+      <td>Bob</td>
+      <td>4</td>
+      <td>3</td>
+    </tr>
+    <tr>
+      <td>Charlie</td>
+      <td>7</td>
+      <td>9</td>
+    </tr>
+  </tbody>
+</table>
+
+Nullam id dolor id nibh ultricies vehicula ut id elit. Sed posuere consectetur est at lobortis. Nullam quis risus eget urna mollis ornare vel eu leo.
+
+-----
+
+Want to see something else added? <a href="https://github.com/poole/poole/issues/new">Open an issue.</a>
diff --git a/site/_oldposts/2014-01-02-introducing-lanyon.md b/site/_oldposts/2014-01-02-introducing-lanyon.md
@@ -0,0 +1,41 @@
+---
+layout: post
+title: Introducing Lanyon
+comments: True
+excerpt: Lanyon is an unassuming Jekyll theme that places content first by tucking
+tags: [Hello]
+---
+
+Lanyon is an unassuming [Jekyll](http://jekyllrb.com) theme that places content first by tucking away navigation in a hidden drawer. It's based on [Poole](http://getpoole.com), the Jekyll butler.
+
+### Built on Poole
+
+Poole is the Jekyll Butler, serving as an upstanding and effective foundation for Jekyll themes by [@mdo](https://twitter.com/mdo). Poole, and every theme built on it (like Lanyon here) includes the following:
+
+* Complete Jekyll setup included (layouts, config, [404](/404), [RSS feed](/atom.xml), posts, and [example page](/about))
+* Mobile friendly design and development
+* Easily scalable text and component sizing with `rem` units in the CSS
+* Support for a wide gamut of HTML elements
+* Related posts (time-based, because Jekyll) below each post
+* Syntax highlighting, courtesy Pygments (the Python-based code snippet highlighter)
+
+### Lanyon features
+
+In addition to the features of Poole, Lanyon adds the following:
+
+* Toggleable sliding sidebar (built with only CSS) via **☰** link in top corner
+* Sidebar includes support for textual modules and a dynamically generated navigation with active link support
+* Two orientations for content and sidebar, default (left sidebar) and [reverse](https://github.com/poole/lanyon#reverse-layout) (right sidebar), available via `<body>` classes
+* [Eight optional color schemes](https://github.com/poole/lanyon#themes), available via `<body>` classes
+
+[Head to the readme](https://github.com/poole/lanyon#readme) to learn more.
+
+### Browser support
+
+Lanyon is by preference a forward-thinking project. In addition to the latest versions of Chrome, Safari (mobile and desktop), and Firefox, it is only compatible with Internet Explorer 9 and above.
+
+### Download
+
+Lanyon is developed on and hosted with GitHub. Head to the <a href="https://github.com/poole/lanyon">GitHub repository</a> for downloads, bug reports, and features requests.
+
+Thanks!
diff --git a/site/_posts/2021-01-04-Compositional Explanations of Neurons.md b/site/_posts/2021-01-04-Compositional Explanations of Neurons.md
@@ -0,0 +1,94 @@
+---
+layout: post
+title: Compositional Explanations of Neurons
+comments: True
+excerpt: 
+tags: ['2020', 'Natural Language Inference', 'NeurIPS 2020', AI, Compositionality, Explainability, Interpretability, NeurIPS, NLI]    
+
+---
+
+## Introduction
+
+* The paper describes a method to explain/interpret the representations learned by individual neurons in deep neural networks.
+
+* The explanations are generated by searching for logical forms defined by a set of composition operators (like OR, AND, NOT) over primitive concepts (like water).
+
+* [Link to the paper](https://arxiv.org/abs/2006.14032)
+
+## Generating compositional explanations
+
+* Given a neural network *f*, the goal is to explain a neuron's behavior (of this network) in human-understandable terms.
+
+* [Previous work](http://netdissect.csail.mit.edu/) builds on the idea that a good explanation is a description that identifies the inputs for which the neuron activates.
+
+* Given a set of pre-defined atomic concepts $c \in C$ and a similarity measure $\delta(n, c)$ where $n$ represents the activation of the $n^{th}$ neuron, the explanation, for the $n^{th}$ neuron, is the concept most similar to $n$.
+
+* For images, a concept could be represented as an image segmentation map. For example, the water concept can be represented by the segments of the images that show water.
+
+* The similarity can be measured by first thresholding the neuron activations (to get a neuron mask) and then computing the IoU score (or Jaccard Similarity) between the neuron mask and the concept.
+
+* One limitation of this approach is that the explanations are restricted to pre-defined concepts.
+
+* The paper expands the set of candidate concepts by considering the logical forms of the atomics concepts. 
+
+* In theory, the search space would explode exponentially. In practice, it is restricted to explanations with at most $N$ atomics concepts, and beam search is performed (instead of exhaustive search).
+
+## Setup
+
+* **Image Classification Setup**
+
+    * Neurons from the final 512-unit convolutional layer of a ResNet-18 trained on the [Places365 dataset](https://ieeexplore.ieee.org/abstract/document/7968387).
+
+    * Probing for concepts from [ADE20k scenes dataset](https://openaccess.thecvf.com/content_cvpr_2017/html/Zhou_Scene_Parsing_Through_CVPR_2017_paper.html) with atomic concepts defined by annotations in the [Broden dataset](http://netdissect.csail.mit.edu/)
+
+* **NLI Setup**
+
+    * BiLSTM baseline followed by MLP layers trained on [Stanford Natural Language Inference (SNLI) corpus](https://nlp.stanford.edu/projects/snli/).
+
+    * Probing the penultimate hidden layer (of the MLP component) for sentence-level explanations.
+
+    * Concepts are created using the 2000 most common words in the validation split of the SNLI dataset.
+
+    * Additional concepts are created based on the lexical overlap between premise and hypothesis.
+
+## Do neurons learn compositional concepts
+
+* **Image Classification Setup**
+
+    * As $N$ increases, the mean IoU increases (i.e., the explanation quality increases) though the returns become diminishing beyond $N=10$.
+
+    * Manual inspection of 128 neurons and their length 10 explanations show that 69% neurons learned some meaningful combination of concepts, while 31% learned some unrelated concepts.
+
+    * The meaningful combination of concepts include:
+
+        * perceptual abstraction that is also lexically coherent (e.g., "skyscraper OR lighthouse OR water tower").
+
+        * perceptual abstraction that is not lexically coherent (e.g., "cradle OR autobus OR fire escape").
+
+        * specialized abstraction of the form L1 AND NOT L2 (e.g. (water OR river) AND NOT blue).
+
+* **NLI Setup**
+
+    * As $N$ increases, the mean IoU increases (as in the image classification setup) though the IoU keeps increasing past $N=30$.
+
+    * Many neurons correspond to lexical features. For example, some neurons are gender-sensitive or activate for verbs like sitting, eating or sleeping. Some neurons are activated when the lexical overlap between premise and hypothesis is high. 
+
+## Do interpretable neurons contribute to model accuracy?
+
+* In image classification setup, the more interpretable the neuron is, the more accurate is the model (when the neuron is active).
+
+* However, the opposite trend is seen in NLI models. i.e., the more interpretable neurons are less accurate.
+
+* Key takeaway - interpretability (as measured by the paper) is not correlated with performance. Given a concept space, the identified behaviors may be correlated or anti-correlated with the model's performance. 
+
+## Targeting explanations to change model behavior
+
+* The idea is to construct examples that activate (or inhibit) certain neurons, causing a change in the model's predictions.
+
+* These adversarial examples are referred to as "copy-paste" adversarial examples. 
+
+* For example, the neuron corresponding to "(water OR river) AND (NOT blue)" is a major contributor for detecting "swimming hole" classes. An adversarial example is created by making the water blue. This prompts the model to predict "grotto" instead of "swimming hole."
+
+* Similarly, in the NLI model, a neuron detects the word "nobody" in the hypothesis as highly indicative of contradiction. An adversarial example can be created by adding the word "nobody" to the hypothesis, prompting the model to predict contradiction while the true label should be neutral. 
+
+* These observations support the hypothesis that one can use explanations to create adversarial examples.
diff --git a/..._posts/2021-01-11-GPipe - Easy Scaling with Micro-Batch Pipeline Parallelism.md b/..._posts/2021-01-11-GPipe - Easy Scaling with Micro-Batch Pipeline Parallelism.md
@@ -0,0 +1,50 @@
+---
+layout: post
+title: GPipe - Easy Scaling with Micro-Batch Pipeline Parallelism
+comments: True
+excerpt: 
+tags: ['2018', 'Distributed Computing', 'Model Parallelism', 'NeurIPS 2019', AI, Engineering, NeurIPS, Scale, Systems]
+
+---
+
+## Introduction
+
+* The paper introduces GPipe, a pipeline parallelism library for scaling networks that can be expressed as a sequence of layers.
+
+* [Link to the paper](https://arxiv.org/abs/1811.06965)
+
+## Design
+
+* Consider training a deep neural network with *L* layers using *K* accelerators (say GPUs).
+
+* Each of the *i<sup>th</sup>* layer has its *forward* function *f<sub>i</sub>*, *backward* function *b<sub>i</sub>*, weights *w<sub>i</sub>* and a cost *c<sub>i</sub>* (say the memory footprint or computational time).
+
+* GPipe partitions this network into *K* cells and places the *i<sup>th</sup>* cell on the *i<sup>th</sup>* accelerator. Output from the *i<sup>th</sup>* accelerator is passed to the *i+1<sup>th</sup>* accelerator as input.
+
+* During the forward pass, the input batch (of size *N*) is divided into *M* equal micro-batches. These micro-batches are pipelined through the *K* accelerators one after another. 
+
+* During the backward pass, gradients are computed for each micro-batch. The gradients are accumulated and applied at the end of each minibatch.
+
+* In batch normalization, the statistics are computed over each micro-batch (used during training) and mini-batch (used during evaluation).
+
+* Micro-batching improves over the naive mode parallelism approach by reducing the underutilization of resources (due to the network's sequential dependencies).
+
+## Performance Optimization
+
+* GPipe supports re-materialization (or checkpointing), i.e., during the forward pass, only the output activations (at partition boundaries) are stored.
+
+* During backward pass, the forward function is recomputed at each accelerator. This trades off the memory requirement with increased time.
+
+* One potential downside is that partitioning can introduce some idle time per accelerator (referred to as the bubble overhead). However, with a sufficiently large number of micro-batches ( more than 4 times the number of partitions), the bubble overhead is negligible.
+
+## Performance Analysis
+
+* Two different types of model architectures are compared: AmoebaNet convolutional model and Transformer sequence-to-sequence model.
+
+* For AmoebaNet, the size of the largest trainable model (on a single 8GB Cloud TPU v2) increases from 82M to 318M. Further, a 1.8 billion parameter model can be trained on 8 accelerators (25x improvement in size using GPipe).
+
+* For transformers, GPipe scales the model size to 83.9 B parameters with 128 partitions (298x improvement in size compared to a single accelerator).
+
+* Since the computation is evenly distributed across transformer layers, the training throughput scales almost linearly with the number of devices. 
+
+* Quantitative experiments on ImageNet and multilingual machine translation show that models can be effectively trained using GPipe.