From 5bc0ac5e7e93a5252550e461b940f178b0699f63 Mon Sep 17 00:00:00 2001
From: Shagun Sodhani <sshagunsodhani@gmail.com>
Date: Sun, 12 Feb 2023 14:02:47 -0500
Subject: [PATCH] Add toolformer paper

---
 README.md                                     | 442 +++++++++---------
 ...odels Can Teach Themselves to Use Tools.md |   2 +-
 site/_site                                    |   2 +-
 3 files changed, 224 insertions(+), 222 deletions(-)

diff --git a/README.md b/README.md
index bcd610bf..d8f5e39c 100755
--- a/README.md
+++ b/README.md
@@ -3,223 +3,225 @@
 I am trying a new initiative - a-paper-a-week. This repository will hold all those papers and related summaries and notes.
 
 ## List of papers
-* [Hints for Computer System Design](https://shagunsodhani.com/papers-I-read/Hints-for-Computer-System-Design)
-* [Synthesized Policies for Transfer and Adaptation across Tasks and Environments](https://shagunsodhani.com/papers-I-read/Synthesized-Policies-for-Transfer-and-Adaptation-across-Tasks-and-Environments)
-* [Deep Neural Networks for YouTube Recommendations](https://shagunsodhani.com/papers-I-read/Deep-Neural-Networks-for-YouTube-Recommendations)
-* [The Tail at Scale](https://shagunsodhani.com/papers-I-read/The-Tail-at-Scale)
-* [Practical Lessons from Predicting Clicks on Ads at Facebook](https://shagunsodhani.com/papers-I-read/Practical-Lessons-from-Predicting-Clicks-on-Ads-at-Facebook)
-* [Ad Click Prediction - a View from the Trenches](https://shagunsodhani.com/papers-I-read/Ad-Click-Prediction-a-View-from-the-Trenches)
-* [Anatomy of Catastrophic Forgetting - Hidden Representations and Task Semantics](https://shagunsodhani.com/papers-I-read/Anatomy-of-Catastrophic-Forgetting-Hidden-Representations-and-Task-Semantics)
-* [When Do Curricula Work?](https://shagunsodhani.com/papers-I-read/When-Do-Curricula-Work)
-* [Continual learning with hypernetworks](https://shagunsodhani.com/papers-I-read/Continual-learning-with-hypernetworks)
-* [Zero-shot Learning by Generating Task-specific Adapters](https://shagunsodhani.com/papers-I-read/Zero-shot-Learning-by-Generating-Task-specific-Adapters)
-* [HyperNetworks](https://shagunsodhani.com/papers-I-read/HyperNetworks)
-* [Energy-based Models for Continual Learning](https://shagunsodhani.com/papers-I-read/Energy-based-Models-for-Continual-Learning)
-* [GPipe - Easy Scaling with Micro-Batch Pipeline Parallelism](https://shagunsodhani.com/papers-I-read/GPipe-Easy-Scaling-with-Micro-Batch-Pipeline-Parallelism)
-* [Compositional Explanations of Neurons](https://shagunsodhani.com/papers-I-read/Compositional-Explanations-of-Neurons)
-* [Design patterns for container-based distributed systems](https://shagunsodhani.com/papers-I-read/Design-patterns-for-container-based-distributed-systems)
-* [Cassandra - a decentralized structured storage system](https://shagunsodhani.com/papers-I-read/Cassandra-a-decentralized-structured-storage-system)
-* [CAP twelve years later - How the rules have changed](https://shagunsodhani.com/papers-I-read/CAP-twelve-years-later-How-the-rules-have-changed)
-* [Consistency Tradeoffs in Modern Distributed Database System Design](https://shagunsodhani.com/papers-I-read/Consistency-Tradeoffs-in-Modern-Distributed-Database-System-Design)
-* [Exploring Simple Siamese Representation Learning](https://shagunsodhani.com/papers-I-read/Exploring-Simple-Siamese-Representation-Learning)
-* [Data Management for Internet-Scale Single-Sign-On](https://shagunsodhani.com/papers-I-read/Data-Management-for-Internet-Scale-Single-Sign-On)
-* [Searching for Build Debt - Experiences Managing Technical Debt at Google](https://shagunsodhani.com/papers-I-read/Searching-for-Build-Debt-Experiences-Managing-Technical-Debt-at-Google)
-* [One Solution is Not All You Need - Few-Shot Extrapolation via Structured MaxEnt RL](https://shagunsodhani.com/papers-I-read/One-Solution-is-Not-All-You-Need-Few-Shot-Extrapolation-via-Structured-MaxEnt-RL)
-* [Learning Explanations That Are Hard To Vary](https://shagunsodhani.com/papers-I-read/Learning-Explanations-That-Are-Hard-To-Vary)
-* [Remembering for the Right Reasons - Explanations Reduce Catastrophic Forgetting](https://shagunsodhani.com/papers-I-read/Remembering-for-the-Right-Reasons-Explanations-Reduce-Catastrophic-Forgetting)
-* [A Foliated View of Transfer Learning](https://shagunsodhani.com/papers-I-read/A-Foliated-View-of-Transfer-Learning)
-* [Harvest, Yield, and Scalable Tolerant Systems](https://shagunsodhani.com/papers-I-read/Harvest,-Yield,-and-Scalable-Tolerant-Systems)
-* [MONet - Unsupervised Scene Decomposition and Representation](https://shagunsodhani.com/papers-I-read/MONet-Unsupervised-Scene-Decomposition-and-Representation)
-* [Revisiting Fundamentals of Experience Replay](https://shagunsodhani.com/papers-I-read/Revisiting-Fundamentals-of-Experience-Replay)
-* [Deep Reinforcement Learning and the Deadly Triad](https://shagunsodhani.com/papers-I-read/Deep-Reinforcement-Learning-and-the-Deadly-Triad)
-* [Alpha Net: Adaptation with Composition in Classifier Space](https://shagunsodhani.com/papers-I-read/Alpha-Net-Adaptation-with-Composition-in-Classifier-Space)
-* [Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer](https://shagunsodhani.com/papers-I-read/Outrageously-Large-Neural-Networks-The-Sparsely-Gated-Mixture-of-Experts-Layer)
-* [Gradient Surgery for Multi-Task Learning](https://shagunsodhani.com/papers-I-read/Gradient-Surgery-for-Multi-Task-Learning)
-* [GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks](https://shagunsodhani.com/papers-I-read/GradNorm-Gradient-Normalization-for-Adaptive-Loss-Balancing-in-Deep-Multitask-Networks)
-* [TaskNorm: Rethinking Batch Normalization for Meta-Learning](https://shagunsodhani.com/papers-I-read/TASKNORM-Rethinking-Batch-Normalization-for-Meta-Learning)
-* [Averaging Weights leads to Wider Optima and Better Generalization](https://shagunsodhani.com/papers-I-read/Averaging-Weights-leads-to-Wider-Optima-and-Better-Generalization)
-* [Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions](https://shagunsodhani.com/papers-I-read/Decentralized-Reinforcement-Learning-Global-Decision-Making-via-Local-Economic-Transactions)
-* [When to use parametric models in reinforcement learning?](https://shagunsodhani.com/papers-I-read/When-to-use-parametric-models-in-reinforcement-learning)
-* [Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Network-Randomization-A-Simple-Technique-for-Generalization-in-Deep-Reinforcement-Learning)
-* [On the Difficulty of Warm-Starting Neural Network Training](https://shagunsodhani.com/papers-I-read/On-the-Difficulty-of-Warm-Starting-Neural-Network-Training)
-* [Supervised Contrastive Learning](https://shagunsodhani.com/papers-I-read/Supervised-Contrastive-Learning)
-* [CURL - Contrastive Unsupervised Representations for Reinforcement Learning](https://shagunsodhani.com/papers-I-read/CURL-Contrastive-Unsupervised-Representations-for-Reinforcement-Learning)
-* [Competitive Training of Mixtures of Independent Deep Generative Models](https://shagunsodhani.com/papers-I-read/Competitive-Training-of-Mixtures-of-Independent-Deep-Generative-Models)
-* [What Does Classifying More Than 10,000 Image Categories Tell Us?](https://shagunsodhani.com/papers-I-read/What-Does-Classifying-More-Than-10,000-Image-Categories-Tell-Us)
-* [mixup - Beyond Empirical Risk Minimization](https://shagunsodhani.com/papers-I-read/mixup-Beyond-Empirical-Risk-Minimization)
-* [ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators](https://shagunsodhani.com/papers-I-read/ELECTRA-Pre-training-Text-Encoders-as-Discriminators-Rather-Than-Generators)
-* [Gradient based sample selection for online continual learning](https://shagunsodhani.com/papers-I-read/Gradient-based-sample-selection-for-online-continual-learning)
-* [Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One](https://shagunsodhani.com/papers-I-read/Your-Classifier-is-Secretly-an-Energy-Based-Model,-and-You-Should-Treat-it-Like-One)
-* [Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges](https://shagunsodhani.com/papers-I-read/Massively-Multilingual-Neural-Machine-Translation-in-the-Wild-Findings-and-Challenges)
-* [Observational Overfitting in Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Observational-Overfitting-in-Reinforcement-Learning)
-* [Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML](https://shagunsodhani.com/papers-I-read/Rapid-Learning-or-Feature-Reuse-Towards-Understanding-the-Effectiveness-of-MAML)
-* [Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour](https://shagunsodhani.com/papers-I-read/Accurate-Large-Minibatch-SGD-Training-ImageNet-in-1-Hour)
-* [Superposition of many models into one](https://shagunsodhani.com/papers-I-read/Superposition-of-many-models-into-one)
-* [Towards a Unified Theory of State Abstraction for MDPs](https://shagunsodhani.com/papers-I-read/Towards-a-Unified-Theory-of-State-Abstraction-for-MDPs)
-* [ALBERT - A Lite BERT for Self-supervised Learning of Language Representations](https://shagunsodhani.com/papers-I-read/ALBERT-A-Lite-BERT-for-Self-supervised-Learning-of-Language-Representations)
-* [Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model](https://shagunsodhani.com/papers-I-read/Mastering-Atari,-Go,-Chess-and-Shogi-by-Planning-with-a-Learned-Model)
-* [Contrastive Learning of Structured World Models](https://shagunsodhani.com/papers-I-read/Contrastive-Learning-of-Structured-World-Models)
-* [Gossip based Actor-Learner Architectures for Deep RL](https://shagunsodhani.com/papers-I-read/Gossip-based-Actor-Learner-Architectures-for-Deep-RL)
-* [How to train your MAML](https://shagunsodhani.com/papers-I-read/How-to-train-your-MAML)
-* [PHYRE - A New Benchmark for Physical Reasoning](https://shagunsodhani.com/papers-I-read/PHYRE-A-New-Benchmark-for-Physical-Reasoning)
-* [Large Memory Layers with Product Keys](https://shagunsodhani.com/papers-I-read/Large-Memory-Layers-with-Product-Keys)
-* [Abductive Commonsense Reasoning](https://shagunsodhani.com/papers-I-read/Abductive-Commonsense-Reasoning)
-* [Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models](https://shagunsodhani.com/papers-I-read/Deep-Reinforcement-Learning-in-a-Handful-of-Trials-using-Probabilistic-Dynamics-Models)
-* [Assessing Generalization in Deep Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Assessing-Generalization-in-Deep-Reinforcement-Learning)
-* [Quantifying Generalization in Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Quantifying-Generalization-in-Reinforcement-Learning)
-* [Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks](https://shagunsodhani.com/papers-I-read/Set-Transformer-A-Framework-for-Attention-based-Permutation-Invariant-Neural-Networks)
-* [Measuring abstract reasoning in neural networks](https://shagunsodhani.com/papers-I-read/Measuring-Abstract-Reasoning-in-Neural-Networks)
-* [Hamiltonian Neural Networks](https://shagunsodhani.com/papers-I-read/Hamiltonian-Neural-Networks)
-* [Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations](https://shagunsodhani.com/papers-I-read/Extrapolating-Beyond-Suboptimal-Demonstrations-via-Inverse-Reinforcement-Learning-from-Observations)
-* [Meta-Reinforcement Learning of Structured Exploration Strategies](https://shagunsodhani.com/papers-I-read/Meta-Reinforcement-Learning-of-Structured-Exploration-Strategies)
-* [Relational Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Relational-Reinforcement-Learning)
-* [Good-Enough Compositional Data Augmentation](https://shagunsodhani.com/papers-I-read/Good-Enough-Compositional-Data-Augmentation)
-* [Multiple Model-Based Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Multiple-Model-Based-Reinforcement-Learning)
-* [Towards a natural benchmark for continual learning](https://shagunsodhani.com/papers-I-read/Towards-a-natural-benchmark-for-continual-learning)
-* [Meta-Learning Update Rules for Unsupervised Representation Learning](https://shagunsodhani.com/papers-I-read/Meta-Learning-Update-Rules-for-Unsupervised-Representation-Learning)
-* [GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks](https://shagunsodhani.com/papers-I-read/GNN-Explainer-A-Tool-for-Post-hoc-Explanation-of-Graph-Neural-Networks)
-* [To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks](https://shagunsodhani.com/papers-I-read/To-Tune-or-Not-to-Tune-Adapting-Pretrained-Representations-to-Diverse-Tasks)
-* [Model Primitive Hierarchical Lifelong Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Model-Primitive-Hierarchical-Lifelong-Reinforcement-Learning)
-* [TuckER - Tensor Factorization for Knowledge Graph Completion](https://shagunsodhani.com/papers-I-read/TuckER-Tensor-Factorization-for-Knowledge-Graph-Completion)
-* [Linguistic Knowledge as Memory for Recurrent Neural Networks](https://shagunsodhani.com/papers-I-read/Linguistic-Knowledge-as-Memory-for-Recurrent-Neural-Networks)
-* [Diversity is All You Need - Learning Skills without a Reward Function](https://shagunsodhani.com/papers-I-read/Diversity-is-All-You-Need-Learning-Skills-without-a-Reward-Function)
-* [Modular meta-learning](https://shagunsodhani.com/papers-I-read/Modular-meta-learning)
-* [Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies](https://shagunsodhani.com/papers-I-read/Hierarchical-RL-Using-an-Ensemble-of-Proprioceptive-Periodic-Policies)
-* [Efficient Lifelong Learningi with A-GEM](https://shagunsodhani.com/papers-I-read/Efficient-Lifelong-Learning-with-A-GEM)
-* [Pre-training Graph Neural Networks with Kernels](https://shagunsodhani.com/papers-I-read/Pre-training-Graph-Neural-Networks-with-Kernels)
-* [Smooth Loss Functions for Deep Top-k Classification](https://shagunsodhani.com/papers-I-read/Smooth-Loss-Functions-for-Deep-Top-k-Classification)
-* [Hindsight Experience Replay](https://shagunsodhani.com/papers-I-read/Hindsight-Experience-Replay)
-* [Representation Tradeoffs for Hyperbolic Embeddings](https://shagunsodhani.com/papers-I-read/Representation-Tradeoffs-for-Hyperbolic-Embeddings)
-* [Learned Optimizers that Scale and Generalize](https://shagunsodhani.com/papers-I-read/Learned-Optimizers-that-Scale-and-Generalize)
-* [One-shot Learning with Memory-Augmented Neural Networks](https://shagunsodhani.com/papers-I-read/One-shot-Learning-with-Memory-Augmented-Neural-Networks)
-* [BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop](https://shagunsodhani.com/papers-I-read/BabyAI-First-Steps-Towards-Grounded-Language-Learning-With-a-Human-In-the-Loop)
-* [Poincaré Embeddings for Learning Hierarchical Representations](https://shagunsodhani.com/papers-I-read/Poincare-Embeddings-for-Learning-Hierarchical-Representations)
-* [When Recurrent Models Don’t Need To Be Recurrent](https://shagunsodhani.com/papers-I-read/When-Recurrent-Models-Don-t-Need-To-Be-Recurrent)
-* [HoME - a Household Multimodal Environment](https://shagunsodhani.com/papers-I-read/HoME-a-Household-Multimodal-Environment)
-* [Emergence of Grounded Compositional Language in Multi-Agent Populations](https://shagunsodhani.com/papers-I-read/Emergence-of-Grounded-Compositional-Language-in-Multi-Agent-Populations)
-* [A Semantic Loss Function for Deep Learning with Symbolic Knowledge](https://shagunsodhani.com/papers-I-read/A-Semantic-Loss-Function-for-Deep-Learning-with-Symbolic-Knowledge)
-* [Hierarchical Graph Representation Learning with Differentiable Pooling](https://shagunsodhani.com/papers-I-read/Hierarchical-Graph-Representation-Learning-with-Differentiable-Pooling)
-* [Imagination-Augmented Agents for Deep Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Imagination-Augmented-Agents-for-Deep-Reinforcement-Learning)
-* [Kronecker Recurrent Units](https://shagunsodhani.com/papers-I-read/Kronecker-Recurrent-Units)
-* [Learning Independent Causal Mechanisms](https://shagunsodhani.com/papers-I-read/Learning-Independent-Causal-Mechanisms)
-* [Memory-based Parameter Adaptation](https://shagunsodhani.com/papers-I-read/Memory-Based-Parameter-Adaption)
-* [Born Again Neural Networks](https://shagunsodhani.com/papers-I-read/Born-Again-Neural-Networks)
-* [Net2Net-Accelerating Learning via Knowledge Transfer](https://shagunsodhani.com/papers-I-read/Net2Net-Accelerating-Learning-via-Knowledge-Transfer)
-* [Learning to Count Objects in Natural Images for Visual Question Answering](https://shagunsodhani.com/papers-I-read/Learning-to-Count-Objects-in-Natural-Images-for-Visual-Question-Answering)
-* [Neural Message Passing for Quantum Chemistry](https://shagunsodhani.com/papers-I-read/Neural-Message-Passing-for-Quantum-Chemistry)
-* [Unsupervised Learning by Predicting Noise](https://shagunsodhani.com/papers-I-read/Unsupervised-Learning-By-Predicting-Noise)
-* [The Lottery Ticket Hypothesis - Training Pruned Neural Networks](https://shagunsodhani.com/papers-I-read/The-Lottery-Ticket-Hypothesis-Training-Pruned-Neural-Networks)
-* [Cyclical Learning Rates for Training Neural Networks](https://shagunsodhani.com/papers-I-read/Cyclical-Learning-Rates-for-Training-Neural-Networks)
-* [Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Improving-Information-Extraction-by-Acquiring-External-Evidence-with-Reinforcement-Learning)
-* [An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks](https://shagunsodhani.com/papers-I-read/An-Empirical-Investigation-of-Catastrophic-Forgetting-in-Gradient-Based-Neural-Networks)
-* [Learning an SAT Solver from Single-Bit Supervision](https://shagunsodhani.com/papers-I-read/Learning-a-SAT-Solver-from-Single-Bit-Supervision)
-* [Neural Relational Inference for Interacting Systems](https://shagunsodhani.com/papers-I-read/Neural-Relational-Inference-for-Interacting-Systems)
-* [Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks](https://shagunsodhani.com/papers-I-read/Stylistic-Transfer-in-Natural-Language-Generation-Systems-Using-Recurrent-Neural-Networks)
-* [Get To The Point: Summarization with Pointer-Generator Networks](https://shagunsodhani.com/papers-I-read/Get-To-The-Point-Summarization-with-Pointer-Generator-Networks)
-* [StarSpace - Embed All The Things!](https://shagunsodhani.com/papers-I-read/StarSpace-Embed-All-The-Things)
-* [Emotional Chatting Machine - Emotional Conversation Generation with Internal and External Memory](https://shagunsodhani.com/papers-I-read/Emotional-Chatting-Machine-Emotional-Conversation-Generation-with-Internal-and-External-Memory)
-* [Exploring Models and Data for Image Question Answering](https://shagunsodhani.com/papers-I-read/Exploring-Models-and-Data-for-Image-Question-Answering)
-* [How transferable are features in deep neural networks](https://shagunsodhani.com/papers-I-read/How-transferable-are-features-in-deep-neural-networks)
-* [Distilling the Knowledge in a Neural Network](https://shagunsodhani.com/papers-I-read/Distilling-the-Knowledge-in-a-Neural-Network)
-* [Revisiting Semi-Supervised Learning with Graph Embeddings](https://shagunsodhani.com/papers-I-read/Revisiting-Semi-Supervised-Learning-with-Graph-Embeddings)
-* [Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension](https://shagunsodhani.com/papers-I-read/Two-Stage-Synthesis-Networks-for-Transfer-Learning-in-Machine-Comprehension)
-* [Higher-order organization of complex networks](https://shagunsodhani.com/papers-I-read/Higher-order-organization-of-complex-networks)
-* [Network Motifs - Simple Building Blocks of Complex Networks](https://shagunsodhani.com/papers-I-read/Network-Motifs-Simple-Building-Blocks-of-Complex-Networks)
-* [Word Representations via Gaussian Embedding](https://shagunsodhani.com/papers-I-read/Word-Representations-via-Gaussian-Embedding)
-* [HARP - Hierarchical Representation Learning for Networks](https://shagunsodhani.com/papers-I-read/HARP-Hierarchical-Representation-Learning-for-Networks)
-* [Swish - a Self-Gated Activation Function](https://shagunsodhani.com/papers-I-read/Swish-A-self-gated-activation-function)
-* [Reading Wikipedia to Answer Open-Domain Questions](https://shagunsodhani.com/papers-I-read/Reading-Wikipedia-to-Answer-Open-Domain-Questions)
-* [Task-Oriented Query Reformulation with Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Task-Oriented-Query-Reformulation-with-Reinforcement-Learning)
-* [Refining Source Representations with Relation Networks for Neural Machine Translation](https://shagunsodhani.com/papers-I-read/Refining-Source-Representations-with-Relation-Networks-for-Neural-Machine-Translation)
-* [Pointer Networks](https://shagunsodhani.com/papers-I-read/Pointer-Networks)
-* [Learning to Compute Word Embeddings On the Fly](https://shagunsodhani.com/papers-I-read/Learning-to-Compute-Word-Embeddings-On-the-Fly)
-* [R-NET - Machine Reading Comprehension with Self-matching Networks](https://shagunsodhani.com/papers-I-read/R-NET-Machine-Reading-Comprehension-with-Self-matching-Networks)
-* [ReasoNet - Learning to Stop Reading in Machine Comprehension](https://shagunsodhani.com/papers-I-read/ReasoNet-Learning-to-Stop-Reading-in-Machine-Comprehension)
-* [Principled Detection of Out-of-Distribution Examples in Neural Networks](https://shagunsodhani.com/papers-I-read/Principled-Detection-of-Out-of-Distribution-Examples-in-Neural-Networks)
-* [Ask Me Anything: Dynamic Memory Networks for Natural Language Processing](https://shagunsodhani.com/papers-I-read/Ask-Me-Anything-Dynamic-Memory-Networks-for-Natural-Language-Processing)
-* [One Model To Learn Them All](https://shagunsodhani.com/papers-I-read/One-Model-To-Learn-Them-All)
-* [Two/Too Simple Adaptations of Word2Vec for Syntax Problems](https://shagunsodhani.com/papers-I-read/Two-Too-Simple-Adaptations-of-Word2Vec-for-Syntax-Problems)
-* [A Decomposable Attention Model for Natural Language Inference](https://shagunsodhani.com/papers-I-read/A-Decomposable-Attention-Model-for-Natural-Language-Inference)
-* [A Fast and Accurate Dependency Parser using Neural Networks](https://shagunsodhani.com/papers-I-read/A-Fast-and-Accurate-Dependency-Parser-using-Neural-Networks)
-* [Neural Module Networks](https://shagunsodhani.com/papers-I-read/Neural-Module-Networks)
-* [Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering](https://shagunsodhani.com/papers-I-read/Making-the-V-in-VQA-Matter-Elevating-the-Role-of-Image-Understanding-in-Visual-Question-Answering)
-* [Conditional Similarity Networks](https://shagunsodhani.com/papers-I-read/Conditional-Similarity-Networks)
-* [Simple Baseline for Visual Question Answering](https://shagunsodhani.com/papers-I-read/Simple-Baseline-for-Visual-Question-Answering)
-* [VQA: Visual Question Answering](https://shagunsodhani.com/papers-I-read/VQA-Visual-Question-Answering)
-* [Learning to Generate Reviews and Discovering Sentiment](https://gist.github.com/shagunsodhani/634dbe1aa678188399254bb3d0078e1d)
-* [Seeing the Arrow of Time](https://gist.github.com/shagunsodhani/828d8de0034a350d97738bbedadc9373)
-* [End-to-end optimization of goal-driven and visually grounded dialogue systems](https://gist.github.com/shagunsodhani/bbbc739e6815ab6217e0cf0a8f706786)
-* [GuessWhat?! Visual object discovery through multi-modal dialogue](https://gist.github.com/shagunsodhani/2418238e6aefd7b1e8c922cda9e10488)
-* [Semantic Parsing via Paraphrasing](https://gist.github.com/shagunsodhani/93c96d7dd0488d0d00bd7078889dd6f6)
-* [Traversing Knowledge Graphs in Vector Space](https://gist.github.com/shagunsodhani/e8e6213906ec2642f27b1aca3a6201c6)
-* [PPDB: The Paraphrase Database](https://gist.github.com/shagunsodhani/fa1f387f084355dfafdf7550b1899af6)
-* [NewsQA: A Machine Comprehension Dataset](https://gist.github.com/shagunsodhani/c47f0d5c1dfe60ce5da0dd8241e506ea)
-* [A Persona-Based Neural Conversation Model](https://gist.github.com/shagunsodhani/8ad464e7d0ea4c7c6ed5189ac4e44095)
-* [“Why Should I Trust You?” Explaining the Predictions of Any Classifier](https://gist.github.com/shagunsodhani/bd744ab6c17a2289ca139ea586d1d65e)
-* [Conditional Generative Adversarial Nets](https://gist.github.com/shagunsodhani/5d726334de3014defeeb701099a3b4b3)
-* [Addressing the Rare Word Problem in Neural Machine Translation](https://gist.github.com/shagunsodhani/a18fe14b74c7292129c6c5ecb37f33b5)
-* [Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models](https://gist.github.com/shagunsodhani/d32e665b27696ce0436c79174a136410)
-* [Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank](https://gist.github.com/shagunsodhani/6ca136088f58d24f7b08056ec8b97595)
-* [Improving Word Representations via Global Context and Multiple Word Prototypes](https://gist.github.com/shagunsodhani/1be86a9bcbd7f120ce55994dcd932bbf)
-* [Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation](https://gist.github.com/shagunsodhani/9dccec626e68e495fd4577ecdca36b7b)
-* [Skip-Thought Vectors](https://gist.github.com/shagunsodhani/4a4eb32de8cabf21bda9a4ada15c46e8)
-* [Deep Convolutional Generative Adversarial Nets](https://gist.github.com/shagunsodhani/aa79796c70565e3761e86d0f932a3de5)
-* [Generative Adversarial Nets](https://gist.github.com/shagunsodhani/1f9dc0444142be8bd8a7404a226880eb)
-* [A Roadmap towards Machine Intelligence](https://gist.github.com/shagunsodhani/9928673525b1713c2d41fd0fac38f81f)
-* [Smart Reply: Automated Response Suggestion for Email](https://gist.github.com/shagunsodhani/da411f15b71ed6a664f9d5ac46409b42)
-* [Convolutional Neural Network For Sentence Classification](https://gist.github.com/shagunsodhani/9ae6d2364c278c97b1b2f4ec53255c56)
-* [Conditional Image Generation with PixelCNN Decoders](https://gist.github.com/shagunsodhani/3cc7066ce7de051d769908b8fab11990)
-* [Pixel Recurrent Neural Networks](https://gist.github.com/shagunsodhani/e741ebd5ba0e0fc0f49d7836e30891a7)
-* [Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps](https://gist.github.com/shagunsodhani/f48da7f77418aa22751ffed115779126)
-* [Bag of Tricks for Efficient Text Classification](https://gist.github.com/shagunsodhani/432746f15889f7f4a798bf7f9ec4b7d8)
-* [GloVe: Global Vectors for Word Representation](https://gist.github.com/shagunsodhani/efea5a42d17e0fcf18374df8e3e4b3e8)
-* [SimRank: A Measure of Structural-Context Similarity](https://gist.github.com/shagunsodhani/6329486212643fd61f58a5a3eb5abb3c)
-* [How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation](https://gist.github.com/shagunsodhani/f05748b6339ceff26420ceecfc79d58d)
-* [Neural Generation of Regular Expressions from Natural Language with Minimal Domain Knowledge](https://gist.github.com/shagunsodhani/004d803bc021f579d4aa3b24cec5b994)
-* [WikiReading : A Novel Large-scale Language Understanding Task over Wikipedia](https://gist.github.com/shagunsodhani/2788ac9dbcac5523cb8b2d0a3d70f2d2)
-* [WikiQA: A challenge dataset for open-domain question answering](https://gist.github.com/shagunsodhani/7cf3677ff2b0028a33e6702fbd260bc5)
-* [Teaching Machines to Read and Comprehend](https://gist.github.com/shagunsodhani/a863eb099bb7a1ab4831cd37bffffb04)
-* [Evaluating Prerequisite Qualities for Learning End-to-end Dialog Systems](https://gist.github.com/shagunsodhani/5e7c40f61c18502eec2809e5cf1ead6b)
-* [Recurrent Neural Network Regularization](https://gist.github.com/shagunsodhani/d66245692b276cd0b6dcbaf43e4211db)
-* [Deep Math: Deep Sequence Models for Premise Selection](https://gist.github.com/shagunsodhani/d8387256f2bb08f39509600f9d7db498)
-* [A Neural Conversational Model](https://gist.github.com/shagunsodhani/ec6835964df0e49fdef0459c8b334b94)
-* [Key-Value Memory Networks for Directly Reading Documents](https://gist.github.com/shagunsodhani/a5e0baa075b4a917c0a69edc575772a8)
-* [Advances In Optimizing Recurrent Networks](https://gist.github.com/shagunsodhani/75dc31e3c7999ad4a1edf4f289deaa88)
-* [Query Regression Networks for Machine Comprehension](https://gist.github.com/shagunsodhani/93caa283af3c151372f4be86ed4c4b99)
-* [Sequence to Sequence Learning with Neural Networks](https://gist.github.com/shagunsodhani/a2915921d7d0ac5cfd0e379025acfb9f)
-* [The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training](https://gist.github.com/shagunsodhani/e3608ccf262d6e5a6b537128c917c92https://gist.github.com/shagunsodhani/bbbc739e6815ab6217e0cf0a8f706786c)
-* [Question Answering with Subgraph Embeddings](https://gist.github.com/shagunsodhani/b65e299ff5f79a4f9da4a2e9281a0676)
-* [Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks](https://gist.github.com/shagunsodhani/12691b76addf149a224c24ab64b5bdcc)
-* [Visualizing Large-scale and High-dimensional Data](https://gist.github.com/shagunsodhani/6c267cf6122399e9be36491a2f510641)
-* [Visualizing Data using t-SNE](https://gist.github.com/shagunsodhani/2153e01d026712ac94a2b4928a2dbf3e)
-* [Curriculum Learning](https://gist.github.com/shagunsodhani/7e4e1c9817c46e3cb1932f62aac8806b)
-* [End-To-End Memory Networks](https://gist.github.com/shagunsodhani/17881da05d9ee1f6539b2baa8067a6ef)
-* [Memory Networks](https://gist.github.com/shagunsodhani/c7a03a47b3d709e7c592fa7011b0f33e)
-* [Learning To Execute](https://gist.github.com/shagunsodhani/b44b29b86cdfe1b6bae4286253f76350)
-* [Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud](https://gist.github.com/shagunsodhani/1bb05a7134c27cffa1e2f57dc6b1c136)
-* [Large Scale Distributed Deep Networks](https://gist.github.com/shagunsodhani/5733fffe6b1a268998bd93f29ec9fbeb)
-* [Efficient Estimation of Word Representations in Vector Space](https://gist.github.com/shagunsodhani/176a283e2c158a75a0a6)
-* [Regularization and variable selection via the elastic net](https://gist.github.com/shagunsodhani/1cd5d136c8ca30432de5)
-* [Fractional Max-Pooling](https://gist.github.com/shagunsodhani/ccfe3134f46fd3738aa0)
-* [TAO: Facebook’s Distributed Data Store for the Social Graph](https://gist.github.com/shagunsodhani/1c91987c2a4a098fa9f1)
-* [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://gist.github.com/shagunsodhani/4441216a298df0fe6ab0)
-* [The Unified Logging Infrastructure for Data Analytics at Twitter](https://gist.github.com/shagunsodhani/0083f8a2d276e026b15c)
-* [A Few Useful Things to Know about Machine Learning](https://gist.github.com/shagunsodhani/5c2cdfc269bf8aa50b72)
-* [Hive – A Petabyte Scale Data Warehouse Using Hadoop](https://gist.github.com/shagunsodhani/b0651ade0dc39aeb7cfd)
-* [Kafka: a Distributed Messaging System for Log Processing](https://medium.com/@shagun/notes-about-kafka-cc6c1b5c5025)
-* [Power-law distributions in Empirical data](https://github.com/shagunsodhani/powerlaw/blob/master/paper/README.md)
-* [Pregel: A System for Large-Scale Graph Processing](https://gist.github.com/shagunsodhani/af9677bdc79bb34be698)
-* [GraphX: Unifying Data-Parallel and Graph-Parallel Analytics](https://gist.github.com/shagunsodhani/c72bc1928aeef40280c9)
-* [Pig Latin: A Not-So-Foreign Language for Data Processing](https://medium.com/@shagun/pig-latin-e840ac23db93)
-* [Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing](https://medium.com/@shagun/resilient-distributed-datasets-97c28c3a9411)
-* [MapReduce: Simplified Data Processing on Large Clusters](https://medium.com/@shagun/mapreduce-1c88f8a7c3d2)
-* [BigTable: A Distributed Storage System for Structured Data](https://medium.com/@shagun/bigtable-bf580262f030)
-* [Spark SQL: Relational Data Processing in Spark](https://medium.com/@shagun/spark-sql-68a6fac271fe)
-* [Spark: Cluster Computing with Working Sets](https://medium.com/@shagun/spark-8ca626d55d21)
-* [Fast Data in the Era of Big Data: Twitter’s Real-Time Related Query Suggestion Architecture](https://medium.com/@shagun/fast-data-in-the-era-of-big-data-e6208e6d3575)
-* [Scaling Memcache at Facebook](https://medium.com/@shagun/scaling-memcache-at-facebook-1ba77d71c082)
-* [Dynamo: Amazon’s Highly Available Key-value Store](https://medium.com/@shagun/dynamo-9665c22a1ddb)
-* [f4 : Facebook's Warm BLOB Storage System](https://medium.com/@shagun/f4-cba2f141cb0c)
-* [A Theoretician’s Guide to the Experimental Analysis of Algorithms](https://medium.com/@shagun/dos-and-dont-s-of-research-fe33322c7aff)
-* [Cuckoo Hashing](https://medium.com/@shagun/cuckoo-hashing-eb160dfab804)
-* [Never Ending Learning](https://medium.com/@shagun/never-ending-learning-e7b78006e713)
+
+- [Toolformer - Language Models Can Teach Themselves to Use Tools](https://shagunsodhani.com/papers-I-read/Toolformer-Language-Models-Can-Teach-Themselves-to-Use-Tools)
+- [Hints for Computer System Design](https://shagunsodhani.com/papers-I-read/Hints-for-Computer-System-Design)
+- [Synthesized Policies for Transfer and Adaptation across Tasks and Environments](https://shagunsodhani.com/papers-I-read/Synthesized-Policies-for-Transfer-and-Adaptation-across-Tasks-and-Environments)
+- [Deep Neural Networks for YouTube Recommendations](https://shagunsodhani.com/papers-I-read/Deep-Neural-Networks-for-YouTube-Recommendations)
+- [The Tail at Scale](https://shagunsodhani.com/papers-I-read/The-Tail-at-Scale)
+- [Practical Lessons from Predicting Clicks on Ads at Facebook](https://shagunsodhani.com/papers-I-read/Practical-Lessons-from-Predicting-Clicks-on-Ads-at-Facebook)
+- [Ad Click Prediction - a View from the Trenches](https://shagunsodhani.com/papers-I-read/Ad-Click-Prediction-a-View-from-the-Trenches)
+- [Anatomy of Catastrophic Forgetting - Hidden Representations and Task Semantics](https://shagunsodhani.com/papers-I-read/Anatomy-of-Catastrophic-Forgetting-Hidden-Representations-and-Task-Semantics)
+- [When Do Curricula Work?](https://shagunsodhani.com/papers-I-read/When-Do-Curricula-Work)
+- [Continual learning with hypernetworks](https://shagunsodhani.com/papers-I-read/Continual-learning-with-hypernetworks)
+- [Zero-shot Learning by Generating Task-specific Adapters](https://shagunsodhani.com/papers-I-read/Zero-shot-Learning-by-Generating-Task-specific-Adapters)
+- [HyperNetworks](https://shagunsodhani.com/papers-I-read/HyperNetworks)
+- [Energy-based Models for Continual Learning](https://shagunsodhani.com/papers-I-read/Energy-based-Models-for-Continual-Learning)
+- [GPipe - Easy Scaling with Micro-Batch Pipeline Parallelism](https://shagunsodhani.com/papers-I-read/GPipe-Easy-Scaling-with-Micro-Batch-Pipeline-Parallelism)
+- [Compositional Explanations of Neurons](https://shagunsodhani.com/papers-I-read/Compositional-Explanations-of-Neurons)
+- [Design patterns for container-based distributed systems](https://shagunsodhani.com/papers-I-read/Design-patterns-for-container-based-distributed-systems)
+- [Cassandra - a decentralized structured storage system](https://shagunsodhani.com/papers-I-read/Cassandra-a-decentralized-structured-storage-system)
+- [CAP twelve years later - How the rules have changed](https://shagunsodhani.com/papers-I-read/CAP-twelve-years-later-How-the-rules-have-changed)
+- [Consistency Tradeoffs in Modern Distributed Database System Design](https://shagunsodhani.com/papers-I-read/Consistency-Tradeoffs-in-Modern-Distributed-Database-System-Design)
+- [Exploring Simple Siamese Representation Learning](https://shagunsodhani.com/papers-I-read/Exploring-Simple-Siamese-Representation-Learning)
+- [Data Management for Internet-Scale Single-Sign-On](https://shagunsodhani.com/papers-I-read/Data-Management-for-Internet-Scale-Single-Sign-On)
+- [Searching for Build Debt - Experiences Managing Technical Debt at Google](https://shagunsodhani.com/papers-I-read/Searching-for-Build-Debt-Experiences-Managing-Technical-Debt-at-Google)
+- [One Solution is Not All You Need - Few-Shot Extrapolation via Structured MaxEnt RL](https://shagunsodhani.com/papers-I-read/One-Solution-is-Not-All-You-Need-Few-Shot-Extrapolation-via-Structured-MaxEnt-RL)
+- [Learning Explanations That Are Hard To Vary](https://shagunsodhani.com/papers-I-read/Learning-Explanations-That-Are-Hard-To-Vary)
+- [Remembering for the Right Reasons - Explanations Reduce Catastrophic Forgetting](https://shagunsodhani.com/papers-I-read/Remembering-for-the-Right-Reasons-Explanations-Reduce-Catastrophic-Forgetting)
+- [A Foliated View of Transfer Learning](https://shagunsodhani.com/papers-I-read/A-Foliated-View-of-Transfer-Learning)
+- [Harvest, Yield, and Scalable Tolerant Systems](https://shagunsodhani.com/papers-I-read/Harvest,-Yield,-and-Scalable-Tolerant-Systems)
+- [MONet - Unsupervised Scene Decomposition and Representation](https://shagunsodhani.com/papers-I-read/MONet-Unsupervised-Scene-Decomposition-and-Representation)
+- [Revisiting Fundamentals of Experience Replay](https://shagunsodhani.com/papers-I-read/Revisiting-Fundamentals-of-Experience-Replay)
+- [Deep Reinforcement Learning and the Deadly Triad](https://shagunsodhani.com/papers-I-read/Deep-Reinforcement-Learning-and-the-Deadly-Triad)
+- [Alpha Net: Adaptation with Composition in Classifier Space](https://shagunsodhani.com/papers-I-read/Alpha-Net-Adaptation-with-Composition-in-Classifier-Space)
+- [Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer](https://shagunsodhani.com/papers-I-read/Outrageously-Large-Neural-Networks-The-Sparsely-Gated-Mixture-of-Experts-Layer)
+- [Gradient Surgery for Multi-Task Learning](https://shagunsodhani.com/papers-I-read/Gradient-Surgery-for-Multi-Task-Learning)
+- [GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks](https://shagunsodhani.com/papers-I-read/GradNorm-Gradient-Normalization-for-Adaptive-Loss-Balancing-in-Deep-Multitask-Networks)
+- [TaskNorm: Rethinking Batch Normalization for Meta-Learning](https://shagunsodhani.com/papers-I-read/TASKNORM-Rethinking-Batch-Normalization-for-Meta-Learning)
+- [Averaging Weights leads to Wider Optima and Better Generalization](https://shagunsodhani.com/papers-I-read/Averaging-Weights-leads-to-Wider-Optima-and-Better-Generalization)
+- [Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions](https://shagunsodhani.com/papers-I-read/Decentralized-Reinforcement-Learning-Global-Decision-Making-via-Local-Economic-Transactions)
+- [When to use parametric models in reinforcement learning?](https://shagunsodhani.com/papers-I-read/When-to-use-parametric-models-in-reinforcement-learning)
+- [Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Network-Randomization-A-Simple-Technique-for-Generalization-in-Deep-Reinforcement-Learning)
+- [On the Difficulty of Warm-Starting Neural Network Training](https://shagunsodhani.com/papers-I-read/On-the-Difficulty-of-Warm-Starting-Neural-Network-Training)
+- [Supervised Contrastive Learning](https://shagunsodhani.com/papers-I-read/Supervised-Contrastive-Learning)
+- [CURL - Contrastive Unsupervised Representations for Reinforcement Learning](https://shagunsodhani.com/papers-I-read/CURL-Contrastive-Unsupervised-Representations-for-Reinforcement-Learning)
+- [Competitive Training of Mixtures of Independent Deep Generative Models](https://shagunsodhani.com/papers-I-read/Competitive-Training-of-Mixtures-of-Independent-Deep-Generative-Models)
+- [What Does Classifying More Than 10,000 Image Categories Tell Us?](https://shagunsodhani.com/papers-I-read/What-Does-Classifying-More-Than-10,000-Image-Categories-Tell-Us)
+- [mixup - Beyond Empirical Risk Minimization](https://shagunsodhani.com/papers-I-read/mixup-Beyond-Empirical-Risk-Minimization)
+- [ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators](https://shagunsodhani.com/papers-I-read/ELECTRA-Pre-training-Text-Encoders-as-Discriminators-Rather-Than-Generators)
+- [Gradient based sample selection for online continual learning](https://shagunsodhani.com/papers-I-read/Gradient-based-sample-selection-for-online-continual-learning)
+- [Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One](https://shagunsodhani.com/papers-I-read/Your-Classifier-is-Secretly-an-Energy-Based-Model,-and-You-Should-Treat-it-Like-One)
+- [Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges](https://shagunsodhani.com/papers-I-read/Massively-Multilingual-Neural-Machine-Translation-in-the-Wild-Findings-and-Challenges)
+- [Observational Overfitting in Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Observational-Overfitting-in-Reinforcement-Learning)
+- [Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML](https://shagunsodhani.com/papers-I-read/Rapid-Learning-or-Feature-Reuse-Towards-Understanding-the-Effectiveness-of-MAML)
+- [Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour](https://shagunsodhani.com/papers-I-read/Accurate-Large-Minibatch-SGD-Training-ImageNet-in-1-Hour)
+- [Superposition of many models into one](https://shagunsodhani.com/papers-I-read/Superposition-of-many-models-into-one)
+- [Towards a Unified Theory of State Abstraction for MDPs](https://shagunsodhani.com/papers-I-read/Towards-a-Unified-Theory-of-State-Abstraction-for-MDPs)
+- [ALBERT - A Lite BERT for Self-supervised Learning of Language Representations](https://shagunsodhani.com/papers-I-read/ALBERT-A-Lite-BERT-for-Self-supervised-Learning-of-Language-Representations)
+- [Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model](https://shagunsodhani.com/papers-I-read/Mastering-Atari,-Go,-Chess-and-Shogi-by-Planning-with-a-Learned-Model)
+- [Contrastive Learning of Structured World Models](https://shagunsodhani.com/papers-I-read/Contrastive-Learning-of-Structured-World-Models)
+- [Gossip based Actor-Learner Architectures for Deep RL](https://shagunsodhani.com/papers-I-read/Gossip-based-Actor-Learner-Architectures-for-Deep-RL)
+- [How to train your MAML](https://shagunsodhani.com/papers-I-read/How-to-train-your-MAML)
+- [PHYRE - A New Benchmark for Physical Reasoning](https://shagunsodhani.com/papers-I-read/PHYRE-A-New-Benchmark-for-Physical-Reasoning)
+- [Large Memory Layers with Product Keys](https://shagunsodhani.com/papers-I-read/Large-Memory-Layers-with-Product-Keys)
+- [Abductive Commonsense Reasoning](https://shagunsodhani.com/papers-I-read/Abductive-Commonsense-Reasoning)
+- [Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models](https://shagunsodhani.com/papers-I-read/Deep-Reinforcement-Learning-in-a-Handful-of-Trials-using-Probabilistic-Dynamics-Models)
+- [Assessing Generalization in Deep Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Assessing-Generalization-in-Deep-Reinforcement-Learning)
+- [Quantifying Generalization in Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Quantifying-Generalization-in-Reinforcement-Learning)
+- [Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks](https://shagunsodhani.com/papers-I-read/Set-Transformer-A-Framework-for-Attention-based-Permutation-Invariant-Neural-Networks)
+- [Measuring abstract reasoning in neural networks](https://shagunsodhani.com/papers-I-read/Measuring-Abstract-Reasoning-in-Neural-Networks)
+- [Hamiltonian Neural Networks](https://shagunsodhani.com/papers-I-read/Hamiltonian-Neural-Networks)
+- [Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations](https://shagunsodhani.com/papers-I-read/Extrapolating-Beyond-Suboptimal-Demonstrations-via-Inverse-Reinforcement-Learning-from-Observations)
+- [Meta-Reinforcement Learning of Structured Exploration Strategies](https://shagunsodhani.com/papers-I-read/Meta-Reinforcement-Learning-of-Structured-Exploration-Strategies)
+- [Relational Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Relational-Reinforcement-Learning)
+- [Good-Enough Compositional Data Augmentation](https://shagunsodhani.com/papers-I-read/Good-Enough-Compositional-Data-Augmentation)
+- [Multiple Model-Based Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Multiple-Model-Based-Reinforcement-Learning)
+- [Towards a natural benchmark for continual learning](https://shagunsodhani.com/papers-I-read/Towards-a-natural-benchmark-for-continual-learning)
+- [Meta-Learning Update Rules for Unsupervised Representation Learning](https://shagunsodhani.com/papers-I-read/Meta-Learning-Update-Rules-for-Unsupervised-Representation-Learning)
+- [GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks](https://shagunsodhani.com/papers-I-read/GNN-Explainer-A-Tool-for-Post-hoc-Explanation-of-Graph-Neural-Networks)
+- [To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks](https://shagunsodhani.com/papers-I-read/To-Tune-or-Not-to-Tune-Adapting-Pretrained-Representations-to-Diverse-Tasks)
+- [Model Primitive Hierarchical Lifelong Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Model-Primitive-Hierarchical-Lifelong-Reinforcement-Learning)
+- [TuckER - Tensor Factorization for Knowledge Graph Completion](https://shagunsodhani.com/papers-I-read/TuckER-Tensor-Factorization-for-Knowledge-Graph-Completion)
+- [Linguistic Knowledge as Memory for Recurrent Neural Networks](https://shagunsodhani.com/papers-I-read/Linguistic-Knowledge-as-Memory-for-Recurrent-Neural-Networks)
+- [Diversity is All You Need - Learning Skills without a Reward Function](https://shagunsodhani.com/papers-I-read/Diversity-is-All-You-Need-Learning-Skills-without-a-Reward-Function)
+- [Modular meta-learning](https://shagunsodhani.com/papers-I-read/Modular-meta-learning)
+- [Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies](https://shagunsodhani.com/papers-I-read/Hierarchical-RL-Using-an-Ensemble-of-Proprioceptive-Periodic-Policies)
+- [Efficient Lifelong Learningi with A-GEM](https://shagunsodhani.com/papers-I-read/Efficient-Lifelong-Learning-with-A-GEM)
+- [Pre-training Graph Neural Networks with Kernels](https://shagunsodhani.com/papers-I-read/Pre-training-Graph-Neural-Networks-with-Kernels)
+- [Smooth Loss Functions for Deep Top-k Classification](https://shagunsodhani.com/papers-I-read/Smooth-Loss-Functions-for-Deep-Top-k-Classification)
+- [Hindsight Experience Replay](https://shagunsodhani.com/papers-I-read/Hindsight-Experience-Replay)
+- [Representation Tradeoffs for Hyperbolic Embeddings](https://shagunsodhani.com/papers-I-read/Representation-Tradeoffs-for-Hyperbolic-Embeddings)
+- [Learned Optimizers that Scale and Generalize](https://shagunsodhani.com/papers-I-read/Learned-Optimizers-that-Scale-and-Generalize)
+- [One-shot Learning with Memory-Augmented Neural Networks](https://shagunsodhani.com/papers-I-read/One-shot-Learning-with-Memory-Augmented-Neural-Networks)
+- [BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop](https://shagunsodhani.com/papers-I-read/BabyAI-First-Steps-Towards-Grounded-Language-Learning-With-a-Human-In-the-Loop)
+- [Poincaré Embeddings for Learning Hierarchical Representations](https://shagunsodhani.com/papers-I-read/Poincare-Embeddings-for-Learning-Hierarchical-Representations)
+- [When Recurrent Models Don’t Need To Be Recurrent](https://shagunsodhani.com/papers-I-read/When-Recurrent-Models-Don-t-Need-To-Be-Recurrent)
+- [HoME - a Household Multimodal Environment](https://shagunsodhani.com/papers-I-read/HoME-a-Household-Multimodal-Environment)
+- [Emergence of Grounded Compositional Language in Multi-Agent Populations](https://shagunsodhani.com/papers-I-read/Emergence-of-Grounded-Compositional-Language-in-Multi-Agent-Populations)
+- [A Semantic Loss Function for Deep Learning with Symbolic Knowledge](https://shagunsodhani.com/papers-I-read/A-Semantic-Loss-Function-for-Deep-Learning-with-Symbolic-Knowledge)
+- [Hierarchical Graph Representation Learning with Differentiable Pooling](https://shagunsodhani.com/papers-I-read/Hierarchical-Graph-Representation-Learning-with-Differentiable-Pooling)
+- [Imagination-Augmented Agents for Deep Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Imagination-Augmented-Agents-for-Deep-Reinforcement-Learning)
+- [Kronecker Recurrent Units](https://shagunsodhani.com/papers-I-read/Kronecker-Recurrent-Units)
+- [Learning Independent Causal Mechanisms](https://shagunsodhani.com/papers-I-read/Learning-Independent-Causal-Mechanisms)
+- [Memory-based Parameter Adaptation](https://shagunsodhani.com/papers-I-read/Memory-Based-Parameter-Adaption)
+- [Born Again Neural Networks](https://shagunsodhani.com/papers-I-read/Born-Again-Neural-Networks)
+- [Net2Net-Accelerating Learning via Knowledge Transfer](https://shagunsodhani.com/papers-I-read/Net2Net-Accelerating-Learning-via-Knowledge-Transfer)
+- [Learning to Count Objects in Natural Images for Visual Question Answering](https://shagunsodhani.com/papers-I-read/Learning-to-Count-Objects-in-Natural-Images-for-Visual-Question-Answering)
+- [Neural Message Passing for Quantum Chemistry](https://shagunsodhani.com/papers-I-read/Neural-Message-Passing-for-Quantum-Chemistry)
+- [Unsupervised Learning by Predicting Noise](https://shagunsodhani.com/papers-I-read/Unsupervised-Learning-By-Predicting-Noise)
+- [The Lottery Ticket Hypothesis - Training Pruned Neural Networks](https://shagunsodhani.com/papers-I-read/The-Lottery-Ticket-Hypothesis-Training-Pruned-Neural-Networks)
+- [Cyclical Learning Rates for Training Neural Networks](https://shagunsodhani.com/papers-I-read/Cyclical-Learning-Rates-for-Training-Neural-Networks)
+- [Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Improving-Information-Extraction-by-Acquiring-External-Evidence-with-Reinforcement-Learning)
+- [An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks](https://shagunsodhani.com/papers-I-read/An-Empirical-Investigation-of-Catastrophic-Forgetting-in-Gradient-Based-Neural-Networks)
+- [Learning an SAT Solver from Single-Bit Supervision](https://shagunsodhani.com/papers-I-read/Learning-a-SAT-Solver-from-Single-Bit-Supervision)
+- [Neural Relational Inference for Interacting Systems](https://shagunsodhani.com/papers-I-read/Neural-Relational-Inference-for-Interacting-Systems)
+- [Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks](https://shagunsodhani.com/papers-I-read/Stylistic-Transfer-in-Natural-Language-Generation-Systems-Using-Recurrent-Neural-Networks)
+- [Get To The Point: Summarization with Pointer-Generator Networks](https://shagunsodhani.com/papers-I-read/Get-To-The-Point-Summarization-with-Pointer-Generator-Networks)
+- [StarSpace - Embed All The Things!](https://shagunsodhani.com/papers-I-read/StarSpace-Embed-All-The-Things)
+- [Emotional Chatting Machine - Emotional Conversation Generation with Internal and External Memory](https://shagunsodhani.com/papers-I-read/Emotional-Chatting-Machine-Emotional-Conversation-Generation-with-Internal-and-External-Memory)
+- [Exploring Models and Data for Image Question Answering](https://shagunsodhani.com/papers-I-read/Exploring-Models-and-Data-for-Image-Question-Answering)
+- [How transferable are features in deep neural networks](https://shagunsodhani.com/papers-I-read/How-transferable-are-features-in-deep-neural-networks)
+- [Distilling the Knowledge in a Neural Network](https://shagunsodhani.com/papers-I-read/Distilling-the-Knowledge-in-a-Neural-Network)
+- [Revisiting Semi-Supervised Learning with Graph Embeddings](https://shagunsodhani.com/papers-I-read/Revisiting-Semi-Supervised-Learning-with-Graph-Embeddings)
+- [Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension](https://shagunsodhani.com/papers-I-read/Two-Stage-Synthesis-Networks-for-Transfer-Learning-in-Machine-Comprehension)
+- [Higher-order organization of complex networks](https://shagunsodhani.com/papers-I-read/Higher-order-organization-of-complex-networks)
+- [Network Motifs - Simple Building Blocks of Complex Networks](https://shagunsodhani.com/papers-I-read/Network-Motifs-Simple-Building-Blocks-of-Complex-Networks)
+- [Word Representations via Gaussian Embedding](https://shagunsodhani.com/papers-I-read/Word-Representations-via-Gaussian-Embedding)
+- [HARP - Hierarchical Representation Learning for Networks](https://shagunsodhani.com/papers-I-read/HARP-Hierarchical-Representation-Learning-for-Networks)
+- [Swish - a Self-Gated Activation Function](https://shagunsodhani.com/papers-I-read/Swish-A-self-gated-activation-function)
+- [Reading Wikipedia to Answer Open-Domain Questions](https://shagunsodhani.com/papers-I-read/Reading-Wikipedia-to-Answer-Open-Domain-Questions)
+- [Task-Oriented Query Reformulation with Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Task-Oriented-Query-Reformulation-with-Reinforcement-Learning)
+- [Refining Source Representations with Relation Networks for Neural Machine Translation](https://shagunsodhani.com/papers-I-read/Refining-Source-Representations-with-Relation-Networks-for-Neural-Machine-Translation)
+- [Pointer Networks](https://shagunsodhani.com/papers-I-read/Pointer-Networks)
+- [Learning to Compute Word Embeddings On the Fly](https://shagunsodhani.com/papers-I-read/Learning-to-Compute-Word-Embeddings-On-the-Fly)
+- [R-NET - Machine Reading Comprehension with Self-matching Networks](https://shagunsodhani.com/papers-I-read/R-NET-Machine-Reading-Comprehension-with-Self-matching-Networks)
+- [ReasoNet - Learning to Stop Reading in Machine Comprehension](https://shagunsodhani.com/papers-I-read/ReasoNet-Learning-to-Stop-Reading-in-Machine-Comprehension)
+- [Principled Detection of Out-of-Distribution Examples in Neural Networks](https://shagunsodhani.com/papers-I-read/Principled-Detection-of-Out-of-Distribution-Examples-in-Neural-Networks)
+- [Ask Me Anything: Dynamic Memory Networks for Natural Language Processing](https://shagunsodhani.com/papers-I-read/Ask-Me-Anything-Dynamic-Memory-Networks-for-Natural-Language-Processing)
+- [One Model To Learn Them All](https://shagunsodhani.com/papers-I-read/One-Model-To-Learn-Them-All)
+- [Two/Too Simple Adaptations of Word2Vec for Syntax Problems](https://shagunsodhani.com/papers-I-read/Two-Too-Simple-Adaptations-of-Word2Vec-for-Syntax-Problems)
+- [A Decomposable Attention Model for Natural Language Inference](https://shagunsodhani.com/papers-I-read/A-Decomposable-Attention-Model-for-Natural-Language-Inference)
+- [A Fast and Accurate Dependency Parser using Neural Networks](https://shagunsodhani.com/papers-I-read/A-Fast-and-Accurate-Dependency-Parser-using-Neural-Networks)
+- [Neural Module Networks](https://shagunsodhani.com/papers-I-read/Neural-Module-Networks)
+- [Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering](https://shagunsodhani.com/papers-I-read/Making-the-V-in-VQA-Matter-Elevating-the-Role-of-Image-Understanding-in-Visual-Question-Answering)
+- [Conditional Similarity Networks](https://shagunsodhani.com/papers-I-read/Conditional-Similarity-Networks)
+- [Simple Baseline for Visual Question Answering](https://shagunsodhani.com/papers-I-read/Simple-Baseline-for-Visual-Question-Answering)
+- [VQA: Visual Question Answering](https://shagunsodhani.com/papers-I-read/VQA-Visual-Question-Answering)
+- [Learning to Generate Reviews and Discovering Sentiment](https://gist.github.com/shagunsodhani/634dbe1aa678188399254bb3d0078e1d)
+- [Seeing the Arrow of Time](https://gist.github.com/shagunsodhani/828d8de0034a350d97738bbedadc9373)
+- [End-to-end optimization of goal-driven and visually grounded dialogue systems](https://gist.github.com/shagunsodhani/bbbc739e6815ab6217e0cf0a8f706786)
+- [GuessWhat?! Visual object discovery through multi-modal dialogue](https://gist.github.com/shagunsodhani/2418238e6aefd7b1e8c922cda9e10488)
+- [Semantic Parsing via Paraphrasing](https://gist.github.com/shagunsodhani/93c96d7dd0488d0d00bd7078889dd6f6)
+- [Traversing Knowledge Graphs in Vector Space](https://gist.github.com/shagunsodhani/e8e6213906ec2642f27b1aca3a6201c6)
+- [PPDB: The Paraphrase Database](https://gist.github.com/shagunsodhani/fa1f387f084355dfafdf7550b1899af6)
+- [NewsQA: A Machine Comprehension Dataset](https://gist.github.com/shagunsodhani/c47f0d5c1dfe60ce5da0dd8241e506ea)
+- [A Persona-Based Neural Conversation Model](https://gist.github.com/shagunsodhani/8ad464e7d0ea4c7c6ed5189ac4e44095)
+- [“Why Should I Trust You?” Explaining the Predictions of Any Classifier](https://gist.github.com/shagunsodhani/bd744ab6c17a2289ca139ea586d1d65e)
+- [Conditional Generative Adversarial Nets](https://gist.github.com/shagunsodhani/5d726334de3014defeeb701099a3b4b3)
+- [Addressing the Rare Word Problem in Neural Machine Translation](https://gist.github.com/shagunsodhani/a18fe14b74c7292129c6c5ecb37f33b5)
+- [Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models](https://gist.github.com/shagunsodhani/d32e665b27696ce0436c79174a136410)
+- [Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank](https://gist.github.com/shagunsodhani/6ca136088f58d24f7b08056ec8b97595)
+- [Improving Word Representations via Global Context and Multiple Word Prototypes](https://gist.github.com/shagunsodhani/1be86a9bcbd7f120ce55994dcd932bbf)
+- [Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation](https://gist.github.com/shagunsodhani/9dccec626e68e495fd4577ecdca36b7b)
+- [Skip-Thought Vectors](https://gist.github.com/shagunsodhani/4a4eb32de8cabf21bda9a4ada15c46e8)
+- [Deep Convolutional Generative Adversarial Nets](https://gist.github.com/shagunsodhani/aa79796c70565e3761e86d0f932a3de5)
+- [Generative Adversarial Nets](https://gist.github.com/shagunsodhani/1f9dc0444142be8bd8a7404a226880eb)
+- [A Roadmap towards Machine Intelligence](https://gist.github.com/shagunsodhani/9928673525b1713c2d41fd0fac38f81f)
+- [Smart Reply: Automated Response Suggestion for Email](https://gist.github.com/shagunsodhani/da411f15b71ed6a664f9d5ac46409b42)
+- [Convolutional Neural Network For Sentence Classification](https://gist.github.com/shagunsodhani/9ae6d2364c278c97b1b2f4ec53255c56)
+- [Conditional Image Generation with PixelCNN Decoders](https://gist.github.com/shagunsodhani/3cc7066ce7de051d769908b8fab11990)
+- [Pixel Recurrent Neural Networks](https://gist.github.com/shagunsodhani/e741ebd5ba0e0fc0f49d7836e30891a7)
+- [Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps](https://gist.github.com/shagunsodhani/f48da7f77418aa22751ffed115779126)
+- [Bag of Tricks for Efficient Text Classification](https://gist.github.com/shagunsodhani/432746f15889f7f4a798bf7f9ec4b7d8)
+- [GloVe: Global Vectors for Word Representation](https://gist.github.com/shagunsodhani/efea5a42d17e0fcf18374df8e3e4b3e8)
+- [SimRank: A Measure of Structural-Context Similarity](https://gist.github.com/shagunsodhani/6329486212643fd61f58a5a3eb5abb3c)
+- [How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation](https://gist.github.com/shagunsodhani/f05748b6339ceff26420ceecfc79d58d)
+- [Neural Generation of Regular Expressions from Natural Language with Minimal Domain Knowledge](https://gist.github.com/shagunsodhani/004d803bc021f579d4aa3b24cec5b994)
+- [WikiReading : A Novel Large-scale Language Understanding Task over Wikipedia](https://gist.github.com/shagunsodhani/2788ac9dbcac5523cb8b2d0a3d70f2d2)
+- [WikiQA: A challenge dataset for open-domain question answering](https://gist.github.com/shagunsodhani/7cf3677ff2b0028a33e6702fbd260bc5)
+- [Teaching Machines to Read and Comprehend](https://gist.github.com/shagunsodhani/a863eb099bb7a1ab4831cd37bffffb04)
+- [Evaluating Prerequisite Qualities for Learning End-to-end Dialog Systems](https://gist.github.com/shagunsodhani/5e7c40f61c18502eec2809e5cf1ead6b)
+- [Recurrent Neural Network Regularization](https://gist.github.com/shagunsodhani/d66245692b276cd0b6dcbaf43e4211db)
+- [Deep Math: Deep Sequence Models for Premise Selection](https://gist.github.com/shagunsodhani/d8387256f2bb08f39509600f9d7db498)
+- [A Neural Conversational Model](https://gist.github.com/shagunsodhani/ec6835964df0e49fdef0459c8b334b94)
+- [Key-Value Memory Networks for Directly Reading Documents](https://gist.github.com/shagunsodhani/a5e0baa075b4a917c0a69edc575772a8)
+- [Advances In Optimizing Recurrent Networks](https://gist.github.com/shagunsodhani/75dc31e3c7999ad4a1edf4f289deaa88)
+- [Query Regression Networks for Machine Comprehension](https://gist.github.com/shagunsodhani/93caa283af3c151372f4be86ed4c4b99)
+- [Sequence to Sequence Learning with Neural Networks](https://gist.github.com/shagunsodhani/a2915921d7d0ac5cfd0e379025acfb9f)
+- [The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training](https://gist.github.com/shagunsodhani/e3608ccf262d6e5a6b537128c917c92https://gist.github.com/shagunsodhani/bbbc739e6815ab6217e0cf0a8f706786c)
+- [Question Answering with Subgraph Embeddings](https://gist.github.com/shagunsodhani/b65e299ff5f79a4f9da4a2e9281a0676)
+- [Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks](https://gist.github.com/shagunsodhani/12691b76addf149a224c24ab64b5bdcc)
+- [Visualizing Large-scale and High-dimensional Data](https://gist.github.com/shagunsodhani/6c267cf6122399e9be36491a2f510641)
+- [Visualizing Data using t-SNE](https://gist.github.com/shagunsodhani/2153e01d026712ac94a2b4928a2dbf3e)
+- [Curriculum Learning](https://gist.github.com/shagunsodhani/7e4e1c9817c46e3cb1932f62aac8806b)
+- [End-To-End Memory Networks](https://gist.github.com/shagunsodhani/17881da05d9ee1f6539b2baa8067a6ef)
+- [Memory Networks](https://gist.github.com/shagunsodhani/c7a03a47b3d709e7c592fa7011b0f33e)
+- [Learning To Execute](https://gist.github.com/shagunsodhani/b44b29b86cdfe1b6bae4286253f76350)
+- [Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud](https://gist.github.com/shagunsodhani/1bb05a7134c27cffa1e2f57dc6b1c136)
+- [Large Scale Distributed Deep Networks](https://gist.github.com/shagunsodhani/5733fffe6b1a268998bd93f29ec9fbeb)
+- [Efficient Estimation of Word Representations in Vector Space](https://gist.github.com/shagunsodhani/176a283e2c158a75a0a6)
+- [Regularization and variable selection via the elastic net](https://gist.github.com/shagunsodhani/1cd5d136c8ca30432de5)
+- [Fractional Max-Pooling](https://gist.github.com/shagunsodhani/ccfe3134f46fd3738aa0)
+- [TAO: Facebook’s Distributed Data Store for the Social Graph](https://gist.github.com/shagunsodhani/1c91987c2a4a098fa9f1)
+- [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://gist.github.com/shagunsodhani/4441216a298df0fe6ab0)
+- [The Unified Logging Infrastructure for Data Analytics at Twitter](https://gist.github.com/shagunsodhani/0083f8a2d276e026b15c)
+- [A Few Useful Things to Know about Machine Learning](https://gist.github.com/shagunsodhani/5c2cdfc269bf8aa50b72)
+- [Hive – A Petabyte Scale Data Warehouse Using Hadoop](https://gist.github.com/shagunsodhani/b0651ade0dc39aeb7cfd)
+- [Kafka: a Distributed Messaging System for Log Processing](https://medium.com/@shagun/notes-about-kafka-cc6c1b5c5025)
+- [Power-law distributions in Empirical data](https://github.com/shagunsodhani/powerlaw/blob/master/paper/README.md)
+- [Pregel: A System for Large-Scale Graph Processing](https://gist.github.com/shagunsodhani/af9677bdc79bb34be698)
+- [GraphX: Unifying Data-Parallel and Graph-Parallel Analytics](https://gist.github.com/shagunsodhani/c72bc1928aeef40280c9)
+- [Pig Latin: A Not-So-Foreign Language for Data Processing](https://medium.com/@shagun/pig-latin-e840ac23db93)
+- [Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing](https://medium.com/@shagun/resilient-distributed-datasets-97c28c3a9411)
+- [MapReduce: Simplified Data Processing on Large Clusters](https://medium.com/@shagun/mapreduce-1c88f8a7c3d2)
+- [BigTable: A Distributed Storage System for Structured Data](https://medium.com/@shagun/bigtable-bf580262f030)
+- [Spark SQL: Relational Data Processing in Spark](https://medium.com/@shagun/spark-sql-68a6fac271fe)
+- [Spark: Cluster Computing with Working Sets](https://medium.com/@shagun/spark-8ca626d55d21)
+- [Fast Data in the Era of Big Data: Twitter’s Real-Time Related Query Suggestion Architecture](https://medium.com/@shagun/fast-data-in-the-era-of-big-data-e6208e6d3575)
+- [Scaling Memcache at Facebook](https://medium.com/@shagun/scaling-memcache-at-facebook-1ba77d71c082)
+- [Dynamo: Amazon’s Highly Available Key-value Store](https://medium.com/@shagun/dynamo-9665c22a1ddb)
+- [f4 : Facebook's Warm BLOB Storage System](https://medium.com/@shagun/f4-cba2f141cb0c)
+- [A Theoretician’s Guide to the Experimental Analysis of Algorithms](https://medium.com/@shagun/dos-and-dont-s-of-research-fe33322c7aff)
+- [Cuckoo Hashing](https://medium.com/@shagun/cuckoo-hashing-eb160dfab804)
+- [Never Ending Learning](https://medium.com/@shagun/never-ending-learning-e7b78006e713)
diff --git a/site/_posts/2023-02-10-Toolformer - Language Models Can Teach Themselves to Use Tools.md b/site/_posts/2023-02-10-Toolformer - Language Models Can Teach Themselves to Use Tools.md
index 9c78963c..bcd511dd 100755
--- a/site/_posts/2023-02-10-Toolformer - Language Models Can Teach Themselves to Use Tools.md	
+++ b/site/_posts/2023-02-10-Toolformer - Language Models Can Teach Themselves to Use Tools.md	
@@ -25,7 +25,7 @@ tags:
 
 - Starting with a language model, M, the goal is to enable the language model to use tools by invoking API calls.
 
-- An API call is denoted by the tuple $c = (api\_name, api\_input)$. It can be linearized as $e(c) = [api\_name(api\_input)]$ or as $e(c, r) = [api\_name(api\_input) -> r]$ where $r$ denotes the result of the API.
+- An API call is denoted by the tuple $c = (api-name, api-input)$. It can be linearized as $e(c) = [api-name(api-input)]$ or as $e(c, r) = [api-name(api-input) -> r]$ where $r$ denotes the result of the API.
 
 - The given dataset of plain text, $C$, is converted into a dataset $C*$ augmented with the API calls using a three-step process.
 
diff --git a/site/_site b/site/_site
index 628da57f..595a66b8 160000
--- a/site/_site
+++ b/site/_site
@@ -1 +1 @@
-Subproject commit 628da57f4f5b407fc586415c802145416dfaf0e3
+Subproject commit 595a66b8361d6a240aafa6bb4450f0133b6a7a96