diff --git a/_site/404.html b/_site/404.html
new file mode 100644
index 00000000..b7c3474c
--- /dev/null
+++ b/_site/404.html
@@ -0,0 +1,4 @@
+<div class="page">
+  <h1 class="page-title">404: Page not found</h1>
+  <p class="lead">Sorry, we've misplaced that URL or it's pointing to something that doesn't exist. <a href="/">Head back home</a> to try finding it again.</p>
+</div>
diff --git a/_site/README.md b/_site/README.md
new file mode 100755
index 00000000..d8f5e39c
--- /dev/null
+++ b/_site/README.md
@@ -0,0 +1,227 @@
+# papers-I-read
+
+I am trying a new initiative - a-paper-a-week. This repository will hold all those papers and related summaries and notes.
+
+## List of papers
+
+- [Toolformer - Language Models Can Teach Themselves to Use Tools](https://shagunsodhani.com/papers-I-read/Toolformer-Language-Models-Can-Teach-Themselves-to-Use-Tools)
+- [Hints for Computer System Design](https://shagunsodhani.com/papers-I-read/Hints-for-Computer-System-Design)
+- [Synthesized Policies for Transfer and Adaptation across Tasks and Environments](https://shagunsodhani.com/papers-I-read/Synthesized-Policies-for-Transfer-and-Adaptation-across-Tasks-and-Environments)
+- [Deep Neural Networks for YouTube Recommendations](https://shagunsodhani.com/papers-I-read/Deep-Neural-Networks-for-YouTube-Recommendations)
+- [The Tail at Scale](https://shagunsodhani.com/papers-I-read/The-Tail-at-Scale)
+- [Practical Lessons from Predicting Clicks on Ads at Facebook](https://shagunsodhani.com/papers-I-read/Practical-Lessons-from-Predicting-Clicks-on-Ads-at-Facebook)
+- [Ad Click Prediction - a View from the Trenches](https://shagunsodhani.com/papers-I-read/Ad-Click-Prediction-a-View-from-the-Trenches)
+- [Anatomy of Catastrophic Forgetting - Hidden Representations and Task Semantics](https://shagunsodhani.com/papers-I-read/Anatomy-of-Catastrophic-Forgetting-Hidden-Representations-and-Task-Semantics)
+- [When Do Curricula Work?](https://shagunsodhani.com/papers-I-read/When-Do-Curricula-Work)
+- [Continual learning with hypernetworks](https://shagunsodhani.com/papers-I-read/Continual-learning-with-hypernetworks)
+- [Zero-shot Learning by Generating Task-specific Adapters](https://shagunsodhani.com/papers-I-read/Zero-shot-Learning-by-Generating-Task-specific-Adapters)
+- [HyperNetworks](https://shagunsodhani.com/papers-I-read/HyperNetworks)
+- [Energy-based Models for Continual Learning](https://shagunsodhani.com/papers-I-read/Energy-based-Models-for-Continual-Learning)
+- [GPipe - Easy Scaling with Micro-Batch Pipeline Parallelism](https://shagunsodhani.com/papers-I-read/GPipe-Easy-Scaling-with-Micro-Batch-Pipeline-Parallelism)
+- [Compositional Explanations of Neurons](https://shagunsodhani.com/papers-I-read/Compositional-Explanations-of-Neurons)
+- [Design patterns for container-based distributed systems](https://shagunsodhani.com/papers-I-read/Design-patterns-for-container-based-distributed-systems)
+- [Cassandra - a decentralized structured storage system](https://shagunsodhani.com/papers-I-read/Cassandra-a-decentralized-structured-storage-system)
+- [CAP twelve years later - How the rules have changed](https://shagunsodhani.com/papers-I-read/CAP-twelve-years-later-How-the-rules-have-changed)
+- [Consistency Tradeoffs in Modern Distributed Database System Design](https://shagunsodhani.com/papers-I-read/Consistency-Tradeoffs-in-Modern-Distributed-Database-System-Design)
+- [Exploring Simple Siamese Representation Learning](https://shagunsodhani.com/papers-I-read/Exploring-Simple-Siamese-Representation-Learning)
+- [Data Management for Internet-Scale Single-Sign-On](https://shagunsodhani.com/papers-I-read/Data-Management-for-Internet-Scale-Single-Sign-On)
+- [Searching for Build Debt - Experiences Managing Technical Debt at Google](https://shagunsodhani.com/papers-I-read/Searching-for-Build-Debt-Experiences-Managing-Technical-Debt-at-Google)
+- [One Solution is Not All You Need - Few-Shot Extrapolation via Structured MaxEnt RL](https://shagunsodhani.com/papers-I-read/One-Solution-is-Not-All-You-Need-Few-Shot-Extrapolation-via-Structured-MaxEnt-RL)
+- [Learning Explanations That Are Hard To Vary](https://shagunsodhani.com/papers-I-read/Learning-Explanations-That-Are-Hard-To-Vary)
+- [Remembering for the Right Reasons - Explanations Reduce Catastrophic Forgetting](https://shagunsodhani.com/papers-I-read/Remembering-for-the-Right-Reasons-Explanations-Reduce-Catastrophic-Forgetting)
+- [A Foliated View of Transfer Learning](https://shagunsodhani.com/papers-I-read/A-Foliated-View-of-Transfer-Learning)
+- [Harvest, Yield, and Scalable Tolerant Systems](https://shagunsodhani.com/papers-I-read/Harvest,-Yield,-and-Scalable-Tolerant-Systems)
+- [MONet - Unsupervised Scene Decomposition and Representation](https://shagunsodhani.com/papers-I-read/MONet-Unsupervised-Scene-Decomposition-and-Representation)
+- [Revisiting Fundamentals of Experience Replay](https://shagunsodhani.com/papers-I-read/Revisiting-Fundamentals-of-Experience-Replay)
+- [Deep Reinforcement Learning and the Deadly Triad](https://shagunsodhani.com/papers-I-read/Deep-Reinforcement-Learning-and-the-Deadly-Triad)
+- [Alpha Net: Adaptation with Composition in Classifier Space](https://shagunsodhani.com/papers-I-read/Alpha-Net-Adaptation-with-Composition-in-Classifier-Space)
+- [Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer](https://shagunsodhani.com/papers-I-read/Outrageously-Large-Neural-Networks-The-Sparsely-Gated-Mixture-of-Experts-Layer)
+- [Gradient Surgery for Multi-Task Learning](https://shagunsodhani.com/papers-I-read/Gradient-Surgery-for-Multi-Task-Learning)
+- [GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks](https://shagunsodhani.com/papers-I-read/GradNorm-Gradient-Normalization-for-Adaptive-Loss-Balancing-in-Deep-Multitask-Networks)
+- [TaskNorm: Rethinking Batch Normalization for Meta-Learning](https://shagunsodhani.com/papers-I-read/TASKNORM-Rethinking-Batch-Normalization-for-Meta-Learning)
+- [Averaging Weights leads to Wider Optima and Better Generalization](https://shagunsodhani.com/papers-I-read/Averaging-Weights-leads-to-Wider-Optima-and-Better-Generalization)
+- [Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions](https://shagunsodhani.com/papers-I-read/Decentralized-Reinforcement-Learning-Global-Decision-Making-via-Local-Economic-Transactions)
+- [When to use parametric models in reinforcement learning?](https://shagunsodhani.com/papers-I-read/When-to-use-parametric-models-in-reinforcement-learning)
+- [Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Network-Randomization-A-Simple-Technique-for-Generalization-in-Deep-Reinforcement-Learning)
+- [On the Difficulty of Warm-Starting Neural Network Training](https://shagunsodhani.com/papers-I-read/On-the-Difficulty-of-Warm-Starting-Neural-Network-Training)
+- [Supervised Contrastive Learning](https://shagunsodhani.com/papers-I-read/Supervised-Contrastive-Learning)
+- [CURL - Contrastive Unsupervised Representations for Reinforcement Learning](https://shagunsodhani.com/papers-I-read/CURL-Contrastive-Unsupervised-Representations-for-Reinforcement-Learning)
+- [Competitive Training of Mixtures of Independent Deep Generative Models](https://shagunsodhani.com/papers-I-read/Competitive-Training-of-Mixtures-of-Independent-Deep-Generative-Models)
+- [What Does Classifying More Than 10,000 Image Categories Tell Us?](https://shagunsodhani.com/papers-I-read/What-Does-Classifying-More-Than-10,000-Image-Categories-Tell-Us)
+- [mixup - Beyond Empirical Risk Minimization](https://shagunsodhani.com/papers-I-read/mixup-Beyond-Empirical-Risk-Minimization)
+- [ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators](https://shagunsodhani.com/papers-I-read/ELECTRA-Pre-training-Text-Encoders-as-Discriminators-Rather-Than-Generators)
+- [Gradient based sample selection for online continual learning](https://shagunsodhani.com/papers-I-read/Gradient-based-sample-selection-for-online-continual-learning)
+- [Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One](https://shagunsodhani.com/papers-I-read/Your-Classifier-is-Secretly-an-Energy-Based-Model,-and-You-Should-Treat-it-Like-One)
+- [Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges](https://shagunsodhani.com/papers-I-read/Massively-Multilingual-Neural-Machine-Translation-in-the-Wild-Findings-and-Challenges)
+- [Observational Overfitting in Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Observational-Overfitting-in-Reinforcement-Learning)
+- [Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML](https://shagunsodhani.com/papers-I-read/Rapid-Learning-or-Feature-Reuse-Towards-Understanding-the-Effectiveness-of-MAML)
+- [Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour](https://shagunsodhani.com/papers-I-read/Accurate-Large-Minibatch-SGD-Training-ImageNet-in-1-Hour)
+- [Superposition of many models into one](https://shagunsodhani.com/papers-I-read/Superposition-of-many-models-into-one)
+- [Towards a Unified Theory of State Abstraction for MDPs](https://shagunsodhani.com/papers-I-read/Towards-a-Unified-Theory-of-State-Abstraction-for-MDPs)
+- [ALBERT - A Lite BERT for Self-supervised Learning of Language Representations](https://shagunsodhani.com/papers-I-read/ALBERT-A-Lite-BERT-for-Self-supervised-Learning-of-Language-Representations)
+- [Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model](https://shagunsodhani.com/papers-I-read/Mastering-Atari,-Go,-Chess-and-Shogi-by-Planning-with-a-Learned-Model)
+- [Contrastive Learning of Structured World Models](https://shagunsodhani.com/papers-I-read/Contrastive-Learning-of-Structured-World-Models)
+- [Gossip based Actor-Learner Architectures for Deep RL](https://shagunsodhani.com/papers-I-read/Gossip-based-Actor-Learner-Architectures-for-Deep-RL)
+- [How to train your MAML](https://shagunsodhani.com/papers-I-read/How-to-train-your-MAML)
+- [PHYRE - A New Benchmark for Physical Reasoning](https://shagunsodhani.com/papers-I-read/PHYRE-A-New-Benchmark-for-Physical-Reasoning)
+- [Large Memory Layers with Product Keys](https://shagunsodhani.com/papers-I-read/Large-Memory-Layers-with-Product-Keys)
+- [Abductive Commonsense Reasoning](https://shagunsodhani.com/papers-I-read/Abductive-Commonsense-Reasoning)
+- [Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models](https://shagunsodhani.com/papers-I-read/Deep-Reinforcement-Learning-in-a-Handful-of-Trials-using-Probabilistic-Dynamics-Models)
+- [Assessing Generalization in Deep Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Assessing-Generalization-in-Deep-Reinforcement-Learning)
+- [Quantifying Generalization in Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Quantifying-Generalization-in-Reinforcement-Learning)
+- [Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks](https://shagunsodhani.com/papers-I-read/Set-Transformer-A-Framework-for-Attention-based-Permutation-Invariant-Neural-Networks)
+- [Measuring abstract reasoning in neural networks](https://shagunsodhani.com/papers-I-read/Measuring-Abstract-Reasoning-in-Neural-Networks)
+- [Hamiltonian Neural Networks](https://shagunsodhani.com/papers-I-read/Hamiltonian-Neural-Networks)
+- [Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations](https://shagunsodhani.com/papers-I-read/Extrapolating-Beyond-Suboptimal-Demonstrations-via-Inverse-Reinforcement-Learning-from-Observations)
+- [Meta-Reinforcement Learning of Structured Exploration Strategies](https://shagunsodhani.com/papers-I-read/Meta-Reinforcement-Learning-of-Structured-Exploration-Strategies)
+- [Relational Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Relational-Reinforcement-Learning)
+- [Good-Enough Compositional Data Augmentation](https://shagunsodhani.com/papers-I-read/Good-Enough-Compositional-Data-Augmentation)
+- [Multiple Model-Based Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Multiple-Model-Based-Reinforcement-Learning)
+- [Towards a natural benchmark for continual learning](https://shagunsodhani.com/papers-I-read/Towards-a-natural-benchmark-for-continual-learning)
+- [Meta-Learning Update Rules for Unsupervised Representation Learning](https://shagunsodhani.com/papers-I-read/Meta-Learning-Update-Rules-for-Unsupervised-Representation-Learning)
+- [GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks](https://shagunsodhani.com/papers-I-read/GNN-Explainer-A-Tool-for-Post-hoc-Explanation-of-Graph-Neural-Networks)
+- [To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks](https://shagunsodhani.com/papers-I-read/To-Tune-or-Not-to-Tune-Adapting-Pretrained-Representations-to-Diverse-Tasks)
+- [Model Primitive Hierarchical Lifelong Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Model-Primitive-Hierarchical-Lifelong-Reinforcement-Learning)
+- [TuckER - Tensor Factorization for Knowledge Graph Completion](https://shagunsodhani.com/papers-I-read/TuckER-Tensor-Factorization-for-Knowledge-Graph-Completion)
+- [Linguistic Knowledge as Memory for Recurrent Neural Networks](https://shagunsodhani.com/papers-I-read/Linguistic-Knowledge-as-Memory-for-Recurrent-Neural-Networks)
+- [Diversity is All You Need - Learning Skills without a Reward Function](https://shagunsodhani.com/papers-I-read/Diversity-is-All-You-Need-Learning-Skills-without-a-Reward-Function)
+- [Modular meta-learning](https://shagunsodhani.com/papers-I-read/Modular-meta-learning)
+- [Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies](https://shagunsodhani.com/papers-I-read/Hierarchical-RL-Using-an-Ensemble-of-Proprioceptive-Periodic-Policies)
+- [Efficient Lifelong Learningi with A-GEM](https://shagunsodhani.com/papers-I-read/Efficient-Lifelong-Learning-with-A-GEM)
+- [Pre-training Graph Neural Networks with Kernels](https://shagunsodhani.com/papers-I-read/Pre-training-Graph-Neural-Networks-with-Kernels)
+- [Smooth Loss Functions for Deep Top-k Classification](https://shagunsodhani.com/papers-I-read/Smooth-Loss-Functions-for-Deep-Top-k-Classification)
+- [Hindsight Experience Replay](https://shagunsodhani.com/papers-I-read/Hindsight-Experience-Replay)
+- [Representation Tradeoffs for Hyperbolic Embeddings](https://shagunsodhani.com/papers-I-read/Representation-Tradeoffs-for-Hyperbolic-Embeddings)
+- [Learned Optimizers that Scale and Generalize](https://shagunsodhani.com/papers-I-read/Learned-Optimizers-that-Scale-and-Generalize)
+- [One-shot Learning with Memory-Augmented Neural Networks](https://shagunsodhani.com/papers-I-read/One-shot-Learning-with-Memory-Augmented-Neural-Networks)
+- [BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop](https://shagunsodhani.com/papers-I-read/BabyAI-First-Steps-Towards-Grounded-Language-Learning-With-a-Human-In-the-Loop)
+- [Poincaré Embeddings for Learning Hierarchical Representations](https://shagunsodhani.com/papers-I-read/Poincare-Embeddings-for-Learning-Hierarchical-Representations)
+- [When Recurrent Models Don’t Need To Be Recurrent](https://shagunsodhani.com/papers-I-read/When-Recurrent-Models-Don-t-Need-To-Be-Recurrent)
+- [HoME - a Household Multimodal Environment](https://shagunsodhani.com/papers-I-read/HoME-a-Household-Multimodal-Environment)
+- [Emergence of Grounded Compositional Language in Multi-Agent Populations](https://shagunsodhani.com/papers-I-read/Emergence-of-Grounded-Compositional-Language-in-Multi-Agent-Populations)
+- [A Semantic Loss Function for Deep Learning with Symbolic Knowledge](https://shagunsodhani.com/papers-I-read/A-Semantic-Loss-Function-for-Deep-Learning-with-Symbolic-Knowledge)
+- [Hierarchical Graph Representation Learning with Differentiable Pooling](https://shagunsodhani.com/papers-I-read/Hierarchical-Graph-Representation-Learning-with-Differentiable-Pooling)
+- [Imagination-Augmented Agents for Deep Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Imagination-Augmented-Agents-for-Deep-Reinforcement-Learning)
+- [Kronecker Recurrent Units](https://shagunsodhani.com/papers-I-read/Kronecker-Recurrent-Units)
+- [Learning Independent Causal Mechanisms](https://shagunsodhani.com/papers-I-read/Learning-Independent-Causal-Mechanisms)
+- [Memory-based Parameter Adaptation](https://shagunsodhani.com/papers-I-read/Memory-Based-Parameter-Adaption)
+- [Born Again Neural Networks](https://shagunsodhani.com/papers-I-read/Born-Again-Neural-Networks)
+- [Net2Net-Accelerating Learning via Knowledge Transfer](https://shagunsodhani.com/papers-I-read/Net2Net-Accelerating-Learning-via-Knowledge-Transfer)
+- [Learning to Count Objects in Natural Images for Visual Question Answering](https://shagunsodhani.com/papers-I-read/Learning-to-Count-Objects-in-Natural-Images-for-Visual-Question-Answering)
+- [Neural Message Passing for Quantum Chemistry](https://shagunsodhani.com/papers-I-read/Neural-Message-Passing-for-Quantum-Chemistry)
+- [Unsupervised Learning by Predicting Noise](https://shagunsodhani.com/papers-I-read/Unsupervised-Learning-By-Predicting-Noise)
+- [The Lottery Ticket Hypothesis - Training Pruned Neural Networks](https://shagunsodhani.com/papers-I-read/The-Lottery-Ticket-Hypothesis-Training-Pruned-Neural-Networks)
+- [Cyclical Learning Rates for Training Neural Networks](https://shagunsodhani.com/papers-I-read/Cyclical-Learning-Rates-for-Training-Neural-Networks)
+- [Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Improving-Information-Extraction-by-Acquiring-External-Evidence-with-Reinforcement-Learning)
+- [An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks](https://shagunsodhani.com/papers-I-read/An-Empirical-Investigation-of-Catastrophic-Forgetting-in-Gradient-Based-Neural-Networks)
+- [Learning an SAT Solver from Single-Bit Supervision](https://shagunsodhani.com/papers-I-read/Learning-a-SAT-Solver-from-Single-Bit-Supervision)
+- [Neural Relational Inference for Interacting Systems](https://shagunsodhani.com/papers-I-read/Neural-Relational-Inference-for-Interacting-Systems)
+- [Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks](https://shagunsodhani.com/papers-I-read/Stylistic-Transfer-in-Natural-Language-Generation-Systems-Using-Recurrent-Neural-Networks)
+- [Get To The Point: Summarization with Pointer-Generator Networks](https://shagunsodhani.com/papers-I-read/Get-To-The-Point-Summarization-with-Pointer-Generator-Networks)
+- [StarSpace - Embed All The Things!](https://shagunsodhani.com/papers-I-read/StarSpace-Embed-All-The-Things)
+- [Emotional Chatting Machine - Emotional Conversation Generation with Internal and External Memory](https://shagunsodhani.com/papers-I-read/Emotional-Chatting-Machine-Emotional-Conversation-Generation-with-Internal-and-External-Memory)
+- [Exploring Models and Data for Image Question Answering](https://shagunsodhani.com/papers-I-read/Exploring-Models-and-Data-for-Image-Question-Answering)
+- [How transferable are features in deep neural networks](https://shagunsodhani.com/papers-I-read/How-transferable-are-features-in-deep-neural-networks)
+- [Distilling the Knowledge in a Neural Network](https://shagunsodhani.com/papers-I-read/Distilling-the-Knowledge-in-a-Neural-Network)
+- [Revisiting Semi-Supervised Learning with Graph Embeddings](https://shagunsodhani.com/papers-I-read/Revisiting-Semi-Supervised-Learning-with-Graph-Embeddings)
+- [Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension](https://shagunsodhani.com/papers-I-read/Two-Stage-Synthesis-Networks-for-Transfer-Learning-in-Machine-Comprehension)
+- [Higher-order organization of complex networks](https://shagunsodhani.com/papers-I-read/Higher-order-organization-of-complex-networks)
+- [Network Motifs - Simple Building Blocks of Complex Networks](https://shagunsodhani.com/papers-I-read/Network-Motifs-Simple-Building-Blocks-of-Complex-Networks)
+- [Word Representations via Gaussian Embedding](https://shagunsodhani.com/papers-I-read/Word-Representations-via-Gaussian-Embedding)
+- [HARP - Hierarchical Representation Learning for Networks](https://shagunsodhani.com/papers-I-read/HARP-Hierarchical-Representation-Learning-for-Networks)
+- [Swish - a Self-Gated Activation Function](https://shagunsodhani.com/papers-I-read/Swish-A-self-gated-activation-function)
+- [Reading Wikipedia to Answer Open-Domain Questions](https://shagunsodhani.com/papers-I-read/Reading-Wikipedia-to-Answer-Open-Domain-Questions)
+- [Task-Oriented Query Reformulation with Reinforcement Learning](https://shagunsodhani.com/papers-I-read/Task-Oriented-Query-Reformulation-with-Reinforcement-Learning)
+- [Refining Source Representations with Relation Networks for Neural Machine Translation](https://shagunsodhani.com/papers-I-read/Refining-Source-Representations-with-Relation-Networks-for-Neural-Machine-Translation)
+- [Pointer Networks](https://shagunsodhani.com/papers-I-read/Pointer-Networks)
+- [Learning to Compute Word Embeddings On the Fly](https://shagunsodhani.com/papers-I-read/Learning-to-Compute-Word-Embeddings-On-the-Fly)
+- [R-NET - Machine Reading Comprehension with Self-matching Networks](https://shagunsodhani.com/papers-I-read/R-NET-Machine-Reading-Comprehension-with-Self-matching-Networks)
+- [ReasoNet - Learning to Stop Reading in Machine Comprehension](https://shagunsodhani.com/papers-I-read/ReasoNet-Learning-to-Stop-Reading-in-Machine-Comprehension)
+- [Principled Detection of Out-of-Distribution Examples in Neural Networks](https://shagunsodhani.com/papers-I-read/Principled-Detection-of-Out-of-Distribution-Examples-in-Neural-Networks)
+- [Ask Me Anything: Dynamic Memory Networks for Natural Language Processing](https://shagunsodhani.com/papers-I-read/Ask-Me-Anything-Dynamic-Memory-Networks-for-Natural-Language-Processing)
+- [One Model To Learn Them All](https://shagunsodhani.com/papers-I-read/One-Model-To-Learn-Them-All)
+- [Two/Too Simple Adaptations of Word2Vec for Syntax Problems](https://shagunsodhani.com/papers-I-read/Two-Too-Simple-Adaptations-of-Word2Vec-for-Syntax-Problems)
+- [A Decomposable Attention Model for Natural Language Inference](https://shagunsodhani.com/papers-I-read/A-Decomposable-Attention-Model-for-Natural-Language-Inference)
+- [A Fast and Accurate Dependency Parser using Neural Networks](https://shagunsodhani.com/papers-I-read/A-Fast-and-Accurate-Dependency-Parser-using-Neural-Networks)
+- [Neural Module Networks](https://shagunsodhani.com/papers-I-read/Neural-Module-Networks)
+- [Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering](https://shagunsodhani.com/papers-I-read/Making-the-V-in-VQA-Matter-Elevating-the-Role-of-Image-Understanding-in-Visual-Question-Answering)
+- [Conditional Similarity Networks](https://shagunsodhani.com/papers-I-read/Conditional-Similarity-Networks)
+- [Simple Baseline for Visual Question Answering](https://shagunsodhani.com/papers-I-read/Simple-Baseline-for-Visual-Question-Answering)
+- [VQA: Visual Question Answering](https://shagunsodhani.com/papers-I-read/VQA-Visual-Question-Answering)
+- [Learning to Generate Reviews and Discovering Sentiment](https://gist.github.com/shagunsodhani/634dbe1aa678188399254bb3d0078e1d)
+- [Seeing the Arrow of Time](https://gist.github.com/shagunsodhani/828d8de0034a350d97738bbedadc9373)
+- [End-to-end optimization of goal-driven and visually grounded dialogue systems](https://gist.github.com/shagunsodhani/bbbc739e6815ab6217e0cf0a8f706786)
+- [GuessWhat?! Visual object discovery through multi-modal dialogue](https://gist.github.com/shagunsodhani/2418238e6aefd7b1e8c922cda9e10488)
+- [Semantic Parsing via Paraphrasing](https://gist.github.com/shagunsodhani/93c96d7dd0488d0d00bd7078889dd6f6)
+- [Traversing Knowledge Graphs in Vector Space](https://gist.github.com/shagunsodhani/e8e6213906ec2642f27b1aca3a6201c6)
+- [PPDB: The Paraphrase Database](https://gist.github.com/shagunsodhani/fa1f387f084355dfafdf7550b1899af6)
+- [NewsQA: A Machine Comprehension Dataset](https://gist.github.com/shagunsodhani/c47f0d5c1dfe60ce5da0dd8241e506ea)
+- [A Persona-Based Neural Conversation Model](https://gist.github.com/shagunsodhani/8ad464e7d0ea4c7c6ed5189ac4e44095)
+- [“Why Should I Trust You?” Explaining the Predictions of Any Classifier](https://gist.github.com/shagunsodhani/bd744ab6c17a2289ca139ea586d1d65e)
+- [Conditional Generative Adversarial Nets](https://gist.github.com/shagunsodhani/5d726334de3014defeeb701099a3b4b3)
+- [Addressing the Rare Word Problem in Neural Machine Translation](https://gist.github.com/shagunsodhani/a18fe14b74c7292129c6c5ecb37f33b5)
+- [Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models](https://gist.github.com/shagunsodhani/d32e665b27696ce0436c79174a136410)
+- [Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank](https://gist.github.com/shagunsodhani/6ca136088f58d24f7b08056ec8b97595)
+- [Improving Word Representations via Global Context and Multiple Word Prototypes](https://gist.github.com/shagunsodhani/1be86a9bcbd7f120ce55994dcd932bbf)
+- [Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation](https://gist.github.com/shagunsodhani/9dccec626e68e495fd4577ecdca36b7b)
+- [Skip-Thought Vectors](https://gist.github.com/shagunsodhani/4a4eb32de8cabf21bda9a4ada15c46e8)
+- [Deep Convolutional Generative Adversarial Nets](https://gist.github.com/shagunsodhani/aa79796c70565e3761e86d0f932a3de5)
+- [Generative Adversarial Nets](https://gist.github.com/shagunsodhani/1f9dc0444142be8bd8a7404a226880eb)
+- [A Roadmap towards Machine Intelligence](https://gist.github.com/shagunsodhani/9928673525b1713c2d41fd0fac38f81f)
+- [Smart Reply: Automated Response Suggestion for Email](https://gist.github.com/shagunsodhani/da411f15b71ed6a664f9d5ac46409b42)
+- [Convolutional Neural Network For Sentence Classification](https://gist.github.com/shagunsodhani/9ae6d2364c278c97b1b2f4ec53255c56)
+- [Conditional Image Generation with PixelCNN Decoders](https://gist.github.com/shagunsodhani/3cc7066ce7de051d769908b8fab11990)
+- [Pixel Recurrent Neural Networks](https://gist.github.com/shagunsodhani/e741ebd5ba0e0fc0f49d7836e30891a7)
+- [Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps](https://gist.github.com/shagunsodhani/f48da7f77418aa22751ffed115779126)
+- [Bag of Tricks for Efficient Text Classification](https://gist.github.com/shagunsodhani/432746f15889f7f4a798bf7f9ec4b7d8)
+- [GloVe: Global Vectors for Word Representation](https://gist.github.com/shagunsodhani/efea5a42d17e0fcf18374df8e3e4b3e8)
+- [SimRank: A Measure of Structural-Context Similarity](https://gist.github.com/shagunsodhani/6329486212643fd61f58a5a3eb5abb3c)
+- [How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation](https://gist.github.com/shagunsodhani/f05748b6339ceff26420ceecfc79d58d)
+- [Neural Generation of Regular Expressions from Natural Language with Minimal Domain Knowledge](https://gist.github.com/shagunsodhani/004d803bc021f579d4aa3b24cec5b994)
+- [WikiReading : A Novel Large-scale Language Understanding Task over Wikipedia](https://gist.github.com/shagunsodhani/2788ac9dbcac5523cb8b2d0a3d70f2d2)
+- [WikiQA: A challenge dataset for open-domain question answering](https://gist.github.com/shagunsodhani/7cf3677ff2b0028a33e6702fbd260bc5)
+- [Teaching Machines to Read and Comprehend](https://gist.github.com/shagunsodhani/a863eb099bb7a1ab4831cd37bffffb04)
+- [Evaluating Prerequisite Qualities for Learning End-to-end Dialog Systems](https://gist.github.com/shagunsodhani/5e7c40f61c18502eec2809e5cf1ead6b)
+- [Recurrent Neural Network Regularization](https://gist.github.com/shagunsodhani/d66245692b276cd0b6dcbaf43e4211db)
+- [Deep Math: Deep Sequence Models for Premise Selection](https://gist.github.com/shagunsodhani/d8387256f2bb08f39509600f9d7db498)
+- [A Neural Conversational Model](https://gist.github.com/shagunsodhani/ec6835964df0e49fdef0459c8b334b94)
+- [Key-Value Memory Networks for Directly Reading Documents](https://gist.github.com/shagunsodhani/a5e0baa075b4a917c0a69edc575772a8)
+- [Advances In Optimizing Recurrent Networks](https://gist.github.com/shagunsodhani/75dc31e3c7999ad4a1edf4f289deaa88)
+- [Query Regression Networks for Machine Comprehension](https://gist.github.com/shagunsodhani/93caa283af3c151372f4be86ed4c4b99)
+- [Sequence to Sequence Learning with Neural Networks](https://gist.github.com/shagunsodhani/a2915921d7d0ac5cfd0e379025acfb9f)
+- [The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training](https://gist.github.com/shagunsodhani/e3608ccf262d6e5a6b537128c917c92https://gist.github.com/shagunsodhani/bbbc739e6815ab6217e0cf0a8f706786c)
+- [Question Answering with Subgraph Embeddings](https://gist.github.com/shagunsodhani/b65e299ff5f79a4f9da4a2e9281a0676)
+- [Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks](https://gist.github.com/shagunsodhani/12691b76addf149a224c24ab64b5bdcc)
+- [Visualizing Large-scale and High-dimensional Data](https://gist.github.com/shagunsodhani/6c267cf6122399e9be36491a2f510641)
+- [Visualizing Data using t-SNE](https://gist.github.com/shagunsodhani/2153e01d026712ac94a2b4928a2dbf3e)
+- [Curriculum Learning](https://gist.github.com/shagunsodhani/7e4e1c9817c46e3cb1932f62aac8806b)
+- [End-To-End Memory Networks](https://gist.github.com/shagunsodhani/17881da05d9ee1f6539b2baa8067a6ef)
+- [Memory Networks](https://gist.github.com/shagunsodhani/c7a03a47b3d709e7c592fa7011b0f33e)
+- [Learning To Execute](https://gist.github.com/shagunsodhani/b44b29b86cdfe1b6bae4286253f76350)
+- [Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud](https://gist.github.com/shagunsodhani/1bb05a7134c27cffa1e2f57dc6b1c136)
+- [Large Scale Distributed Deep Networks](https://gist.github.com/shagunsodhani/5733fffe6b1a268998bd93f29ec9fbeb)
+- [Efficient Estimation of Word Representations in Vector Space](https://gist.github.com/shagunsodhani/176a283e2c158a75a0a6)
+- [Regularization and variable selection via the elastic net](https://gist.github.com/shagunsodhani/1cd5d136c8ca30432de5)
+- [Fractional Max-Pooling](https://gist.github.com/shagunsodhani/ccfe3134f46fd3738aa0)
+- [TAO: Facebook’s Distributed Data Store for the Social Graph](https://gist.github.com/shagunsodhani/1c91987c2a4a098fa9f1)
+- [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://gist.github.com/shagunsodhani/4441216a298df0fe6ab0)
+- [The Unified Logging Infrastructure for Data Analytics at Twitter](https://gist.github.com/shagunsodhani/0083f8a2d276e026b15c)
+- [A Few Useful Things to Know about Machine Learning](https://gist.github.com/shagunsodhani/5c2cdfc269bf8aa50b72)
+- [Hive – A Petabyte Scale Data Warehouse Using Hadoop](https://gist.github.com/shagunsodhani/b0651ade0dc39aeb7cfd)
+- [Kafka: a Distributed Messaging System for Log Processing](https://medium.com/@shagun/notes-about-kafka-cc6c1b5c5025)
+- [Power-law distributions in Empirical data](https://github.com/shagunsodhani/powerlaw/blob/master/paper/README.md)
+- [Pregel: A System for Large-Scale Graph Processing](https://gist.github.com/shagunsodhani/af9677bdc79bb34be698)
+- [GraphX: Unifying Data-Parallel and Graph-Parallel Analytics](https://gist.github.com/shagunsodhani/c72bc1928aeef40280c9)
+- [Pig Latin: A Not-So-Foreign Language for Data Processing](https://medium.com/@shagun/pig-latin-e840ac23db93)
+- [Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing](https://medium.com/@shagun/resilient-distributed-datasets-97c28c3a9411)
+- [MapReduce: Simplified Data Processing on Large Clusters](https://medium.com/@shagun/mapreduce-1c88f8a7c3d2)
+- [BigTable: A Distributed Storage System for Structured Data](https://medium.com/@shagun/bigtable-bf580262f030)
+- [Spark SQL: Relational Data Processing in Spark](https://medium.com/@shagun/spark-sql-68a6fac271fe)
+- [Spark: Cluster Computing with Working Sets](https://medium.com/@shagun/spark-8ca626d55d21)
+- [Fast Data in the Era of Big Data: Twitter’s Real-Time Related Query Suggestion Architecture](https://medium.com/@shagun/fast-data-in-the-era-of-big-data-e6208e6d3575)
+- [Scaling Memcache at Facebook](https://medium.com/@shagun/scaling-memcache-at-facebook-1ba77d71c082)
+- [Dynamo: Amazon’s Highly Available Key-value Store](https://medium.com/@shagun/dynamo-9665c22a1ddb)
+- [f4 : Facebook's Warm BLOB Storage System](https://medium.com/@shagun/f4-cba2f141cb0c)
+- [A Theoretician’s Guide to the Experimental Analysis of Algorithms](https://medium.com/@shagun/dos-and-dont-s-of-research-fe33322c7aff)
+- [Cuckoo Hashing](https://medium.com/@shagun/cuckoo-hashing-eb160dfab804)
+- [Never Ending Learning](https://medium.com/@shagun/never-ending-learning-e7b78006e713)
diff --git a/_site/assets/BatchNormalization/eq1.png b/_site/assets/BatchNormalization/eq1.png
new file mode 100755
index 00000000..d4620aff
Binary files /dev/null and b/_site/assets/BatchNormalization/eq1.png differ
diff --git a/_site/assets/BatchNormalization/eq2.png b/_site/assets/BatchNormalization/eq2.png
new file mode 100755
index 00000000..435b1712
Binary files /dev/null and b/_site/assets/BatchNormalization/eq2.png differ
diff --git a/_site/assets/FewThingsAboutML/BiasVarianceDiagram.png b/_site/assets/FewThingsAboutML/BiasVarianceDiagram.png
new file mode 100755
index 00000000..f5e49fc6
Binary files /dev/null and b/_site/assets/FewThingsAboutML/BiasVarianceDiagram.png differ
diff --git a/_site/assets/HNN/equation1.png b/_site/assets/HNN/equation1.png
new file mode 100644
index 00000000..e0181f93
Binary files /dev/null and b/_site/assets/HNN/equation1.png differ
diff --git a/_site/assets/HNN/equation2.png b/_site/assets/HNN/equation2.png
new file mode 100644
index 00000000..fd422b3e
Binary files /dev/null and b/_site/assets/HNN/equation2.png differ
diff --git a/_site/assets/RNTN/MVRNN.png b/_site/assets/RNTN/MVRNN.png
new file mode 100755
index 00000000..99d9100d
Binary files /dev/null and b/_site/assets/RNTN/MVRNN.png differ
diff --git a/_site/assets/RNTN/P1RNTN.png b/_site/assets/RNTN/P1RNTN.png
new file mode 100755
index 00000000..5afdcbf2
Binary files /dev/null and b/_site/assets/RNTN/P1RNTN.png differ
diff --git a/_site/assets/RNTN/P2RNTN.png b/_site/assets/RNTN/P2RNTN.png
new file mode 100755
index 00000000..1f99c784
Binary files /dev/null and b/_site/assets/RNTN/P2RNTN.png differ
diff --git a/_site/assets/RNTN/ParseTreeMVRNN.png b/_site/assets/RNTN/ParseTreeMVRNN.png
new file mode 100755
index 00000000..c2c18daf
Binary files /dev/null and b/_site/assets/RNTN/ParseTreeMVRNN.png differ
diff --git a/_site/assets/RNTN/RNN.png b/_site/assets/RNTN/RNN.png
new file mode 100755
index 00000000..46afade1
Binary files /dev/null and b/_site/assets/RNTN/RNN.png differ
diff --git a/_site/assets/RNTN/RNNModels.png b/_site/assets/RNTN/RNNModels.png
new file mode 100755
index 00000000..c2b00f52
Binary files /dev/null and b/_site/assets/RNTN/RNNModels.png differ
diff --git a/_site/assets/Swish/plot.png b/_site/assets/Swish/plot.png
new file mode 100644
index 00000000..802cd5bf
Binary files /dev/null and b/_site/assets/Swish/plot.png differ
diff --git a/_site/assets/topk/eq1.png b/_site/assets/topk/eq1.png
new file mode 100644
index 00000000..c2dfbbad
Binary files /dev/null and b/_site/assets/topk/eq1.png differ
diff --git a/_site/assets/topk/eq2.png b/_site/assets/topk/eq2.png
new file mode 100644
index 00000000..4e28741a
Binary files /dev/null and b/_site/assets/topk/eq2.png differ
diff --git a/_site/site/2017/04/27/VQA-Visual-Question-Answering.html b/_site/site/2017/04/27/VQA-Visual-Question-Answering.html
new file mode 100644
index 00000000..36f3c864
--- /dev/null
+++ b/_site/site/2017/04/27/VQA-Visual-Question-Answering.html
@@ -0,0 +1,106 @@
+<h3 id="problem-statement">Problem Statement</h3>
+
+<ul>
+  <li>
+    <p>Given an image and a free-form, open-ended, natural language question (about the image), produce the answer for the image.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1505.00468v6">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h3 id="vqa-challenge-and-workshop"><a href="http://www.visualqa.org/">VQA Challenge and Workshop</a></h3>
+
+<ul>
+  <li>The authors organise an annual challenge and workshop to discuss the state-of-the-art methods and best practices in this domain.</li>
+  <li>Interestingly, the second version is starting on 27th April 2017 (today).</li>
+</ul>
+
+<h3 id="benefits-over-tasks-like-image-captioning">Benefits over tasks like image captioning:</h3>
+
+<ul>
+  <li>Simple, <em>n-gram</em> statistics based methods are not sufficient.</li>
+  <li>Requires the system to blend in different aspects of knowledge - object detection, activity recognition, commonsense reasoning etc.</li>
+  <li>Since only short answers are expected, evaluation is easier.</li>
+</ul>
+
+<h3 id="dataset">Dataset</h3>
+
+<ul>
+  <li>Created a new dataset of 50000 realistic, abstract images.</li>
+  <li>Used AMT to crowdsource the task of collecting questions and answers for MS COCO dataset (&gt;200K images) and abstract images.</li>
+  <li>Three questions per image and ten answers per question (along with their confidence) were collected.</li>
+  <li>The entire dataset contains over 760K questions and 10M answers.</li>
+  <li>The authors also performed an exhaustive analysis of the dataset to establish its diversity and to explore how the content of these question-answers differ from that of standard image captioning datasets.</li>
+</ul>
+
+<h3 id="highlights-of-data-collection-methodology">Highlights of data collection methodology</h3>
+
+<ul>
+  <li>Emphasis on questions that require an image, and not just common sense, to be answered correctly.</li>
+  <li>Workers were shown previous questions when writing new questions to increase diversity.</li>
+  <li>Answers collected from multiple users to account for discrepancies in answers by humans.</li>
+  <li>Two modalities supported:
+    <ul>
+      <li><strong>Open-ended</strong> - produce the answer</li>
+      <li><strong>multiple-choice</strong> - select from a set of options provided (18 options comprising of popular, plausible, random and ofc correct answer)</li>
+    </ul>
+  </li>
+</ul>
+
+<h3 id="highlights-from-data-analysis">Highlights from data analysis</h3>
+
+<ul>
+  <li>Most questions range from four to ten words while answers range from one to three words.</li>
+  <li>Around 40% questions are “yes/no” questions.</li>
+  <li>Significant (&gt;80%) inter-human agreement for answers.</li>
+  <li>The authors performed a study where human evaluators were asked to answer the questions without looking at the images.</li>
+  <li>Further, they performed a study where evaluators were asked to label if a question could be answered using common sense and what was the youngest age group, they felt, could answer the question.</li>
+  <li>The idea was to establish that a sufficient number of questions in the dataset required more than just common sense to answer.</li>
+</ul>
+
+<h3 id="baseline-models">Baseline Models</h3>
+
+<ul>
+  <li><strong>random</strong> selection</li>
+  <li><strong>prior (“yes”)</strong> - always answer as yes.</li>
+  <li><strong>per Q-type prior</strong> - pick the most popular answer per question type.</li>
+  <li><strong>nearest neighbor</strong> - find the k nearest neighbors for the given (image, question) pair.</li>
+</ul>
+
+<h3 id="methods">Methods</h3>
+
+<ul>
+  <li>
+    <p>2-channel model (using vision and language models) followed by softmax over (K = 1000) most frequent answers.</p>
+  </li>
+  <li><strong>Image Channel</strong>
+    <ul>
+      <li><strong>I</strong> - Used last hidden layer of VGGNet to obtain 4096-dim image embedding.</li>
+      <li><strong>norm I</strong> - : l2 normalized version of <strong>I</strong>.</li>
+    </ul>
+  </li>
+  <li><strong>Question Channel</strong>
+    <ul>
+      <li><strong>BoW Q</strong> - Bag-of-Words representation for the questions using the top 1000 words plus the top 1- first, second and third words of the questions.</li>
+      <li><strong>LSTM Q</strong> - Each word is encoded into 300-dim vectors using fully connected + tanh non-linearity. These embeddings are fed to an LSTM to obtain 1024d-dim embedding.</li>
+      <li><strong>Deeper LSTM Q</strong> - Same as LSTM Q but uses two hidden layers to obtain 2048-dim embedding.</li>
+    </ul>
+  </li>
+  <li><strong>Multi-Layer Perceptron (MLP)</strong> - Combine image and question embeddings to obtain a single embedding.
+    <ul>
+      <li><strong>BoW Q + I</strong> method - concatenate BoW Q and I embeddings.</li>
+      <li><strong>LSTM Q + I, deeper LSTM Q + norm I</strong> methods - image embedding transformed to 1024-dim using a FC layer and tanh non-linearity followed by element-wise multiplication of image and question vectors.</li>
+    </ul>
+  </li>
+  <li>Pass combined embedding to an MLP - FC neural network with 2 hidden layers (1000 neurons and 0.5 dropout) with tanh, followed by softmax.</li>
+  <li>Cross-entropy loss with VGGNet parameters frozen.</li>
+</ul>
+
+<h3 id="results">Results</h3>
+
+<ul>
+  <li>Deeper LSTM Q + norm I is the best model with 58.16% accuracy on open-ended dataset and 63.09% on multiple-choice but far behind the human evaluators (&gt;80% and &gt;90% respectively).</li>
+  <li>The best model performs well for answers involving common visual objects but performs poorly for answers involving counts.</li>
+  <li>Vision only model performs even worse than the model which always produces “yes” as the answer.</li>
+</ul>
diff --git a/_site/site/2017/04/28/Simple-Baseline-for-Visual-Question-Answering.html b/_site/site/2017/04/28/Simple-Baseline-for-Visual-Question-Answering.html
new file mode 100644
index 00000000..049ba2b0
--- /dev/null
+++ b/_site/site/2017/04/28/Simple-Baseline-for-Visual-Question-Answering.html
@@ -0,0 +1,34 @@
+<h3 id="problem-statement">Problem Statement</h3>
+
+<ul>
+  <li>VQA Task: Given an image and a free-form, open-ended, natural language question (about the image), produce the answer for the image.</li>
+  <li>The paper attempts to fine tune the simple baseline method of Bag-of-Words + Image features (iBOWIMG) to make it competitive against more sophisticated LSTM models.</li>
+  <li><a href="http://arxiv.org/pdf/1512.02167.pdf">Link to the paper</a></li>
+</ul>
+
+<h3 id="model">Model</h3>
+
+<ul>
+  <li>VQA modelled as a classification task where the system learns to choose among one of the top k most prominent answers.</li>
+  <li><strong>Text Features</strong> - Convert input question to a one-hot vector and then transform to word vectors using a word embedding.</li>
+  <li><strong>Image Features</strong> - Last layer activations from GoogLeNet.</li>
+  <li>Text features are concatenated with image features and fed into a softmax.</li>
+  <li>Different learning rates and weight clipping for word embedding layer and softmax layer with the learning rate for embedding layer much higher than that of softmax layer.</li>
+</ul>
+
+<h3 id="results">Results</h3>
+
+<ul>
+  <li>iBOWIMG model reports an accuracy of 55.89% for Open-ended questions and 61.97% for Multiple-Choice questions which is comparable to the performance of other, more sophisticated models.</li>
+</ul>
+
+<h3 id="interpretation-of-the-model">Interpretation of the model</h3>
+
+<ul>
+  <li>Since the model is very simple, it is possible to interpret the model to know what exactly is the model learning. This is the greatest strength of the paper even though the model is very simple and naive.</li>
+  <li>The model attempts to memorise the correlation between the answer class and the informative words (in the question) and image features.</li>
+  <li>Question words generally can influence the answer given the bias in images occurring in COCO dataset.</li>
+  <li>Given the simple linear transformation being used, it is possible to quantify the importance of each single words (in the question) to the answer.</li>
+  <li>The paper uses the Class Activation Mapping (CAM) approach (which uses the linear relation between softmax and final image feature map) to highlight the informative image regions relevant to the predicted answer.</li>
+  <li>While the results reported by the paper are not themselves so significant, the described approach provides a way to interpret the strengths and weakness of different VQA datasets.</li>
+</ul>
diff --git a/_site/site/2017/05/07/Conditional-Similarity-Networks.html b/_site/site/2017/05/07/Conditional-Similarity-Networks.html
new file mode 100644
index 00000000..3d9c83f2
--- /dev/null
+++ b/_site/site/2017/05/07/Conditional-Similarity-Networks.html
@@ -0,0 +1,103 @@
+<h2 id="problem-statement">Problem Statement</h2>
+
+<ul>
+  <li>A common way of measuring image similarity is to embed them into feature spaces where distance acts as a proxy for similarity.</li>
+  <li>But this feature space can capture one (or a weighted combination) of the many possible notions of similarity.</li>
+  <li>What if contracting notions of similarity could be captured at the same time - in terms of semantically distinct subspaces.</li>
+  <li>The paper proposes a new architecture called as Conditional Similarity Networks (CSNs) which learns a disentangled embedding such that the features, for different notions of similarity, are encoded into separate dimensions.</li>
+  <li>It jointly learns masks (or feature extractors) that select and reweights relevant dimensions to induce a subspace that encodes a specific notion of similarity.</li>
+  <li><a href="https://vision.cornell.edu/se3/conditional-similarity-networks/">Link to the paper</a></li>
+</ul>
+
+<h2 id="conditional-similarity-networks">Conditional Similarity Networks</h2>
+
+<ul>
+  <li>Given an image, <em>x</em>, learn a non-linear feature embedding <em>f(x)</em> such that for any 2 images <em>x<sub>1</sub></em> and <em>x<sub>2</sub></em>, the euclidean distance between <em>f(x<sub>1</sub>)</em> and <em>f(x<sub>2</sub>)</em> reflects their similarity.</li>
+</ul>
+
+<h3 id="conditional-similarity-triplets">Conditional Similarity Triplets</h3>
+
+<ul>
+  <li>Given a triplet of images <em>(x<sub>1</sub>, x<sub>2</sub>, x<sub>3</sub>)</em> and a condition <em>c</em> (the notion of similarity), an oracle (say crowd) is used to determmine if <em>x<sub>1</sub></em> is more similar to <em>x<sub>2</sub></em> or <em>x<sub>3</sub></em> as per the given criteria <em>c</em>.</li>
+  <li>In general, for images <em>i, j, l</em>, the triplet <em>t</em> is ordered {i, j, l | c} if <em>i</em> is more similar to <em>j</em> than <em>l</em>.</li>
+</ul>
+
+<h3 id="learning-from-triplets">Learning From Triplets</h3>
+
+<ul>
+  <li>Define a loss function <em>L<sub>T</sub>()</em> to model the similarity structure over the triplets.</li>
+  <li><em>L<sub>T</sub>(i, j, l) = max{0, D(i, j) - D(i, l) + h}</em> where <em>D</em> is the euclidean distance function and <em>h</em> is the similarity scalar margin to prevent trivial solutions.</li>
+  <li>To model conditional similarities, masks <em>m</em> are defined as <em>m = σ(β)</em> where σ is the RELU unit and β is a set of parameters to be learnt.</li>
+  <li><em>m<sub>c</sub></em> denotes the selection of the c-th mask column from feature vector. It thus acts as an element-wise gating function which selects the relevant dimensions of the embedding to attend to a particular similarity concept.</li>
+  <li>The euclidean function <em>D</em> now computes the masked distance (<em>f(i, c)m<sub>c</sub></em>) between the two given images.</li>
+  <li>Two regularising terms are also added - L2 norm for <em>D</em> and L1 norm for <em>m</em>.</li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<h3 id="datasets">Datasets</h3>
+
+<ul>
+  <li>Fonts dataset by Bernhardsson
+    <ul>
+      <li>3.1 million 64 by 64-pixel grey scale images.</li>
+    </ul>
+  </li>
+  <li>Zappos50k shoe dataset
+    <ul>
+      <li>Contains 50,000 images of individual richly annotated shoes.</li>
+      <li>Characteristics of interest:
+        <ul>
+          <li>Type of the shoes (i.e., shoes, boots, sandals or slippers)</li>
+          <li>Suggested gender of the shoes (i.e., for women, men, girls or boys)</li>
+          <li>Height of the shoes’ heels (0 to 5 inches)</li>
+          <li>Closing mechanism of the shoes (buckle, pull on, slip on, hook and loop or laced up)</li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h3 id="models">Models</h3>
+
+<ul>
+  <li>Initial model for the experiments is a ConvNet pre-trained on ImageNet</li>
+  <li><strong>Standard Triplet Network</strong>
+    <ul>
+      <li>Learn from all available triplets jointly as if they have the same notion of similarity.</li>
+    </ul>
+  </li>
+  <li><strong>Set of Task Specific Triplet Networks</strong>
+    <ul>
+      <li>Train n separate triplet networks such that each is trained on a single notion of similarity.</li>
+      <li>Needs far more parameters and compute.</li>
+    </ul>
+  </li>
+  <li><strong>Conditional Similarity Networks - fixed disjoint masks</strong>
+    <ul>
+      <li>In this version, only the convolutional filters and the embedding is learnt and masks are predefined to be disjoint.</li>
+      <li>Aims to learn a fully disjoint embedding.</li>
+    </ul>
+  </li>
+  <li><strong>Conditional Similarity Networks - learned masks</strong>
+    <ul>
+      <li>Learns all the components - conv filters, embedding and the masks.</li>
+    </ul>
+  </li>
+  <li>Refer paper for details on hyperparameters.</li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<ul>
+  <li>Visual exploration of the learned subspaces (t-sne visualisation) show that network successfully disentangles different features in the embedded vector space.</li>
+  <li>The learned masks are very sparse and share dimensions. This shows that CSNs may learn to only use the required number of dimensions thereby doing away with the need of picking the right size of embedding.</li>
+  <li>Order of performance:
+    <ul>
+      <li>CSNs with learned masks &gt; CSNs with fixed masks &gt; Task-specific networks &gt; standard triplet network.</li>
+      <li>Though CSNs with learned masks require more training data.</li>
+    </ul>
+  </li>
+  <li>CSNs also outperform Standard Triplet Network when used as off the shelf features for (brand) classification task and is very close to the performance of ResNet trained on ImageNet.</li>
+  <li>This shows that while CSN retained most of the information in the original network, the training mechanism of Standard Triplet Network hurts the underlying conv features and their generalising capability</li>
+</ul>
diff --git a/_site/site/2017/05/14/Making-the-V-in-VQA-Matter-Elevating-the-Role-of-Image-Understanding-in-Visual-Question-Answering.html b/_site/site/2017/05/14/Making-the-V-in-VQA-Matter-Elevating-the-Role-of-Image-Understanding-in-Visual-Question-Answering.html
new file mode 100644
index 00000000..f1e7cfb1
--- /dev/null
+++ b/_site/site/2017/05/14/Making-the-V-in-VQA-Matter-Elevating-the-Role-of-Image-Understanding-in-Visual-Question-Answering.html
@@ -0,0 +1,38 @@
+<h3 id="problem-statement">Problem Statement</h3>
+
+<ul>
+  <li>Standard VQA models benefit from the inherent bias in the structure of the world and the language of the question.</li>
+  <li>For example, if the question starts with “Do you see a …”, it is more likely to be “yes” than “no”.</li>
+  <li>To truly assess the capability of any VQA system, we need to have evaluation tasks that require the use of both the visual and the language modality.</li>
+  <li>The authors present a balanced version of <a href="https://shagunsodhani.in/papers-I-read/VQA-Visual-Question-Answering">VQA dataset</a> where each question in the dataset is associated with a pair of similar images such that the same question would give different answers on the two images.</li>
+  <li>The proposed data collection procedure enables the authors to develop a novel interpretable model which, given an image and a question, identifies an image that is similar to the original image but has a different answer to the same question thereby building trust for the system.</li>
+  <li><a href="https://arxiv.org/abs/1612.00837">Link to the paper</a></li>
+</ul>
+
+<h3 id="dataset-collection">Dataset Collection</h3>
+
+<ul>
+  <li>Given an (image, question, answer) triplet (I, Q, A) from the VQA dataset, a human worker (on AMT) is asked to identify an image I’ which is similar to I but for which the answer to question Q is A’ (different from A).</li>
+  <li>To facilitate the search for I’, the worker is shown 24 nearest-neighbor images of I (based on VGGNet features) and is asked to choose the most similar image to I, for which Q makes sense and answer for Q is different than A. In case none of the 24 images qualifies, the worker may select “not possible”.</li>
+  <li>In the second round, the workers were asked to answer Q for I’.</li>
+  <li>This 2-stage protocol results in a significantly more balanced dataset than the previous dataset.</li>
+</ul>
+
+<h3 id="observation">Observation</h3>
+
+<ul>
+  <li>State-of-the-art models trained on unbalanced VQA dataset perform significantly worse on the new, balanced dataset indicating that those models benefitted from the language bias in the older dataset.</li>
+  <li>Training on balanced dataset improves performance on the unbalanced dataset.</li>
+  <li>Further, the VQA model, trained on the balanced dataset, learns to differentiate between otherwise similar images.</li>
+</ul>
+
+<h3 id="counter-example-explanations">Counter-example Explanations</h3>
+
+<ul>
+  <li>Given an image and a question, the model not only answers the question, it also provides an image (from the k nearest neighbours of I, based on VGGNet features) which is similar to the input image but for which the model would have given different answer for the same image.</li>
+  <li>Supervising signal is provided by the data collection procedure where humans pick the image I’ from the same set of candidate images.</li>
+  <li>For each image in the candidate set, compute the inner product of question-image embedding and answer embedding.</li>
+  <li>The K inner product values are passed through a fully connected layer to generate K scores.</li>
+  <li>Trained with pairwise hinge ranking loss so that the score of the human picked image is higher than the score of all other images by a margin of M (hyperparameter).</li>
+  <li>The proposed explanation model achieves a recall@5 of 43.49%</li>
+</ul>
diff --git a/_site/site/2017/05/23/Neural-Module-Networks.html b/_site/site/2017/05/23/Neural-Module-Networks.html
new file mode 100644
index 00000000..4cc6246c
--- /dev/null
+++ b/_site/site/2017/05/23/Neural-Module-Networks.html
@@ -0,0 +1,74 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>For the task of <a href="https://shagunsodhani.in/papers-I-read/VQA-Visual-Question-Answering">Visual Question Answering</a>, decompose a question into its linguistic substructures and train a neural network module for each substructure.</li>
+  <li>Jointly train the modules and dynamically compose them into deep networks which can learn to answer the question.</li>
+  <li>Start by analyzing the question and decide what logical units are needed to answer the question and what should be the relationship between them.</li>
+  <li>The paper also introduces a new dataset for Visual Question Answering which has challenging, highly compositional questions about abstract shapes.</li>
+  <li><a href="https://arxiv.org/abs/1511.02799">Link to the paper</a></li>
+</ul>
+
+<h2 id="inspiration">Inspiration</h2>
+
+<ul>
+  <li>Questions tend to be compositional.</li>
+  <li>Different architectures are needed for different tasks - CNNs for object detection, RNNs for counting.</li>
+  <li>Recurrent and Recursive Neural Networks also use the idea of a different network graph for each input.</li>
+</ul>
+
+<h2 id="neural-module-network-for-vqa">Neural Module Network for VQA</h2>
+
+<ul>
+  <li>Training samples of form <em>(w, x, y)</em>
+    <ul>
+      <li><em>w</em> - Natural Language Question</li>
+      <li><em>x</em> - Images</li>
+      <li><em>y</em> - Answer</li>
+    </ul>
+  </li>
+  <li>Model specified by collection of modules <em>{m}</em> and a network layout predictor <em>P</em>.</li>
+  <li>Model instantiates a network based on <em>P(w)</em> and uses that to encode a distribution <em>P(y|w, x, model_params)</em></li>
+</ul>
+
+<h2 id="modules">Modules</h2>
+
+<ul>
+  <li>Find: Finds objects of interest.</li>
+  <li>Transform: Shift regions of attention.</li>
+  <li>Combine: Merge two attention maps into a single one.</li>
+  <li>Describe: Map a pair of attention and input image to a distribution over the labels.</li>
+  <li>Measure: Map attention to a distribution over the labels.</li>
+</ul>
+
+<h2 id="natural-language-question-to-networks">Natural Language Question to Networks</h2>
+
+<ul>
+  <li>Map question to the layout which specifies the set of modules and connections between them.</li>
+  <li>Assemble the final network using the layout.</li>
+  <li>Parse the input question to obtain set of dependencies and obtain a representation similar to combinatory logic.</li>
+  <li>eg “what is the colour of the truck?” becomes “colour(truck)”</li>
+  <li>The symbolic representation is mapped to a layout:
+    <ul>
+      <li>All leaves become <em>find</em> module.</li>
+      <li>All internal nodes become <em>transform/combine</em> module.</li>
+      <li>All root nodes become <em>describe/measure</em> module.</li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="answering-natural-language-question">Answering Natural Language Question</h2>
+
+<ul>
+  <li>Final model combines output from a simple LSTM question encoder with the output of the neural module network.</li>
+  <li>This helps in modelling the syntactic and semantic regularities of the question.</li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>Since some modules are updated more frequently than others, adaptive per weight learning rates are better.</li>
+  <li>The paper introduces a small SHAPES datasets (64 images and 244 unique questions per image).</li>
+  <li>Neural Module Network achieves a score of 90% on SHAPES dataset while VIS + LSTM baseline achieves an accuracy of 65.3%.</li>
+  <li>Even on natural images (VQA dataset), the neural module network outperforms the VIS + LSTM baseline.</li>
+</ul>
+
diff --git a/_site/site/2017/06/03/A-Fast-and-Accurate-Dependency-Parser-using-Neural-Networks.html b/_site/site/2017/06/03/A-Fast-and-Accurate-Dependency-Parser-using-Neural-Networks.html
new file mode 100644
index 00000000..e554c0da
--- /dev/null
+++ b/_site/site/2017/06/03/A-Fast-and-Accurate-Dependency-Parser-using-Neural-Networks.html
@@ -0,0 +1,67 @@
+<h2 id="introduction">Introduction</h2>
+<ul>
+  <li>The paper proposes a neural network classifier to perform transition-based dependency parsing using dense vector representation for the features.</li>
+  <li>Earlier approaches used a large, manually designed sparse feature vector which took a lot of time and effort to compute and was often incomplete.</li>
+  <li><a href="http://cs.stanford.edu/people/danqi/papers/emnlp2014.pdf">Link to the paper</a></li>
+</ul>
+
+<h2 id="description-of-the-system">Description of the system</h2>
+
+<ul>
+  <li>The system described in the paper uses <a href="http://www.mitpressjournals.org/doi/pdf/10.1162/coli.07-056-R1-07-027"><strong>arc-standard</strong> system</a> (a greedy, transition-based dependency parsing system).</li>
+  <li>Words, POS tags and arc labels are represented as d dimensional vectors.</li>
+  <li>S<sup>w</sup>, S<sup>t</sup>, S<sup>l</sup> denote the set of words, POS and labels respectively.</li>
+  <li>Neural network takes as input selected words from the 3 sets and uses a single hidden layer followed by Softmax which models the different actions that can be chosen by the arc-standard system.</li>
+  <li>Uses a cube activation function to allow interaction between features coming from the set of words, POS and labels in the first layer itself. These features come from different embeddings and are not related as such.</li>
+  <li>Using separate embedding for POS tags and labels allow for capturing aspects like NN (singular noun) should be closer to NNS (plural noun) than DT (determiner).</li>
+  <li>Input to the network contains words on the stack and buffer and their left and right children (read upon transition-based parsing), their labels and corresponding arc labels.</li>
+  <li>Output generated by the system is the action to be taken (transition to be performed) when reading each word in the input.</li>
+  <li>This sequential and deterministic nature of the input-output mapping allows the problem to be modelled as a supervised learning problem and a cross entropy loss can be used.</li>
+  <li>L2-regularization term is also added to the loss.</li>
+  <li>During inference, a greedy decoding strategy is used and transition with the highest score is chosen.</li>
+  <li>The paper mentions a pre-computation trick where matrix computation of most frequent top 10000 words is performed beforehand and cached.</li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>Dataset
+    <ul>
+      <li>English Penn Treebank (PTB)</li>
+      <li>Chinese Penn Treebank (CTB)</li>
+    </ul>
+  </li>
+  <li>Two dependency representations used:
+    <ul>
+      <li>CoNLL Syntactic Dependencies (CD)</li>
+      <li>Stanford Basic Dependencies (SD)</li>
+    </ul>
+  </li>
+  <li>Metrics:
+    <ul>
+      <li>Unlabeled Attached Scores (UAS)</li>
+      <li>Labeled Attached Scores (LAS)</li>
+    </ul>
+  </li>
+  <li>Benchmarked against:
+    <ul>
+      <li>Greedy arc-eager parser</li>
+      <li>Greedy arc-standard parser</li>
+      <li>Malt-Parser</li>
+      <li>MSTParser</li>
+    </ul>
+  </li>
+  <li>Results
+    <ul>
+      <li>The system proposed in the paper outperforms all other parsers in both speed and accuracy.</li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="analysis">Analysis</h2>
+
+<ul>
+  <li>Cube function gives a 0.8-1.2% improvement over tanh.</li>
+  <li>Pretained embeddings give 0.7-1.7% improvement over training embeddings from scratch.</li>
+  <li>Using POS and labels gives an improvement of 1.7% and 0.4% respectively.</li>
+</ul>
diff --git a/_site/site/2017/06/17/A-Decomposable-Attention-Model-for-Natural-Language-Inference.html b/_site/site/2017/06/17/A-Decomposable-Attention-Model-for-Natural-Language-Inference.html
new file mode 100644
index 00000000..d99ce674
--- /dev/null
+++ b/_site/site/2017/06/17/A-Decomposable-Attention-Model-for-Natural-Language-Inference.html
@@ -0,0 +1,66 @@
+<h3 id="introduction">Introduction</h3>
+
+<ul>
+  <li>The paper proposes an attention based mechanism to decompose the problem of Natural Language Inference (NLI) into parallelizable subproblems.</li>
+  <li>Further, it uses much fewer parameters as compared to any other model while obtaining state of the art results.</li>
+  <li><a href="https://arxiv.org/abs/1606.01933">Link to the paper</a></li>
+  <li>The motivation behind the paper is that the tasks like NLI do not require deep modelling of the sentence structure and comparison of local text substructures followed by aggregation can also work very well</li>
+</ul>
+
+<h3 id="approach">Approach</h3>
+
+<ul>
+  <li>
+    <p>Given two sentences <strong>a</strong> and <strong>b</strong>, the model has to predict whether they have an “entailment” relationship, “neutral” relationship or “contradiction” relationship.</p>
+  </li>
+  <li><strong>Embed</strong>
+    <ul>
+      <li>All the words are mapped to their corresponding word vector representation. In subsequent steps, “word” refers to the word vector representation of the actual word.</li>
+    </ul>
+  </li>
+  <li><strong>Attend</strong>
+    <ul>
+      <li>For each word <em>i</em> in <strong>a</strong> and <em>j</em> in <strong>b</strong>, obtain unnormalized attention weights *e(i, j)=F(i)<sup>T</sup>F(j) where F is a feed-forward neural network.</li>
+      <li>For <em>i</em>, compute a β<sub>i</sub> by performing softmax-like normalization of <em>j</em> using <em>e(i, j)</em> as the weight and normalizing for all words <em>j</em> in <strong>b</strong>.</li>
+      <li>β<sub>i</sub> captures the subphrase in <strong>b</strong> that is softly aligned to <em>a</em>.</li>
+      <li>Similarly compute α<sub>j</sub> for <em>j</em>.</li>
+    </ul>
+  </li>
+  <li><strong>Compare</strong>
+    <ul>
+      <li>Create two set of comparison vectors, one for <strong>a</strong> and another for <strong>b</strong></li>
+      <li>For <strong>a</strong>, <strong>v<sub>1, i</sub></strong> = G(concatenate(i, β<sub>i</sub>)).</li>
+      <li>Similarly for <strong>b</strong>, <strong>v<sub>2, j</sub></strong> = G(concatenate(j, α<sub>j</sub>))</li>
+      <li>G is another feed-forward neural network.</li>
+    </ul>
+  </li>
+  <li><strong>Aggregate</strong>
+    <ul>
+      <li>Aggregate over the two set of comparison vectors to obtain <strong>v<sub>1</sub></strong> and <strong>v<sub>2</sub></strong>.</li>
+      <li>Feed the aggregated results through the final classifier layer.</li>
+      <li>Multi-class cross-entropy loss function.</li>
+    </ul>
+  </li>
+  <li>The paper also explains how this representation can be augmented using intra-sentence attention to the model compositional relationship between words.</li>
+</ul>
+
+<h3 id="computational-complexity">Computational Complexity</h3>
+
+<ul>
+  <li>Computationally, the proposed model is asymptotically as good as LSTM with attention.</li>
+  <li>Assuming that dimensionality of word vectors &gt; length of the sentence (reasonable for the given SNLI dataset), the model is asymptotically as good as regular LSTM.</li>
+  <li>Further, the model has the advantage of being parallelizable.</li>
+</ul>
+
+<h3 id="experiment">Experiment</h3>
+
+<ul>
+  <li>On Stanford Natural Language Inference (SNLI) dataset, the proposed model achieves the state of the art results even when it uses an order of magnitude lesser parameters than the next best model.</li>
+  <li>Adding intra-sentence attention further improve the test accuracy by 0.5 percent.</li>
+</ul>
+
+<h3 id="notes">Notes</h3>
+
+<ul>
+  <li>A similar approach could be tried on paraphrase detection problem as even that problem should not require very deep sentence representation. <a href="https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs">Quora Duplicate Question Detection Challenege</a>  would have been an ideal dataset but it has a lot of out-of-vocabulary information related to named entities which need to be accounted for.</li>
+</ul>
diff --git a/_site/site/2017/06/26/Two-Too-Simple-Adaptations-of-Word2Vec-for-Syntax-Problems.html b/_site/site/2017/06/26/Two-Too-Simple-Adaptations-of-Word2Vec-for-Syntax-Problems.html
new file mode 100644
index 00000000..c0c28d97
--- /dev/null
+++ b/_site/site/2017/06/26/Two-Too-Simple-Adaptations-of-Word2Vec-for-Syntax-Problems.html
@@ -0,0 +1,9 @@
+<ul>
+  <li>The paper proposes two variants of Word2Vec model so that it may account for syntactic properties of words and perform better on syntactic tasks like POS tagging and dependency parsing.</li>
+  <li><a href="http://www.cs.cmu.edu/~lingwang/papers/naacl2015.pdf">Link to the paper</a></li>
+  <li>In the original Skip-Gram setting, the model predicts the <em>2c</em> words in the context window (<em>c</em> is the size of the context window). But it uses the same set of parameters whether predicting the word next to the centre word or the word farthest away, thus losing all information about the word order.</li>
+  <li>Similarly, the CBOW (Continuous Bas Of Words) model just adds the embedding of all the surrounding words thereby losing the word order information.</li>
+  <li>The paper proposes to use a set of <em>2c</em> matrices each for a different word in the context window for both Skip-Gram and CBOW models.</li>
+  <li>This simple trick allows for accounting of syntactic properties in the word vectors and improves the performance of dependency parsing task and POS tagging.</li>
+  <li>The downside of using this is that now the model has far more parameters than before which increases the training time and needs a large enough corpus to avoid sparse representation.</li>
+</ul>
diff --git a/_site/site/2017/07/01/One-Model-To-Learn-Them-All.html b/_site/site/2017/07/01/One-Model-To-Learn-Them-All.html
new file mode 100644
index 00000000..a06941bd
--- /dev/null
+++ b/_site/site/2017/07/01/One-Model-To-Learn-Them-All.html
@@ -0,0 +1,177 @@
+<ul>
+  <li>
+    <p>The current trend in deep learning is to design, train and fine tune a separate model for each problem.</p>
+  </li>
+  <li>
+    <p>Though multi-task models have been explored, they have been trained for problems from the same domain only and no competitive multi-task, multi-modal models have been proposed.</p>
+  </li>
+  <li>
+    <p>The paper explores the possibility of such a unified deep learning model that can solve different tasks across multiple domains by training concurrently on them.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1706.05137">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="design-philosophy">Design Philosophy</h2>
+
+<ul>
+  <li>
+    <p>Small, modality-specific subnetworks (called modality nets) should be used to map input data to a joint representation space and back.</p>
+
+    <ul>
+      <li>
+        <p>The joint representation is to be of variable size.</p>
+      </li>
+      <li>
+        <p>Different tasks from the same domain share the modality net.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>MultiModel networks should use computational blocks from different domains even if they are not specifically designed for the task at hand.</p>
+
+    <ul>
+      <li>Eg the paper reports that attention and mixture-of-experts (MOE) layers slightly improve the performance on ImageNet even though they are not explicitly needed.</li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="architecture">Architecture</h2>
+
+<ul>
+  <li>
+    <p>MulitModel Network consists of few, small modality nets, an encoder, I/O mixer and an autoregressive decoder.</p>
+  </li>
+  <li>
+    <p>Encoder and decoder use the following computational blocks:</p>
+
+    <ul>
+      <li>
+        <p><strong>Convolutional Block</strong></p>
+
+        <ul>
+          <li>ReLU activations on inputs followed by depthwise separable convolutions and layer normalization.</li>
+        </ul>
+      </li>
+      <li>
+        <p><strong>Attention Block</strong></p>
+
+        <ul>
+          <li>Multihead, dot product based attention mechanism.</li>
+        </ul>
+      </li>
+      <li>
+        <p><strong>Mixture-of-Experts (MoE) Block</strong></p>
+
+        <ul>
+          <li>Consists of simple feed-forward networks (called experts) and a trainable gating network which selects a sparse combination of experts to process the inputs.</li>
+        </ul>
+      </li>
+      <li>
+        <p>For further details, refer the <a href="https://arxiv.org/abs/1706.05137">original paper</a>.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p><strong>Encoder</strong> consists of 6 conv blocks with a MoE block in the middle.</p>
+  </li>
+  <li>
+    <p><strong>I/O mixer</strong> consists of an attention block and 2 conv blocks.</p>
+  </li>
+  <li>
+    <p><strong>Decoder</strong> consists of 4 blocks of convolution and attention with a MoE block in the middle.</p>
+  </li>
+  <li>
+    <p><strong>Modality Nets</strong></p>
+
+    <ul>
+      <li>
+        <p><strong>Language Data</strong></p>
+
+        <ul>
+          <li>
+            <p>Input is the sequence of tokens ending in a termination token.</p>
+          </li>
+          <li>
+            <p>This sequence is mapped to correct dimensionality using a learned embedding.</p>
+          </li>
+          <li>
+            <p>For output, the network takes the decoded output and performs a learned linear mapping followed by Softmax.</p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p><strong>Image</strong> and <strong>Categorical Data</strong></p>
+
+        <ul>
+          <li>
+            <p>Uses residual convolution blocks.</p>
+          </li>
+          <li>
+            <p>Similar to the exit flow for <a href="https://arxiv.org/abs/1610.02357">Xception Network</a></p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p><strong>Audio Data</strong></p>
+
+        <ul>
+          <li>1-d waveform over time or 2-d spectrogram operated upon by stack of 8 residual convolution blocks.</li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="tasks">Tasks</h2>
+
+<ul>
+  <li>
+    <p>WSJ speech corpus</p>
+  </li>
+  <li>
+    <p>ImageNet dataset</p>
+  </li>
+  <li>
+    <p>COCO image captioning dataset</p>
+  </li>
+  <li>
+    <p>WSJ parsing dataset</p>
+  </li>
+  <li>
+    <p>WMT English-German translation corpus</p>
+  </li>
+  <li>
+    <p>German-English translation</p>
+  </li>
+  <li>
+    <p>WMT English-French translation corpus</p>
+  </li>
+  <li>
+    <p>German-French translation</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>The experimental section is not very rigorous with many details skipped (would probably be added later).</p>
+  </li>
+  <li>
+    <p>While MultiModel does not beat the state of the art models, it does outperform some recent models.</p>
+  </li>
+  <li>
+    <p>Jointly trained model performs similar to single trained models on tasks with a lot of data and sometimes outperformed single trained models on tasks with less data (like parsing).</p>
+  </li>
+  <li>
+    <p>Interestingly, jointly training the model for parsing task and Imagenet tasks improves the performance of parsing task even though the two tasks are seemingly unrelated.</p>
+  </li>
+  <li>
+    <p>Another experiment was done to evaluate the effect of components (like MoE) on tasks (like Imagenet) which do not explicitly need them. It was observed that either the performance either went down or remained the same when MoE component was removed. This indicates that mixing different components does help to improve performance over multiple tasks.</p>
+  </li>
+  <li>
+    <p>But this observation is not conclusive as a different combination of say the encoder (that does not use MoE) could achieve better performance than one that does. The paper does not explore possibilities like these.</p>
+  </li>
+</ul>
diff --git a/_site/site/2017/07/09/Ask-Me-Anything-Dynamic-Memory-Networks-for-Natural-Language-Processing.html b/_site/site/2017/07/09/Ask-Me-Anything-Dynamic-Memory-Networks-for-Natural-Language-Processing.html
new file mode 100644
index 00000000..3d7cc9a8
--- /dev/null
+++ b/_site/site/2017/07/09/Ask-Me-Anything-Dynamic-Memory-Networks-for-Natural-Language-Processing.html
@@ -0,0 +1,153 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Dynamic Memory Networks (DMN) is a neural network based general framework that can be used for tasks like sequence tagging, classification, sequence to sequence and question answering requiring transitive reasoning.</p>
+  </li>
+  <li>
+    <p>The basic idea is that all these tasks can be modelled as question answering task in general and a common architecture could be used for solving them.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1506.07285">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="architecture">Architecture</h2>
+
+<ul>
+  <li>DMN takes as input a document(sentence, story, article etc) and a question which is to be answered given the document.</li>
+</ul>
+
+<h3 id="input-module">Input Module</h3>
+
+<ul>
+  <li>
+    <p>Concatenate all the sentences (or facts) in the document and encode them by feeding the word embeddings of the text to a GRU.</p>
+  </li>
+  <li>
+    <p>Each time a sentence ends, extract the hidden representation of the GRU till that point and use as the encoded representation of the sentence.</p>
+  </li>
+</ul>
+
+<h3 id="question-module">Question Module</h3>
+
+<ul>
+  <li>Similarly, feed the question to a GRU to obtain its representation.</li>
+</ul>
+
+<h3 id="episodic-memory-module">Episodic Memory Module</h3>
+
+<ul>
+  <li>
+    <p>Episodic memory consists of an attention mechanism and a recurrent network with which it updates its memory.</p>
+  </li>
+  <li>
+    <p>During each iteration, the network generates an episode <em>e</em> by attending over the representation of the sentences, question and the previous memory.</p>
+  </li>
+  <li>
+    <p>The episodic memory is updated using the current episode and the previous memory.</p>
+  </li>
+  <li>
+    <p>Depending on the amount of supervision available, the network may perform multiple passes. eg, in the bAbI dataset, some tasks specify how many passes would be needed and which sentence should be attended to in each pass. For others, a fixed number of passes are made.</p>
+  </li>
+  <li>
+    <p>Multiple passes allow the network to perform transitive inference.</p>
+  </li>
+</ul>
+
+<h3 id="attention-mechanism">Attention Mechanism</h3>
+
+<ul>
+  <li>
+    <p>Given the input representation <em>c</em>, memory <em>m</em> and question <em>q</em>, produce a scalar score using a 2-layer feedforward network, to use as attention mechanism.</p>
+  </li>
+  <li>
+    <p>A separate GRU encodes the input representation and weights it by the attention.</p>
+  </li>
+  <li>
+    <p>Final state of the GRU is fed to the answer module.</p>
+  </li>
+</ul>
+
+<h3 id="answer-module">Answer Module</h3>
+
+<ul>
+  <li>Use a GRU (initialized with the final state of the episodic module) and at each timestep, feed it the question vector, last hidden state of the same GRU and the previously predicted output.</li>
+</ul>
+
+<h3 id="training">Training</h3>
+
+<ul>
+  <li>There are two possible losses:
+    <ul>
+      <li>Cross-entropy loss of the predicted answer (all datasets)</li>
+      <li>Cross-entropy loss of the attention supervision (for datasets like bAbI)</li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<h3 id="question-answering">Question Answering</h3>
+
+<ul>
+  <li>
+    <p>bAbI Dataset</p>
+  </li>
+  <li>
+    <p>For most tasks, DMN either outperforms or performs as good as Memory Networks.</p>
+  </li>
+  <li>
+    <p>For tasks like answering with 2 or 3 supporting facts, DMN lags because of limitation of RNN in modelling long sentences.</p>
+  </li>
+</ul>
+
+<h3 id="text-classification">Text Classification</h3>
+
+<ul>
+  <li>
+    <p>Stanford Sentiment Treebank Dataset</p>
+  </li>
+  <li>
+    <p>DMN outperforms all the baselines for both binary and fine-grained sentiment analysis.</p>
+  </li>
+</ul>
+
+<h3 id="sequence-tagging">Sequence Tagging</h3>
+
+<ul>
+  <li>
+    <p>Wall Street Journal Dataset</p>
+  </li>
+  <li>
+    <p>DMN archives state of the art accuracy of 97.56%</p>
+  </li>
+</ul>
+
+<h2 id="observations">Observations</h2>
+
+<ul>
+  <li>
+    <p>Multiple passes help in reasoning tasks but not so much for sentiment/POS tags.</p>
+  </li>
+  <li>
+    <p>Attention in the case of 2-iteration DMN is more focused than attention in 1-iteration DMN.</p>
+  </li>
+  <li>
+    <p>For 2-iteration DMN, attention in the second iteration focuses only on relevant words and less attention is paid to words that lose their relevance in the context of the entire document.</p>
+  </li>
+</ul>
+
+<h2 id="notes">Notes</h2>
+
+<ul>
+  <li>
+    <p>It would be interesting to put some mechanism in place to determine the number of episodes that should be generated before an answer is predicted. A naive way would be to predict the answer after each episode and check if the softmax score of the predicted answer is more than a threshold.</p>
+  </li>
+  <li>
+    <p>Alternatively, the softmax score and other information could be fed to a Reinforcement Learning (RL) agent which decided if the document should be read again. So every time an episode is generated, the state is passed to the RL agent which decides if another iteration should be performed. If it decides to predict the answer and correct answer is generated, the agent gets a large +ve reward else a large -ve reward.</p>
+  </li>
+  <li>
+    <p>To discourage unnecessary iterations, a small -ve reward could be given everytime the agent decides to perform another iteration.</p>
+  </li>
+</ul>
diff --git a/_site/site/2017/07/17/Principled-Detection-of-Out-of-Distribution-Examples-in-Neural-Networks.html b/_site/site/2017/07/17/Principled-Detection-of-Out-of-Distribution-Examples-in-Neural-Networks.html
new file mode 100644
index 00000000..90ce6089
--- /dev/null
+++ b/_site/site/2017/07/17/Principled-Detection-of-Out-of-Distribution-Examples-in-Neural-Networks.html
@@ -0,0 +1,120 @@
+<h2 id="problem-statement">Problem Statement</h2>
+
+<ul>
+  <li>
+    <p>Given a pre-trained neural network, which is trained using data from some distribution P (referred to as in-distribution data), the task is to detect the examples coming from a distribution Q which is different from P (referred to as out-of-distribution data).</p>
+  </li>
+  <li>
+    <p>For example, if a digit recognizer neural network is trained using MNIST images, an out-of-distribution example would be images of animals.</p>
+  </li>
+  <li>
+    <p>Neural Networks can make high confidence predictions even in such cases where the input is unrecognisable or irrelevant.</p>
+  </li>
+  <li>
+    <p>The paper proposes <em>ODIN</em> which can detect such out-of-distribution examples without changing the pre-trained model itself.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1706.02690">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="odin">ODIN</h2>
+
+<ul>
+  <li>
+    <p>Uses 2 major techniques</p>
+
+    <ul>
+      <li><strong>Temperature Scaling</strong>
+        <ul>
+          <li>
+            <p>Softmax classifier for the classification network can be written as:</p>
+
+            <p><em>p<sub>i</sub>(x, T) = exp(f<sub>i</sub>(x)/T) / sum(exp(f<sub>j</sub>(x)/T))</em></p>
+          </li>
+        </ul>
+
+        <p>where <em>x</em> is the input, <em>p</em> is the softmax probability and <em>T</em> is the temperature scaling parameter.</p>
+
+        <ul>
+          <li>Increasing <em>T</em> (up to some extent) boosts the performance in distinguishing in-distribution and out-of-distribution examples.</li>
+        </ul>
+      </li>
+      <li><strong>Input Preprocessing</strong>
+        <ul>
+          <li>
+            <p>Add small perturbations to the input (image) before feeding it into the network.</p>
+          </li>
+          <li>
+            <p><em>x_perturbed = x - ε * sign(-δ<sub>x</sub>log(p<sub>y</sub>(x, T)))</em></p>
+          </li>
+        </ul>
+
+        <p>where ε is the perturbation magnitude</p>
+
+        <ul>
+          <li>The perturbations are such that softmax scores between in-distribution and out-of-distribution samples become separable.</li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>Given an input (image), first perturb the input.</li>
+  <li>Feed the perturbed input to the network to get its softmax score.</li>
+  <li>If the softmax score is greater than some threshold, mark the input as in-distribution and feed in the unperturbed version of the input to the network for classification.</li>
+  <li>Otherwise, mark the input as out-of-distribution.</li>
+  <li>For detailed mathematical treatment, refer section 6 and appendix in the <a href="https://arxiv.org/abs/1706.02690">paper</a></li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>Code available on <a href="https://github.com/ShiyuLiang/odin-pytorch">github</a></p>
+  </li>
+  <li>
+    <p>Models</p>
+
+    <ul>
+      <li>DenseNet with depth L = 100 and growth rate k = 12</li>
+      <li>Wide ResNet with depth = 28 and widen factor = 10</li>
+    </ul>
+  </li>
+  <li>
+    <p>In-Distribution Datasets</p>
+
+    <ul>
+      <li>CIFAR-10</li>
+      <li>CIFAR-100</li>
+    </ul>
+  </li>
+  <li>
+    <p>Out-of-Distribution Datasets</p>
+
+    <ul>
+      <li>TinyImageNet</li>
+      <li>LSUN</li>
+      <li>iSUN</li>
+      <li>Gaussian Noise</li>
+    </ul>
+  </li>
+  <li>
+    <p>Metrics</p>
+
+    <ul>
+      <li>False Positive Rate at 95% True Positive Rate</li>
+      <li>Detection Error - minimum misclassification probability over all thresholds</li>
+      <li>Area Under the Receiver Operating Characteristic Curve</li>
+      <li>Area Under the Precision-Recall Curve</li>
+    </ul>
+  </li>
+  <li>
+    <p>ODIN outperforms the baseline across all datasets and all models by a good margin.</p>
+  </li>
+</ul>
+
+<h2 id="notes">Notes</h2>
+
+<ul>
+  <li>Very simple and straightforward approach with theoretical justification under some conditions.</li>
+  <li>Limited to examples from Vision so can not judge its applicability for NLP tasks.</li>
+</ul>
diff --git a/_site/site/2017/07/24/ReasoNet-Learning-to-Stop-Reading-in-Machine-Comprehension.html b/_site/site/2017/07/24/ReasoNet-Learning-to-Stop-Reading-in-Machine-Comprehension.html
new file mode 100644
index 00000000..e2d5240e
--- /dev/null
+++ b/_site/site/2017/07/24/ReasoNet-Learning-to-Stop-Reading-in-Machine-Comprehension.html
@@ -0,0 +1,129 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>In the domain of machine comprehension, making multiple passes over the given document is an effective technique to extract the relation between the given passage, question and answer.</p>
+  </li>
+  <li>
+    <p>Unlike previous approaches, which perform a fixed number of passes over the passage, Reasoning Network (ReasoNet) uses reinforcement learning (RL) to decide how many times a document should be read.</p>
+  </li>
+  <li>
+    <p>Every time the document is read, ReasoNet determines whether the document should be read again or has the termination state been reached. If termination state is reached, the answer module is triggered to generate the answer.</p>
+  </li>
+  <li>
+    <p>Since the termination state is discrete and not connected to the final output, RL approach is used.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1609.05284">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="datasets">Datasets</h2>
+
+<ul>
+  <li>
+    <p>CNN, DailyMail Dataset</p>
+  </li>
+  <li>
+    <p>SQuAD</p>
+  </li>
+  <li>
+    <p>Graph Reachability Dataset</p>
+    <ul>
+      <li>2 synthetic datasets to test if the network can answer questions like “Is node_1 connected to node_12”?</li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="architecture">Architecture</h2>
+
+<ul>
+  <li>
+    <p><strong>Memory (M)</strong> - Comprises of the vector representation of the document and the question (encoded using GRU or other RNNs).</p>
+  </li>
+  <li>
+    <p><strong>Attention</strong> - Attention vector (<strong>x<sub>t</sub></strong>) is a function of current internal state <strong>s<sub>t</sub></strong> and external memory <strong>M</strong>. The state and memory are passed through FCs and fed to a similarity function.</p>
+  </li>
+  <li>
+    <p><strong>Internal State (s<sub>t</sub>)</strong> - Vector representation of the question state computed by a RNN using the previous internal state and the attention vector <strong>x<sub>t</sub></strong></p>
+  </li>
+  <li>
+    <p><strong>Termination Gate (T<sub>t</sub>)</strong> - Uses a logistic regression model to generate a random binary variable using the current internal state <strong>s<sub>t</sub></strong>.</p>
+  </li>
+  <li><strong>Answer</strong> - Answer module is triggered when <strong>T<sub>t</sub> = 1</strong>.
+    <ul>
+      <li>For CNN and DailyMail, a linear projection of GRU outputs is used to predict the answer from candidate entities.</li>
+      <li>For SQuAD, the position of the first and the last word from the answer span are predicted.</li>
+      <li>For Graph Reachability, a logistic regression module is used to predict yes/no as the answer.</li>
+    </ul>
+  </li>
+  <li>
+    <p><strong>Reinforcement Learning</strong> - For the RL setting, reward at time <strong>t</strong>, <strong>r<sub>t</sub></strong> = 1 if <strong>T<sub>t</sub></strong> = 1 and answer is correct. Otherwise <strong>r<sub>t</sub> = 0</strong></p>
+  </li>
+  <li>
+    <p><strong>Workflow</strong> - Given a passage p, query q and answer a:</p>
+
+    <ul>
+      <li>
+        <p>Extract memory using p</p>
+      </li>
+      <li>
+        <p>Extract initial hidden state using q</p>
+      </li>
+      <li>
+        <p>ReasoNet executes all possible episodes that can be enumerated by setting an upper limit on the number of passes.</p>
+      </li>
+      <li>
+        <p>These episodes generate actions and answers that are used to train the ReasoNet.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p><strong>Result</strong></p>
+
+    <ul>
+      <li>
+        <p>CNN, DailyMail Corpus</p>
+
+        <ul>
+          <li>ReasoNet outperforms all the baselines which use fixed number of reasoning steps and could benefit by capturing the word alignment signals between query and passage.</li>
+        </ul>
+      </li>
+      <li>
+        <p>SQuAD</p>
+
+        <ul>
+          <li>At the time of submission, ReasoNet was ranked 2nd on the <a href="https://rajpurkar.github.io/SQuAD-explorer/">SQuAD leaderboard</a> and as of 9th July 2017, it is ranked 4th.</li>
+        </ul>
+      </li>
+      <li>
+        <p>Graph Reachability Dataset</p>
+
+        <ul>
+          <li>
+            <p>ReasoNet - Standard ReasoNet as described above.</p>
+          </li>
+          <li>
+            <p>ReasoNet-Last - Use the prediction from the <strong>T<sub>max</sub></strong></p>
+          </li>
+          <li>
+            <p>ReasoNet &gt; ReasoNet-Last &gt; Deep LSTM Reader</p>
+          </li>
+          <li>
+            <p>ReasoNet converges faster than ReasoNet-Last indicating that the terminate gate is useful.</p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p><strong>Notes</strong></p>
+
+    <ul>
+      <li>As such there is nothing discouraging the ReasoNet to make unnecessary passes over the passage.</li>
+      <li>In fact, the modal value of the number of passes = upper bound on the number of passes.</li>
+      <li>This effect is more prominent for large graph indicating that the ReasoNet may try to play safe by performing extra passes.</li>
+      <li>It would be interesting to see if the network can be discouraged from making unnecessary passed by awarding a small negative reward for each pass.</li>
+    </ul>
+  </li>
+</ul>
diff --git a/_site/site/2017/08/07/R-NET-Machine-Reading-Comprehension-with-Self-matching-Networks.html b/_site/site/2017/08/07/R-NET-Machine-Reading-Comprehension-with-Self-matching-Networks.html
new file mode 100644
index 00000000..12467165
--- /dev/null
+++ b/_site/site/2017/08/07/R-NET-Machine-Reading-Comprehension-with-Self-matching-Networks.html
@@ -0,0 +1,90 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>R-NET is an end-to-end trained neural network model for machine comprehension.</p>
+  </li>
+  <li>
+    <p>It starts by matching the question and the given passage (using gated attention based RNN) to obtain question-aware passage representation.</p>
+  </li>
+  <li>
+    <p>Next, it uses a self-matching attention mechanism to refine the passage representation by matching the passage against itself.</p>
+  </li>
+  <li>
+    <p>Lastly, it uses pointer networks to determine the position of the answer in the passage.</p>
+  </li>
+  <li>
+    <p><a href="https://www.microsoft.com/en-us/research/publication/mrc/">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="datasets">Datasets</h2>
+
+<ul>
+  <li>
+    <p>SQuAD</p>
+  </li>
+  <li>
+    <p>MS-MARCO</p>
+  </li>
+</ul>
+
+<h2 id="architecture">Architecture</h2>
+
+<ul>
+  <li>
+    <p>Question / Passage Encoder</p>
+
+    <ul>
+      <li>Concatenate the word level and character level embeddings for each word and feed into a bidirectional GRU to obtain question and passage representation.</li>
+    </ul>
+  </li>
+  <li>
+    <p>Gated Attention based RNN</p>
+
+    <ul>
+      <li>
+        <p>Given question and passage representation, sentence pair representation is generated via soft-alignment of the words in the question and in the passage.</p>
+      </li>
+      <li>
+        <p>The newly added gate captures the relation between the question and the current passage word as only some parts of the passage are relevant for answering the given question.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Self Matching Attention</p>
+
+    <ul>
+      <li>
+        <p>The passage representation obtained so far would not capture most of the context.</p>
+      </li>
+      <li>
+        <p>So the current representation is matched against itself so as to collect evidence from the entire passage and encode the evidence relevant to the current passage word and question.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Output Layer</p>
+
+    <ul>
+      <li>
+        <p>Use pointer network (initialized using attention pooling over answer representation) to predict the position of the answer.</p>
+      </li>
+      <li>
+        <p>Loss function is the sum of negative log probabilities of start and end positions.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Results</p>
+
+    <ul>
+      <li>
+        <p>R-NET is ranked second on <a href="https://rajpurkar.github.io/SQuAD-explorer/">SQuAD Leaderboard</a> as of 7th August, 2017 and achieves best-published results on MS-MARCO dataset.</p>
+      </li>
+      <li>
+        <p>Using ideas like sentence ranking, using syntax information performing multihop inference and augmenting question dataset (using seqToseq network) do not help in improving the performance.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
diff --git a/_site/site/2017/08/21/Learning-to-Compute-Word-Embeddings-On-the-Fly.html b/_site/site/2017/08/21/Learning-to-Compute-Word-Embeddings-On-the-Fly.html
new file mode 100644
index 00000000..fe327be6
--- /dev/null
+++ b/_site/site/2017/08/21/Learning-to-Compute-Word-Embeddings-On-the-Fly.html
@@ -0,0 +1,79 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Word based language models suffer from the problem of rare or Out of Vocabulary (OOV) words.</p>
+  </li>
+  <li>
+    <p>Learning representations for OOV words directly on the end task often results in poor representation.</p>
+  </li>
+  <li>
+    <p>The alternative is to replace all the rare words with a single, unique representation (loss of information) or use character level models to obtain word representations (they tend to miss on the semantic relationship).</p>
+  </li>
+  <li>
+    <p>The paper proposes to learn a network that can predict the representations of words using auxiliary data (referred to as definitions) such as dictionary definitions, Wikipedia infoboxes, the spelling of the word etc.</p>
+  </li>
+  <li>
+    <p>The auxiliary data encoders are trained jointly with the end task to ensure that word representations align with the requirements of the end task.</p>
+  </li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<ul>
+  <li>
+    <p>Given a rare word <em>w</em>, let <em>d(w) = &lt;x<sub>1</sub>, x<sub>2</sub>…&gt;</em> denote its defination where <em>x<sub>i</sub></em> are words.</p>
+  </li>
+  <li>
+    <p><em>d(w)</em> is fed to a <em>defination reader</em> network <em>f</em> (LSTM) and its last state is used as the <em>defination embedding e<sub>d</sub>(w)</em></p>
+  </li>
+  <li>
+    <p>In case <em>w</em> has multiple definitions, the embeddings are combined using mean pooling.</p>
+  </li>
+  <li>
+    <p>The approach can be extended to in-vocabulary words as well by using the <em>definition embedding</em> of such words to update their original embeddings.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>Auxiliary data sources
+    <ul>
+      <li>Word definitions from WordNet</li>
+      <li>Spelling of words</li>
+    </ul>
+  </li>
+  <li>
+    <p>The proposed approach was tested on following tasks:</p>
+
+    <ul>
+      <li>Extractive Question Answering over SQuAD
+        <ul>
+          <li>Base model from <a href="https://arxiv.org/abs/1611.01604">Xiong et al. 2016</a></li>
+        </ul>
+      </li>
+      <li>Entailment Prediction over SNLI corpus
+        <ul>
+          <li>Base models from <a href="https://nlp.stanford.edu/pubs/snli_paper.pdf">Bowman et al. 2015</a> and <a href="https://arxiv.org/abs/1609.06038">Chen et al. 2016</a></li>
+        </ul>
+      </li>
+      <li>One Billion Words Language Modelling</li>
+    </ul>
+  </li>
+  <li>
+    <p>For all the tasks, models using both spelling and dictionary (SD) outperformed the model using just one.</p>
+  </li>
+  <li>While SD does not outperform the Glove model (with full vocabulary), it does bridge the performance gap significantly.</li>
+</ul>
+
+<h2 id="future-work">Future Work</h2>
+
+<ul>
+  <li>
+    <p>Multi-token words like “San Francisco” are not accounted for now.</p>
+  </li>
+  <li>
+    <p>The model does not handle the rare words which appear in the definition and just replaces them by the <UNK> token. Making the model recursive would be a useful addition.</UNK></p>
+  </li>
+</ul>
diff --git a/_site/site/2017/08/27/Pointer-Networks.html b/_site/site/2017/08/27/Pointer-Networks.html
new file mode 100644
index 00000000..b0419516
--- /dev/null
+++ b/_site/site/2017/08/27/Pointer-Networks.html
@@ -0,0 +1,64 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper introduces a novel architecture that generates an output sequence such that the elements of the output sequence are discrete tokens corresponding to positions in the input sequence.</p>
+  </li>
+  <li>
+    <p>Such a problem can not be solved using <a href="https://gist.github.com/shagunsodhani/a2915921d7d0ac5cfd0e379025acfb9f">Seq2Seq</a> or Neural Turing Machines as the size of the output softmax is variable (as it depends on the size of the input sequence).</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1506.03134">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="architecture">Architecture</h2>
+
+<ul>
+  <li>
+    <p>Traditional attention-base sequence-to-sequence models compute an attention vector for each step of the output decoder and use that to blend the individual context vectors of the input into a single, consolidated attention vector. This attention vector is used to compute a fixed size softmax.</p>
+  </li>
+  <li>
+    <p>In Pointer Nets, the normalized attention vector (over all the tokens in the input sequence) is normalized and treated as the softmax output over the input tokens.</p>
+  </li>
+  <li>
+    <p>So Pointer Net is a very simple modification of the attention model.</p>
+  </li>
+</ul>
+
+<h2 id="application">Application</h2>
+
+<ul>
+  <li>
+    <p>Any problem where the size of the output depends on the size of the input because of which fixed length softmax is ruled out.</p>
+  </li>
+  <li>
+    <p>eg combinatorial problems such as planar convex hull where the size of the output would depend on the size of the input.</p>
+  </li>
+</ul>
+
+<h2 id="evaluation">Evaluation</h2>
+
+<ul>
+  <li>
+    <p>The paper considers the following 3 problems:</p>
+
+    <ul>
+      <li>Convex Hull</li>
+      <li>Delaunay triangulations</li>
+      <li>Travelling Salesman Problem (TSP)</li>
+    </ul>
+  </li>
+  <li>
+    <p>Since some of the problems are NP hard, the paper considers approximate solutions whereever the exact solutions are not feasible to compute.</p>
+  </li>
+  <li>
+    <p>The authors used the exact same architecture and model parameters of all the instances of the 3 problems to show the generality of the model.</p>
+  </li>
+  <li>
+    <p>The proosed Pointer Nets outperforms LSTMs and LSTMs with attention and can generalise quite well for much larger sequences.</p>
+  </li>
+  <li>
+    <p>Interestingly, the order in which the inputs are fed to the system affects its performance. The authors discussed this apsect in their subsequent paper titled <a href="https://arxiv.org/pdf/1511.06391v4.pdf">Order Matters: Sequence To Sequence for Sets</a></p>
+  </li>
+</ul>
diff --git a/_site/site/2017/09/22/Refining-Source-Representations-with-Relation-Networks-for-Neural-Machine-Translation.html b/_site/site/2017/09/22/Refining-Source-Representations-with-Relation-Networks-for-Neural-Machine-Translation.html
new file mode 100644
index 00000000..ae052feb
--- /dev/null
+++ b/_site/site/2017/09/22/Refining-Source-Representations-with-Relation-Networks-for-Neural-Machine-Translation.html
@@ -0,0 +1,85 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>The paper introduces Relation Network (RN) that refines the encoding representation of the given source document (or sentence).</li>
+  <li>This refined source representation can then be used in Neural Machine Translation (NMT) systems to counter the problem of RNNs forgetting old information.</li>
+  <li><a href="https://arxiv.org/abs/1709.03980">Link to the paper</a></li>
+</ul>
+
+<h2 id="limitations-of-existing-nmt-models">Limitations of existing NMT models</h2>
+
+<ul>
+  <li>The RNN encoder-decoder architecture is the standard choice for NMT systems. But the RNNs are prone to forgetting old information.</li>
+  <li>In NMT models, the attention is modeled in the unit of words while the use of phrases (instead of words) would be a better choice.</li>
+  <li>While NMT systems might be able to capture certain relationships between words, they are not explicitly designed to capture such information.</li>
+</ul>
+
+<h2 id="contributions-of-the-paper">Contributions of the paper</h2>
+
+<ul>
+  <li>Learn the relationship between the source words using the context (neighboring words).</li>
+  <li>Relation Networks (RNs) build pairwise relations between source words using the representations generated by the RNNs. The RN would sit between the encoder and the attention layer of the encoder-decoder framework thereby keeping the main architecture unaffected.</li>
+</ul>
+
+<h2 id="relation-network">Relation Network</h2>
+
+<ul>
+  <li>Neural network which is desgined for relational reasoning.</li>
+  <li>Given a set of inputs * O = o<sub>1</sub>, …, o<sub>n</sub> *, RN is formed as a composition of inputs:
+   RN(O) = f(sum(g(o<sub>i</sub>, o<sub>j</sub>))), f and g are functions used to learn the relations (feed forward networks)</li>
+  <li><em>g</em> learns how the objects are related hence the name “relation”.</li>
+  <li><strong>Components</strong>:
+    <ul>
+      <li>CNN Layer
+        <ul>
+          <li>Extract information from the words surrounding the given word (context).</li>
+          <li>The final output of this layer is the sequence of vectors for different kernel width.</li>
+        </ul>
+      </li>
+      <li>Graph Propagation (GP) Layer
+        <ul>
+          <li>Connect all the words with each other in the form of a graph.</li>
+          <li>Each output vector from the CNN corresponds to a node in the graph and there is an edge between all possible pair of nodes.</li>
+          <li>The information flows between the nodes of the graph in a message passing sort of fashion (graph propagation) to obtain a new set of vectors for each node.</li>
+        </ul>
+      </li>
+      <li>Multi-Layer Perceptron (MLP) Layer
+        <ul>
+          <li>The representation from the GP Layer is fed to the MLP layer.</li>
+          <li>The layer uses residual connections from previous layers in form of concatenation.</li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="datasets">Datasets</h2>
+
+<ul>
+  <li>IWSLT Data - 44K sentences from tourism and travel domain.</li>
+  <li>NIST Data - 1M Chinese-English parallel sentence pairs.</li>
+</ul>
+
+<h2 id="models">Models</h2>
+
+<ul>
+  <li>MOSES - Open source translation system - http://www.statmt.org/moses/</li>
+  <li>NMT - Attention based NMT</li>
+  <li>NMT+ - NMT with improved decoder</li>
+  <li>TRANSFORMER - Google’s new NMT</li>
+  <li>RNMT+ - Relation Network integrated with NMT+</li>
+</ul>
+
+<h2 id="evaluation-metric">Evaluation Metric</h2>
+
+<ul>
+  <li>case-insensitive 4-gram BLEU score</li>
+</ul>
+
+<h2 id="observations">Observations</h2>
+
+<ul>
+  <li>As sentences become larger (more than 50 words), RNMT clearly outperforms other baselines.</li>
+  <li>Qualitative evaluation shows that RNMT+ model captures the word alignment better than the NMT+ models.</li>
+  <li>Similarly, NMT+ system tends to miss some information from the source sentence (more so for longer sentences). While both CNNs and RNNs are weak at capturing long-term dependency, using the relation layer mitigates this issue to some extent.</li>
+</ul>
diff --git a/_site/site/2017/10/01/Task-Oriented-Query-Reformulation-with-Reinforcement-Learning.html b/_site/site/2017/10/01/Task-Oriented-Query-Reformulation-with-Reinforcement-Learning.html
new file mode 100644
index 00000000..42ea0bcc
--- /dev/null
+++ b/_site/site/2017/10/01/Task-Oriented-Query-Reformulation-with-Reinforcement-Learning.html
@@ -0,0 +1,114 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>The paper introduces a query reformulation system that rewrites a query to maximise the number of “relevant” documents that are extracted from a given black box search engine.</li>
+  <li>A Reinforcement Learning (RL) agent selects the terms that are to be added to the reformulated query and the rewards are decided on the basis of document recall.</li>
+  <li><a href="https://arxiv.org/abs/1704.04572">Link to the paper</a></li>
+  <li><a href="https://github.com/nyu-dl/QueryReformulator">Implementation</a></li>
+</ul>
+
+<h2 id="key-aspect">Key Aspect</h2>
+
+<ul>
+  <li>The underlying problem is as follows: when the end user makes a query to a search engine, the engine often relies on word matching techniques to perform retrieval. This means relevant documents could be missed if there is no exactly matching words between the query and the document.</li>
+  <li>This problem can be handled at two levels: First, the search engine itself takes care of query semantics. Alternatively, we assume the search engine to be dumb and instead have a system in place that can improve the original queries (automatic query reformulation).</li>
+  <li>The paper takes the latter approach and expands the original query by adding terms from the set of retrieved documents (pseudo relevance feedback).</li>
+</ul>
+
+<h2 id="datasets">Datasets</h2>
+
+<ul>
+  <li>TREC - Complex Answer Retrieval (TREC-CAR)</li>
+  <li>Jeopardy Q&amp;A dataset</li>
+  <li>Microsoft Academic (MSA) dataset - created by the authors using papers crawled from Microsoft Academic API</li>
+</ul>
+
+<h2 id="framework">Framework</h2>
+
+<ul>
+  <li>Query Reformulation task is modeled as an RL problem where:
+    <ul>
+      <li>Environment is the search engine.</li>
+      <li>Actions are whether a word is to be added to the query or not and if yes, then what word is added.</li>
+      <li>Reward is the retrieval accuracy.</li>
+    </ul>
+  </li>
+  <li>The input to the system is a query q<sub>0</sub> consisting of a sequence of words w<sub>1</sub>, …, w<sub>n</sub> and a candidate term t<sub>i</sub> with some context words.</li>
+  <li>Candidate terms are all the terms that appear in the original query and the documents retrieved using the query.</li>
+  <li>The words are mapped to vectors and then a fixed size representation is obtained for the sequence using CNN’s or RNNs.</li>
+  <li>Similarly, a representation is obtained for the candidate words by feeding them and their context words to the CNN or RNNs.</li>
+  <li>Finally, a sigmoidal score is computed for all the candidate words.</li>
+  <li>An RNN sequentially applies this model to emit query words till an end token is emitted.</li>
+  <li>Vocabulary is used only from the extracted documents and not the entire vocabulary set, to keep the inference fast.</li>
+</ul>
+
+<h2 id="training">Training</h2>
+
+<ul>
+  <li>The model is trained using REINFORCE algorithm which minimizes the <em>C<sub>a</sub> = (R − R~) * sum(log(P(t|q))) where R~ is the baseline.</em></li>
+  <li>Value network minimises <em>C<sub>b</sub> = &amp;\alpha(||R-R~||<sup>2</sup>)</em></li>
+  <li><em>C<sub>a</sub></em> and <em>C<sub>b</sub></em> are minimised using SGD.</li>
+  <li>An entropy regulation term is added to prevent the probability distribution from reaching the peak.</li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<h3 id="baseline-methods">Baseline Methods</h3>
+
+<ul>
+  <li>
+    <p><strong>Raw</strong> - Original query is fed to the search engine without any modification.</p>
+  </li>
+  <li>
+    <p><strong>Pseudo-Relevance Feedback (PRF-TFIDF)</strong> - The query is expanded using the top-N TF-IDF terms.</p>
+  </li>
+  <li>
+    <p><strong>PRF-Relevance Model (PRF-RM)</strong> - Probability of adding token <em>t</em> to the query <em>q0</em> is given by <em>P(t|q0) = (1 − λ)P′(t|q0) + λ sum (P(d)P(t|d)P(q0|d))</em></p>
+  </li>
+</ul>
+
+<h3 id="proposed-methods">Proposed Methods</h3>
+
+<ul>
+  <li><strong>Supervised Learning</strong>
+    <ul>
+      <li>Assumes that the query words contribute indepently to the query retrival performace. (Too strong an assumption).</li>
+      <li>A term is marked as relevant if <em>(R(new_query) - R(old_query))/R(old_query) &gt; 0.005</em></li>
+    </ul>
+  </li>
+  <li><strong>Reinforcement Learning</strong>
+    <ul>
+      <li>RL-RNN/CNN - RL Framework + RNN/CNN to encode the input features.</li>
+      <li>RL-RNN-SEQ - Add a sequential generator.</li>
+    </ul>
+  </li>
+  <li><strong>Metrics</strong>
+    <ul>
+      <li>Recall@K</li>
+      <li>Precision@K</li>
+      <li>Mean Average Precision@K</li>
+    </ul>
+  </li>
+  <li>
+    <p><strong>Reward</strong> - The paper uses Recall@K as a reward when training the RL-based models with the argument that the “metric has shown to be effective in improving the other metrics as well”, without any justification though.</p>
+  </li>
+  <li>
+    <p><strong>SL-Oracle</strong> - classifier that perfectly selects terms that will increase performance based on the supervised learning approach.</p>
+  </li>
+  <li><strong>RL-Oracle</strong> - Produces a conservative upper-bound for the performance of the RL Agent. It splits the test data into N subsets and trains an RL agent for each subset. Then, the reward is averaged over all the N subsets.</li>
+</ul>
+
+<h2 id="observations">Observations</h2>
+
+<ul>
+  <li>Reformulation based methods &gt; original query</li>
+  <li>RL methods &gt; Supervised methods &gt; unsupervised methods</li>
+  <li>RL-RNN-SEQ performs slightly worse than RL-RNN but is much faster (as it produces shorter queries).</li>
+  <li>RL-based model benefits from more candidate terms while the classical PRF method quickly saturates.</li>
+</ul>
+
+<h2 id="comments">Comments</h2>
+
+<ul>
+  <li>Interestingly, for each raw query, they carried out the reformulation step just once and not multiple times. The number of times a query is reformulated could also have become a part of the RL framework.</li>
+</ul>
diff --git a/_site/site/2017/10/15/Reading-Wikipedia-to-Answer-Open-Domain-Questions.html b/_site/site/2017/10/15/Reading-Wikipedia-to-Answer-Open-Domain-Questions.html
new file mode 100644
index 00000000..c02b439d
--- /dev/null
+++ b/_site/site/2017/10/15/Reading-Wikipedia-to-Answer-Open-Domain-Questions.html
@@ -0,0 +1,74 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper presents a new machine comprehension dataset for question answering in real life setting (say when interacting with Cortana/Siri).</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1704.00051">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="unique-aspects-of-the-dataset">Unique Aspects of the dataset</h2>
+
+<ul>
+  <li>
+    <p>Existing machine comprehension (MC) datasets are either too small or synthetic (with a distribution different from that or real-questions posted by humans). MARCO questions are sampled from real, anonymized user queries.</p>
+  </li>
+  <li>
+    <p>Most datasets would provide a comparatively small and clean context to answer the question. In MARCO, the context documents (which may or may not contain the answer) are extracted using Bing from real-world documents. As such the questions and the context documents are noisy.</p>
+  </li>
+  <li>
+    <p>In general, the answer to the questions are restricted to an entity or text span within the document. In case of MARCO, the human judges are encouraged to generate complete sentences as answers.</p>
+  </li>
+</ul>
+
+<h2 id="dataset-description">Dataset Description</h2>
+
+<ul>
+  <li>
+    <p>First release consists of 100K questions with the aim of releasing 1M questions in the future releases.</p>
+  </li>
+  <li>
+    <p>All questions are tagged with segment information.</p>
+  </li>
+  <li>
+    <p>A subset of questions has multiple answers and another subset has no answers at all.</p>
+  </li>
+  <li>
+    <p>Each record in the dataset contains the following information:</p>
+
+    <ul>
+      <li><strong>Query</strong> - The actual question</li>
+      <li><strong>Passage</strong> - Top 10 contextual passages extracted from web search engine (which may or may not contain the answer to the question).</li>
+      <li><strong>Document URLs</strong> - URLs for the top documents (which are the source of the contextual passages).</li>
+      <li><strong>Answer</strong> - Answer synthesised by human evaluators.</li>
+      <li><strong>Segment</strong> - Query type, description, neumeric, entity, location, person.</li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="experimental-results">Experimental Results</h2>
+
+<ul>
+  <li>
+    <p>Metrics</p>
+
+    <ul>
+      <li>Accuracy and precision/recall for numeric questions</li>
+      <li>ROGUE-L/paraphrasing aware evaluation framework for long, textual answers.</li>
+    </ul>
+  </li>
+  <li>
+    <p>Among generative models, Memory Networks performed better than seq-to-seq.</p>
+  </li>
+  <li>
+    <p>In the cloze-style test, <a href="https://arxiv.org/abs/1609.05284">ReasoNet</a> achieved an accuracy of approx. 59% while <a href="ASR">Attention Sum Reader</a> achieved an accuracy of approx 55%.</p>
+  </li>
+  <li>
+    <p>Current QA systems (including the ones using memory and attention) derive their power from supervised data and are very different from how humans do reasoning.</p>
+  </li>
+  <li>
+    <p>Imagenet dataset pushed the state-of-the-art performance on object classification to beyond human accuracy. Similar was the case with speech recognition dataset from DARPA which led to the advancement of speech recognition. Having a large, diverse and human-like questions dataset is a fundamental requirement to advance the field and the paper aims to provide just the right kind of dataset.</p>
+  </li>
+</ul>
diff --git a/_site/site/2017/10/22/Swish-A-self-gated-activation-function.html b/_site/site/2017/10/22/Swish-A-self-gated-activation-function.html
new file mode 100644
index 00000000..8eafeb18
--- /dev/null
+++ b/_site/site/2017/10/22/Swish-A-self-gated-activation-function.html
@@ -0,0 +1,45 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper presents a new activation function called Swish with formulation <em>f(x) = x.sigmod(x)</em> and its parameterised version called Swish-β where <em>f(x, β) = 2x.sigmoid(β.x)</em> and β is a training parameter.</p>
+  </li>
+  <li>
+    <p>The paper shows that Swish is consistently able to outperform RELU and other activations functions over a variety of datasets (CIFAR, ImageNet, WMT2014) though by small margins only in some cases.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1710.05941">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="properties-of-swish">Properties of Swish</h2>
+
+<ul>
+  <li>
+    <p><img src="https://raw.githubusercontent.com/shagunsodhani/papers-I-read/master/assets/Swish/plot.png" alt="Plot Of Swish" /></p>
+  </li>
+  <li>
+    <p>Smooth, non-monotonic function.</p>
+  </li>
+  <li>
+    <p>Swish-β can be thought of as a smooth function that interpolates between a linear function and RELU.</p>
+  </li>
+  <li>
+    <p>Uses self-gating mechanism (that is, it uses its own value to gate itself). Gating generally uses multiple scalar inputs but since self-gating uses a single scalar input, it can be used to replace activation functions which are generally pointwise.</p>
+  </li>
+  <li>
+    <p>Being unbounded on the x&gt;0 side, it avoids saturation when training is slow due to near 0 gradients.</p>
+  </li>
+  <li>
+    <p>Being bounded below induces a kind of regularization effect as large, negative inputs are forgotten.</p>
+  </li>
+  <li>
+    <p>Since the Swish function is smooth, the output landscape and the loss landscape are also smooth. A smooth landscape should be more traversable and less sensitive to initialization and learning rates.</p>
+  </li>
+</ul>
+
+<h2 id="criticism">Criticism</h2>
+
+<ul>
+  <li>Swish is much more complicated than ReLU (when weighted against the small improvements that are provided) so it might not end up with as strong an adoption as ReLU.</li>
+</ul>
diff --git a/_site/site/2017/10/28/HARP-Hierarchical-Representation-Learning-for-Networks.html b/_site/site/2017/10/28/HARP-Hierarchical-Representation-Learning-for-Networks.html
new file mode 100644
index 00000000..e71db893
--- /dev/null
+++ b/_site/site/2017/10/28/HARP-Hierarchical-Representation-Learning-for-Networks.html
@@ -0,0 +1,55 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>HARP is an architecture to learn low-dimensional node embeddings by compressing the input graph into smaller graphs.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1706.07845">Link to the paper</a>.</p>
+  </li>
+  <li>
+    <p>Given a graph <em>G = (V, E)</em>, compute a series of successively smaller (coarse) graphs <em>G<sub>0</sub>, …, G<sub>L</sub></em>. Learn the node representations in <em>G<sub>L</sub></em> and successively refine the embeddings for larger graphs in the series.</p>
+  </li>
+  <li>
+    <p>The architecture is independent of the algorithms used to embed the nodes or to refine the node representations.</p>
+  </li>
+  <li>
+    <p><strong>Graph coarsening technique that preserves global structure</strong></p>
+
+    <ul>
+      <li>
+        <p>Collapse edges and stars to preserve first and second order proximity.</p>
+      </li>
+      <li>
+        <p><strong>Edge collapsing</strong> - select the subset of <em>E</em> such that no two edges are incident on the same vertex and merge their nodes into a single node and merge their edges as well.</p>
+      </li>
+      <li>
+        <p><strong>Star collapsing</strong> - given star structure, collapse the pairs of neighboring nodes (of the central node).</p>
+      </li>
+      <li>
+        <p>In practice, first apply star collapsing, followed by edge collapsing.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p><strong>Extending node representation from coarse graph to finer graph</strong></p>
+
+    <ul>
+      <li>
+        <p>Lets say <em>node1</em> and <em>node2</em> were merged into <em>node12</em> during coarsening. First copy the representation of <em>node12</em> into <em>node1</em>, <em>node2</em>.</p>
+      </li>
+      <li>
+        <p>Additionally, if hierarchical softmax was used, extend the B-tree such that <em>node12</em> is replaced by 2 child nodes <em>node1</em> and <em>node2</em>.</p>
+      </li>
+      <li>
+        <p>Time complexity for HARP + DeepWalk is <em>O(number of walks * |V|)</em> while for HARP + LINE is <em>O(number of iterations * |E|)</em>.</p>
+      </li>
+      <li>
+        <p>The asymptotic complexity remains the same as the HARP-less version for the two cases.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Multilabel classification task shows that HAR improves all the node embedding technique with gains up to 14%.</p>
+  </li>
+</ul>
diff --git a/_site/site/2017/11/05/Word-Representations-via-Gaussian-Embedding.html b/_site/site/2017/11/05/Word-Representations-via-Gaussian-Embedding.html
new file mode 100644
index 00000000..4c232c0b
--- /dev/null
+++ b/_site/site/2017/11/05/Word-Representations-via-Gaussian-Embedding.html
@@ -0,0 +1,62 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>Existing word embedding models like <a href="https://gist.github.com/shagunsodhani/176a283e2c158a75a0a6">Skip-Gram</a>, <a href="https://gist.github.com/shagunsodhani/efea5a42d17e0fcf18374df8e3e4b3e8">GloVe</a> etc map words to fixed sized vectors in a low dimensional vector space.</li>
+  <li>This fixed point setting cannot capture uncertainty about representation.</li>
+  <li>Further, these fixed point vectors are compared with measures like dot product and cosine similarity which are not suitable for capturing asymmetric properties like textual entailment and inclusion.</li>
+  <li>The paper proposes to learn Gaussian function embeddings (with diagonal covariance) for the word vectors.</li>
+  <li>This way, the words are mapped to soft regions in the embedding space which enables modeling uncertainty and asymmetric properties like inclusion and uncertainty.</li>
+  <li><a href="https://arxiv.org/abs/1412.6623">Link to the paper</a></li>
+  <li><a href="https://github.com/seomoz/word2gauss">Implementation</a></li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<ul>
+  <li>KL divergence is used as the asymmetric distance function for comparing the distributions.</li>
+  <li>Unlike the Word2Vec model, the proposed model uses ranking-based loss.</li>
+</ul>
+
+<h3 id="similarity-measures-used">Similarity Measures used</h3>
+
+<ul>
+  <li>
+    <p><strong>Symmetric Similarity</strong></p>
+  </li>
+  <li>For two gaussian distributions, <em>P<sub>i</sub></em> and <em>P<sub>j</sub></em>, compute the inner product <em>E(P<sub>i</sub>, P<sub>j</sub>)</em> as <em>N(0; mean<sub>i</sub> - mean<sub>j</sub>, sigma<sub>i</sub> + sigma<sub>j</sub>)</em>.</li>
+  <li>Compute the gradient of <em>mean</em> and <em>sigma</em> with respect to <em>log(E)</em>.</li>
+  <li>
+    <p>The resulting loss function can be interpreted as pushing the means closer which encouraging the two gaussians to be more concentrated.</p>
+  </li>
+  <li>
+    <p><strong>Asymmetric Similarity</strong></p>
+  </li>
+  <li>Use KL divergence to encode the context distribution.</li>
+  <li>The benefit over the symmetric setting is that now entailment type relations can also be modeled.</li>
+  <li>For example, a low KL divergence from x to y indicates that y can be encoded as x or that y “entails” x.</li>
+</ul>
+
+<h2 id="learning">Learning</h2>
+
+<ul>
+  <li>One of the two notions of similarity is chosen and max-margin is used as the loss function.</li>
+  <li>Mean is regularized by adding a simple constraint on the L2-norm.</li>
+  <li>For covariance matrix, the eigenvalues are constrained to lie within a hypercube. This ensures that the positive-definite property of the covariance matrix is maintained while having a constraint on the size.</li>
+</ul>
+
+<h2 id="observations">Observations</h2>
+
+<ul>
+  <li>Polysemous words have higher variance in their word embeddings as compared to specific words.</li>
+  <li>KL divergence (with diagonal covariance) outperforms other models.</li>
+  <li>Simple tree hierarchies can also be modeled by embedding into the Gaussian space. A Gaussian is created for each node with randomly initialized mean and the same set of embeddings is used for nodes and context.</li>
+  <li>For word similarity benchmarks, embeddings with spherical covariance have a slight edge over embeddings with diagonal covariance and outperform the Skip-Gram model in all the cases.</li>
+</ul>
+
+<h2 id="future-work">Future Work</h2>
+
+<ul>
+  <li>Use combinations of low rank and diagonal matrices for covariances.</li>
+  <li>Improved optimisation strategies.</li>
+  <li>Trying other distributions like Student’s-t distribution.</li>
+</ul>
diff --git a/_site/site/2017/11/12/Network-Motifs-Simple-Building-Blocks-of-Complex-Networks.html b/_site/site/2017/11/12/Network-Motifs-Simple-Building-Blocks-of-Complex-Networks.html
new file mode 100644
index 00000000..b9df89b0
--- /dev/null
+++ b/_site/site/2017/11/12/Network-Motifs-Simple-Building-Blocks-of-Complex-Networks.html
@@ -0,0 +1,34 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>The paper presents the concept of “network motifs” to understand the structural design of a network or a graph.</li>
+  <li><a href="http://science.sciencemag.org/content/298/5594/824">Link to the paper</a></li>
+</ul>
+
+<h2 id="idea">Idea</h2>
+
+<ul>
+  <li>
+    <p>A network motif is defined as “a pattern of inter-connections occurring in complex networks in numbers that are significantly higher than those in randomized networks”.</p>
+  </li>
+  <li>
+    <p>In the practical setting, given an input network, we first create randomized networks which have same single node characteristics (like a number of incoming and outgoing edges) as the input network.</p>
+  </li>
+  <li>
+    <p>The patterns that occur at a much higher frequency in the input graph (than the randomized graphs) are reported as motifs.</p>
+  </li>
+  <li>
+    <p>More specifically, the patterns for which the probability of appearing in a randomized network an equal or more number of times than in the real network is lower than a cutoff value (say 0.01).</p>
+  </li>
+</ul>
+
+<h2 id="motivation">Motivation</h2>
+
+<ul>
+  <li>
+    <p>Real-life networks exhibit properties like “small world” property ( the majority of nodes are within a distance of fewer than 7 hops from each other) and “scale-free” property (fraction of nodes having k edges decays as a power-law).</p>
+  </li>
+  <li>
+    <p>Motifs are one such structural property that is exhibited by networks in biochemistry, neurobiology, ecology, and engineering. Further, motifs shared by graphs of different domains are different which hints at the usefulness of motifs as a fundamental structural property of the graph and relates to the process of evolution of the graph.</p>
+  </li>
+</ul>
diff --git a/_site/site/2017/11/19/Higher-order-organization-of-complex-networks.html b/_site/site/2017/11/19/Higher-order-organization-of-complex-networks.html
new file mode 100644
index 00000000..9fbc589a
--- /dev/null
+++ b/_site/site/2017/11/19/Higher-order-organization-of-complex-networks.html
@@ -0,0 +1,62 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper presents a generalized framework for graph clustering (clusters of network motifs) on the basis of higher-order connectivity patterns.</p>
+  </li>
+  <li>
+    <p><a href="http://science.sciencemag.org/content/353/6295/163">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<ul>
+  <li>
+    <p>Given a <a href="https://shagunsodhani.in/papers-I-read/Network-Motifs-Simple-Building-Blocks-of-Complex-Networks">motif M</a>, the framework aims to find a cluster of the set of nodes S such that nodes of S participate in many instances of M and avoid cutting instances of M (that is only a subset of nodes in instances of M appears in S).</p>
+  </li>
+  <li>
+    <p>Mathematically, the aim is to minimise the motif conductance metric given as <em>cut<sub>M</sub>(S, S’) / min[vol<sub>M</sub>(S), vol<sub>M</sub>(S’)]</em> where <em>S’</em> is complement of <em>S</em>, <em>cut<sub>M</sub>(S, S’)</em> = number of instances of M which have atleast one node from both <em>S</em> and <em>S’</em> and <em>vol<sub>M</sub>(S)</em> = Number of nodes in instances of M that belong only to S.</p>
+  </li>
+  <li>
+    <p>Solving the above equation is computationally infeasible and an approximate solution is proposed using eigenvalues and matrices.</p>
+  </li>
+  <li>
+    <p>The approximate solution is easy to implement, efficient and guaranteed to find clusters that are at most a quadratic factor away from the optimal.</p>
+  </li>
+</ul>
+
+<h2 id="algorithm">Algorithm</h2>
+
+<ul>
+  <li>
+    <p>Given the network and motif M, form a motif adjacency matrix W<sub>M</sub> where W<sub>M</sub>(i, j) is the number of instances of M that contains i and j.</p>
+  </li>
+  <li>
+    <p>Compute spectral ordering of the nodes from normalized motif laplacian matrix.</p>
+  </li>
+  <li>
+    <p>Compute prefix set of spectral ordering with small motif conductance.</p>
+  </li>
+</ul>
+
+<h2 id="scalability">Scalability</h2>
+
+<ul>
+  <li>Worst case <em>O(m<sup>1.5</sup>)</em>, based on experiments <em>O(m<sup>1.2</sup>)</em> where <em>m</em> is the number of edges.</li>
+</ul>
+
+<h2 id="advantages">Advantages</h2>
+
+<ul>
+  <li>
+    <p>Applicable to directed, undirected and weighted graphs (allows for negative edge weights as well).</p>
+  </li>
+  <li>
+    <p>In case the motif is not known beforehand, the framework can be used to compute significant motifs.</p>
+  </li>
+  <li>
+    <p>The proposed framework unifies the two fundamental tools of network science (motif analysis and network partitioning) along with some worst-case guarantees for the approximations employed and can be extended to identify higher order modular organization of networks.</p>
+  </li>
+</ul>
+
diff --git a/_site/site/2017/11/28/Two-Stage-Synthesis-Networks-for-Transfer-Learning-in-Machine-Comprehension.html b/_site/site/2017/11/28/Two-Stage-Synthesis-Networks-for-Transfer-Learning-in-Machine-Comprehension.html
new file mode 100644
index 00000000..49bd9020
--- /dev/null
+++ b/_site/site/2017/11/28/Two-Stage-Synthesis-Networks-for-Transfer-Learning-in-Machine-Comprehension.html
@@ -0,0 +1,98 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>The paper proposes a two-stage synthesis network that can perform transfer learning for the task of machine comprehension.</li>
+  <li>
+    <p>The problem is the following:</p>
+
+    <ul>
+      <li>
+        <p>We have a domain D<sub>S</sub> for which we have labelled dataset of question-answer pairs and another domain D<sub>T</sub> for which we do not have any labelled dataset.</p>
+      </li>
+      <li>
+        <p>We use the data for domain D<sub>S</sub> to train SynNet and use that to generate synthetic question-answer pairs for domain D<sub>T</sub>.</p>
+      </li>
+      <li>
+        <p>Now we can train a machine comprehension model M on D<sub>S</sub> and finetune using the synthetic data for D<sub>T</sub>.</p>
+      </li>
+    </ul>
+  </li>
+  <li><a href="https://www.microsoft.com/en-us/research/publication/two-stage-synthesis-networks-transfer-learning-machine-comprehension/">Link to the paper</a></li>
+</ul>
+
+<h2 id="synnet">SynNet</h2>
+
+<ul>
+  <li>
+    <p>Works in two stages:</p>
+
+    <ul>
+      <li>Answer Synthesis - Given a text paragraph, generate an answer.</li>
+      <li>Question Synthesis - Given a text paragraph and an answer, generate a question.</li>
+    </ul>
+  </li>
+</ul>
+
+<h3 id="answer-synthesis-network">Answer Synthesis Network</h3>
+
+<ul>
+  <li>Given the labelled dataset for D<sub>S</sub>, generate a labelled dataset of &lt;word, tag&gt; pair such that each word in the given paragraph is assigned one of the 4 tags:
+    <ul>
+      <li>IOB<sub>start</sub> - if it is the starting word of an answer</li>
+      <li>IOB<sub>mid</sub> - if it is the intermediate word of an answer</li>
+      <li>IOB<sub>end</sub> - if it is the ending word of an answer</li>
+      <li>IOB<sub>none</sub> - if it is not part of any answer</li>
+    </ul>
+  </li>
+  <li>
+    <p>For training, map the words to their GloVe embeddings and pass through a Bi-LSTM. Next, pass them through two-FC layers followed by a softmax layer.</p>
+  </li>
+  <li>For the target domain D<sub>T</sub>, all the consecutive word spans where no label is IOB<sub>none</sub> are returned as candidate answers.</li>
+</ul>
+
+<h3 id="question-synthesis-network">Question Synthesis Network</h3>
+
+<ul>
+  <li>
+    <p>Given an input paragraph and a candidate answer, Question Synthesis network generates question one word at a time.</p>
+  </li>
+  <li>
+    <p>Map each word in the paragraph to their GloVe embedding. After the word vector, append a ‘1’ if the word was part of the candidate answer else append a ‘0’.</p>
+  </li>
+  <li>
+    <p>Feed to a Bi-LSTM network (encoder-decoder) where the decoder conditions on the representation generated by the encoder as well as the question tokens generated so far. Decoding is stopped when “END” token is produced.</p>
+  </li>
+  <li>
+    <p>The paragraph may contain some named entities or rare words which do not appear in the softmax vocabulary. To account for such words, a copying mechanism is also incorporated.</p>
+  </li>
+  <li>
+    <p>At each time step, a Pointer Network (C<sub>P</sub>) and a Vocabulary Predictor (V<sub>P</sub>) are used to generate probability distribution for the next word and a Latent Predictor Network is used to decide which of the two networks would be used for the prediction.</p>
+  </li>
+  <li>
+    <p>At inference time, a greedy decoding is used where the most likely predictor is chosen and then the most likely word from that predictor is chosen.</p>
+  </li>
+</ul>
+
+<h3 id="machine-comprehension-model">Machine Comprehension Model</h3>
+
+<ul>
+  <li>Given any MC model, first train it over domain D<sub>S</sub> and then fine-tune using the artificial questions generated using D<sub>T</sub>.</li>
+</ul>
+
+<h3 id="implementation-details">Implementation Details</h3>
+
+<ul>
+  <li>
+    <p><strong>Data Regularization</strong> - There is a need to alternate between mini batches from source and target domain while fine-tuning the MC model.</p>
+  </li>
+  <li>
+    <p>At inference time, the fine-tuned MC model is used to get the distribution P(i=start) and P(i=end) (corresponding to the likelihood of choosing word I as the starting or ending word for the answer) for all the words and DP is used to find the optimal answer span.</p>
+  </li>
+  <li>
+    <p><strong>Checkpoint Averaging</strong> - Use the different checkpointed models to average the answer likelihood before running DP.</p>
+  </li>
+  <li>
+    <p>Using the synthetically generated dataset helps to gain a 2% improvement in terms of F-score (from SQuAD -&gt; NewsQA). Using checkpointed models further improves the performance to overall 46.6% F score which closes the gap with respect to the performance of model trained on NewsQA itself (~52.3% F score)</p>
+  </li>
+</ul>
+
diff --git a/_site/site/2017/12/11/Revisiting-Semi-Supervised-Learning-with-Graph-Embeddings.html b/_site/site/2017/12/11/Revisiting-Semi-Supervised-Learning-with-Graph-Embeddings.html
new file mode 100644
index 00000000..7e8d636c
--- /dev/null
+++ b/_site/site/2017/12/11/Revisiting-Semi-Supervised-Learning-with-Graph-Embeddings.html
@@ -0,0 +1,86 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper presents a semi-supervised learning framework for graphs where the node embeddings are used to jointly predict both the class labels and neighbourhood context. Usually, graph embeddings are learnt in an unsupervised manner and can not leverage the supervising signal coming from the labelled data.</p>
+  </li>
+  <li>
+    <p>The framework is called <a href="https://github.com/kimiyoung/planetoid">Planetoid (Predicting Labels And Neighbors with Embeddings Transductively Or Inductively from Data)</a>.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1603.08861">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="problem-setting">Problem Setting</h2>
+
+<ul>
+  <li>
+    <p>Given a graph G = (V, E) and x<sub>L</sub> and x<sub>U</sub> as feature vectors for labelled and unlabelled nodes and y<sub>L</sub> as labels for the labelled nodes, the problem is to learn a mapping (classifier) f: x -&gt; y</p>
+  </li>
+  <li>
+    <p>There are two settings possible:</p>
+
+    <ul>
+      <li>
+        <p><strong>Transductive</strong> - Predictions are made only for those nodes which are already observed in the graph at training time.</p>
+      </li>
+      <li>
+        <p><strong>Inductive</strong> - Predictions are made for nodes whether they have been observed in the graph at training time or not.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<ul>
+  <li>
+    <p>The general semi-supervised learning loss would be <em>L<sub>S</sub> + λL<sub>U</sub></em> where <em>L<sub>S</sub></em> is the supervised learning loss while <em>L<sub>U</sub></em> is the unsupervised learning loss.</p>
+  </li>
+  <li>
+    <p>The unsupervised loss is a variant of the Skip-gram loss with negative edge sampling.</p>
+  </li>
+  <li>
+    <p>More specifically, first a random walk sequence S is sampled. Then either a positive edge is sampled from S (within a given context distance) or a negative edge is sampled.</p>
+  </li>
+  <li>
+    <p>The label information is injected by using the label as a context and minimising the distance between the positive edges (edges where the nodes have the same label) and maximising the distance between the negative edges (edges where the nodes have different labels).</p>
+  </li>
+</ul>
+
+<h3 id="transductive-formulation">Transductive Formulation</h3>
+
+<ul>
+  <li>
+    <p>Two separate fully connected networks are applied over the node features and node embeddings.</p>
+  </li>
+  <li>
+    <p>These 2 representations are then concatenated and fed to a softmax classifier to predict the class label.</p>
+  </li>
+</ul>
+
+<h3 id="inductive-formulation">Inductive Formulation</h3>
+
+<ul>
+  <li>
+    <p>In the inductive setting, it is difficult to obtain the node embeddings at test time. One naive approach is to retrain the network to obtain the embeddings on the previously unobserved nodes but that is inefficient.</p>
+  </li>
+  <li>
+    <p>The embeddings of node x are parameterized as a function of its input feature vector and is learnt by applying a fully connected neural network on the node feature vector.</p>
+  </li>
+  <li>
+    <p>This provides a simple way to extend the original approach to the inductive setting.</p>
+  </li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<ul>
+  <li>
+    <p>The proposed approach is evaluated in 3 settings (text classification, distantly supervised entity extraction and entity classification) and it consistently outperforms approaches that use just node features or node embeddings.</p>
+  </li>
+  <li>
+    <p>The key takeaway is that the joint training in the semi-supervised setting has several benefits over the unsupervised setting and that using the graph context (in terms of node embeddings) is much more effective than using graph Laplacian-based regularization term.</p>
+  </li>
+</ul>
diff --git a/_site/site/2017/12/24/PTE-Predictive-Text-Embedding-through-Large-scale-Heterogeneous-Text-Networks.html b/_site/site/2017/12/24/PTE-Predictive-Text-Embedding-through-Large-scale-Heterogeneous-Text-Networks.html
new file mode 100644
index 00000000..efc404a7
--- /dev/null
+++ b/_site/site/2017/12/24/PTE-Predictive-Text-Embedding-through-Large-scale-Heterogeneous-Text-Networks.html
@@ -0,0 +1,97 @@
+<h1 id="introduction">Introduction</h1>
+
+<ul>
+  <li>
+    <p>Unsupervised text embeddings can be generalized for different tasks but they have weaker predictive powers (as compared to end-to-end trained deep learning methods) for any particular task. But the deep learning techniques are expensive and need a large amount of supervised data and a large number of parameters to tune.</p>
+  </li>
+  <li>
+    <p>The paper introduces Predictive Text Embedding (PTE) - a semi-supervised approach which learns an effective low dimensional representation using a large amount of unsupervised data and a small amount of supervised data.</p>
+  </li>
+  <li>
+    <p>The work can be extended to general information networks as well as classic techniques like MDS, Iso-map, Laplacian EigenMaps etc do not scale well for large graphs.</p>
+  </li>
+  <li>
+    <p>Further, this model can be applied to heterogeneous networks as well unlike the previous works <a href="https://arxiv.org/abs/1503.03578">LINE</a> and <a href="https://arxiv.org/abs/1403.6652">DeepWalk</a> which work on homogeneous networks only.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1508.00200">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h1 id="approach">Approach</h1>
+
+<ul>
+  <li>
+    <p>The paper proposes 3 different kinds of networks:</p>
+
+    <ul>
+      <li><strong>Word-Word Network</strong> which captures the word co-occurrence information (local level).</li>
+      <li><strong>Word-Document Network</strong> which captures the word-document co-occurrence information (local + document level).</li>
+      <li><strong>Word-Label Network</strong> which captures the word-label co-occurrence information (bipartite graph).</li>
+    </ul>
+  </li>
+  <li>
+    <p>All 3 graphs are integrated into one heterogeneous text network.</p>
+  </li>
+  <li>
+    <p>First, the authors extend their previous work, LINE, for heterogenous bipartite text networks as explained:</p>
+
+    <ul>
+      <li>
+        <p>Given a bipartite graph <em>G = (V<sub>A</sub> \bigcup V<sub>B</sub>, E)</em> , where <em>V<sub>A</sub> and V<sub>B</sub></em> are disjoint set of vertices, the conditional probability of <em>v<sub>a</sub></em> (in set <em>V<sub>A</sub></em>) being generated by <em>v<sub>b</sub></em> (in set <em>V<sub>B</sub></em>) is given as the softmax score between embeddings of <em>v<sub>a</sub></em> and <em>v<sub>b</sub></em> and normalised by the sum of exponentials of dot products between <em>v<sub>b</sub></em>  and all nodes in <em>V<sub>A</sub></em>.</p>
+      </li>
+      <li>
+        <table>
+          <tbody>
+            <tr>
+              <td>The second order proximity can be determined by the conditional distributions *p(.</td>
+              <td>v<sub>j</sub>)*p(.</td>
+              <td>v<sub>j</sub>)*.</td>
+            </tr>
+          </tbody>
+        </table>
+      </li>
+      <li>
+        <p>The objective to be minimised the KL divergence between the conditional distribution <em>p(.\v<sub>j</sub>)</em> and the emperical distribution <em>p<sup>^</sup>(.\v<sub>j</sub>)</em> (given as w<sub>i, j</sub>/deg<sub>j</sub>).</p>
+      </li>
+      <li>The objective can be further simplified and optimised using SGD with edge sampling and negative sampling.</li>
+    </ul>
+  </li>
+  <li>
+    <p>Now, the 3 individual networks can all be interpreted as bipartite networks. So node representation of all the 3 individual networks is obtained as described above.</p>
+  </li>
+  <li>
+    <p>For the word-label network, since the training data is sparse, one could either train the unlabelled networks first and then the labelled network or they all could be trained jointly.</p>
+  </li>
+  <li>
+    <p>For the case of joint training, the edges are sampled from the 3 networks alternatively.</p>
+  </li>
+  <li>
+    <p>For the fine-tuning case, the edges are first sampled from the unlabelled network and then from the labelled network.</p>
+  </li>
+  <li>
+    <p>Once the word embeddings are obtained, the text embeddings may be obtained by simply averaging the word embeddings.</p>
+  </li>
+</ul>
+
+<h1 id="evaluation">Evaluation</h1>
+
+<ul>
+  <li>
+    <p><strong>Baseline Models</strong></p>
+
+    <ul>
+      <li>Local word co-occurence based methods - SkipGram, LINE(Gww)</li>
+      <li>Document word co-occurence based methods - LINE(Gwd), PV-DBOW</li>
+      <li>Combined method - LINE (Gww + Gwd)</li>
+      <li>CNN</li>
+      <li>PTE</li>
+    </ul>
+  </li>
+  <li>
+    <p>For long documents, PTE (joint) outperforms CNN and other PTE variants and is around 10 times faster than CNN model.</p>
+  </li>
+  <li>
+    <p>For short documents, PTE (joint) does not always outperform CNN model probably because the word sense ambiguity is more relevant in the short documents.</p>
+  </li>
+</ul>
diff --git a/_site/site/2017/12/31/Distilling-the-Knowledge-in-a-Neural-Network.html b/_site/site/2017/12/31/Distilling-the-Knowledge-in-a-Neural-Network.html
new file mode 100644
index 00000000..50c584ca
--- /dev/null
+++ b/_site/site/2017/12/31/Distilling-the-Knowledge-in-a-Neural-Network.html
@@ -0,0 +1,92 @@
+<h1 id="introduction">Introduction</h1>
+
+<ul>
+  <li>
+    <p>In machine learning, it is common to train a single large model (with a large number of parameters) or ensemble of multiple smaller models using the same dataset.</p>
+  </li>
+  <li>
+    <p>While such large models help to improve the performance of the system, they also make it difficult and computationally expensive to deploy the system.</p>
+  </li>
+  <li>
+    <p>The paper proposes to transfer the knowledge from such “cumbersome” models into a single, “simpler” model which is more suitable for deployment. This transfer of knowledge is referred to as “distillation”.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1503.02531">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h1 id="idea">Idea</h1>
+
+<ul>
+  <li>
+    <p>Train the cumbersome model using the given training data in the usual way.</p>
+  </li>
+  <li>
+    <p>Train the simpler, distilled model using the class probabilities (from the cumbersome model) as the soft target. Thus, the simpler model is trained to generalise the same way as the cumbersome model.</p>
+  </li>
+  <li>
+    <p>If the soft targets have high entropy, they provide much more information than the hard targets and the gradient (between training examples) would vary lesser.</p>
+  </li>
+  <li>
+    <p>One approach is to minimise the L2 difference between logits produced by the cumbersome model and the simpler model. This approach was pursued by <a href="https://www.cs.cornell.edu/~caruana/compression.kdd06.pdf">Buciluǎ et al.</a></p>
+  </li>
+  <li>
+    <p>The paper proposes a more general solution which they name “distillation”. The temperature of the final softmax is increased till the cumbersome model produces a set of soft targets (from the final softmax layer). These soft targets are then used to train the simpler model.</p>
+  </li>
+  <li>
+    <p>It also shows that the proposed approach is, in fact, a more general case of the first approach.</p>
+  </li>
+</ul>
+
+<h1 id="approach">Approach</h1>
+
+<ul>
+  <li>
+    <p>In the simplest setting, the cumbersome model is first trained with a high value of temperature and then the same temperature value is used to train the simpler model. The temperature is set to 1 when making predictions using the simpler model.</p>
+  </li>
+  <li>
+    <p>It helps to add an auxiliary objective function which corresponds to the cross-entropy loss with the correct labels. The second objective function should be given a much lower weight though. Further, the magnitude of the soft targets needs to be scaled by multiplying with the square of temperature.</p>
+  </li>
+</ul>
+
+<h1 id="experiment">Experiment</h1>
+
+<ul>
+  <li>
+    <p>The paper reports favourable results for distillation task for the following domains:</p>
+
+    <ul>
+      <li>
+        <p>Image Classification (on MNIST dataset)</p>
+
+        <ul>
+          <li>An extra experiment is performed where the simpler model is not shown any images of “3” but the model fails for only 133 cases out of 1010 cases involving “3”.</li>
+        </ul>
+      </li>
+      <li>
+        <p>Automatic Speech Recognition (ASR)</p>
+
+        <ul>
+          <li>
+            <p>An extra experiment is performed where the baseline model is trained using both hard targets and soft targets alternatively. Further, only 3% of the total dataset is used.</p>
+          </li>
+          <li>
+            <p>The model using hard targets overfits and has poor test accuracy while the model using soft targets does not overfit and gets much better test accuracy. This shows the regularizing effect of soft targets.</p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>Training ensemble specialists for very large datasets (JFT dataset - an internal dataset at Google)</p>
+
+        <ul>
+          <li>
+            <p>The experiment shows that while training a single large model would take a lot of time, the performance of the model can be improved by learning a small number of specialised networks (which are faster to train).</p>
+          </li>
+          <li>
+            <p>Though it is yet to be shown that the knowledge of such specialist models can be distilled back into a single model.</p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+</ul>
diff --git a/_site/site/2018/01/06/How-transferable-are-features-in-deep-neural-networks.html b/_site/site/2018/01/06/How-transferable-are-features-in-deep-neural-networks.html
new file mode 100644
index 00000000..bca26258
--- /dev/null
+++ b/_site/site/2018/01/06/How-transferable-are-features-in-deep-neural-networks.html
@@ -0,0 +1,99 @@
+<h1 id="introduction">Introduction</h1>
+
+<ul>
+  <li>
+    <p>When neural networks are trained on images, they tend to learn the same kind of features for the first layer (corresponding to Gabor filters or colour blobs). The first layer features are “general” irrespective of the task/optimizer etc.</p>
+  </li>
+  <li>
+    <p>The final layer features tend to be “specific” in the sense that they strongly depend on the task.</p>
+  </li>
+  <li>
+    <p>The paper studies the transition of generalization property across layers in the network. This could be useful in the domain of transfer learning where features are reused across tasks.</p>
+  </li>
+  <li>
+    <p><a href="http://papers.nips.cc/paper/5347-how-transferable-are-features-in-deep-neural-networks.pdf">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h1 id="setup">Setup</h1>
+
+<ul>
+  <li>
+    <p>Degree of generality of a set of features, learned on task A, is defined as the extent to which these features can be used for another task B.</p>
+  </li>
+  <li>
+    <p>Randomly split 1000 ImageNet classes into 2 groups (corresponding to tasks A and B). Each group has 500 classes and half the total number of examples.</p>
+  </li>
+  <li>
+    <p>Two 8-layer convolutional networks are trained on the two datasets and labelled as baseA and baseB respectively.</p>
+  </li>
+  <li>
+    <p>Now choose a layer numbered n from {1, 2…7}.</p>
+  </li>
+  <li>
+    <p>For each layer n, train the following two networks:</p>
+
+    <ul>
+      <li><strong>Selffer Network BnB</strong>
+        <ul>
+          <li>Copy (and freeze) first n layers from baseB. The remaining layers are initialized randomly and trained on B.</li>
+          <li>This serves as the control group.</li>
+        </ul>
+      </li>
+      <li><strong>Transfer Network AnB</strong>
+        <ul>
+          <li>Copy (and freeze) first n layers from baseA. The remaining layers are initialized randomly and trained on B.</li>
+          <li>This corresponds to transferring features from A to B.</li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>If AnB performs well, n<sup>th</sup> layer features are “general”.</p>
+  </li>
+  <li>
+    <p>In another setting, the transferred layers are also fine-tuned (BnB<sup>+</sup> and AnB<sup>+</sup>).</p>
+  </li>
+  <li>
+    <p>ImageNet dataset contains a hierarchy of classes which allow for creating the datasets A and B with high and low similarity.</p>
+  </li>
+</ul>
+
+<h1 id="observation">Observation</h1>
+
+<h2 id="dataset-a-and-b-are-similar">Dataset A and B are similar</h2>
+
+<ul>
+  <li>
+    <p>For n = {1, 2}, the performance of the BnB model is same as baseB model. For n = {3, 4, 5, 6}, the performance of BnB model is worse.</p>
+  </li>
+  <li>
+    <p>This indicates the presence of “fragile co-adaption” features on successive layers where features interact with each other in a complex way and can not be easily separated across layers. This is more prominent across middle layers and less across the first and the last layers.</p>
+  </li>
+  <li>
+    <p>For model AnB, the performance of baseB for n = {1, 2}. Beyond that, the performance begins to drop.</p>
+  </li>
+  <li>
+    <p>Transfer learning of features followed by fine-tuning gives better results than training the network from scratch.</p>
+  </li>
+</ul>
+
+<h2 id="dataset-a-and-b-are-dissimilar">Dataset A and B are dissimilar</h2>
+
+<ul>
+  <li>Effectiveness of feature transfer decreases as the two tasks become less similar.</li>
+</ul>
+
+<h2 id="random-weights">Random Weights</h2>
+
+<ul>
+  <li>
+    <p>Instead of using transferred weights in BnB and BnA, the first n layers were initialized randomly.</p>
+  </li>
+  <li>
+    <p>The performance falls for layer 1 and 2. It further drops to near-random level for layers 3 and beyond.</p>
+  </li>
+  <li>
+    <p>Another interesting insight is that even for dissimilar tasks, transferring features is better than using random features.</p>
+  </li>
+</ul>
diff --git a/_site/site/2018/01/14/Exploring-Models-and-Data-for-Image-Question-Answering.html b/_site/site/2018/01/14/Exploring-Models-and-Data-for-Image-Question-Answering.html
new file mode 100644
index 00000000..bd678269
--- /dev/null
+++ b/_site/site/2018/01/14/Exploring-Models-and-Data-for-Image-Question-Answering.html
@@ -0,0 +1,77 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p><strong>Problem Statement</strong>: Given an image, answer a given question about the image.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1505.02074">Link to the paper</a></p>
+  </li>
+  <li>
+    <p><strong>Assumptions</strong>:</p>
+    <ul>
+      <li>The answer is assumed to be a single word thereby bypassing the evaluation issues of multi-word generation tasks.</li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="vis-lstm-model">VIS-LSTM Model</h2>
+
+<ul>
+  <li>Treat the input image as the first word in the question.</li>
+  <li>Obtain the vector representation (skip-gram) for words in the question.</li>
+  <li>Obtain the VGG Net embeddings of the image and use a linear transformation (dimensionality reduction weight matrix) to match the dimensions of word embeddings.</li>
+  <li>Keep image embedding frozen during training and use an LSTM to combine the word vectors.</li>
+  <li>LSTM outputs are fed into a softmax layer which generates the answer.</li>
+</ul>
+
+<h2 id="dataset">Dataset</h2>
+
+<ul>
+  <li>DAtaset for QUestion Ansering on Real-world images (DAQUAR)
+    <ul>
+      <li>1300 images and 7000 questions with 37 object classes.</li>
+      <li>Downside is that even guess work can yield good results.</li>
+    </ul>
+  </li>
+  <li>The paper proposed an algorithm for generating questions using MS-COCO dataset.
+    <ul>
+      <li>Perform preprocessing steps like breaking large sentences and changing indefinite determines to definite ones.</li>
+      <li><em>object</em> questions, <em>number</em> questions, <em>colour</em> questions and <em>location</em> questions can be generated by searching for nouns, numbers, colours and prepositions respectively.</li>
+      <li>Resulting dataset has ~120K questions across above 4 semantic types.</li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="models">Models</h2>
+
+<ul>
+  <li>VIS+LSTM - explained above</li>
+  <li>2-VIS+BLSTM - Add the image features twice, in beginning and in the end (using different linear transformations) plus use bidirectional LSTM</li>
+  <li>IMG+BOW - Multinomial logistic regression on image features without dimensionality reduction + bag of words (averaging word vectors).</li>
+  <li>FULL - Simple average of above 2 models.</li>
+</ul>
+
+<h3 id="baseline">Baseline</h3>
+
+<ul>
+  <li>Includes models where the answer is guessed, or only image or question features are used or image features along with prior knowledge of object are used.</li>
+  <li>Also includes a KNN model where the system finds the nearest (image, question) pair.</li>
+</ul>
+
+<h3 id="metrics">Metrics</h3>
+
+<ul>
+  <li>Accuracy</li>
+  <li>Wu-Palmer similarity measure</li>
+</ul>
+
+<h2 id="observations">Observations</h2>
+
+<ul>
+  <li>The VIS-LSTM model outperforms the baselines while the FULL model benefits from averaging across all the models.</li>
+  <li>Some useful information seems to be lost when downsizing the VGG vectors.</li>
+  <li>Fine tuning the word vectors helps with performance.</li>
+  <li>Normalising CNN hidden image features into zero mean and unit variance leads to faster training.</li>
+  <li>Model does not perform well on the task of considering spatial relations between multiple objects and counting objects when multiple objects are present</li>
+</ul>
diff --git a/_site/site/2018/01/22/Emotional-Chatting-Machine-Emotional-Conversation-Generation-with-Internal-and-External-Memory.html b/_site/site/2018/01/22/Emotional-Chatting-Machine-Emotional-Conversation-Generation-with-Internal-and-External-Memory.html
new file mode 100644
index 00000000..26ce6549
--- /dev/null
+++ b/_site/site/2018/01/22/Emotional-Chatting-Machine-Emotional-Conversation-Generation-with-Internal-and-External-Memory.html
@@ -0,0 +1,79 @@
+<ul>
+  <li>
+    <p>The paper proposes ECM (Emotional Chatting Machine) which can generate both semantically and emotionally appropriate responses in a dialogue setting.</p>
+  </li>
+  <li>
+    <p>More specifically, given an input utterance or dialogue and the desired emotional category of the response, ECM is to generate an appropriate response that conforms to the given emotional category.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1704.01074">Link to the paper</a></p>
+  </li>
+  <li>
+    <p>Much of the recent, deep learning based work on conversational agents has focused on the use of encoder-decoder framework where the input utterance (given sequence of words) is mapped to a response utterance (target sequence of words). This is the so-called seq2seq family of models.</p>
+  </li>
+  <li>
+    <p>ECM model can sit within this framework and introduces 3 new components:</p>
+
+    <ul>
+      <li><strong>Emotion Category Embedding</strong>
+        <ul>
+          <li>Embed the emotion categories into a real-valued, low-dimensional vector space.</li>
+          <li>These embeddings are used as input to the decoder and are learnt along with rest of the model.</li>
+        </ul>
+      </li>
+      <li><strong>Internal Memory</strong>
+        <ul>
+          <li>Physiological, emotional responses are relatively short-lived and involve changes.</li>
+          <li>ECM accounts for this effect by adding an Internal Memory which captures this dynamics of emotions during decoding.</li>
+          <li>It starts with “full” emotions in the beginning and keeps decaying the emotion value over time.</li>
+          <li>How much of the emotion value is to be decayed is determined by a sigmoid gate.</li>
+          <li>By the time the sentence is decoded, the value becomes zero, signifying that the emotion has been completely expressed.</li>
+        </ul>
+      </li>
+      <li><strong>External Memory</strong>
+        <ul>
+          <li>Emotional responses are expected to carry emotionally strong words along with generic, neutral words.</li>
+          <li>An external memory is used to include the emotionally strong words explicitly by using 2 non-overlapping vocabularies - <em>generic</em> vocabulary and the <em>emotion</em> vocabulary (read from the external memory).</li>
+          <li>Both these vocabularies are assigned different generation probabilities and an output gate controls the weights of <em>generic</em> and <em>emotion</em> words.</li>
+          <li>This way the <em>emotion</em> words are included in an otherwise neutral response.</li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p><strong>Loss function</strong></p>
+
+    <ul>
+      <li>The first component is the cross-entropy loss between predicted and target token distribution.</li>
+      <li>A regularization term on internal memory to make sure the emotional state decays to 0 at the end of the decoding process.</li>
+      <li>Another regularization term on external memory to supervise the probability of selection of a <em>generic</em> vs <em>emotion</em> word.</li>
+    </ul>
+  </li>
+  <li><em>*Dataset</em>
+    <ul>
+      <li>STC Dataset (~220K posts and ~4300K responses) annotated by the emotional classifier. Any error on the part of the classifier degrades the quality of the training dataset.</li>
+      <li>NLPCC Dataset - Emotion classification dataset with 23105 sentences.</li>
+    </ul>
+  </li>
+  <li>
+    <p><strong>Metric</strong></p>
+
+    <ul>
+      <li>Perplexity to evaluate the model at the content level.</li>
+      <li>Emotion accuracy to evaluate the model at the emotional level.</li>
+    </ul>
+  </li>
+  <li>
+    <p>ECM achieves a perplexity of 65.9 and emotional accuracy of 0.773.</p>
+  </li>
+  <li>
+    <p>Based on human evaluations, ECM statistically outperforms the seq2seq baselines on both naturalness (likeliness of response being generated by a human) and emotion accuracy.</p>
+  </li>
+  <li>
+    <p>Notes</p>
+
+    <ul>
+      <li>It is an interesting idea to let the sigmoid gate decide how the emotion “value” be spent while decoding. It seems similar to the idea of how much do we want to “attend” to the emotion value the key difference being that your total attention is limited. It would be interesting to see the shape of the distribution of how much of the emotion value is spent at each decoding time step. If the curve is highly biased towards say using most of the emotion value towards the end of the decoding process, maybe another regularisation term is needed to ensure a more balanced distribution of how the emotion is spent.</li>
+    </ul>
+  </li>
+</ul>
diff --git a/_site/site/2018/01/29/StarSpace-Embed-All-The-Things.html b/_site/site/2018/01/29/StarSpace-Embed-All-The-Things.html
new file mode 100644
index 00000000..ce16a74e
--- /dev/null
+++ b/_site/site/2018/01/29/StarSpace-Embed-All-The-Things.html
@@ -0,0 +1,57 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper describes a general purpose neural embedding model where different type of entities (described in terms of discrete features) are embedded in a common vector space.</p>
+  </li>
+  <li>
+    <p>A similarity function is learnt to compare these entities in a meaningful way and score their similarity. The definition of the similarity function could depend on the downstream task where the embeddings are used.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1709.03856">Link to the paper</a></p>
+  </li>
+  <li>
+    <p><a href="https://github.com/facebookresearch/StarSpace">Link to the implementation</a></p>
+  </li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<ul>
+  <li>
+    <p>Each entity is described as a set of discrete features. For example, for the recommendation use case, the users may be described as a bag-of-words of movies they have liked. For the search use case, the document may be described as a bag-of-words of words they are made up of.</p>
+  </li>
+  <li>
+    <p>Given a dataset and a task at hand, generate a set of positive samples <em>E = (a, b)</em> such that <em>a</em> is the input to the task (from the dataset) and <em>b</em> is the expected label(answer/entity) for the given task.</p>
+  </li>
+  <li>
+    <p>Similarly, generate another set of negative samples <em>E <sup>-</sup> = (a, b<sub>i</sub><sup>-</sup>)</em> such that <em>b<sub>i</sub><sup>-</sup></em> is one of the incorrect label(answer/entity) for the given task. The incorrect entity can be sampled randomly from the set of candidate entities. Multiple incorrect samples could be generated for each positive example. These incorrect samples are indexed using <em>i</em>.</p>
+  </li>
+  <li>
+    <p>For example, in case of supervised learning problem like document classification, <em>a</em> would be one of the documents (probably described in terms of words), <em>b</em> is the correct label and <em>b<sub>i</sub><sup>-</sup>)</em> is one of the randomly sampled label from set of all the labels (excluding the correct label).</p>
+  </li>
+  <li>
+    <p>In case of collaborative filtering, <em>a</em> would be the user (either described as a discrete entity like a userid or in terms of items purchased so far), <em>b</em> is the next item the user purchases and <em>b<sub>i</sub><sup>-</sup>)</em> is one of the randomly sampled item from the set of all the items.</p>
+  </li>
+  <li>
+    <p>A similarity function is chosen to compare the representation of entities of type <em>a</em> and <em>b</em>. The paper considered cosine similarity and inner product and observed that cosine similarity works better for the case with a large number of entities.</p>
+  </li>
+  <li>
+    <p>A loss function compares the similarity between positive pairs <em>(a, b)</em> and <em>(a, b<sub>i</sub><sup>-</sup>)</em>. The paper considered margin ranking loss and negative log loss of softmax and reported that margin ranking loss works better.</p>
+  </li>
+  <li>
+    <p>The norm of embeddings is capped at 1.</p>
+  </li>
+</ul>
+
+<h2 id="observations">Observations</h2>
+
+<ul>
+  <li>
+    <p>The same model architecture is applied to a variety of tasks including multi-class classification, multi-label classification, collaborative filtering, content-based recommendation, link prediction, information retrieval, word embeddings and sentence embeddings.</p>
+  </li>
+  <li>
+    <p>The model provides a strong baseline on all the tasks and performs at par with much more complicated and task-specific networks.</p>
+  </li>
+</ul>
+
diff --git a/_site/site/2018/02/05/Get-To-The-Point-Summarization-with-Pointer-Generator-Networks.html b/_site/site/2018/02/05/Get-To-The-Point-Summarization-with-Pointer-Generator-Networks.html
new file mode 100644
index 00000000..c4500791
--- /dev/null
+++ b/_site/site/2018/02/05/Get-To-The-Point-Summarization-with-Pointer-Generator-Networks.html
@@ -0,0 +1,72 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p><a href="https://gist.github.com/shagunsodhani/a2915921d7d0ac5cfd0e379025acfb9f">Sequence-to-Sequence models</a> have made abstract summarization viable but they still suffer from issues like <em>out of vocabulary</em> words and repetitive sentences.</p>
+  </li>
+  <li>
+    <p>The paper proposes to overcome these limitations by using a hybrid Pointer-Generator network (to copy words from the source text) and a <em>coverage</em> vector that keeps track of content that has already been summarized so as to discourage repetition.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1704.04368">Link to the paper</a></p>
+  </li>
+  <li>
+    <p><a href="https://github.com/abisee/pointer-generator">Code</a></p>
+  </li>
+</ul>
+
+<h2 id="model">Model</h2>
+
+<h3 id="pointer-generator-network">Pointer Generator Network</h3>
+
+<ul>
+  <li>
+    <p>It is a hybrid model between the Sequence-to-Sequence network and <a href="https://shagunsodhani.in/papers-I-read/Pointer-Networks">Pointer Network</a> such that when generating a word, the model decides whether the word would be generated using the softmax vocabulary (Sequence-to-Sequence) or using the source vocabulary (Pointer Network).</p>
+  </li>
+  <li>
+    <p>Since the model can choose a word from the source vocabulary, the issue of <em>out of vocabulary</em> words is handled.</p>
+  </li>
+</ul>
+
+<h3 id="coverage-mechanism">Coverage Mechanism</h3>
+
+<ul>
+  <li>
+    <p>The model maintains a <em>coverage</em> vector which is the sum of attention distributions over all previous decoder timesteps.</p>
+  </li>
+  <li>
+    <p>This <em>coverage</em> vector is fed as an input to the attention mechanism.</p>
+  </li>
+  <li>
+    <p>A <em>coverage loss</em> is added to prevent the model from repeatedly attending to the same word.</p>
+  </li>
+  <li>
+    <p>The idea is to capture how much coverage different words have already received from the attention mechanism.</p>
+  </li>
+</ul>
+
+<h2 id="observation">Observation</h2>
+
+<ul>
+  <li>
+    <p>Model when evaluated on CNN/Daily Mail summarization task, outperforms the state-of-the-art by at least 2 ROUGE points though it still does not outperform the lead-3 baseline.</p>
+  </li>
+  <li>
+    <p>Lead-3 baseline uses first 3 sentences as the summary of the article which should be a strong baseline given that the dataset is actually about news articles.</p>
+  </li>
+  <li>
+    <p>The model is initially trained without coverage and then finetuned with the coverage loss.</p>
+  </li>
+  <li>
+    <p>During training, the model first learns how to copy words and then how to generate words (p<sup>gen</sup> starts from 0.3 and converges to 0.53).</p>
+  </li>
+  <li>
+    <p>During testing, the model strongly prefers copying over generating (p<sup>gen</sup> = 0.17).</p>
+  </li>
+  <li>
+    <p>Further, whenever the model is at beginning of sentences or at the join between switched-together fragments, it prefers to generate a word instead of copying one from the source language.</p>
+  </li>
+  <li>
+    <p>The overall model is very simple, neat and interpretable and also performs well in practice.</p>
+  </li>
+</ul>
diff --git a/_site/site/2018/02/11/Stylistic-Transfer-in-Natural-Language-Generation-Systems-Using-Recurrent-Neural-Networks.html b/_site/site/2018/02/11/Stylistic-Transfer-in-Natural-Language-Generation-Systems-Using-Recurrent-Neural-Networks.html
new file mode 100644
index 00000000..8676492f
--- /dev/null
+++ b/_site/site/2018/02/11/Stylistic-Transfer-in-Natural-Language-Generation-Systems-Using-Recurrent-Neural-Networks.html
@@ -0,0 +1,48 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li><a href="https://aclweb.org/anthology/W/W16/W16-6010.pdf">This workshop paper</a> explores the problem of style transfer in natural language generation (NLG).</li>
+  <li>One possible manifestation would be rewriting technical articles in an easy-to-understate manner.</li>
+</ul>
+
+<h2 id="challenges">Challenges</h2>
+
+<ul>
+  <li>Identifying relevant stylistic cues and using them to control text generation in NLG systems.</li>
+  <li>Absence of a large amount of training data.</li>
+</ul>
+
+<h2 id="pitch">Pitch</h2>
+
+<ul>
+  <li>Using Recurrent Neural Networks (RNNs) to disentangle the style from semantic content.</li>
+  <li>Autoencoder model with two components - one for learning style and another for learning content.</li>
+  <li>This allows for “style” component to be replaced while keeping the “content” component same, resulting in a style transfer.</li>
+  <li>One way to think about this is - the encoder generates a 100-dimensional vector. In this, the first 50 entries, correspond to the “style” component and remaining to the “content” component.</li>
+  <li>The proposal is that the loss function should be modified to include a cross-covariance term for ensuring disentanglement.</li>
+  <li>I think one way of doing this is to have two loss functions:
+    <ul>
+      <li>The <strong>first loss</strong> function ensures that the input sentence is decoded properly into the target sentence. This loss is computed for each sentence.</li>
+      <li>The <strong>second loss</strong> ensures that the first 50 entries across all the encoded represenations are are correlated. This loss operates at the batch level.</li>
+      <li>The <strong>total loss</strong> is the weighted sum of these 2 losses.</li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="possible-datasets">Possible Datasets</h2>
+
+<ul>
+  <li><a href="http://norvig.com/ngrams/shakespeare.txt">Complete works of Shakespeare</a></li>
+  <li><a href="https://www.kaggle.com/c/wikichallenge/data">Wikpedia Kaggle dataset</a></li>
+  <li><a href="https://ota.ox.ac.uk/">Oxford Text Archive</a></li>
+  <li>Twitter data</li>
+</ul>
+
+<h2 id="possible-metrics">Possible Metrics</h2>
+
+<ul>
+  <li>Soundness - is the generated text entailed with the input sentence.</li>
+  <li>Coherence - free of grammatical errors, proper word usage etc.</li>
+  <li>Effectiveness - how effective was the style transfer</li>
+  <li>Since some of the metrics are subjective, human evaluators also need to be employed.</li>
+</ul>
diff --git a/_site/site/2018/02/17/Neural-Relational-Inference-for-Interacting-Systems.html b/_site/site/2018/02/17/Neural-Relational-Inference-for-Interacting-Systems.html
new file mode 100644
index 00000000..cccd236d
--- /dev/null
+++ b/_site/site/2018/02/17/Neural-Relational-Inference-for-Interacting-Systems.html
@@ -0,0 +1,97 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper presents Neural Relational Inference (NRI) model which can infer underlying interactions in a dynamical system in an unsupervised manner, using just the observational data in terms of the trajectories.</p>
+  </li>
+  <li>
+    <p>For instance, consider a simulated system where the particles are connected to each other by springs. The observational data does not explicitly specify which particles are connected to each other and only contains information like position and velocity of each particle at different timesteps.</p>
+  </li>
+  <li>
+    <p>The task is to explicitly infer the interaction structure (in this example, which pair of particles are connected to each other) while learning the dynamical model of the system itself.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1802.04687">Link to the paper</a></p>
+  </li>
+  <li>
+    <p><a href="https://github.com/ethanfetaya/nri">Link to the implementation</a></p>
+  </li>
+</ul>
+
+<h2 id="model">Model</h2>
+
+<ul>
+  <li>
+    <p>The model consists of an encoder that encodes the given trajectories into an interaction graph and a decoder that decodes the dynamical model given the interaction graph.</p>
+  </li>
+  <li>
+    <p>The model starts by assuming that a full connected interaction graph exists between the objects in the system.</p>
+  </li>
+  <li>
+    <p>For this latent graph <strong>z</strong>, <em>z<sub>i, j</sub></em> denotes the (discrete) edge type between object <em>v<sub>i</sub></em> and <em>v<sub>j</sub></em> with the assumption that there are <em>K</em> edge types.</p>
+  </li>
+  <li>
+    <p>The object <em>v<sub>i</sub></em> has a feature vector <em>x<sub>i</sub><sup>t</sup></em> associated with it at time <em>t</em>. This feature vector captures information like location and velocity.</p>
+  </li>
+</ul>
+
+<h3 id="encoder">Encoder</h3>
+
+<ul>
+  <li>
+    <p>A Graph Neural Network (GNN) acts on the fully connected latent graph <em>z</em>, performs message passing from node to node via edges and predicts the discrete label for each edge.</p>
+  </li>
+  <li>
+    <p>The GNN architecture may itself use MLPs or ConvNets and returns a factorised distribution over the edge types <em>q<sub>φ</sub>(z|x)</em>.</p>
+  </li>
+</ul>
+
+<h3 id="decoder">Decoder</h3>
+
+<ul>
+  <li>
+    <p>The decoder is another GNN (with separate params for each edge type) that predicts the future dynamics of the system and returns <em>p<sub>θ</sub>(x|z)</em>.</p>
+  </li>
+  <li>The overall model is a VAE that optimizes the ELBO given as:</li>
+  <li>
+    <p>E<sub>q<sub>φ</sub>(z|x)</sub>[log p<sub>θ</sub>(x|z)] − KL[q<sub>φ</sub>(z|x)||p<sub>θ</sub>(z)]</p>
+  </li>
+  <li>
+    <p><em>p<sub>θ</sub>(x)</em> is the prior which is assumed to be uniform distribution over the edge types.</p>
+  </li>
+  <li>
+    <p>Instead of predicting the dynamics of the system for just the next timestep, the paper chooses to use the prediction multiple steps (10) in the future. This ensures that the interactions can have a significant effect on the dynamics of the system.</p>
+  </li>
+  <li>In some cases, like real humans playing a physical sport, the dynamics of the system need not be Markovian and a recurrent decoder is used to model the time dependence.</li>
+</ul>
+
+<h2 id="pipeline">Pipeline</h2>
+
+<ul>
+  <li>
+    <p>Given the dynamical system, run the encoder to obtain <em>q<sub>φ</sub>(z|x)</em>.</p>
+  </li>
+  <li>
+    <p>Sample <em>z<sub>i, j</sub></em> from <em>q<sub>φ</sub>(z|x)</em>.</p>
+  </li>
+  <li>
+    <p>Run the decoder to predict the future dynamics for the next T timesteps.</p>
+  </li>
+  <li>
+    <p>Optimise the ELBO loss.</p>
+  </li>
+  <li>
+    <p>Note that since the latent variables (edge labels) are discrete in this case, the sampling is done from a continuous approximation of the discrete distribution and reparameterization trick is applied over this discrete approximation to get the (biased) gradients.</p>
+  </li>
+</ul>
+
+<h2 id="observations">Observations</h2>
+
+<ul>
+  <li>
+    <p>Experiments are performed using simulated systems like particles connected to springs, phase coupled oscillators and charged particles and using real-world data like CMU Motion Capture database and NBA tracking data.</p>
+  </li>
+  <li>
+    <p>The NRI system effectively predicts the dynamics of the systems and is able to reconstruct the ground truth interaction graph (for simulated systems).</p>
+  </li>
+</ul>
diff --git a/_site/site/2018/02/24/Learning-a-SAT-Solver-from-Single-Bit-Supervision.html b/_site/site/2018/02/24/Learning-a-SAT-Solver-from-Single-Bit-Supervision.html
new file mode 100644
index 00000000..bee8f3c8
--- /dev/null
+++ b/_site/site/2018/02/24/Learning-a-SAT-Solver-from-Single-Bit-Supervision.html
@@ -0,0 +1,95 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper presents NeuroSAT, a message passing neural network that is trained to predict if a given SAT can be solved. As a side effect of training, the model also learns how to solve the SAT problem itself without any extra supervision.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1802.03685">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="background">Background</h2>
+
+<ul>
+  <li>
+    <p>Given an expression in the propositional logic, the task is to predict if there exists a substitution of variables that make the expression true.</p>
+  </li>
+  <li>
+    <p>The expression itself can be written as a conjunction of disjunctions (“and” over “or”) where each conjunct is called a clause and each variable within a clause is called a literal.</p>
+  </li>
+  <li>
+    <p>Invariants</p>
+
+    <ul>
+      <li>
+        <p>The variables or clauses or literals (within the clauses) can be permuted.</p>
+      </li>
+      <li>
+        <p>Every occurrence of a variable can be negated.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="model">Model</h2>
+
+<ul>
+  <li>
+    <p>Given the SAT problem,  create an undirected graph of literals, their negations and the clauses they belong to.</p>
+  </li>
+  <li>
+    <p>Put an edge between every literal and the clause to which it belongs and another kind of edge between every literal and its negation.</p>
+  </li>
+  <li>
+    <p>Perform message passing between nodes to obtain vector representations corresponding to each node. Specifically, first, each clause received a message from its neighbours (literals) and updates its embeddings. Then every literal receives a message from its neighbours (both literals and clauses) and updates its embeddings.</p>
+  </li>
+  <li>
+    <p>After T iterations, the nodes vote to decide the prediction of the model as a whole.</p>
+  </li>
+  <li>
+    <p>The model is trained end-to-end using the cross-entropy loss between logit and the true label.</p>
+  </li>
+  <li>
+    <p>Permutation invariance is ensured by operating on the nodes and the edges in the topological order and negation invariance is ensured by treating all literals as the same.</p>
+  </li>
+</ul>
+
+<h2 id="decoding-satisfying-assignment">Decoding Satisfying Assignment</h2>
+
+<ul>
+  <li>
+    <p>The most interesting aspect of this work is that even though the model was trained to predict if the SAT problem can be satisfied, it is actually possible to extract the correct assignment from the classifier.</p>
+  </li>
+  <li>
+    <p>In the early iterations, all the nodes vote “unsolvable” with low confidence. Then a few nodes start voting “solvable” and then a phase transition happens where most of the nodes start voting “solvable” with high confidence.</p>
+  </li>
+  <li>
+    <p>The model never becomes highly confident that problem is “unsolvable” and almost never guesses “solvable” on an “unsolvable” problem. So in some sense, the model is looking for the combination of literals that actually solves the problem.</p>
+  </li>
+  <li>
+    <p>The authors found that the 2 dimensional PCA projections of the literal embeddings are initially mixed up but become more and more linearly separable as the phase transition happens.</p>
+  </li>
+  <li>
+    <p>Based on this insight, the authors propose to obtain cluster centres C1 and C2, partition the variables according to the cluster centres and then try assignments from both the partitions.</p>
+  </li>
+  <li>
+    <p>This alone provides a satisfying solution in over 70% of the cases when though there is no explicit supervising signal about how to solve the problem.</p>
+  </li>
+  <li>
+    <p>The other strengths of the paper includes</p>
+
+    <ul>
+      <li>
+        <p>Generalizing to longer and more difficult SAT problems (than those seen during training).</p>
+      </li>
+      <li>
+        <p>Generalizing to another kind of search problems like graph colouring, clique detection etc (over small random graphs).</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>The paper also reports that by adding supervising signal about which clauses in the given expression are unsatisfiable, it is possible to decode the literals which prove the “unsatisfiability” of an expression at test time. Though not a lot of details have been provided about this part and would probably be covered in the next iteration of the paper.</p>
+  </li>
+</ul>
+
diff --git a/_site/site/2018/03/05/An-Empirical-Investigation-of-Catastrophic-Forgetting-in-Gradient-Based-Neural-Networks.html b/_site/site/2018/03/05/An-Empirical-Investigation-of-Catastrophic-Forgetting-in-Gradient-Based-Neural-Networks.html
new file mode 100644
index 00000000..57c40a98
--- /dev/null
+++ b/_site/site/2018/03/05/An-Empirical-Investigation-of-Catastrophic-Forgetting-in-Gradient-Based-Neural-Networks.html
@@ -0,0 +1,80 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p><em>Catastrophic Forgetting</em> refers to the phenomenon where when a learning system is trained on two tasks in succession, it may forget how to perform the first task.</p>
+  </li>
+  <li>
+    <p>The paper investigates this behaviour for different learning activations in presence and absence of dropout.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1312.6211">Link to the paper</a></p>
+  </li>
+  <li>
+    <p><a href="https://github.com/goodfeli/forgetting">Link to the implementation</a></p>
+  </li>
+</ul>
+
+<h2 id="experiment-formulation">Experiment Formulation</h2>
+
+<ul>
+  <li>
+    <p>For each experiment, two tasks are defined - “old” task and “new” task.</p>
+  </li>
+  <li>
+    <p>The network is first trained on the “old” task until the validation set error has not improved for the last 100 epochs.</p>
+  </li>
+  <li>
+    <p>The “best” performing model is then trained for the “new” task until the combined error on the “old” and the “new” validation datasets has not improved in the last 100 epochs.</p>
+  </li>
+  <li>
+    <p>All the tasks used the same model architecture - 2 hidden layers followed by a softmax layer.</p>
+  </li>
+  <li>Following activations were tested:
+    <ul>
+      <li>Sigmoid</li>
+      <li>ReLU</li>
+      <li>Hard Local Winner Takes It All</li>
+      <li>Maxout</li>
+    </ul>
+  </li>
+  <li>
+    <p>Models were trained using SGD with or without dropout.</p>
+  </li>
+  <li>
+    <p>For each combination of the model, activation and the training mechanism, a random hyper param search was performed with set of 25 hyperparams.</p>
+  </li>
+  <li>The authors took care to keep the hyperparams and other settings consistent and comparable across different experiments. Deviations, wherever applicable, and their reasons were documented.</li>
+</ul>
+
+<h2 id="observations">Observations</h2>
+
+<ul>
+  <li>
+    <p>In terms of the relationship between the “old” and the “new” tasks, three kinds of settings are considered:</p>
+
+    <ul>
+      <li>
+        <p>The tasks are very very similar but the input is processed in a different format. For this setting, MNIST dataset was used with a different permutation of pixels for the “old” and the “new” task.</p>
+      </li>
+      <li>
+        <p>The tasks are similar but not exactly the same. For this setting, the task was to predict sentiments of reviews across 2 different product categories.</p>
+      </li>
+      <li>
+        <p>In the last setting, 2 dissimilar tasks were used. One task was to predict sentiment of reviews and another task was to perform classification over MNIST dataset (reduced to 2 classes).</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Using Dropout improved the overall validation performance for all the models for all the tasks.</p>
+  </li>
+  <li>
+    <p>Using Dropout also increase the size of the optimal model across all the activations indicating that maybe the increased size of the model could explain the increased resistance to forgetting. It would have been interesting to check if dropout always selected the largest model possible given the set of the hyperparams.</p>
+  </li>
+  <li>
+    <p>On the dissimilar task, dropout improved the performance while reducing the model size so it might have other properties as well that helps to prevent forgetting.</p>
+  </li>
+  <li>
+    <p>As compared to the choice of training technique, the activation function has a less consistent effect on resistance to forgetting. The paper recommends performing cross-validation for the choice of the activation function. If that is not feasible, maxout activation function with dropout could be used.</p>
+  </li>
+</ul>
diff --git a/_site/site/2018/03/11/Improving-Information-Extraction-by-Acquiring-External-Evidence-with-Reinforcement-Learning.html b/_site/site/2018/03/11/Improving-Information-Extraction-by-Acquiring-External-Evidence-with-Reinforcement-Learning.html
new file mode 100644
index 00000000..7633f1cb
--- /dev/null
+++ b/_site/site/2018/03/11/Improving-Information-Extraction-by-Acquiring-External-Evidence-with-Reinforcement-Learning.html
@@ -0,0 +1,120 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Information Extraction  - Given a query to be answered and an external search engine, information extraction entails the task of issuing search queries, extracting information from new sources and reconciling the extracted values till we are sufficiently confident about the extracted values.</p>
+  </li>
+  <li>
+    <p>The paper proposes the use of Reinforcement Learning (RL) to solve this task.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1603.07954">Link to the paper</a></p>
+  </li>
+  <li>
+    <p><a href="https://github.com/karthikncode/DeepRL-InformationExtraction">Implementation</a></p>
+  </li>
+</ul>
+
+<h2 id="key-aspect">Key Aspect</h2>
+
+<ul>
+  <li>Use of Reinforcement Learning to resolve the ambiguity inherent in the textual documents.</li>
+  <li>Given a query, the RL agent would use template statement to formulate the queries (to be performed on the black box search engine). It would further resolve and combine the result for the query from the set of retrieved documents.</li>
+</ul>
+
+<h2 id="datasets">Datasets</h2>
+
+<ul>
+  <li>Database of Mass Shootings in the United States.</li>
+  <li>Food Shield database of illegal food adulteration.</li>
+</ul>
+
+<h2 id="framework">Framework</h2>
+
+<ul>
+  <li>
+    <p>Information extraction task is modelled as a Markov Decision Process (MDP) &lt;S, A, T, R&gt;</p>
+  </li>
+  <li><strong>S</strong> - Set of all possible states
+    <ul>
+      <li>The state consists of:
+        <ul>
+          <li>Extractor’s confidence in predicted entity values.</li>
+          <li>Context from which values are extracted.</li>
+          <li>Similarity between the new document (extracted just now from the search engine) and the original document accompanying the given query.</li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li><strong>A</strong> - Set of all possible actions
+    <ul>
+      <li>Reconciliation decision - d
+        <ul>
+          <li>Accept all entities values.</li>
+          <li>Reject all entities values.</li>
+          <li>Stop the current episode.</li>
+        </ul>
+      </li>
+      <li>Query choice - q
+        <ul>
+          <li>Choose the next query from a set of automatically generated alternatives.</li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li><strong>R</strong> - Rewards
+    <ul>
+      <li>Maximise the final extraction accuracy while minimising the number of queries.</li>
+    </ul>
+  </li>
+  <li><strong>Q</strong> - Queries
+    <ul>
+      <li>Generated using a template.</li>
+      <li>The query is searched on a search engine and the top k links are retrieved.</li>
+    </ul>
+  </li>
+  <li><strong>Transition</strong>
+    <ul>
+      <li>Start with a single source article x<sub>i</sub> and extract the initial set of entities.</li>
+      <li>At each timestep, the agent is given the state (s) on basis of which it chooses the action (d, q). The episode stops whenever the action is a stop action.</li>
+    </ul>
+  </li>
+  <li>
+    <p>Deep Q Network is used.</p>
+  </li>
+  <li>Parameters are learned using SGD and RMSProp.</li>
+</ul>
+
+<h2 id="experimental-setup">Experimental Setup</h2>
+
+<h3 id="extraction-model">Extraction Model</h3>
+
+<ul>
+  <li>Max Entropy Classifier is used as the base extraction system.</li>
+  <li>First, all the words in the document are tagged as one of the entity types and the mode of these values is used to obtain the set of extracted entities.</li>
+</ul>
+
+<h3 id="baseline">Baseline</h3>
+
+<ul>
+  <li>Basic Extractors</li>
+  <li>Aggregation System which either chooses the entity value with the highest confidence or takes a majority vote over all extracted values.</li>
+  <li>Meta-Classifier which operates over the same input state space and produces the same set of reconciliation decisions as the DQN.</li>
+  <li>Oracle Extractor which is computed assuming perfect reconciliation and query decisions on the top of the Maxnet base extractor.</li>
+</ul>
+
+<h3 id="rl-models">RL Models</h3>
+
+<ul>
+  <li>RL Basic - Only reconciliation decision.</li>
+  <li>RL Query - Only query decision with a fixed reconciliation strategy.</li>
+  <li>RL Extract - the full system with both reconciliation and query decision.</li>
+</ul>
+
+<h2 id="result">Result</h2>
+
+<ul>
+  <li>RL Extract obtains substantial gains eg up to 11% over Maxnet.</li>
+  <li>Simple aggregation schemes do not handle the task well.</li>
+  <li>In terms of reward structure, providing rewards after each step works better than a single delayed reward.</li>
+</ul>
diff --git a/_site/site/2018/03/18/Cyclical-Learning-Rates-for-Training-Neural-Networks.html b/_site/site/2018/03/18/Cyclical-Learning-Rates-for-Training-Neural-Networks.html
new file mode 100644
index 00000000..038f2ef3
--- /dev/null
+++ b/_site/site/2018/03/18/Cyclical-Learning-Rates-for-Training-Neural-Networks.html
@@ -0,0 +1,66 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Conventional wisdom says that when training neural networks, learning rate should monotonically decrease. This insight forms the basis of the different type of adaptive learning rates.</p>
+  </li>
+  <li>
+    <p>Counter to this expected behaviour, the paper demonstrates that using a cyclical learning rate (CLR), varying between a minimum and a maximum value, helps to train the neural network faster without requiring fine-tuning of learning rate.</p>
+  </li>
+  <li>
+    <p>The paper also provides a simple approach to estimate the lower and upper bound for CLR.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1506.01186">Link to the paper</a></p>
+  </li>
+  <li>
+    <p><a href="https://github.com/bckenstler/CLR">Link to the implementation</a></p>
+  </li>
+</ul>
+
+<h2 id="intution">Intution</h2>
+
+<ul>
+  <li>
+    <p>Difficulty in minimizing the loss arises from saddle points and not from local minima. <a href="http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf">[Ref]</a></p>
+  </li>
+  <li>
+    <p>Increasing the learning rate allows for rapid traversal of saddle points.</p>
+  </li>
+  <li>
+    <p>Alternatively, the optimal learning rate is expected to be between bounds of CLR and thus the learning rate would always be close to the optimal learning rate.</p>
+  </li>
+</ul>
+
+<h2 id="parameter-estimation">Parameter Estimation</h2>
+
+<ul>
+  <li>
+    <p>Cycle Length = Number of iterations till learning rate returns to the initial value = 2 * step_size</p>
+  </li>
+  <li>
+    <p>step_size should be set to 2-10 times the number of iterations in an epoch.</p>
+  </li>
+  <li>
+    <p>Estimating the CLR boundary values:</p>
+
+    <ul>
+      <li>
+        <p>Run the model for several epochs while increasing the learning rate between the allowed low and high values.</p>
+      </li>
+      <li>
+        <p>Plot accuracy vs learning rate and note the learning rate values when the accuracy starts to fall.</p>
+      </li>
+      <li>
+        <p>This gives a good candidate value for upper and lower bound. Alternatively, the lower bound could be set to be 1/3 or 3/4 of the upper bound. But it is difficult to judge if the model has run for the sufficient number of epochs in the first place.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="notes">Notes</h2>
+
+<ul>
+  <li>The idea in itself is very simple and straight-forward to add to any existing model which makes it very appealing.</li>
+  <li>The author has experimented with various architectures and datasets (from vision domain) and has reported faster training results.</li>
+</ul>
diff --git a/_site/site/2018/03/25/The-Lottery-Ticket-Hypothesis-Training-Pruned-Neural-Networks.html b/_site/site/2018/03/25/The-Lottery-Ticket-Hypothesis-Training-Pruned-Neural-Networks.html
new file mode 100644
index 00000000..93c921fe
--- /dev/null
+++ b/_site/site/2018/03/25/The-Lottery-Ticket-Hypothesis-Training-Pruned-Neural-Networks.html
@@ -0,0 +1,72 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Empirical evidence indicates that at training time, the neural networks need to be of significantly larger size than necessary.</p>
+  </li>
+  <li>
+    <p>The paper purposes a hypothesis called the <em>lottery ticket hypothesis</em> to explain this behaviour.</p>
+  </li>
+  <li>
+    <p>The idea is the following - Successful training of a neural network depends on a <em>lucky</em> random initialization of a subcomponent of the network. Such components are referred to as <em>lottery tickets</em>.</p>
+  </li>
+  <li>
+    <p>Larger networks are more likely to have these <em>lottery tickets</em> and hence are easier to train.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1803.03635">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="methodology">Methodology</h2>
+
+<ul>
+  <li>
+    <p>Various aspects of the hypothesis are explored empirically.</p>
+  </li>
+  <li>
+    <p>Two tasks are considered - MNIST and XOR.</p>
+  </li>
+  <li>
+    <p>For each task, the paper considers networks of different sizes and empirically shows that larger networks are more likely to converge (or have better performance) for a fixed number of epochs as compared to the smaller networks.</p>
+  </li>
+  <li>
+    <p>Given a large, trained network, some weights (or units) of the network are pruned and the resulting network is reset to its initial random weights.</p>
+  </li>
+  <li>
+    <p>The resulting network is the <em>lottery-ticket</em> in the sense that when the pruned network is trained, it is more likely to converge than an otherwise randomly initialised network of the same size. Further, it is more likely to match the original, larger network in terms of performance.</p>
+  </li>
+  <li>
+    <p>The paper explores different aspects of this experiment:</p>
+
+    <ul>
+      <li>Pruning Strategies:
+        <ul>
+          <li>One-shot strategy prunes the network in one-go while the iterative strategy prunes the network iteratively.</li>
+          <li>Though the latter is computationally more intensive, it is more likely to find a lottery ticket.</li>
+        </ul>
+      </li>
+      <li>
+        <p>Size of the pruned network affects the speed of convergence when training the <em>lottery ticket</em>.</p>
+      </li>
+      <li>
+        <p>If only the architecture or only the initial weights of the <em>lottery ticket</em> are used, the resulting network tends to converge more slowly and achieves a lower level of performance.</p>
+      </li>
+      <li>This indicates that the lottery ticket depends on both the network architecture and the weight initialization.</li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="discussion">Discussion</h2>
+
+<ul>
+  <li>
+    <p>The paper includes some more interesting experiments. For instance, the distribution of the initialization in the weights that survived the pruning suggests that small weights from before training tend to remain small after training.</p>
+  </li>
+  <li>
+    <p>One interesting experiment would be to show the performance of the pruned network before resetting its weights and retraining again. This performance should be compared with the performance of the initial large network and the performance of the <em>lottery ticket</em> after training.</p>
+  </li>
+  <li>
+    <p>Overall, the experiments are not sufficient to conclude anything about the correctness of the hypothesis. The proposition itself is very interesting and could enhance our understanding of how the neural networks work.</p>
+  </li>
+</ul>
diff --git a/_site/site/2018/04/02/Unsupervised-Learning-By-Predicting-Noise.html b/_site/site/2018/04/02/Unsupervised-Learning-By-Predicting-Noise.html
new file mode 100644
index 00000000..1bd6354d
--- /dev/null
+++ b/_site/site/2018/04/02/Unsupervised-Learning-By-Predicting-Noise.html
@@ -0,0 +1,147 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Convolutional Neural Networks are extremely good feature extractors in the sense that features extracted for one task (say image classification) can be easily transferred to another task (say image segmentation).</p>
+  </li>
+  <li>
+    <p>Existing unsupervised approaches do not aim to learn discriminative features and supervised approaches for discriminative features do not scale well.</p>
+  </li>
+  <li>
+    <p>The paper presents an approach to learn features in an unsupervised setting by using a set of target representations called as Noise As Target (NAT) which acts as a kind of proxy supervising signal.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1704.05310">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<h3 id="unsupervised-setting">Unsupervised Setting</h3>
+
+<ul>
+  <li>Given a collection of image X (x<sub>1</sub>, x<sub>2</sub>, …, x<sub>n</sub>), we want to learn a parameterized mapping <em>f</em> such that <em>f(x<sub>i</sub>)</em> gives the features of image <em>x<sub>i</sub></em>. We would jointly learn the target vectors <em>y<sub>i</sub></em> (more on it later).</li>
+</ul>
+
+<h3 id="loss-function">Loss Function</h3>
+
+<ul>
+  <li>Squared L2 norm is used as the distance measure while making sure that final activations are unit normalized.</li>
+</ul>
+
+<h3 id="fixed-target-representation">Fixed Target Representation</h3>
+
+<ul>
+  <li>
+    <p>In the setting of the problem where we are learning both the features and the target representation, a trivial solution would be the one where all the input images map to the same target and are assigned the same representation. No discriminative features are learned in this case.</p>
+  </li>
+  <li>
+    <p>To avoid such situations, a set of k predefined target representations are chosen and each image is mapped to one of these k representations (based on the features).</p>
+  </li>
+  <li>
+    <p>There is an assumption that k &gt; n so that each image is assigned a different target.</p>
+  </li>
+  <li>
+    <p>One simple choice of target representation is the standard one-hot vector which implies that all the class (and by extension, the associated images) are orthogonal and equidistant from each other. But this is not a reasonable approximation as not all the image pairs are equally similar or dissimilar.</p>
+  </li>
+  <li>
+    <p>Instead, the target vectors are uniformly sampled from a d-dimensional unit sphere, where d is the dimensionality of the feature representation. That is, the idea is to map the features to the manifold of the d-dimensional L2 sphere by using the K predefined representations as for the discrete approximation of the manifold.</p>
+  </li>
+  <li>
+    <p>Since each data point (image) is mapped to a new point on the manifold, the algorithm is suited for online training as well.</p>
+  </li>
+</ul>
+
+<h3 id="optimisation">Optimisation</h3>
+
+<ul>
+  <li>
+    <p>For the training, the number of target K is reduced to the number of images n and an assignment matrix P is learned which ensures that the mapping between the image to target is 1-to-1.</p>
+  </li>
+  <li>
+    <p>The resulting optimisation equation can be solved using the Hungarian Algorithm but at a high-cost O(n^3). An optimisation is to take a batch of b images and update the square matrix P<sub>B</sub> for dimension bXb (made of the images and their corresponding targets). This reduces the overall complexity of O(nb^2).</p>
+  </li>
+  <li>
+    <p>Other optimisation techniques, that are common to supervised learning, like batch norm used in this setting as well.</p>
+  </li>
+</ul>
+
+<h3 id="implementation-detail">Implementation Detail</h3>
+
+<ul>
+  <li>
+    <p>Used AlexNet with NATs to train the unsupervised model.</p>
+  </li>
+  <li>
+    <p>An MLP is trained on these features to learn the classifier.</p>
+  </li>
+  <li>
+    <p>Standard preprocessing techniques like random cropping/flipping are used.</p>
+  </li>
+</ul>
+
+<h3 id="experimental-details">Experimental Details</h3>
+
+<ul>
+  <li>
+    <p>Dataset</p>
+
+    <ul>
+      <li>
+        <p>ImageNet for training the AlexNet architecture with the proposed approach.</p>
+      </li>
+      <li>
+        <p>Pascal VOC 2007 for transfer learning experiments.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Baselines</p>
+
+    <ul>
+      <li>
+        <p>Unsupervised approaches like autoencoder, GAN, BiGAN</p>
+      </li>
+      <li>
+        <p>Self-supervised</p>
+      </li>
+      <li>
+        <p>SOTA models using hand-made features SIFT with Fisher Vector.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="observation">Observation</h2>
+
+<ul>
+  <li>
+    <p>Using squared loss instead of softmax does not deteriorate the performance too much.</p>
+  </li>
+  <li>
+    <p>The authors compare the effect of using discrete vs continuous target representations for transfer learning. For the discrete representation, elements of the canonical basis of a k-dimensional space (k=1000, 10000, 100000) are used. Experiments demonstrate that d-dimensional continuous vectors perform much better than the discrete vectors.</p>
+  </li>
+  <li>
+    <p>While training the unsupervised network, its features were extracted after every 20 iterations to evaluate the performance on transfer learning task. The test accuracy increases up to around 100 iterations then saturate.</p>
+  </li>
+  <li>
+    <p>Comparing the visualization of the first convolutional layer filters (for AlexNet with and without supervision) shows that while unsupervised filters are less sharp, they maintain the edge and orientation information.</p>
+  </li>
+  <li>
+    <p>The proposed unsupervised method outperforms all the unsupervised baselines and is competitive with respect to the supervised baseline. But it is still far behind the model using handcrafted features.</p>
+  </li>
+  <li>
+    <p>For transfer learning, on Pascal VOC, the proposed approach beats the supervised baseline and works at par with the supervised approach.</p>
+  </li>
+</ul>
+
+<h2 id="notes">Notes</h2>
+
+<ul>
+  <li>
+    <p>The paper proposed a simple unsupervised framework for learning discriminative features without having to rely on proxy tasks like image generation and without having to make an assumption about the input domain.</p>
+  </li>
+  <li>
+    <p>The key aspect of the proposed approach is that each image is assigned to a unique point in the d-dimensional manifold which means 2 images could be very close to each other on the manifold while being quite distinct in reality. It is interesting to see that such a simple strategy is able to give such good results.</p>
+  </li>
+</ul>
diff --git a/_site/site/2018/04/08/Neural-Message-Passing-for-Quantum-Chemistry.html b/_site/site/2018/04/08/Neural-Message-Passing-for-Quantum-Chemistry.html
new file mode 100644
index 00000000..cf60f22d
--- /dev/null
+++ b/_site/site/2018/04/08/Neural-Message-Passing-for-Quantum-Chemistry.html
@@ -0,0 +1,162 @@
+<h1 id="introduction">Introduction</h1>
+
+<ul>
+  <li>
+    <p>The paper presents a general message passing architecture called as Message Passing Neural Networks (MPNNs) that unify various existing models for performing supervised learning on molecules.</p>
+  </li>
+  <li>
+    <p>Variants of the MPNN model achieve very good performance on the task of predicting the property of the molecules.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1704.01212">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h1 id="mpnn">MPNN</h1>
+
+<h2 id="setting">Setting</h2>
+
+<ul>
+  <li>
+    <p>The input to the model is an undirected graph <em>G</em> where node features are represented as <em>x<sub>v</sub></em> (corresponding to node <em>v</em>) and edge features are <em>e<sub>v, w</sub></em> (corresponding to edge between nodes <em>v, w</em>).</p>
+  </li>
+  <li>
+    <p>The idea is to learn a representation (or feature vector) for all the nodes (and possibly edges) in the graph and use that for the downstream supervised learning task.</p>
+  </li>
+  <li>
+    <p>The model can be easily extended to the setting of directed graphs.</p>
+  </li>
+  <li>
+    <p>The model works in 2 phases:</p>
+  </li>
+</ul>
+
+<h2 id="message-passing-phase">Message Passing Phase</h2>
+
+<ul>
+  <li>
+    <p>All nodes send a <em>message</em> to their neighbouring nodes. The message is a function of the feature vectors corresponding to the sender node (or vertex), the receiver node and the edge connecting the two nodes. The feature vectors can be combined to form the message using the <em>message function</em> which can be implemented as a neural network.</p>
+  </li>
+  <li>
+    <p>Once a node has received messages from all its neighbours, it updated its feature vector by aggregating all the message. The function used to aggregate and update the feature vector is called as the <em>update function</em> and can be implemented as a neural network.</p>
+  </li>
+  <li>
+    <p>After updating the feature vectors, the graph could initiate another round of message passing. After a sufficient number of message passing rounds, the Readout phase is invoked.</p>
+  </li>
+</ul>
+
+<h2 id="readout-phase">Readout Phase</h2>
+
+<ul>
+  <li>
+    <p>The feature vectors corresponding to different nodes in the graph are aggregated into a single feature vector (corresponding to the feature vector of the graph) using the <em>readout function</em>.</p>
+  </li>
+  <li>
+    <p>The <em>readout function</em> can also be implemented using a neural network with the condition that it is invariant to the permutation of the nodes within the graph (to ensure that the MPNN is independent of the graph isomorphism).</p>
+  </li>
+</ul>
+
+<h1 id="existing-variants-in-literature">Existing Variants in literature</h1>
+
+<ul>
+  <li>The paper provides various examples where the existing architectures could be explained in terms of the message passing framework. This includes examples like <a href="https://arxiv.org/abs/1509.09292">Convolutional Networks on Graphs for Learning Molecular Fingerprints</a>, <a href="https://arxiv.org/abs/1511.05493">
+Gated Graph Sequence Neural Networks</a>, <a href="http://tkipf.github.io/graph-convolutional-networks/">Graph Convolutional Networks</a> etc.</li>
+</ul>
+
+<h1 id="experiments">Experiments</h1>
+
+<h2 id="setup">Setup</h2>
+
+<ul>
+  <li>
+    <p>Broadly speaking, the task is to predict the properties of given molecules (regression problem).</p>
+  </li>
+  <li>
+    <p>The QM9 dataset consists of 130K molecules whose properties have been measured using Quantum Mechanical Simulations (DFT).</p>
+  </li>
+  <li>
+    <p>Properties to be predicted include atomization energy, enthalpy, highest fundamental vibrational frequency etc.</p>
+  </li>
+  <li>
+    <p>There are two benchmarks for error:</p>
+
+    <ul>
+      <li>
+        <p>DFT Error - Estimated average error of DFT approximation</p>
+      </li>
+      <li>
+        <p>Chemical Accuracy - As established by the chemistry community</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="model">Model</h2>
+
+<ul>
+  <li>
+    <p>Following variants of <em>message function</em> are explored:</p>
+
+    <ul>
+      <li>
+        <p>Matrix multiplication between <em>A<sub>evw</sub></em> and <em>h<sub>v</sub></em> where <em>A</em> is the adjacency matrix <em>h<sub>v</sub></em> is the feature corresponding to node <em>v</em>.</p>
+      </li>
+      <li>
+        <p>Edge Network which is same as matrix multiplication case with the difference that <em>A</em> is a learned matrix for each edge type.</p>
+      </li>
+      <li>
+        <p>Pair Network where the feature vector corresponding to the source node, target node and edge is fed to a neural network.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="virtual-elements">Virtual Elements</h2>
+
+<ul>
+  <li>
+    <p>Since all messages are shared via edges, it could take a long time for the message to move between two ends of the graph. To fasten this process, virtual elements are provided.</p>
+  </li>
+  <li>
+    <p>In the first setting, “virtual edges” are inserted between nodes.</p>
+  </li>
+  <li>
+    <p>In the second setting, a “master” node connects to all the other nodes.</p>
+  </li>
+</ul>
+
+<h2 id="message-passing-complexity">Message Passing Complexity</h2>
+
+<ul>
+  <li>
+    <p>In a graph with <em>n</em> nodes and <em>d</em> dimensional feature vectors, a single step of message passing would have the worst case time complexity of <em>O(n<sup>2</sup>d<sup>2</sup></em>.</p>
+  </li>
+  <li>
+    <p>This complexity can be reduced by breaking the <em>d</em> dimensional embedding into <em>k</em> different groups of <em>d/k</em> embeddings which can be updated in parallel. The complexity of the modified approach is <em>O(n<sup>2</sup>d<sup>2</sup>/k</em>.</p>
+  </li>
+</ul>
+
+<h1 id="results">Results</h1>
+
+<ul>
+  <li>
+    <p>Best performing MPNN model uses edge network as the <em>message function</em> and <a href="https://arxiv.org/abs/1511.06391">set2set</a> as the <em>readout function</em>.</p>
+  </li>
+  <li>
+    <p>Using group of embeddings helps to improve generalization. This effect could also be because of ensemble-like nature of the modified architecture.</p>
+  </li>
+  <li>
+    <p>The model performs worse without the virtual elements.</p>
+  </li>
+</ul>
+
+<h1 id="takeaways">Takeaways</h1>
+
+<ul>
+  <li>
+    <p>Long range interaction between vertices is necessary.</p>
+  </li>
+  <li>
+    <p>Scaling to larger molecule sizes is challenging because the model creates a fully connected graph by incorporating virtual elements.</p>
+  </li>
+</ul>
diff --git a/_site/site/2018/05/06/Learning-to-Count-Objects-in-Natural-Images-for-Visual-Question-Answering.html b/_site/site/2018/05/06/Learning-to-Count-Objects-in-Natural-Images-for-Visual-Question-Answering.html
new file mode 100644
index 00000000..6d01ced7
--- /dev/null
+++ b/_site/site/2018/05/06/Learning-to-Count-Objects-in-Natural-Images-for-Visual-Question-Answering.html
@@ -0,0 +1,74 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>Most of the visual question-answering (VQA) models perform poorly on the task of counting objects in an image. The main reasons are:
+    <ul>
+      <li>Most VQA models use a soft attention mechanism to perform a weighted sum over the spatial features to obtain a single feature vector. These aggregated features helps in most category of questions but seems to hurt for counting based questions.</li>
+      <li>For the counting questions, we do not have a ground truth segmentation of where the objects to be counted are present on the image. This limits the scope of supervision.</li>
+    </ul>
+  </li>
+  <li>
+    <p>Additionally, we need to ensure that any modification in the architecture, to enhance the performance on the counting questions, should not degrade the performance on other classes of questions.</p>
+  </li>
+  <li>
+    <p>The paper proposes to overcome these challenges by using the attention maps (and not the aggregated feature vectors) as input to a separate <strong>count</strong> module.</p>
+  </li>
+  <li><a href="https://arxiv.org/abs/1802.05766">Link to the paper</a></li>
+</ul>
+
+<h2 id="notes">Notes</h2>
+
+<p>The basic idea is quite intuitive: when we perform weighted averaging based on different attention maps, we end up averaging the features corresponding to the difference instances of an object. This makes the feature vectors indistinguishable from the scenario where we had just one instance of the object in the image.</p>
+
+<p>Even multiple glimpses (multiple attention steps) can not resolve this problem as the weights given to one feature vector would not depend on the other feature vectors (that are attended to). Hard attention could be more useful than soft-attention but there is not much empirical evidence in support of this hypothesis.</p>
+
+<p>The proposed <strong>count</strong> module is a separate pipeline that can be integrated with most of the existing attention based VQA models without affecting the performance on non-count based questions.</p>
+
+<p>The inputs to the <strong>count</strong> module are the attention maps and the object proposals (coming from some pre-trained model like the RCNN model) and the output is an count-feature vector which is used to answer the count based question.</p>
+
+<p>The top level idea is the following - given the object proposals and the attention maps, create a graph where nodes are objects (object proposals) and edges capture how similar two object proposals are (how much do they overlap). The graph is transformed (by removing and scaling edges) so that the count of the object can be obtained easily.</p>
+
+<p>To explain their methodology, the paper simplifies the setting by making two assumptions:</p>
+<ul>
+  <li>The first assumption is that the attention weights are either 1 (when the object is present in the proposal) or 0 (when the object is absent from the proposal).</li>
+  <li>The second assumption is that any two object proposals either overlap completely (in which case, they are corresponding to the exact same object and hence receive the exact same weights) or the two proposals have zero overlap (in which case, they must be corresponding to completely different objects).</li>
+</ul>
+
+<p>These simplifying assumptions are made only for the sake of exposition and do not limit the capabilities of the <strong>count</strong> module.</p>
+
+<p>Given the assumptions, the task of the count module is to handle the exact duplicates to prevent double-counting of objects.</p>
+
+<p>As the first step, the attention weights (<strong>a</strong>) are used to generate an attention matrix (<strong>A</strong>) by performing an outer product between <strong>a</strong> and <strong>a<sup>T</sup></strong>. This corresponds to the step of creating a graph from the input.</p>
+
+<p><strong>A</strong> corresponds to the adjacency matrix of that graph. The attention weight for the <em>i<sup>th</sup></em> proposal corresponds to the <em>i<sup>th</sup></em> node in the graph and the edge between the nodes <em>i</em> and <em>j</em> has the weight <strong>a<sub>i</sub>*a<sub>j</sub></strong>.</p>
+
+<p>Also note that the graph is a weighted directed graph and the subgraph of vertices satisfying the condition <strong>a<sub>i</sub></strong> = 1 is a complete directed graph with self-loops. Given such a graph, the number of vertices, <em>V = sqrt(E)</em> where <em>E</em> could be computed by summing over the adjacency matrix.This implies that if the proposals are distinct, then the count can be obtained trivially by performing a sum over the adjacency matrix.</p>
+
+<p>The objective is now to eliminate the edges such that the underlying objects are the vertices of a complete subgraph. This requires removing two type of duplicate edges - intra-object edges and inter-object edges.</p>
+
+<p>Intra-object edges can be removed by computing a distance matrix, <strong>D</strong>, defined as 1 - IoU, where IoU matrix corresponds to the Intersection-over-Union matrix. A modified adjacency matrix <strong>A’</strong> is obtained by performing the element-wise product between f<sub>1</sub>(<strong>A</strong>) and f<sub>2</sub>(<strong>D</strong>) where f<sub>1</sub> and f<sub>2</sub> are piece-wise linear functions that are learnt via backpropogation.</p>
+
+<p>The inter-object edges are removed in the following manner:</p>
+
+<ul>
+  <li>Count the number of proposals that correspond of each instance of an object and then scale down the edges corresponding to the different instances by that number.</li>
+  <li>This creates the effect of reducing the weights of multiple proposals equivalent to a single proposal.</li>
+  <li>The number of proposals corresponding to an object is not available as an annotation in the training pipeline and is estimated based on the similarity between the different proposals (measured via the attention weights <strong>a</strong>, adjacency matrix <strong>A</strong> and distance matrix <strong>D</strong>).</li>
+  <li>The matrix corresponding to the similarity between proposals  (<strong>sim<sub>i, j</sub></strong>) is transformed into a vector corresponding to the scaling factor of each node (<strong>s<sub>i</sub></strong>)</li>
+</ul>
+
+<p><strong>s</strong> can be converted into a matrix (by doing outer-product with itself) so as to scale both the incoming and the outgoing edges. The self edges (which were removed while computing <strong>A’</strong> are added back (after scaling with <strong>s</strong>) to obtain a new transformed matrix <strong>C</strong>.</p>
+
+<p>The transformed matrix <strong>C</strong> is a complete graph with self-loops where the nodes corresponds to all the relevant object instances and not to object proposals. The actual count can be obtained from <strong>C</strong> by performing a sum over all its values as described earlier. The original count problem was a regression problem but it is transformed into a classification problem to avoid scale issues. The network produces a <strong>k</strong>-hot <strong>n</strong>-dimensional vector called <strong>o</strong> where <strong>n</strong> is the number of object proposals that were feed into the module (and hence the upper limit on upto how large a number could the module count). In the ideal setting, <strong>k</strong> should be one, as the network would produce an integer value but in practice, the network produces a real number so <strong>k</strong> can be upto 2. If <strong>c</strong> is an exact integer, the output is a 1-hot vector with the value in index corresponding to <strong>c</strong> set to 1. If <strong>c</strong> is a real number, the output is a linear interpolation between two one-hot vectors (the one-hot vectors correspond to the two integers between  which <strong>c</strong> lies).</p>
+
+<p><strong>count</strong> module supports computing the confidence of a prediction by defining two variables p<sub><strong>a</strong></sub> and p<sub><strong>D</strong></sub> which compute the average distance of f<sub>6</sub>(<strong>a</strong>) and $f<sub>7</sub>(<strong>D</strong>) from 0.5. The final output <strong>o’</strong> is defined as f<sub>8</sub>(p<sub><strong>a</strong></sub> + p<sub><strong>D</strong></sub>) . <strong>o</strong></p>
+
+<p>All the different f functions are piece wise linear functions and are learnt via backpropagation.</p>
+
+<h2 id="experiments">Experiments</h2>
+
+<p>The authors created a new category of count-based questions by filtering the number-type questions to remove questions like “What is the time right now”. These questions do have a neumerical answer but do not fall under the purview of count based questions and hence are not targeted by the <strong>count</strong> model.</p>
+
+<p>The authors augmented a state of the art <a href="https://arxiv.org/abs/1704.03162">VQA model</a> with their <strong>count</strong> module and show substantial gains over the count-type questions for the <a href="https://arxiv.org/abs/1612.00837">VQA-v2 dataset</a>. This augmentation does not drastically impact the performance on non-count questions.</p>
+
+<p>The overall idea is quite crisp and intutive and the paper is easy to follow. It would be even better if there were some more abalation studies. For example, why are the piece-wise linear functions assumed to have 16 linear components? Would a smaller or larger number be better?</p>
diff --git a/_site/site/2018/05/21/Net2Net-Accelerating-Learning-via-Knowledge-Transfer.html b/_site/site/2018/05/21/Net2Net-Accelerating-Learning-via-Knowledge-Transfer.html
new file mode 100644
index 00000000..3c404a37
--- /dev/null
+++ b/_site/site/2018/05/21/Net2Net-Accelerating-Learning-via-Knowledge-Transfer.html
@@ -0,0 +1,69 @@
+<h2 id="notes">Notes</h2>
+
+<ul>
+  <li>
+    <p>The paper presents a simple yet effective approach for transferring knowledge from a trained neural network (referred to as the teacher network) to a large, untrained neural network (referred to as the student network).</p>
+  </li>
+  <li>
+    <p>The key idea is to use a function-preserving transformation that guarantees that for any given input, the output from the teacher network and the newly created student network would be the same.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1511.05641">Link to the paper</a></p>
+  </li>
+  <li>
+    <p><a href="https://github.com/paengs/Net2Net">Link to an implementation</a></p>
+  </li>
+  <li>
+    <p>The approach works as follows - Let us say that the teacher network was represented by the transformation <em>y = f(x, θ)</em> where <em>θ</em> refer to the parameters of the network. The task is to choose a new set of parameters <em>θ’</em> for the student network <em>g(x, θ’)</em> such that for all <em>x, f(x, θ) = g(x, θ’)</em></p>
+  </li>
+  <li>
+    <p>To start, we can assume that <em>f</em> and <em>g</em> are composed of standard linear layers. Layer <em>i</em> and <em>i+1</em> are represented by weights <em>W<sub>mxn</sub><sup>i</sup></em> and <em>W<sub>nxp</sub><sup>i+1</sup></em></p>
+  </li>
+  <li>
+    <p>We want to grow layer <em>i</em> to have <em>q</em> output units (where <em>q</em> &gt; <em>n</em>) and layer <em>i+1</em> to have <em>q</em> input units. The new weight matrix would be <em>U<sub>mxq</sub><sup>i</sup></em> and <em>U<sub>qxp</sub><sup>i+1</sup></em></p>
+  </li>
+  <li>
+    <p>The first <em>q</em> columns (rows) of <em>W<sup>i</sup></em> (<em>W<sup>i+1</sup></em>) would be copied as it is into <em>U<sup>i</sup></em>(<em>U<sup>i+1</sup></em>).</p>
+  </li>
+  <li>
+    <p>For filling the remaining <em>n-q</em> slots, columns (rows) would be sampled randomly from <em>W<sup>i</sup></em> (<em>W<sup>i+1</sup></em>).</p>
+  </li>
+  <li>
+    <p>Finally, each layer in <em>U<sup>i</sup></em> is scaled by dividing by the corresponding replication factor to ensure that the output value of function remains unchanged by the operation.</p>
+  </li>
+  <li>
+    <p>Since convolutions can be seen as multiplication by a double block circulant matrix, the approach can be readily extended for convolutional networks.</p>
+  </li>
+  <li>
+    <p>The benefits of using this approach are the following:</p>
+
+    <ul>
+      <li>The newly created student network performs at least as good as the teacher network.</li>
+      <li>Any changes to the network are guaranteed to be an improvement.</li>
+      <li>It is safe to optimize all the parameters in the network.</li>
+    </ul>
+  </li>
+  <li>
+    <p>The variant discussed above is called the <strong>Net2WiderNet</strong> variant. There is another variant called<strong>Net2DeeperNet</strong> that enables the network to grow in depth.</p>
+  </li>
+  <li>
+    <p>In that case, a new matrix, <em>U</em>, initialized as the identity matrix, is added to the network. Note that unlike the <strong>Net2WiderNet</strong>, this approach would not work with arbitrary activation function between the layers.</p>
+  </li>
+</ul>
+
+<h2 id="strengths">Strengths</h2>
+
+<ul>
+  <li>
+    <p>The model can accelerate the training of neural networks, especially during development cycle when the designers try out different models.</p>
+  </li>
+  <li>
+    <p>The approach could potentially be used in life-long learning systems where the model is trained over a stream of data and needs to grow over time.</p>
+  </li>
+</ul>
+
+<h2 id="limitations">Limitations</h2>
+
+<ul>
+  <li>The function preserving transformations need to be worked out manually. Extra care needs to be taken when operations like concatenation or batch norm are present.</li>
+</ul>
diff --git a/_site/site/2018/06/09/Born-Again-Neural-Networks.html b/_site/site/2018/06/09/Born-Again-Neural-Networks.html
new file mode 100644
index 00000000..8cfc6968
--- /dev/null
+++ b/_site/site/2018/06/09/Born-Again-Neural-Networks.html
@@ -0,0 +1,123 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper explores knowledge distillation (KD) from the perspective of transferring knowledge between 2 networks of identical capacity.</p>
+  </li>
+  <li>
+    <p>This is in contrast to much of the previous work in KD which has focused on transferring knowledge from a larger network to a smaller network.</p>
+  </li>
+  <li>
+    <p>The paper reports that these Born Again Networks (BANs) outperform their teachers by significant margins in many cases.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1805.04770">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<ul>
+  <li>The standard KD setting is as follows:
+    <ul>
+      <li>Start with an untrained network (or ensemble of networks) and train them for the given task. This network is referred to as the teacher network.</li>
+      <li>Now start with another untrained network (generally of smaller size than the teacher network) and train it using the output of the teacher network. This network is referred to as the student network.</li>
+    </ul>
+  </li>
+  <li>
+    <p>The paper augments this setting with an extra cross-entropy loss between the output of the teacher and the student networks. The student tried to predict the correct answer while matching the output distribution of the teacher.</p>
+  </li>
+  <li>
+    <p>The resulting student network is referred to as BAN - Born Again Network.</p>
+  </li>
+  <li>
+    <p>The same approach can be used multiple times (with diminishing returns) where the kth generation student is initialized by knowledge transfer from (k-1)th generation student.</p>
+  </li>
+  <li>The output of multiple generation BANs are combined via averaging to produce BANE (Born Again Network Ensemble).</li>
+</ul>
+
+<h2 id="dark-knowledge">Dark Knowledge</h2>
+
+<ul>
+  <li>
+    <p><a href="https://shagunsodhani.in/papers-I-read/Distilling-the-Knowledge-in-a-Neural-Network">Hinton et al</a> suggested that even when the output of the teacher network is incorrect, it contains useful information about the similarity between the output classes. This information is referred to as the “dark knowledge”.</p>
+  </li>
+  <li>
+    <p>The current paper observed that the gradient of the correct output dimension during distillation and normal supervised training resembles the original gradient up to a  weight factor. This sample specific weight is defined by the value of the teacher’s max output.</p>
+  </li>
+  <li>
+    <p>This suggests distillation may be performing some kind of importance weighing. To explore this further, the paper considers 2 cases:</p>
+
+    <ul>
+      <li>
+        <p>Confidence Weighted By Teacher Max (CWTM) - where each example in the student’s loss function is weighted by the confidence that the teacher has on the prediction for that sample. The student incurs a higher loss if the teacher was more confident about the example.</p>
+      </li>
+      <li>
+        <p>Dark Knowledge with Permuted Predictions (DKPP) - The non-argmax output of teacher’s predictive distribution are permuted thus destroying the information about which output classes are related.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>The key effect of these variations is that the covariance between the output classes is lost and classical knowledge distillation would not be sufficient to explain improvements (if any).</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<h3 id="image-data">Image Data</h3>
+
+<ul>
+  <li>Datasets
+    <ul>
+      <li>CIFAR10</li>
+      <li>CIFAR100</li>
+    </ul>
+  </li>
+  <li>Baselines
+    <ul>
+      <li>ResNets</li>
+      <li>DenseNets</li>
+    </ul>
+  </li>
+  <li>BAN Variants
+    <ul>
+      <li>BAN-DenseNet and BAN-ResNet  - Train a sequence of 2 or 3 BANs using DenseNets and ResNets. Different variants constrain BANs to be similar to their teacher or penalize l2-distance between student and teacher activations etc.</li>
+      <li>Two settings with CWTM and DKPP as explained earlier.</li>
+      <li>BAN-Resnet with DenseNet teacher and BAN-DenseNet with ResNet teacher</li>
+    </ul>
+  </li>
+</ul>
+
+<h3 id="text-data">Text Data</h3>
+
+<ul>
+  <li>Datasets:
+    <ul>
+      <li>PTB Dataset</li>
+    </ul>
+  </li>
+  <li>Baselines
+    <ul>
+      <li>CNN-LSTM model</li>
+    </ul>
+  </li>
+  <li>BAN Variant
+    <ul>
+      <li>LSTM</li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<ul>
+  <li>BAN student models improved over their teachers in most of the configurations.</li>
+  <li>Training BANs across multiple generations leads to saturating improvements.</li>
+  <li>The student models exhibit improvements even in the control settings (CWTM and DKPP).
+    <ul>
+      <li>One reason could be that the permutation procedure did not remove the higher order moments of output distribution.</li>
+      <li>Improvements in the CWTM model suggests that the pre-trained models can be used to rebalance the training set by giving lesser weight for samples where the teacher’s output distribution is more spread.</li>
+    </ul>
+  </li>
+</ul>
+
diff --git a/_site/site/2018/07/04/Memory-Based-Parameter-Adaption.html b/_site/site/2018/07/04/Memory-Based-Parameter-Adaption.html
new file mode 100644
index 00000000..cd9ed18a
--- /dev/null
+++ b/_site/site/2018/07/04/Memory-Based-Parameter-Adaption.html
@@ -0,0 +1,146 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Standard Deep Learning networks are not suitable for continual learning setting as the change in the data distribution leads to catastrophic forgetting.</p>
+  </li>
+  <li>
+    <p>The paper proposes Memory-based Parameter Adaptation (MbPA), a technique that augments a standard neural network with an episodic memory (containing examples from the previous tasks).</p>
+  </li>
+  <li>
+    <p>This episodic memory allows for rapid acquisition of new knowledge (corresponding to the current task) while preserving performance on the previous tasks.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1802.10542">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="architecture">Architecture</h2>
+
+<ul>
+  <li>
+    <p>MbPA consists of 3 components:</p>
+
+    <ul>
+      <li>Embedding Network <em>f</em></li>
+      <li>Memory <em>M</em></li>
+      <li>Output network <em>g</em></li>
+    </ul>
+  </li>
+  <li>
+    <p><em>f</em> and <em>g</em> are parametric components while <em>M</em> is a non-parametric component.</p>
+  </li>
+  <li>
+    <p><em>M</em> is a dynamically sized dictionary where the key represents the output of the embedding network and the value represents the desired output for a given input (input to the model).</p>
+  </li>
+  <li>
+    <p>When a new training tuple (x<sub>j</sub>, y<sub>j</sub>) is fed as input to the model, a key-value pair (h<sub>j</sub>, v<sub>j</sub>) is added to the memory. h<sub>j</sub> = f(x<sub>j</sub>)</p>
+  </li>
+  <li>
+    <p>The memory has a fixed size and acts as a circular buffer. When it gets filled up, earlier examples are dropped.</p>
+  </li>
+  <li>
+    <p>When accessing the memory using a key <em>h<sub>key</sub></em>, the k-nearest neighbours (in terms of distance from the given key) are retrieved.</p>
+  </li>
+</ul>
+
+<h2 id="training-phase">Training Phase</h2>
+
+<ul>
+  <li>During the training phase, the memory is only used to store the input examples and does not interfere with the training procedure.</li>
+</ul>
+
+<h2 id="testing-phase">Testing Phase</h2>
+
+<ul>
+  <li>
+    <p>During testing, the memory is used to adapt the parameters of the output network <em>g</em> while the embedding network <em>f</em> remains the same.</p>
+  </li>
+  <li>
+    <p>Given the input x, obtain the embedding corresponding to x and using that as the key, retrieve the k-nearest neighbours from the memory.</p>
+  </li>
+  <li>
+    <p>Each retrived neighbour is a tuple of the form (h<sub>k</sub>, v<sub>k</sub>, w<sub>k</sub>) where w<sub>k</sub> is propotional to the closeness between the input query and the key corresponding to the retrived example.</p>
+  </li>
+  <li>
+    <p>The collection of all the retrieved examples are referred to as the context <em>C</em>.</p>
+  </li>
+  <li>
+    <p>The parameters of the output network <em>g</em> are adapted from θ to θ<sub>x</sub> where θ<sub>x</sub> = θ + δ<sub>M</sub>(x, θ)</p>
+  </li>
+  <li>
+    <p>δ<sub>M</sub>(x, θ) is referred to as the contextual update of parameters of the output network.</p>
+  </li>
+</ul>
+
+<h2 id="interpretation-of-mbpa">Interpretation of MbPA</h2>
+
+<ul>
+  <li>
+    <p>MbPA can be interpreted as decreasing the weighted average of negative log likelihood over the retrieved neighbours in the context C.</p>
+  </li>
+  <li>
+    <p>The expression corresponding to  δ<sub>M</sub>(x, θ) can be obtained by performing gradient descent to minimise the max a posterior over the context C.</p>
+  </li>
+  <li>
+    <p>The a posterior expression can be written as a sum of two terms - one corresponding to a weighted likelihood of data in the context C and the other corresponding to a regularisation term to prevent overfitting the data.</p>
+  </li>
+  <li>
+    <p>This idea can be thought of as a generalisation of attention. Attention can be viewed as fitting a constant function over the neighbourhood of memories while MbPA fits a more general function which is parameterised by the output network of the given model. Refer appendix E in the paper for further details.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>MbPA aims to solve the fundamental problem of enabling the model to deal with changes in data distribution.</p>
+  </li>
+  <li>
+    <p>In that sense, it is evaluated on a wide range of settings: continual learning, incremental learning, unbalanced datasets and change in data distribution at test time.</p>
+  </li>
+  <li>
+    <p>Continual Learning:</p>
+
+    <ul>
+      <li>
+        <p>In this setting, the model encounters a sequence of tasks and cannot revisit a previous task.</p>
+      </li>
+      <li>
+        <p>Permuted MNIST dataset was used.</p>
+      </li>
+      <li>
+        <p>The key takeaway is that once a task is catastrophically forgotten, only a few gradient updates on a carefully selected data, are sufficient to recover the performance.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Incremental Learning:</p>
+
+    <ul>
+      <li>
+        <p>In this setting, the model is trained on a subset of classes and then introduced to novel, unseen classes. The model is tested to see if it can incorporate the new knowledge while retaining the knowledge about the previous classes.</p>
+      </li>
+      <li>
+        <p>Imagenet dataset with Resnet V1 model is used. It is first pretrained on 500 classes and then fine-tuned to see how quickly could it adapt to new classes.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Unbalanced Dataset:</p>
+
+    <ul>
+      <li>This setting is similar to the incremental learning setting with the key difference that once the model has been trained on a part of the dataset and is to be finetuned to acquire new knowledge, the dataset used for finetuning is much smaller than the initial dataset thus creating the effect of unbalanced datasets.</li>
+    </ul>
+  </li>
+  <li>
+    <p>Language Modelling:</p>
+
+    <ul>
+      <li>MbPA is used to adapt to the shift in the word distribution that is common to language modelling tasks. PTB and WikiText datasets were used.</li>
+    </ul>
+  </li>
+  <li>
+    <p>MbPA exhibits strong performance on all these tasks showing that the memory-based parameter adaption technique is effective across a range of tasks in supervised learning.</p>
+  </li>
+</ul>
diff --git a/_site/site/2018/07/11/Learning-Independent-Causal-Mechanisms.html b/_site/site/2018/07/11/Learning-Independent-Causal-Mechanisms.html
new file mode 100644
index 00000000..dc2757e3
--- /dev/null
+++ b/_site/site/2018/07/11/Learning-Independent-Causal-Mechanisms.html
@@ -0,0 +1,124 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper presents a very interesting approach for learning independent (inverse) data transformation from a set of transformed data points in an unsupervised manner.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1712.00961">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="formulation">Formulation</h2>
+
+<ul>
+  <li>
+    <p>We start with a given data distribution <em>P</em> (say the MNIST dataset) where each x ε R<sup>d</sup>.</p>
+  </li>
+  <li>
+    <p>Consider N transformations M<sub>1</sub>, …, M<sub>N</sub> (functions that map input x to transformed input x’). Note that N need not be known before hand.</p>
+  </li>
+  <li>
+    <p>These transformations can be thought of as independent (from other transformations) causal mechanisms.</p>
+  </li>
+  <li>
+    <p>Applying these transformation would give N new distributions Q<sub>1</sub>, …, Q<sub>N</sub>.</p>
+  </li>
+  <li>
+    <p>These individual distributions are combined to form a single transformed distribution Q which contains the union of samples from the individual distributions.</p>
+  </li>
+  <li>
+    <p>At training time, two datasets are created. One dataset corresponds to untransformed objects (sampled from <em>P</em>), referred to as <em>D<sub>P</sub></em>. The other dataset corresponds to samples from the transformed distribution <em>Q</em> and is referred to as <em>D<sub>Q</sub></em>.</p>
+  </li>
+  <li>
+    <p>Note that all the samples in <em>D<sub>P</sub></em> and <em>D<sub>Q</sub></em> are sampled independently and no supervising information is needed.</p>
+  </li>
+  <li>
+    <p>A series of N’ parametric models, called as experts, are initialized and would be trained to learn the different mechanisms.</p>
+  </li>
+  <li>
+    <p>For simplicity, assume that N = N’. If N &gt; N’, some experts would learn more than one transformation or certain transformations would not be learnt. If N &lt; N’, some experts would not learn anything or some experts would learn the same distribution. All of these cases can be diagnosed and corrected by changing the number of experts.</p>
+  </li>
+  <li>
+    <p>The experts are trained with the goal of maximizing an objective parameter <em>c</em>: R<sup>d</sup> to R. <em>c</em> takes high values on the support of  <em>P</em> and low values outside.</p>
+  </li>
+  <li>
+    <p>During training, an example x<sub>Q</sub> (from D<sub>Q</sub>) is fed to all the experts at the same time. Each expert produces a value <em>c<sub>j</sub> = c(E<sub>j</sub>(x<sub>Q</sub>))</em></p>
+  </li>
+  <li>
+    <p>The winning expert is the one whose output is the max among all the outputs. Its parameters are updated to maximise its output while the other experts are not updated.</p>
+  </li>
+  <li>
+    <p>This forces the best performing model to become even better and hence specialize.</p>
+  </li>
+  <li>
+    <p>The objective <em>c</em> comes from adversarial training where a discriminator network discriminates between the untransformed input and the output of the experts.</p>
+  </li>
+  <li>
+    <p>Each expert can be thought of as a GAN that conditions on the input x<sub>Q</sub> (and not on a noise vector). The output of the different experts is fed to the discriminator which provides both a selection mechanism and the gradients for training the experts.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>Experiments are performed on the MNIST dataset using the transformations like translation along 4 directions and along 4 diagonals, contrast shift and inversion.</p>
+  </li>
+  <li>
+    <p>The discriminator is further trained against the output of all the losing experts thereby furthering strengthing the winning expert.</p>
+  </li>
+</ul>
+
+<h3 id="approximate-identity-initialization">Approximate Identity Initialization</h3>
+
+<ul>
+  <li>
+    <p>The experts are initialized randomly and then pretrained to approximate the identity function by training with identical input-output pairs.</p>
+  </li>
+  <li>
+    <p>This ensures that the experts start from a similar level.</p>
+  </li>
+  <li>
+    <p>In practice, it seems necessary for the success of the proposed approach.</p>
+  </li>
+</ul>
+
+<h3 id="observations">Observations</h3>
+
+<ul>
+  <li>
+    <p>During the initial phase, there is a heavy competition between the experts and eventually different winners emerge for different transformations.</p>
+  </li>
+  <li>The approximate quality of reconstructed output was also evaluated using a downstream task.
+    <ul>
+      <li>3 type of inputs were created:
+        <ul>
+          <li>Untransformed images</li>
+          <li>Transformed images</li>
+          <li>Transformed images a being processed by experts.</li>
+        </ul>
+      </li>
+      <li>These inputs are fed to a pretrained MNISTN classifier.</li>
+      <li>The classifier performs poorly on the transformed images while the performance for images processed by experts quickly catches up with the performance on untransformed images.</li>
+    </ul>
+  </li>
+  <li>The experts E<sub>i</sub> generalize on the data points from a different dataset as well.
+    <ul>
+      <li>To test the generalisation capabilities of the expert, a sample of data from the omniglot dataset is transformed and fed to experts (which are trained only on MNIST).</li>
+      <li>Each expert consistently applies the same transformation even though the inputs are outside the training domain.</li>
+      <li>This suggests that the experts have generalized to different transformations irrespective of the underlying dataset.</li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="comments">Comments</h2>
+
+<ul>
+  <li>
+    <p>The experiments are quite limited in terms of complexity of dataset and complexity of transformation but it provides evidence for a promising connection between deep learning and causality.</p>
+  </li>
+  <li>
+    <p>Appendix mentions that in case there are too many experts, for most of the tasks, only one model specialises and the extra experts do not specialize at all. This is interesting as there is no explicit regularisation penalty which prevents the emergence of multiple experts per task.</p>
+  </li>
+</ul>
diff --git a/_site/site/2018/07/19/Kronecker-Recurrent-Units.html b/_site/site/2018/07/19/Kronecker-Recurrent-Units.html
new file mode 100644
index 00000000..a88eb6ca
--- /dev/null
+++ b/_site/site/2018/07/19/Kronecker-Recurrent-Units.html
@@ -0,0 +1,156 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Recurrent Neural Networks have two key issues:</p>
+
+    <ul>
+      <li>
+        <p><strong>Over parameterization</strong> which increases the time for training and inference.</p>
+      </li>
+      <li>
+        <p><strong>Ill conditioned</strong> recurrent weight matrix which makes training difficult due to vanishing or exploding gradients.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>The paper presents a flexible RNN model called as KRU (Kronecker Recurrent Units) which overcomes the above problems by using a Kronecker factored recurrent matrix and soft unitary constraints on the factors.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1705.10142">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="related-work">Related Work</h2>
+
+<h3 id="existing-solutions-for-overparameterization">Existing solutions for overparameterization</h3>
+
+<ul>
+  <li>
+    <p>Low-rank decomposition.</p>
+  </li>
+  <li>
+    <p>Training a neural network on the soft targets predicted by a big pre-trained network.</p>
+  </li>
+  <li>
+    <p>Low-bit precision training.</p>
+  </li>
+  <li>
+    <p>Hashing.</p>
+  </li>
+</ul>
+
+<h3 id="existing-solutions-for-vanishing-and-exploding-gradients">Existing solutions for vanishing and exploding gradients</h3>
+
+<ul>
+  <li>
+    <p>Gating mechanism like in LSTMs.</p>
+  </li>
+  <li>
+    <p>Gradient Clipping.</p>
+  </li>
+  <li>
+    <p>Orthogonal Weight Initialization.</p>
+  </li>
+  <li>
+    <p>Parameterizing recurrent weight matrix.</p>
+  </li>
+</ul>
+
+<h2 id="kru">KRU</h2>
+
+<ul>
+  <li>
+    <p>Uses a Kronecker factored recurrent matrix which enables controlling the number of parameters and number of factor matrices.</p>
+  </li>
+  <li>
+    <p>Vanishing and exploding gradients are taken care of by using a soft unitary constraint.</p>
+  </li>
+  <li>
+    <p>Why not use strict unitary constraint:</p>
+
+    <ul>
+      <li>
+        <p>Restricts the search space and makes learning process unstable.</p>
+
+        <ul>
+          <li>
+            <p>Makes forgetting (irrelevant) information difficult.</p>
+          </li>
+          <li>
+            <p>Relaxing the strict constraint has shown to improve the convergence speed and generalization performance.</p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>KRU can be easily plugged into RNNs, LSTMs and other variants.</p>
+  </li>
+  <li>
+    <p>The recurrent matrix <em>W</em> is paramterized as a kronecker product of <em>F</em> matrices <em>W<sub>0</sub>, …, W<sub>F-1</sub></em> where each <em>W<sub>f</sub></em> is a complex matrix of shape <em>P<sub>f</sub> x Q<sub>f</sub></em> and the product of all <em>P<sub>f</sub></em> and producto of all <em>Q<sub>f</sub></em> are both equal to <em>N</em>.</p>
+  </li>
+  <li>
+    <p>Why is <em>W</em> a complex matrix?</p>
+
+    <ul>
+      <li>
+        <p>In the real space, the set of all unitary matrices have the determinant as 1 or -1.</p>
+
+        <ul>
+          <li>
+            <p>Given that determinant is a continuous function, the unitary set in the real space is disconnected.</p>
+          </li>
+          <li>
+            <p>The unitary set in the complex space is connected as its determinants are points on the unit circle.</p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h3 id="soft-unitary-constraint">Soft Unitary Constraint</h3>
+
+<ul>
+  <li>
+    <table>
+      <tbody>
+        <tr>
+          <td>A soft unitary constraint is introduced in the form of regularization term</td>
+          <td> </td>
+          <td>W<sub>f</sub><sup>H</sup>W<sub>f</sub> - I</td>
+          <td> </td>
+          <td><sup>2</sup> (per kronecker factored recurrent matrix).</td>
+        </tr>
+      </tbody>
+    </table>
+  </li>
+  <li>
+    <p>If each of the Kronecker factors is unitary, the resulting matrix <em>W</em> would also be unitary.</p>
+  </li>
+  <li>
+    <p>It is computationally inefficient to apply this constraint over the recurrent matrix <em>W</em> itself as the complexity of the regularizer is given as <em>O(N<sup>3</sup>)</em>.</p>
+  </li>
+  <li>Use of Kronecker factorisation makes it computationally feasible to use this regulariser.</li>
+</ul>
+
+<h2 id="experiment">Experiment</h2>
+
+<ul>
+  <li>
+    <p>The Kronecker recurrent model is compared against the existing recurrent models for multiple tasks including copy memory, adding memory, pixel-by-pixel MNIST, char level language models, polyphonic music modelling, and framewise phoneme classification.</p>
+  </li>
+  <li>
+    <p>For most of the task, KRU model produces results comparable to the best performing models despite using fewer parameters.</p>
+  </li>
+  <li>
+    <p>Using soft unitary constraints in KRU provides a principled alternative to gradient clipping (a common heuristic to avoid exploding gradients).</p>
+  </li>
+  <li>
+    <p>Further, recent theoretical results suggest the gradient descent converges to a global optimizer of linear recurrent networks even if the learning problem is non-convex provided that the spectral norm of the recurrent matrix is bound by 1.</p>
+  </li>
+  <li>
+    <p>The key take away from the paper is that state should be high dimensional so that high capacity network can be used for encoding and decoding the input and output. The recurrent dynamics should be implemented via a low capacity model.s per task.</p>
+  </li>
+</ul>
diff --git a/_site/site/2018/08/08/Imagination-Augmented-Agents-for-Deep-Reinforcement-Learning.html b/_site/site/2018/08/08/Imagination-Augmented-Agents-for-Deep-Reinforcement-Learning.html
new file mode 100644
index 00000000..2a9e77e7
--- /dev/null
+++ b/_site/site/2018/08/08/Imagination-Augmented-Agents-for-Deep-Reinforcement-Learning.html
@@ -0,0 +1,82 @@
+<ul>
+  <li>
+    <p>The paper presents I2A (Imagination Augmented Agent) that combines the model-based and model-free approaches leading to data efficiency and robustness even with imperfect models.</p>
+  </li>
+  <li>
+    <p>I2A agent uses the predictions from a learned environment model as an additional context in deep policy networks. This leads to improved data efficiency and robustness to imperfect models.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1707.06203">Link to the paper</a></p>
+  </li>
+  <li>
+    <p>I2A agent has two main modules - Imagination module and the Policy module.</p>
+  </li>
+  <li>
+    <p><strong>Imagination Module</strong></p>
+
+    <ul>
+      <li><strong>Environment Model</strong>
+        <ul>
+          <li>This is a recurrent model, trained in an unsupervised manner using the agent trajectories. It can be used to predict the future state given the current state and action.</li>
+          <li>The environment model can be rolled out multiple times to obtain a simulated trajectory or an “imagined” trajectory.</li>
+          <li>During each rollout, the actions are chosen using a rollout policy π<sub>r</sub>.</li>
+        </ul>
+      </li>
+      <li><strong>Rollout Encoder</strong>
+        <ul>
+          <li>A rollout encoder <em>E</em> (LSTM) is used to process the entire imagined rollout.</li>
+        </ul>
+      </li>
+      <li>The imagination module is used to generate <em>n</em> trajectories. Each trajectory is a sequence of outputs of the environment model.</li>
+      <li>These <em>n</em> trajectories are concatenated into a single “imagination” vector.</li>
+      <li>The training data for the environment model is generated from trajectories of a partially trained model-free agent.</li>
+      <li>Pretraining the environment model (instead of joint training with policy) leads to faster runtime.</li>
+    </ul>
+  </li>
+  <li>
+    <p><strong>Policy Module</strong></p>
+
+    <ul>
+      <li>This module uses the output of both model-based path and model-free path as its input. It generates the policy vector and value function.</li>
+    </ul>
+  </li>
+  <li><strong>Rollout Strategy</strong>
+    <ul>
+      <li>One rollout is performed for each possible action in the environment ie, the first action in the i<sup>th</sup> rollout is the i<sup>th</sup> action in the action set.</li>
+      <li>Subsequent actions are generated using a shared rollout policy π<sub>’</sub></li>
+      <li>An effective strategy was to create a small model-free network π<sub>’</sub>(o<sub>t</sub>) and then add a KL loss component that encourages π<sub>’</sub>(o<sub>t</sub>)to be similar to the imagination augmented policy π(o<sub>t</sub>).</li>
+    </ul>
+  </li>
+  <li><strong>Baselines</strong>
+    <ul>
+      <li>Model-free agent</li>
+      <li>Copy-model agent - same as I2A but the environment model is replaced by a “copy” model that just returns the input observations.</li>
+    </ul>
+  </li>
+  <li><strong>Environments</strong>
+    <ul>
+      <li>Sokoban
+        <ul>
+          <li>Task is to push a number of boxes onto given target locations.</li>
+          <li>I2A outperforms the baselines and gains in performance as the number of unrolling steps increases (though at a diminishing rate).</li>
+          <li>In case of poor environment models, the agent seems to be able to ignore the later part of the rollout when the error starts to accumulate.</li>
+          <li>Monte Carlo search algorithm (without an explicit rollout encoder) performed poorly as compared to the model using rollout encoder.</li>
+          <li>Predicting the reward along with value function and action seems to speed up training.</li>
+          <li>If a near-perfect model is available, I2A agent’s performance can be improved by performing Monte Carlo search with the trained I2A agent for the rollout policy. The agent plays entire episodes in simulation and tries to find a successful action sequence within 10 retries.</li>
+        </ul>
+      </li>
+      <li><strong>MiniPacman</strong>
+        <ul>
+          <li>I2A agent is evaluated to see if a single model can be used to solve multiple tasks.</li>
+          <li>A new environment is designed to define multiple tasks in an environment with shared state transitions.</li>
+          <li>Each task is specified by a 5-dimensional reward vector that associates a reward with moving, eating food, eating a pill, eating a ghost and being eaten by a ghost.</li>
+          <li>A single environment model is trained to predict both observations (frames) and events (eg “eating a pill”). This way, the environment model is shared across all tasks.</li>
+          <li>Baseline agents and I2As are trained on each task separately. I2A architecture outperforms the standard agent in all tasks and the copy-model
+baseline in all but one task.</li>
+          <li>The improvement in performance is higher for tasks where rewards are sparse and where the anticipation
+of ghost dynamics is especially important indicating that the I2A agent can use the environment model to explore the environment more effectively.</li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+</ul>
diff --git a/_site/site/2018/08/16/Hierarchical-Graph-Representation-Learning-with-Differentiable-Pooling.html b/_site/site/2018/08/16/Hierarchical-Graph-Representation-Learning-with-Differentiable-Pooling.html
new file mode 100644
index 00000000..db40fe91
--- /dev/null
+++ b/_site/site/2018/08/16/Hierarchical-Graph-Representation-Learning-with-Differentiable-Pooling.html
@@ -0,0 +1,133 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Most existing GNN (Graph Neural Network) methods are inherently flat and are unable to process the information in a hierarchical manner.</p>
+  </li>
+  <li>
+    <p>The paper proposes a differentiable graph pooling operation, DIFFPOOL, that can generate hierarchical graph representations and can be easily plugged into many GNN architectures.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1806.08804">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="key-idea">Key Idea</h2>
+
+<ul>
+  <li>
+    <p>CNNs have spatial pooling operation that allows for deep CNN architectures to operate on coarse graph representations of input images.</p>
+  </li>
+  <li>
+    <p>This notion cannot be applied as-is to graphs as they do not have a natural notion of spatial locality like images do.</p>
+  </li>
+  <li>
+    <p>DIFFPOOL attempts to resolve this problem by learning a differentiable soft-assignment at each layer which is equivalent to pooling the cluster of nodes to obtain a sparse representation.</p>
+  </li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<ul>
+  <li>
+    <p>Given a graph <em>G(A, F)</em>, where <em>A</em> is the adjacency matrix and <em>F</em> is the feature matrix.</p>
+  </li>
+  <li>
+    <p>Given a permutation invariant GNN that follows the message passing architecture. The output of this GNN can be expressed as <em>Z = GNN(A, X)</em> where <em>X</em> is the current feature matrix.</p>
+  </li>
+  <li>
+    <p>Goal is to stack <em>L</em> GNN layers on top of each other such that the <em>l<sup>th</sup></em> layer uses coarsened output from the  <em>(l-1)<sup>th</sup></em> layer.</p>
+  </li>
+  <li>
+    <p>This coarsening operation uses a cluster assignment matrix <em>S</em>.</p>
+  </li>
+  <li>
+    <p>The learned cluster assignment matrix at layer <em>l</em> is denoted at <em>S<sup>l</sup></em></p>
+  </li>
+  <li>
+    <p>Given <em>S<sup>l</sup></em>, the embedding matrix for the <em>(l+1)<sup>th</sup></em> layer is given as <em>transpose(S<sub>l</sub>)Z<sub>l</sub></em> and adjancecy matrix is given by <em>transpose(S<sub>l</sub>)A<sub>l</sub>S<sub>l</sub></em></p>
+  </li>
+  <li>
+    <p>A new GNN, called as GNN<sub>pool</sub> is used to produce the assignment matrix <em>S</em> by taking a softmax over <em>GNN<sub>pool</sub>(A<sup>l</sup>, X<sup>l</sup>)</em></p>
+  </li>
+  <li>
+    <p>As long as the GNN model is permutation invariant, the resulting DIFFPOOL model is also permutation invariant.</p>
+  </li>
+</ul>
+
+<h2 id="auxiliary-losses">Auxiliary Losses</h2>
+
+<ul>
+  <li>
+    <p>The paper uses 2 auxiliary losses to push the model away from spurious local minima early in the training.</p>
+  </li>
+  <li>
+    <p>Link prediction objective - at each layer, link prediction loss ( = A - S(transpose(S))) is minimized with the intuition that the nearby nodes should be pooled together.</p>
+  </li>
+  <li>
+    <p>Ideally, the cluster assignment for each node should be a one-hot vector so the entropy for cluster assignment per node is regularized.</p>
+  </li>
+</ul>
+
+<h2 id="baselines">Baselines</h2>
+
+<ul>
+  <li>GNN based models
+    <ul>
+      <li>GraphSage
+        <ul>
+          <li>Mean pooling</li>
+          <li>Set2Set pooling</li>
+          <li>Sort pooling</li>
+        </ul>
+      </li>
+      <li>Structure2vec</li>
+      <li>Edge conditioned filters in CNN</li>
+      <li>PatchySan</li>
+    </ul>
+  </li>
+  <li>Kernel based models
+    <ul>
+      <li>Graphlet, shortest path etc</li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="model-variants">Model Variants</h2>
+
+<ul>
+  <li>GraphSage
+    <ul>
+      <li>Mean pool + Diff pool (3 or 2 layers)</li>
+    </ul>
+  </li>
+  <li>Structure2Vec + Diffpool</li>
+  <li>Diffpool-Det
+    <ul>
+      <li>The assignment matrix <em>S</em> are generated using graph clustering algorithms.</li>
+    </ul>
+  </li>
+  <li>Diffpool-NoLP
+    <ul>
+      <li>The link prediction objective function is turned off.</li>
+    </ul>
+  </li>
+  <li>At each DiffPool layer, the number of classes is set to 25% of the number of nodes before the DiffPool layer.</li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<ul>
+  <li>
+    <p>DiffPool obtains the highest average performance across all the pooling approaches and improves upon the base GraphSage architecture by an average of around 7%.</p>
+  </li>
+  <li>
+    <p>In terms of runtime complexity, the paper reports that DiffPool does not incur any significant additional running time. But given that now there are 2 GNN models per layer, the size of the model should increase.</p>
+  </li>
+  <li>
+    <p>DiffPool can capture hierarchical community structure even when trained on just the graph classification loss.</p>
+  </li>
+  <li>
+    <p>One advantage of DiffPool is that the nodes are pooled in a non-uniform way so densely connected group of nodes would collapse into one cluster while sparsely connected nodes can retain their identity.</p>
+  </li>
+</ul>
diff --git a/_site/site/2018/08/21/A-Semantic-Loss-Function-for-Deep-Learning-with-Symbolic-Knowledge.html b/_site/site/2018/08/21/A-Semantic-Loss-Function-for-Deep-Learning-with-Symbolic-Knowledge.html
new file mode 100644
index 00000000..e2421dff
--- /dev/null
+++ b/_site/site/2018/08/21/A-Semantic-Loss-Function-for-Deep-Learning-with-Symbolic-Knowledge.html
@@ -0,0 +1,155 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper proposes an approach for using symbolic knowledge in deep learning systems. These constraints are often expressed as boolean constraints on the output of the deep learning system and directly incorporating these constraints break the differentiability of the system.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1711.11157">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="problem-setting">Problem Setting</h2>
+
+<ul>
+  <li>
+    <p>The model is given some input data to perform predictions and symbolic knowledge is provided in form of boolean constraints like exactly-one constraint for one-hot output encoding.</p>
+  </li>
+  <li>
+    <p>Most approaches tend to encode the symbolic knowledge in the vector space embedding to keep the model pipeline differentiable. In this process, the precise meaning of symbolic knowledge is often lost.</p>
+  </li>
+  <li>
+    <p>A differentiable “semantic loss” is derived which captures the meaning of the constraint while being independent of its syntax.</p>
+  </li>
+</ul>
+
+<h2 id="terminology">Terminology</h2>
+
+<ul>
+  <li>
+    <p>A state <strong>x</strong> (state refers to the instantiation of boolean variables) satisfies a sentence <em>a</em> if <em>a</em> evaluates to true when using the variables as specified by <strong>x</strong>.</p>
+  </li>
+  <li>
+    <p>A sentence <em>a</em> entails another sentence <em>b</em> if all states that satisfy <em>a</em> also satisfy <em>b</em>.</p>
+  </li>
+  <li>
+    <p>The row output vector of the neural network is denoted as <em>p</em> where each value in <em>p</em> denotes the probability of an output.</p>
+  </li>
+  <li>
+    <p>Three different output constraints are studied:</p>
+
+    <ul>
+      <li>
+        <p><em>Exactly-one constraint</em></p>
+
+        <ul>
+          <li>Exactly one value in <em>p</em> should be true.</li>
+          <li>Can be expressed in boolean logic as follows: Let (x1, x2, …, xn) be variables in <em>p</em>. Then (not xi or not xj) for all pair of variables and (x1 or x2 or … xn).</li>
+        </ul>
+      </li>
+      <li><em>Valid Simple Path Constraint</em>
+        <ul>
+          <li>Set of edges must form a valid path.</li>
+        </ul>
+      </li>
+      <li><em>Ordering Constraint</em>
+        <ul>
+          <li>Defining an ordering over the variables.</li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="semantic-loss">Semantic Loss</h2>
+
+<ul>
+  <li>
+    <p>The semantic loss <em>L<sup>s</sup>(a, p)</em> is a function of a propositional logic sentence <em>a</em> (the symbolic knowldge constraint) and <em>p</em> (output of the neural network).</p>
+  </li>
+  <li>
+    <p><em>a</em> is defined over variables (x1, …, xn) and <em>p</em> is interpreted as a vector of probabilities corresponding to these variables <em>xi’s</em>.</p>
+  </li>
+  <li>
+    <p>The semantic loss is directly proportional to the negative log likelihood of generating a state that satisfies the constraints when sampling values according to the distribution <em>p</em>.</p>
+  </li>
+</ul>
+
+<h2 id="main-axioms-and-insights">Main Axioms and Insights</h2>
+
+<ul>
+  <li><strong>Monotonicity</strong>
+    <ul>
+      <li>If a sentence <em>a</em> entails another sentence <em>b</em> then for any given <em>p</em>, <em>L<sup>s</sup>(a, p) &gt; L<sup>s</sup>(b, p)</em> ie adding more constraints cannot decrease the semantic loss.</li>
+    </ul>
+  </li>
+  <li><strong>Semantic Equivalence</strong>
+    <ul>
+      <li>If two sentences are logically equivalent, their semantic loss is the same.</li>
+    </ul>
+  </li>
+  <li><strong>Identity</strong>
+    <ul>
+      <li>For any given sentence <em>a</em>, its representation as a sentence is equivalent to its representation as a deterministic vector ie writing the “one-hot” constraint as a boolean expression is equivalent to a one-hot vector.</li>
+    </ul>
+  </li>
+  <li><strong>Satisfaction</strong>
+    <ul>
+      <li>If <em>p</em> entails the sentence <em>a</em> then <em>L<sup>s</sup>(a, p) = 0</em>.</li>
+    </ul>
+  </li>
+  <li><strong>Label-literal correspondence</strong>
+    <ul>
+      <li>When the constraint is defined in terms of a single variable, it can be interpreted as the supervised label.</li>
+      <li>Hence the semantic loss in case of a single variable should be equivalent to the cross-entropy loss.</li>
+    </ul>
+  </li>
+  <li><strong>Truth</strong>
+    <ul>
+      <li>The semantic loss of a true sentence is 0</li>
+    </ul>
+  </li>
+  <li><strong>Non-negativity</strong>
+    <ul>
+      <li>Semantic loss should always be non-negative.</li>
+    </ul>
+  </li>
+  <li>
+    <p>Probabilities of variables that are not part of the constraint, do not affect the semantic loss.</p>
+  </li>
+  <li>It can be shown that the semantic loss function satisfies all these axioms (and the other axioms specified in the paper) and is the only function to do so, up to a multiplicative constant.</li>
+</ul>
+
+<h2 id="experimental-evaluation">Experimental Evaluation</h2>
+
+<ul>
+  <li>
+    <p>Semantic Loss is used in the semi-supervised setting for Permuted MNIST, Fashion MNIST and CIFAR-10.</p>
+  </li>
+  <li>
+    <p>The key takeaway is that using semantic loss improves the performance of the state-of-the-art models for Fashion MNIST and CIFAR-10.</p>
+  </li>
+  <li>
+    <p>One downside is that the effectiveness of the semantic loss in this type of constraint strongly depends on the performance of the underlying model. Further, the semantic loss does not improve the performance in case of fully supervised scenario.</p>
+  </li>
+  <li>
+    <p>Further experiments are performed to evaluate the performance of the semantic loss on complex constraints. Since these tasks aim to highlight the effect of using semantic loss, only simple models (MLPs) are evaluated.</p>
+  </li>
+</ul>
+
+<h2 id="tractability-of-semantic-loss">Tractability of Semantic Loss</h2>
+
+<ul>
+  <li>
+    <p>The semantic loss is similar to the automated reasoning task called as weight model counting (wmc).</p>
+  </li>
+  <li>
+    <p>Circuit compiler techniques can be used to compute wmc while allowing backpropagation.</p>
+  </li>
+</ul>
+
+<h2 id="notes">Notes</h2>
+
+<ul>
+  <li>The proposed idea is simple and intuitive and the results on semi-supervised classification task are quite good. It would be interesting to extend and scale this method for more complex constraints.</li>
+</ul>
diff --git a/_site/site/2018/09/12/Emergence-of-Grounded-Compositional-Language-in-Multi-Agent-Populations.html b/_site/site/2018/09/12/Emergence-of-Grounded-Compositional-Language-in-Multi-Agent-Populations.html
new file mode 100644
index 00000000..96d69bb8
--- /dev/null
+++ b/_site/site/2018/09/12/Emergence-of-Grounded-Compositional-Language-in-Multi-Agent-Populations.html
@@ -0,0 +1,121 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper provides a multi-agent learning environment and proposes a learning approach that facilitates the emergence of a basic compositional language.</p>
+  </li>
+  <li>
+    <p>The language is quite rudimentary and is essentially a sequence of abstract discrete symbols. But it does comprise of a defined vocabulary and syntax.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1703.04908">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="setup">Setup</h2>
+
+<ul>
+  <li>
+    <p>Cooperative, partially observable Markov game (multi-agent extension of MDP).</p>
+  </li>
+  <li>
+    <p>All agents have identical action and observation spaces, use the same policy and receive a shared reward.</p>
+  </li>
+</ul>
+
+<h3 id="grounded-communication-environment">Grounded Communication Environment</h3>
+
+<ul>
+  <li>
+    <p>Physically simulated 2-D environment in continuous space and discrete time with N agents and M landmarks.</p>
+  </li>
+  <li>
+    <p>The agents and the landmarks would occupy some location and would have some attributes (colour, shape).</p>
+  </li>
+  <li>
+    <p>Within the environment, the agents can <em>go to</em> a location, <em>look</em> at a location or <em>do nothing</em>. Additionally, they can utter communication symbols c (from a shared vocabulary C). Agents themselves learn to assign a meaning to the symbols.</p>
+  </li>
+  <li>
+    <p>Each agent has an internal goal (which could require interaction with other agents to complete) which the other agents cannot see.</p>
+  </li>
+  <li>
+    <p>Goal for agent <em>i</em> consists of an action to perform, a landmark location where to perform the action and another agent who should be performing the action.</p>
+  </li>
+  <li>
+    <p>Since the agent is continuously emitting symbols, a memory module is provided and simple additive memory updates are done.</p>
+  </li>
+  <li>
+    <p>For interaction, the agents could use verbal utterances, non-verbal signals (gaze) or non-communicative strategies (pushing other agents).</p>
+  </li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<ul>
+  <li>
+    <p>A model of all agent and environment state dynamics is created over time and the return gradient is computed.</p>
+  </li>
+  <li>
+    <p>Gumbel-Softmax distribution is used to obtain categorical word emission c.</p>
+  </li>
+  <li>
+    <p>A multi-layer perceptron is used to model the policy which returns action, communication symbol and the memory update for each agent.</p>
+  </li>
+  <li>
+    <p>Since the number of agents (and hence the number of communication streams etc) can vary across instantiations, an identical model is instantiated per agent and per communication stream.</p>
+  </li>
+  <li>
+    <p>The output of individual processing modules are pooled into feature vectors corresponding to communication and physical observations. These pooled features and the goal vectors are fed to the final processing module from which actions and categorical symbols are sampled.</p>
+  </li>
+  <li>
+    <p>In practice, using an additional task (each agent predicts the goal for another agent) encouraged more meaningful communication utterances.</p>
+  </li>
+</ul>
+
+<h3 id="compositionality-and-vocabulary-size">Compositionality and Vocabulary Size</h3>
+
+<ul>
+  <li>
+    <p>Authors recommend using a large vocabulary with a soft penalty that discourages use of too many words. This leads to use of a large vocabulary in the intermediate state which converges to a small vocabulary.</p>
+  </li>
+  <li>
+    <p>Along the lines of rich gets richer dynamics, the communication symbol c’s are modelled as being generated by a Dirichlet process. The resulting reward across all agents is the log-likelihood of all communication utterances to have been generated by a Dirichlet process.</p>
+  </li>
+  <li>
+    <p>Since the agents can only communicate in discrete symbols and do not have a global positioning reference, they need to unambiguously communicate landmark references to other agents.</p>
+  </li>
+</ul>
+
+<h2 id="case-i---agents-can-not-see-each-other">Case I - Agents can not see each other</h2>
+
+<ul>
+  <li>
+    <p>Non-verbal communication is not possible.</p>
+  </li>
+  <li>
+    <p>When trained with just 2 agents, symbols are assigned for each landmark and action.</p>
+  </li>
+  <li>
+    <p>As the number of agents is increased, additional symbols are used to refer to agents.</p>
+  </li>
+  <li>
+    <p>If the agents of the same colour are asked to perform conflicting tasks, they perform the average of conflicting tasks. If distractor locations are added, the agents learn to ignore them.</p>
+  </li>
+</ul>
+
+<h2 id="non-verbal-communication">Non-verbal communication</h2>
+
+<ul>
+  <li>
+    <p>Agents are allowed to observe other agents’ position, gaze etc.</p>
+  </li>
+  <li>
+    <p>Now the location can be pointed to using gaze.</p>
+  </li>
+  <li>
+    <p>If gaze is disabled, the agent could indicate the goal landmark by moving to it.</p>
+  </li>
+  <li>
+    <p>Basically even when the communication is disabled the agents can come up with strategies to complete the task.</p>
+  </li>
+</ul>
diff --git a/_site/site/2018/09/27/HoME-a-Household-Multimodal-Environment.html b/_site/site/2018/09/27/HoME-a-Household-Multimodal-Environment.html
new file mode 100644
index 00000000..a5964df2
--- /dev/null
+++ b/_site/site/2018/09/27/HoME-a-Household-Multimodal-Environment.html
@@ -0,0 +1,103 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Environment for learning using modalities like vision, audio, semantics, physics and interaction with objects and other agents.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1711.11017">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="motivation">Motivation</h2>
+
+<ul>
+  <li>
+    <p>Humans learn by interacting with their surroundings (environment).</p>
+  </li>
+  <li>
+    <p>Similarly training an agent in an interactive multi-model environment (virtual embodiment) could be useful for a learning agent.</p>
+  </li>
+</ul>
+
+<h2 id="characteristics">Characteristics</h2>
+
+<ul>
+  <li>
+    <p>Open-source and Open-AI gym compatible</p>
+  </li>
+  <li>
+    <p>Built on top of 45000 3D house layouts from SUNCG dataset.</p>
+  </li>
+  <li>
+    <p>Provides both 3D visual and audio recording.</p>
+  </li>
+  <li>
+    <p>Semantic image segmentation and langauge description of objects.</p>
+  </li>
+</ul>
+
+<h2 id="components">Components</h2>
+
+<ul>
+  <li>
+    <p>Rendering Engine</p>
+
+    <ul>
+      <li>
+        <p>Implemented using Panda 3D game engine.</p>
+      </li>
+      <li>
+        <p>Renders RGB+depth scenes based on textures, multi-source lightings and shadows.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Acoustic Engine</p>
+
+    <ul>
+      <li>
+        <p>Implemented using EVERT</p>
+      </li>
+      <li>
+        <p>Supports multiple microphones, sound sources, sound absorption based on material, atmospheric conditions etc.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Semantics Engine</p>
+
+    <ul>
+      <li>Provides a short textual description for each object, along with information like color, category, material size, location etc.</li>
+    </ul>
+  </li>
+  <li>
+    <p>Physics Engine</p>
+
+    <ul>
+      <li>
+        <p>Implemented using Bullet3 Engine</p>
+      </li>
+      <li>
+        <p>Supports physical interaction, external forces like gravity and position and velocity information for multiple agents.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="potential-applications">Potential Applications</h2>
+
+<ul>
+  <li>
+    <p>Visual Question Answering</p>
+  </li>
+  <li>
+    <p>Conversational Agents</p>
+  </li>
+  <li>
+    <p>Training an agent to follow instructions</p>
+  </li>
+  <li>
+    <p>Multi-agent communication</p>
+  </li>
+</ul>
diff --git a/_site/site/2018/10/04/When-Recurrent-Models-Don-t-Need-To-Be-Recurrent.html b/_site/site/2018/10/04/When-Recurrent-Models-Don-t-Need-To-Be-Recurrent.html
new file mode 100644
index 00000000..4f9bbcbf
--- /dev/null
+++ b/_site/site/2018/10/04/When-Recurrent-Models-Don-t-Need-To-Be-Recurrent.html
@@ -0,0 +1,62 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper explores “if a well behaved RNN can be replaced by a feed-forward network of comparable size without loss in performance.”</p>
+  </li>
+  <li>
+    <p>“Well behaved” is defined in terms of control-theoretic notion of stability. This roughly requires that the gradients do not explode over time.</p>
+  </li>
+  <li>
+    <p>The paper shows that under the stability assumption, feedforward networks can approximate RNNs for both training and inference. The results are empirically validated as well.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1805.10369">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="problem-setting">Problem Setting</h2>
+
+<ul>
+  <li>
+    <p>Consider a general, non linear dynamical system given by a differential state transition map Φ<sub>w</sub>. The hidden h<sub>t</sub> = Φ<sub>w</sub>(h<sub>t-1</sub>, x<sub>t</sub>).</p>
+  </li>
+  <li>
+    <p>Assumptions:</p>
+
+    <ul>
+      <li>Φ is smooth in w and h.</li>
+      <li>h<sub>0</sub> = 0</li>
+      <li>Φ<sub>w</sub>(0, 0) = 0 (can be ensured by translation)</li>
+    </ul>
+  </li>
+  <li>
+    <p>Stable models are the ones where Φ is contractive ie Φ<sub>w</sub>(h, x) - Φ<sub>w</sub>(h’, x) is less than Λ * (h - h’)</p>
+  </li>
+  <li>
+    <p>For example, in RNN, stability would require that norm(w) is less than (L<sub>p</sub>)<sup>-1</sup> where L<sub>p</sub> is the Lipschitz constant of the point-wise non linearity used.</p>
+  </li>
+  <li>
+    <p>The feedforward approximation uses a finite context (of length k) and is a truncated model.</p>
+  </li>
+  <li>
+    <p>A non-parametric function f maps the output of the recurrent model to prediction. If f is desired to be a parametric model, its parameters can be pushed to the recurrent model.</p>
+  </li>
+</ul>
+
+<h2 id="theoretical-results">Theoretical Results</h2>
+
+<ul>
+  <li>
+    <p>For a Λ-contractive system, it can be proved that for a large k (and additional Lipschitz assumptions) the difference in prediction between the recurrent and truncated mode is negligible.</p>
+  </li>
+  <li>
+    <p>If the recurrent model and truncated feed-forward network are initialized at the same point and trained over the same input for N-step, then for an optimal k, the weights of the two models would be very close in the Euclidean space. It can be shown that this small difference does not lead to large gradient differences during subsequent update steps.</p>
+  </li>
+  <li>
+    <p>This can be roughly interpreted as - if the gradient descent can train a stable recurrent network, it can also train a feedforward model and vice-versa.</p>
+  </li>
+  <li>
+    <p>The stability condition is important as, without that, truncated models would be bad (even for large values of k). Further, it is difficult to show that gradient descent converges to a stationary point.</p>
+  </li>
+</ul>
diff --git a/_site/site/2018/10/11/Poincare-Embeddings-for-Learning-Hierarchical-Representations.html b/_site/site/2018/10/11/Poincare-Embeddings-for-Learning-Hierarchical-Representations.html
new file mode 100644
index 00000000..7ac9211d
--- /dev/null
+++ b/_site/site/2018/10/11/Poincare-Embeddings-for-Learning-Hierarchical-Representations.html
@@ -0,0 +1,130 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Much of the work in representation leaning uses Euclidean vector spaces to embed datapoints (like words, nodes, entities etc).</p>
+  </li>
+  <li>
+    <p>This approach is not effective when data has a (latent) hierarchical structure.</p>
+  </li>
+  <li>
+    <p>The paper proposes to compute the embeddings in the hyperbolic space so as to preserve both the similarity and structure information.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/pdf/1705.08039.pdf">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="hyperbolic-geometry">Hyperbolic Geometry</h2>
+
+<ul>
+  <li>
+    <p>Hyperbolic spaces are spaces with a constant negative curvature while Euclidean spaces have zero curvature.</p>
+  </li>
+  <li>
+    <p>The hyperbolic disc area and circle length increase exponentially with the radius r while in Euclidean space, it increases quadratically and linearly respectively.</p>
+  </li>
+  <li>
+    <p>This makes the hyperbolic space more suitable for embedding tree-like structures where the number of nodes increases as we move away from the root.</p>
+  </li>
+  <li>
+    <p>Hyperbolic spaces can be thought of as the continuous version of trees and trees can be thought of as the discrete version of hyperbolic spaces.</p>
+  </li>
+</ul>
+
+<h2 id="poincare-embeddings">Poincare Embeddings</h2>
+
+<ul>
+  <li>
+    <p>Poincare model is one of the several possible models of the hyperbolic space and is considered here as it is more amenable to gradient-based optimisation.</p>
+  </li>
+  <li>
+    <p>Distance between 2 pints change smoothly and is symmetric. Thus the hierarchical organisation only depends on the distance from the origin which makes the model applicable in settings where the hierarchical structure needs to be inferred from the data.</p>
+  </li>
+  <li>
+    <p>Eventually the norm of a point represents its hierarchy and distance between the points represents similarity.</p>
+  </li>
+</ul>
+
+<h2 id="optimization">Optimization</h2>
+
+<ul>
+  <li>RSGD (Riemannian SGD) method is used.</li>
+  <li>Riemannian gradients can be computed from the Euclidean gradients by rescaling with the inverse of the Poincare ball metric tensor.</li>
+  <li>The embeddings are constrained to be within the Poincare ball by projection operation which normalizes the magnitude of embeddings to be 1.</li>
+</ul>
+
+<h2 id="training-details">Training Details</h2>
+
+<ul>
+  <li>Initializing the embeddings close to 0 (by sampling uniformly from (-0.001, 0.001)) helps.</li>
+  <li>The model is trained for an initial burn-out period of 10 epochs with 0.1 times the learning rate so as to find a better initial angular layout.</li>
+</ul>
+
+<h2 id="evaluation">Evaluation</h2>
+
+<ul>
+  <li>
+    <p>Embedding taxonomy for wordnet task</p>
+
+    <ul>
+      <li>
+        <p>Setup</p>
+
+        <ul>
+          <li>Reconstruction</li>
+          <li>Link Prediction</li>
+        </ul>
+      </li>
+      <li>
+        <p>The input data is a collection of a pair of words (u, v) which are related to each other.</p>
+      </li>
+      <li>
+        <p>For each word pair, 10 negative samples of the form (u, v’) are sampled and the training procedure uses a soft ranking loss that aims to bring the related objects closer together.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Network Embedding</p>
+
+    <ul>
+      <li>
+        <p>Baselines</p>
+
+        <ul>
+          <li>Euclidean Embeddings</li>
+          <li>Translational Embedding where a relation vector corresponding to the edge type is also learnt.</li>
+        </ul>
+      </li>
+      <li>
+        <p>Datasets</p>
+
+        <ul>
+          <li>ASTROPH</li>
+          <li>CONDMAT</li>
+          <li>GRQC</li>
+          <li>HEPPH</li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Lexical Entailment</p>
+  </li>
+</ul>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>* Hyperlex - Gold standard to evaluate how well the semantics models capture lexical entailment on a scale of [0, 10].
+
+* The key takeaway is that for all the datasets/setups, hyperbolic embeddings give a performance benefit when the embedding dimension is small.
+</code></pre></div></div>
+
+<h2 id="challenges">Challenges</h2>
+
+<ul>
+  <li>
+    <p>Hyperbolic embeddings are not suitable for all the datasets. Eg if the dataset is not tree-like or has cycles.</p>
+  </li>
+  <li>
+    <p>Hyperbolic embeddings are difficult to optimize as each operation needs to be modified to be usable in the hyperbolic space.</p>
+  </li>
+</ul>
diff --git a/_site/site/2018/10/18/BabyAI-First-Steps-Towards-Grounded-Language-Learning-With-a-Human-In-the-Loop.html b/_site/site/2018/10/18/BabyAI-First-Steps-Towards-Grounded-Language-Learning-With-a-Human-In-the-Loop.html
new file mode 100644
index 00000000..b9642c52
--- /dev/null
+++ b/_site/site/2018/10/18/BabyAI-First-Steps-Towards-Grounded-Language-Learning-With-a-Human-In-the-Loop.html
@@ -0,0 +1,154 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>BabyAI is a research platform to investigate and support the feasibility of including humans in the loop for grounded language learning.</p>
+  </li>
+  <li>
+    <p>The setup is a series of levels (of increasing difficulty) to train the agent to acquire a synthetic language (Baby Language) which is a proper subset of English language.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1810.08272">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="motivation">Motivation</h2>
+
+<ul>
+  <li>
+    <p>BabyAI platform provides support for curriculum learning and interactive learning as part of its human-in-the-loop training setup.</p>
+  </li>
+  <li>
+    <p>Curriculum learning is incorporated by having a curriculum of levels of increasing difficulty.</p>
+  </li>
+  <li>
+    <p>Interactive learning is supported by including a heuristic expert which can provide new demonstrations on the fly to the learning agent.</p>
+  </li>
+  <li>
+    <p>The heuristic expert can be thought of as the human-in-the-loop which can guide the agent through the learning process.</p>
+  </li>
+  <li>
+    <p>One downside of human-in-the-loop is the poor sample complexity of the learning agent. The heuristic agent can be used to estimate the sample  efficiency.</p>
+  </li>
+</ul>
+
+<h2 id="contribution">Contribution</h2>
+
+<ul>
+  <li>
+    <p>BabyAI research platform for grounded language learning with a simulated human-in-the-loop.</p>
+  </li>
+  <li>
+    <p>Baseline results for performance and sample efficiency for the different tasks.</p>
+  </li>
+</ul>
+
+<h2 id="babyai-platform">BabyAI Platform</h2>
+
+<h3 id="environment">Environment</h3>
+
+<ul>
+  <li>
+    <p>MiniGrid - A partially observable 2D grid-world environment.</p>
+  </li>
+  <li>
+    <p>Entities - Agent, ball, box, door, keys</p>
+  </li>
+  <li>
+    <p>Actions - pick, drop or move objects, unlock doors etc.</p>
+  </li>
+</ul>
+
+<h3 id="baby-language">Baby Language</h3>
+
+<ul>
+  <li>
+    <p>Synthetic Language (a proper subset of English) - Used to give instructions to the agent</p>
+  </li>
+  <li>
+    <p>Support for verifying if the task (and the subtasks) are completed or not</p>
+  </li>
+</ul>
+
+<h3 id="levels">Levels</h3>
+
+<ul>
+  <li>
+    <p>A level is an instruction-following task.</p>
+  </li>
+  <li>
+    <p>Formally, a level is a distribution of missions - a combination of initial state of the environment and an instruction (in Baby Language)</p>
+  </li>
+  <li>
+    <p>Motivated by curriculum learning, the authors create a series of tasks (with increasing difficulty).</p>
+  </li>
+  <li>
+    <p>A subset of skills (competencies) is required for solving each task. The platform takes into account this constraint when creating a level.</p>
+  </li>
+</ul>
+
+<h3 id="heuristic-expert">Heuristic Expert</h3>
+
+<ul>
+  <li>
+    <p>The platform supports a Heuristic expert that simulates the role of a human teacher and knows how to solve each task.</p>
+  </li>
+  <li>
+    <p>For any level, it can suggest actions or generate demonstrations (given the state of the environment).</p>
+  </li>
+</ul>
+
+<h2 id="experiment">Experiment</h2>
+
+<ul>
+  <li>
+    <p>An imitation learning baseline is trained for each level.</p>
+  </li>
+  <li>
+    <p>Data requirement for each level and the benefits of curriculum learning and imitation learning are investigated (in terms of sample efficiency).</p>
+  </li>
+</ul>
+
+<h2 id="model-architecture">Model Architecture</h2>
+
+<ul>
+  <li>
+    <p>GRU to encode the sentence, CNN to encode the input observation</p>
+  </li>
+  <li>
+    <p>FiLM layer to combine the two representations</p>
+  </li>
+  <li>
+    <p>LSTM to encode the per-timestep FiLM encoding (timesteps in the environment)</p>
+  </li>
+  <li>
+    <p>Two model variants are considered:</p>
+
+    <ul>
+      <li>
+        <p>Large Model - Bidirectional GRU + attention + large hidden state</p>
+      </li>
+      <li>
+        <p>Small Model - Unidirectional GRU + No attention + small hidden state</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Heuristic expert used to generate trajectory and the models are trained by imitation learning (to be used as baselines)</p>
+  </li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<ul>
+  <li>
+    <p>The key takeaway is that the current deep learning approaches are extremely sample inefficient when learning a compositional language.</p>
+  </li>
+  <li>
+    <p>Data efficiency of RL methods is much worse than that of imitation learning methods showing that the current imitation learning and reinforcement learning methods scale and generalize poorly.</p>
+  </li>
+  <li>
+    <p>Curriculum-based pretraining and interactive learning was found to be useful in only some cases.</p>
+  </li>
+</ul>
+
diff --git a/_site/site/2018/10/25/One-shot-Learning-with-Memory-Augmented-Neural-Networks.html b/_site/site/2018/10/25/One-shot-Learning-with-Memory-Augmented-Neural-Networks.html
new file mode 100644
index 00000000..2659a3d5
--- /dev/null
+++ b/_site/site/2018/10/25/One-shot-Learning-with-Memory-Augmented-Neural-Networks.html
@@ -0,0 +1,121 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper demonstrates that Memory Augmented Neural Networks (MANN) are suitable for one-shot learning by introducing a new method for accessing an external memory.</p>
+  </li>
+  <li>
+    <p>This method focuses on memory content while earlier methods additionally used memory location based focusing mechanisms.</p>
+  </li>
+  <li>
+    <p>Here, MANN refers to neural networks that have an external memory. This includes Neural Turning Machines (NTMs) and excludes LSTMs.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1605.06065">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="meta-learning">Meta-Learning</h2>
+
+<ul>
+  <li>
+    <p>In meta-learning, a learner is learning at two levels.</p>
+  </li>
+  <li>
+    <p>The learner is shown a sequence of tasks D<sub>1</sub>, D<sub>2</sub>, …, D<sub>T</sub>.</p>
+  </li>
+  <li>
+    <p>When it is training on one of the datasets (say D<sub>T</sub>), it learns to solve the current dataset.</p>
+  </li>
+  <li>
+    <p>At the same time, the learner tries to incorporate knowledge about how task structure changes across different datasets (second level of learning).</p>
+  </li>
+</ul>
+
+<h2 id="mann--meta-learning">MANN + Meta Learning</h2>
+
+<ul>
+  <li>
+    <p>Following are the desirable characteristics for a scalable, combined architecture:</p>
+
+    <ul>
+      <li>
+        <p>Memory representation should be both stable and element-wise accessible.</p>
+      </li>
+      <li>
+        <p>Number of model parameters should not be tied to the size of the memory.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="task-setup">Task Setup</h2>
+
+<ul>
+  <li>
+    <p>In standard learning, the goal is to reduce error on some dataset D. In meta-learning, the goal is to reduce the error across a distribution of datasets p(D).</p>
+  </li>
+  <li>
+    <p>Each dataset is presented to the model in the form (x<sub>1</sub>, null), (x<sub>1</sub>, y<sub>0</sub>), …, (x<sub>t+1</sub>, y<sub>t</sub>) where y<sub>t</sub> is the correct label (or value) corresponding to the inpuit x<sub>t</sub>.</p>
+  </li>
+  <li>
+    <p>Further, the data labels are shuffled from dataset to dataset.</p>
+  </li>
+  <li>
+    <p>The model must learn to hold the data samples in memory till the appropriate candidate labels are presented in the next step.</p>
+  </li>
+  <li>
+    <p>The idea is that a model that meta learns would learn to map data representation to correct labels regardless of the actual context of data representation or the label.</p>
+  </li>
+  <li>
+    <p>The paper uses NTM as the MANN with one modification.</p>
+  </li>
+  <li>
+    <p>In the original formulation, the memories were addressed by both context and location. Location-based addressing is not optimal for the current setup where information encoding is not independent of the sequence.</p>
+  </li>
+  <li>
+    <p>A new access module - LRUA - Least Recent Used Access - is used to write to memory.</p>
+  </li>
+  <li>
+    <p>LRUA is purely content-based and writes to either least used memory location (to preserve recent information) or most recently used memory location (to overwrite recent information with more relevant information). This is decided on the basis of interpolation between previous read weights and weights scaled according to the usage weight.</p>
+  </li>
+</ul>
+
+<h2 id="datasets">Datasets</h2>
+
+<ul>
+  <li>
+    <p>Omniglot (classification)</p>
+  </li>
+  <li>
+    <p>Sampled functions from Gaussian Processes</p>
+  </li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<ul>
+  <li>
+    <p>For the omniglot dataset, the model was trained with various combinations of randomly chosen classes with randomly chosen labels.</p>
+  </li>
+  <li>
+    <p>As baselines, following models were considered:</p>
+
+    <ul>
+      <li>Regular NTM</li>
+      <li>LSTM</li>
+      <li>Feedforward RNN</li>
+      <li>Nearest Neighbour Classifier</li>
+    </ul>
+  </li>
+  <li>
+    <p>Since each episode (dataset created by the combination of classes) contains unique classes (with their own unique labels) it is important to clear the memory across different episodes.</p>
+  </li>
+  <li>
+    <p>For the regression task, the data was generated from a GP prior with a fixed set of hyper-parameters which resulted in different functions.</p>
+  </li>
+  <li>
+    <p>For both the tasks, the MANN architecture outperforms the LSTM architecture baseline NTMs.</p>
+  </li>
+</ul>
+
diff --git a/_site/site/2018/11/01/Learned-Optimizers-that-Scale-and-Generalize.html b/_site/site/2018/11/01/Learned-Optimizers-that-Scale-and-Generalize.html
new file mode 100644
index 00000000..4dd86fa6
--- /dev/null
+++ b/_site/site/2018/11/01/Learned-Optimizers-that-Scale-and-Generalize.html
@@ -0,0 +1,70 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper introduces a learned gradient descent optimizer that has low memory and computational overhead and that generalizes well to new tasks.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1703.04813">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="key-advantage">Key Advantage</h2>
+
+<ul>
+  <li>
+    <p>Uses a hierarchial RNN architecture augmented by features like adapted input an output scaling, momentum etc.</p>
+  </li>
+  <li>
+    <p>A meta-learning set of small diverse optimization tasks, with diverse loss landscapes is developed. The learnt optimizer generalizes to much more complex tasks and setups.</p>
+  </li>
+</ul>
+
+<h2 id="architecture">Architecture</h2>
+
+<ul>
+  <li>
+    <p>A hierarchical RNN is designed to act as a learned optimizer. This RNN is the meta-learner and its parameters are shared across different tasks.</p>
+  </li>
+  <li>
+    <p>The learned optimizer takes as input the gradient (and related metadata) for each parameter and outputs the update to the parameters.</p>
+  </li>
+  <li>
+    <p>At the lowest level of hierarchical, a small “parameter RNN” ingests the gradient (and related metadata).</p>
+  </li>
+  <li>
+    <p>One level up, an intermediate “Tensor RNN” incorporates information from a subset of Parameter RNNS (eg one Tensor RNN per layer of feedforward network).</p>
+  </li>
+  <li>
+    <p>At the highest level is the glocal RNN which receives input from all the Tensor RNNs and can keep track of weight updates across the task.</p>
+  </li>
+  <li>
+    <p>the input of each RNN is averaged and fed as input to the subsequent RNN and the output of each RNN is fed as bias to the previous RNN.</p>
+  </li>
+  <li>
+    <p>In practice, the hidden states are fixed at 10, 30 and 20 respectively.</p>
+  </li>
+</ul>
+
+<h2 id="features-inspired-from-existing-optimizers">Features inspired from existing optimizers</h2>
+
+<ul>
+  <li>
+    <p>Attention and Nesterov’s momentum</p>
+
+    <ul>
+      <li>
+        <p>Attention mechanism is incorporated by attending to new regions of the loss surface (which are an offset from previous parameter location).</p>
+      </li>
+      <li>
+        <p>To incorporate momentum on multiple timescales, the exponential moving average of the gradient at several timescales is also provided as input.</p>
+      </li>
+      <li>
+        <p>The average gradients are rescaled (as in RMSProp and Adam)</p>
+      </li>
+      <li>
+        <p>Relative log gradient magnitudes are also provided as input so that the optimizer can access how the gradient magnitude changes with time.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
diff --git a/_site/site/2018/12/11/Representation-Tradeoffs-for-Hyperbolic-Embeddings.html b/_site/site/2018/12/11/Representation-Tradeoffs-for-Hyperbolic-Embeddings.html
new file mode 100644
index 00000000..93291e9d
--- /dev/null
+++ b/_site/site/2018/12/11/Representation-Tradeoffs-for-Hyperbolic-Embeddings.html
@@ -0,0 +1,181 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper describes a combinatorial approach to embed trees into hyperbolic spaces without performing optimization.</p>
+  </li>
+  <li>
+    <p>The resulting mechanism is analyzed to obtain dimensionality-precision tradeoffs.</p>
+  </li>
+  <li>
+    <p>To embed any metric spaces in the hyperbolic spaces, a hyperbolic generalization of the multidimensional scaling (h-MDS) is proposed.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1804.03329">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="preliminaries">Preliminaries</h2>
+
+<ul>
+  <li>
+    <p>Hyperbolic Spaces</p>
+
+    <ul>
+      <li>
+        <p>Have the “tree” like property ie the shortest path between a pair of points is almost the same as the path through the origin.</p>
+      </li>
+      <li>
+        <p>Generally, Poincare ball model is used given its advantages like conformity to the Euclidean spaces.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Fidelity Measures</p>
+
+    <ul>
+      <li>
+        <p>Mean Average Precision - MAP</p>
+
+        <ul>
+          <li>A local metric that ranks between distances of the immediate neighbors.</li>
+        </ul>
+      </li>
+      <li>
+        <p>Distortion</p>
+
+        <ul>
+          <li>A global metric that depends on the underlying distances and not just the local relationship between distances.</li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="combinatorial-construction-for-embedding-hierarchies-into-hyperbolic-spaces">Combinatorial Construction for embedding hierarchies into Hyperbolic spaces</h2>
+
+<ul>
+  <li>
+    <p>Embed the given graph <em>G = (V, E)</em> into a tree <em>T</em>.</p>
+  </li>
+  <li>
+    <p>Embed the tree <em>T</em> into the poincare ball <em>H<sub>d</sub></em> of dimensionality <em>d</em>.</p>
+  </li>
+</ul>
+
+<h3 id="sarkars-construction-to-embed-points-in-a-2-d-poincare-ball">Sarkar’s construction to embed points in a 2-d Poincare ball</h3>
+
+<ul>
+  <li>
+    <p>Consider two points <em>a</em> and <em>b</em> (from the tree) where <em>b</em> is the parent of <em>a</em>.</p>
+  </li>
+  <li>
+    <p>Assume that <em>a</em> is embedded as <em>f(a)</em> and <em>b</em> is embedded as <em>f(b)</em> and the children of <em>a</em> needs to be embedded.</p>
+  </li>
+  <li>
+    <p>Reflect <em>f(a)</em> and <em>f(b)</em> across a geodesic such that <em>f(a)</em> is mapped to 0 (origin) while <em>f(b)</em> is mapped to some new point <em>z</em>.</p>
+  </li>
+  <li>
+    <p>Children of <em>a</em> are placed at points <em>y<sub>i</sub></em> which are equally placed around a circle of radius <em>(e<sup>r</sup> - 1) / (e<sup>r</sup> + 1)</em> and maximally seperated from <em>z</em>, where <em>r</em> is the scaling factor.</p>
+  </li>
+  <li>
+    <p>Then all the points are reflected back across the geodesic so that all children are at a distance <em>r</em> from <em>f(a)</em>.</p>
+  </li>
+  <li>
+    <p>To embed the tree itself, place the root node at the origin, place its children around it in a circle, then place their children and so on.</p>
+  </li>
+  <li>
+    <p>In this construct, precision scales logarithmically with the degree of the tree but linearly with the maximum path length.</p>
+  </li>
+</ul>
+
+<h3 id="d-dimensional-hyperbolic-spaces"><em>d</em>-dimensional hyperbolic spaces</h3>
+
+<ul>
+  <li>
+    <p>In the <em>d</em>-dimensional space, the points are embedded into hyperspheres (instead of circles).</p>
+  </li>
+  <li>
+    <p>The number of children node that can be placed for a particular angle grows with the dimension.</p>
+  </li>
+  <li>
+    <p>Increasing dimension helps with bushy trees (with high node degree).</p>
+  </li>
+</ul>
+
+<h2 id="hyperbolic-multidimensional-scaling-h-mds">Hyperbolic multidimensional scaling (h-MDS)</h2>
+
+<ul>
+  <li>
+    <p>Given the pairwise distance from a set of points in the hyperbolic space, how to recover the points?</p>
+  </li>
+  <li>
+    <p>The corresponding problem in the Euclidean space is solved using MDS.</p>
+  </li>
+  <li>
+    <p>A variant of MDS called as h-MDS is proposed.</p>
+  </li>
+  <li>
+    <p>MDS makes a centering assumption that points have 0 mean. In h-MDS, a new mean (called as the pseudo-Euclidean mean) is introduced to enable recovery via matrix factorization.</p>
+  </li>
+  <li>
+    <p>Instead of the Poincare model, the hyperboloid model is used (though the points can be mapped back and forth).</p>
+  </li>
+</ul>
+
+<h3 id="pseudo-euclidean-mean">pseudo-Euclidean Mean</h3>
+
+<ul>
+  <li>A set of points can always be centered without affecting their pairwise distance by simply finding their mean and sending it to 0 via isometry</li>
+</ul>
+
+<h3 id="recovery-via-matrix-factorization">Recovery via matrix factorization</h3>
+
+<ul>
+  <li>
+    <p>Given the pairwise distances, a new matrix <em>Y</em> is constructed by applying <em>cosh</em> on the pairwise distances.</p>
+  </li>
+  <li>
+    <p>Running PCA on <em>-Y</em> recovers X up to rotation.</p>
+  </li>
+</ul>
+
+<h2 id="dimensionality-reduction-with-pga-principal-geodesic-analysis">Dimensionality Reduction with PGA (Principal Geodesic Analysis)</h2>
+
+<ul>
+  <li>
+    <p>PGA is the counterpart of PCA in the hyperbolic spaces.</p>
+  </li>
+  <li>
+    <p>First the <em>Karcher</em> mean of the given points is computed.</p>
+  </li>
+  <li>
+    <p>All points <em>x<sub>i</sub></em> are reflected so that their mean is 0 in the Poincare disk model.</p>
+  </li>
+  <li>
+    <p>Combining that with Euclidean reflection formula and hyperbolic metrics leads to a non-convex loss function which can be optimized using gradient descent algorithm.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>Datasets</p>
+
+    <ul>
+      <li>Trees: fully balanced and phylogenic trees expressing genetic heritage.</li>
+      <li>Tree-like hierarchy: WordNet hypernym and graph of Ph.D. advisor-advisee relationships.</li>
+      <li>No-tree like disease relationships, proteins interactions etc</li>
+    </ul>
+  </li>
+  <li>
+    <p>Results</p>
+
+    <ul>
+      <li>Combinatorial construction outperforms approaches based on optimization in terms of both MAP and distortion.</li>
+      <li>eg on WordNet, the combinatorial approach achieves a MAP of 0.989 with just 2 dimensions while the previous best was 0.87 with 200 dimensions.</li>
+    </ul>
+  </li>
+</ul>
+
diff --git a/_site/site/2018/12/18/Hindsight-Experience-Replay.html b/_site/site/2018/12/18/Hindsight-Experience-Replay.html
new file mode 100644
index 00000000..027f438c
--- /dev/null
+++ b/_site/site/2018/12/18/Hindsight-Experience-Replay.html
@@ -0,0 +1,76 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Hindsight Experience Replay(HER) is a sample efficient technique to learn from sparse rewards.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1707.01495">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="idea">Idea</h2>
+
+<ul>
+  <li>
+    <p>Assume a footballer misses the goal narrowly. Even though the player does not get any “reward”(in terms of goal), the player realizes that had the goal post been shifted a bit, it would have resulted in a goal(reward).</p>
+  </li>
+  <li>
+    <p>The same intuition is applied for the RL agent - let us say that the true goal state was <em>g</em> while the agent ends up in the state <em>s</em>.</p>
+  </li>
+  <li>
+    <p>While the action sequence is not useful for reaching the goal state <em>g</em>, it is indeed useful for reaching state <em>s</em>. Hence the trajectory could be replayed with the goal as <em>s</em>(and not <em>g</em>).</p>
+  </li>
+</ul>
+
+<h2 id="technical-details">Technical Details</h2>
+
+<ul>
+  <li>
+    <p>Multi-goal policy trained using Universal Value Function Approximation (UVFA).</p>
+  </li>
+  <li>
+    <p>Every episode starts by sampling a start state and a goal state. Each goal has a different reward function.</p>
+  </li>
+  <li>
+    <p>Policy uses both the current state and the current goal state and leads to a state transition sequence <em>s<sub>1</sub>, s<sub>2</sub>,…, s<sub>n</sub></em>.</p>
+  </li>
+  <li>
+    <p>Each of these transitions <em>s<sub>i</sub> -&gt; s<sub>i+1</sub></em> are stored in a buffer with both the original goal and a subset of the other goals.</p>
+  </li>
+  <li>
+    <p>For the goal selection, following strategies are tried:</p>
+
+    <ul>
+      <li>
+        <p><em>Future</em> - goal state is the state <em>k</em> steps after observing the state transition.</p>
+      </li>
+      <li>
+        <p><em>Final</em> - goal state is the final state of the current episode.</p>
+      </li>
+      <li>
+        <p><em>Episode</em> - <em>k</em> random states are selected from the current episode.</p>
+      </li>
+      <li>
+        <p><em>Randon</em> - <em>k</em> states are selected randomly.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Any off-policy algorithm can be used. Specifically, DDPG is used.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>Robotic arm simulated using MuJoCo for <em>push</em>, <em>slide</em> and <em>pick and place</em> tasks.</p>
+  </li>
+  <li>
+    <p>DDPG with and without HER evaluated on the 3 tasks.</p>
+  </li>
+  <li>
+    <p>DDPG with the HER variant significantly outperforms the baseline in all the cases.</p>
+  </li>
+</ul>
diff --git a/_site/site/2018/12/25/Smooth-Loss-Functions-for-Deep-Top-k-Classification.html b/_site/site/2018/12/25/Smooth-Loss-Functions-for-Deep-Top-k-Classification.html
new file mode 100644
index 00000000..25cf763b
--- /dev/null
+++ b/_site/site/2018/12/25/Smooth-Loss-Functions-for-Deep-Top-k-Classification.html
@@ -0,0 +1,117 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>For top-k classification tasks, cross entropy is widely used as the learning objective even though it is the optimal metric only in the limit of infinite data.</p>
+  </li>
+  <li>
+    <p>The paper introduces a family of smoothed loss functions that are specially designed for top-k optimization.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1802.07595">Paper</a></p>
+  </li>
+  <li>
+    <p><a href="https://github.com/oval-group/smooth-topk">Code</a></p>
+  </li>
+</ul>
+
+<h2 id="idea">Idea</h2>
+
+<ul>
+  <li>Inspired by the multi-loss SVMs, a surrogate loss (l<sub>k</sub>) is introduced that creates a margin between the ground truth and the kth largest score.</li>
+</ul>
+
+<p><img src="https://github.com/shagunsodhani/papers-I-read/raw/master/assets/topk/eq1.png" alt="Equation 1" /></p>
+
+<ul>
+  <li>
+    <p>Here <strong>s</strong> denotes the output of the classifier model to be learnt, <em>y</em> is the ground truth label, <em>s[p]</em> denotes the kth largest element of <strong>s</strong> and <strong>s\p</strong> denotes the vector <strong>s</strong> without <em>p</em>th element.</p>
+  </li>
+  <li>
+    <p>This l<sub>k</sub> loss has two limitations:</p>
+
+    <ul>
+      <li>
+        <p>It is continous but not differentiable in <em>s</em>.</p>
+      </li>
+      <li>
+        <p>Its weak derivatives have at most 2-nonzero elements.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>The loss can be reformulated by adding and subtracting the k-1 largest scores of <strong>s\y</strong> and <em>s<sub>y</sub></em> and by introducing a temperature parameter τ.</p>
+  </li>
+</ul>
+
+<p><img src="https://github.com/shagunsodhani/papers-I-read/raw/master/assets/topk/eq2.png" alt="Equation 2" /></p>
+
+<h2 id="properties-of-lkτ">Properties of L<sub>kτ</sub></h2>
+
+<ul>
+  <li>
+    <p>For any τ &gt; 0, L<sub>kτ</sub> is infinite-differentiable and has non-sparse gradients.</p>
+  </li>
+  <li>
+    <p>Under mild conditions, L<sub>kτ</sub> apporachs l<sub>k</sub> (in a pointwise sense) as τ approaches to 0+<sup>+</sup>.</p>
+  </li>
+  <li>
+    <p>It is an upper bound on the actual loss (up to a constant factor).</p>
+  </li>
+  <li>
+    <p>It is a generalization of the cross-entropy loss for different values of k, and τ and higher margins.</p>
+  </li>
+</ul>
+
+<h2 id="computational-challenges">Computational Challenges</h2>
+
+<ul>
+  <li>
+    <p><em>nCk</em> number of terms needs to be evaluated for computing the loss for one sample (n is number of classes).</p>
+  </li>
+  <li>
+    <p>Loss L<sub>kτ</sub> can be expressed in terms of elementary symmetric polynomials σ<sub>i</sub>(<strong>e</strong>) (sum of all products of i distinct elements of vector e). Thus the challenge is to compute σ<sub>k</sub> efficiently.</p>
+  </li>
+</ul>
+
+<h3 id="forward-computation">Forward Computation</h3>
+
+<ul>
+  <li>
+    <p>Compute σ<sub>k</sub>(<strong>e</strong>) where <strong>e</strong> is a n-dimensional vector and k« n and e[i]!=0 for all i.</p>
+  </li>
+  <li>
+    <p>σ<sub>i</sub>(<em>e</em>) can be computed using the coefficients of the polynomial (X+e<sub>1</sub>)(X+e<sub>2</sub>)…(X+e<sub>n</sub>) by divide and conquer approach with polynomial multiplication.</p>
+  </li>
+  <li>
+    <p>With some more optimizations (eg log(n) levels of recursion and each level being parallelized on a GPU), the resulting algorithms scale well with n on a GPU.</p>
+  </li>
+  <li>
+    <p>Operations are performed in the log-space using the log-sum-exp trick to achieve numerical stability in single floating point precision.</p>
+  </li>
+</ul>
+
+<h3 id="backward-computation">Backward computation</h3>
+
+<ul>
+  <li>
+    <p>The backward pass uses optimizations like computing derivative of σ<sub>j</sub> with respect to e<sub>i</sub> in a recursive manner.</p>
+  </li>
+  <li>
+    <p>Appendix of the paper describes these techniques in detail.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>Experiments are performed on CIFAR-100 (with noise) and Imagenet.</p>
+  </li>
+  <li>
+    <p>For CIFAR-100 with noise, the labels are randomized with probability p (within the same top-level class).</p>
+  </li>
+  <li>
+    <p>The proposed loss function is very robust to both noise and reduction in the amount of training dataset as compared to cross-entropy loss function for both top-k and top-1 performance.</p>
+  </li>
+</ul>
diff --git a/_site/site/2019/01/02/Pre-training-Graph-Neural-Networks-with-Kernels.html b/_site/site/2019/01/02/Pre-training-Graph-Neural-Networks-with-Kernels.html
new file mode 100644
index 00000000..d10533f7
--- /dev/null
+++ b/_site/site/2019/01/02/Pre-training-Graph-Neural-Networks-with-Kernels.html
@@ -0,0 +1,73 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper proposes a pretraining technique that can be used with the <a href="https://shagunsodhani.in/papers-I-read/Neural-Message-Passing-for-Quantum-Chemistry">GNN</a> architecture for learning graph representation as induced by powerful graph kernels.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1811.06930">Paper</a></p>
+  </li>
+</ul>
+
+<h2 id="idea">Idea</h2>
+
+<ul>
+  <li>
+    <p>Graph Kernel methods can learn powerful representations of the input graphs but the learned representation is implicit as the kernel function actually computes the dot product between the representations.</p>
+  </li>
+  <li>
+    <p>GNNs are flexible and powerful in terms of the representations they can learn but they can easily overfit if a large amount of training data is not available as is commonly the case of graphs.</p>
+  </li>
+  <li>
+    <p>Kernel methods can be used to learn an unsupervised graph representation that can be finetuned using the GNN architectures for the supervised tasks.</p>
+  </li>
+</ul>
+
+<h2 id="architecture">Architecture</h2>
+
+<ul>
+  <li>
+    <p>Given a dataset of graphs <em>g<sub>1</sub>, g<sub>2</sub>, …, g<sub>n</sub></em>, use a relevant kernel function to compute <em>k(g<sub>i</sub>, g<sub>j</sub>)</em> for all pairs of graphs.</p>
+  </li>
+  <li>
+    <p>A siamese network is used to encode the pair of graphs into representations <em>f(g<sub>i</sub>)</em> and <em>f(g<sub>j</sub>)</em> such that <em>dot(f(g<sub>i</sub>), f(g<sub>j</sub>))</em> equals <em>k(g<sub>i</sub>, g<sub>j</sub>)</em>.</p>
+  </li>
+  <li>
+    <p>The function <em>f</em> is trained to learn the compressed representation of kernel’s feature space.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<h3 id="datasets">Datasets</h3>
+
+<ul>
+  <li>Biological node-labeled graphs representing chemical compounds - MUTAG, PTC, NCI1</li>
+</ul>
+
+<h3 id="baselines">Baselines</h3>
+
+<ul>
+  <li><a href="https://www.cse.wustl.edu/~muhan/papers/AAAI_2018_DGCNN.pdf">DGCNN</a></li>
+  <li>Graphlet Kernel (GK)</li>
+  <li>Random Walk Kernel</li>
+  <li>Propogation Kernel</li>
+  <li>Weisfeiler-Lehman subtree kernel (WL)</li>
+</ul>
+
+<h3 id="results">Results</h3>
+
+<ul>
+  <li>
+    <p>Pretraining uses the WL kernel</p>
+  </li>
+  <li>
+    <p>Pretrained model performs better than the baselines for 2 datasets but lags behind WL method (which was used for pretraining) for the NCI1 dataset.</p>
+  </li>
+</ul>
+
+<h2 id="notes">Notes</h2>
+
+<ul>
+  <li>The idea is straightforward and intuitive. In general, this kind of pretraining should help the downstream model. It would be interesting to try it on more datasets/kernels/GNNs so that more conclusive results can be obtained.</li>
+</ul>
diff --git a/_site/site/2019/01/08/Efficient-Lifelong-Learning-with-A-GEM.html b/_site/site/2019/01/08/Efficient-Lifelong-Learning-with-A-GEM.html
new file mode 100644
index 00000000..be4bbda1
--- /dev/null
+++ b/_site/site/2019/01/08/Efficient-Lifelong-Learning-with-A-GEM.html
@@ -0,0 +1,164 @@
+<h2 id="contributions">Contributions</h2>
+
+<ul>
+  <li>
+    <p>A new (and more realistic) evaluation protocol for lifelong learning where each data point is observed just once and a disjoint set of tasks are used for training and validation.</p>
+  </li>
+  <li>
+    <p>A new metric that focuses on the efficiency of the models - in terms of sample complexity and computational (and memory) costs.</p>
+  </li>
+  <li>
+    <p>Modification of <a href="https://arxiv.org/abs/1706.08840">Gradient Episodic Memory ie GEM</a> which reduces the computational overhead of GEM without compromising on the results.</p>
+  </li>
+  <li>
+    <p>Empirical validation that using task descriptors help lifelong learning models and improve their few-shot learning capabilities.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1812.00420">Link to the paper</a></p>
+  </li>
+  <li>
+    <p><a href="https://github.com/facebookresearch/agem/">Link to the code</a></p>
+  </li>
+</ul>
+
+<h2 id="learning-protocol">Learning Protocol</h2>
+
+<ul>
+  <li>
+    <p>Two group of datasets - one for training and evaluation (D<sup>EV</sup>) and other for cross validation (D<sup>CV</sup>).</p>
+  </li>
+  <li>
+    <p>Data can be sampled multiple times for cross-validation dataset but only once from the training dataset.</p>
+  </li>
+  <li>
+    <p>Each group of dataset (say D<sup>EV</sup> or D<sup>CV</sup>) is a list of task-specific datasets D<sub>k</sub> (k is the task index).</p>
+  </li>
+  <li>
+    <p>Each sample in D<sub>k</sub> is of the form (x, t, y) where x is the data, t is the task descriptor and y is the output.</p>
+  </li>
+  <li>
+    <p>D<sub>k</sub> contains B<sup>k</sup> minibatches of data.</p>
+  </li>
+</ul>
+
+<h2 id="metrics">Metrics</h2>
+
+<h3 id="accuracy">Accuracy</h3>
+
+<ul>
+  <li>
+    <p>a<sub>k,i,j</sub> = accuracy on test task j after training on ith minibatch of training task k.</p>
+  </li>
+  <li>
+    <p>A<sub>k</sub> = mean over all j = 1 to k (a<sub>k, B<sub>k</sub>, j</sub>) ie train the model on data for task k and then test it on all the tasks.</p>
+  </li>
+</ul>
+
+<h3 id="forgetting-measure">Forgetting Measure</h3>
+
+<ul>
+  <li>
+    <p>f<sub>j</sub><sup>k</sup> = forgetting on task j after training on all minibatches upto task k.</p>
+  </li>
+  <li>
+    <p>f<sub>j</sub><sup>k</sup> = max over all l = 1 to k-1 (a<sub>l, B<sub>l</sub>j</sub> - a<sub>k, B<sub>k</sub>j</sub>)</p>
+  </li>
+  <li>
+    <p>Forgetting = F<sub>k</sub> = mean over all j = 1 to k-1 (f<sub>j</sub><sup>k</sup>)</p>
+  </li>
+</ul>
+
+<h3 id="lca---learning-curve-area">LCA - Learning Curve Area</h3>
+
+<ul>
+  <li>
+    <p>Z<sub>b</sub> = average b shot performance where b is the minibatch number.</p>
+  </li>
+  <li>
+    <p>Z<sub>b</sub> = mean over all k = 0 to T (a<sub>k, b, k</sub>)</p>
+  </li>
+  <li>
+    <p>LCA<sub>β</sub> = mean over all b = 0 to β (Z<sub>b</sub>)</p>
+  </li>
+  <li>
+    <p>One special case is LCA<sub>0</sub> which is the forward transfer performance or performance on the unseen task.</p>
+  </li>
+  <li>
+    <p>In experiments, β is kept small as we want the model to learn from few examples.</p>
+  </li>
+</ul>
+
+<h2 id="model">Model</h2>
+
+<ul>
+  <li>
+    <p>GEM has been shown to be very effective in single epoch setting but introduces a very high computational overhead.</p>
+  </li>
+  <li>
+    <p>Average GEM (AGEM) reduces this overhead by sampling (and using) only some examples from the episodic memory instead of using all the examples.</p>
+  </li>
+  <li>
+    <p>While GEM provides better guarantees in terms of worst-case forgetting, AGEM provides better guarantees in terms of average accuracy.</p>
+  </li>
+</ul>
+
+<h2 id="joint-embedding-model-using-compositional-task-descriptors">Joint Embedding Model Using Compositional Task Descriptors</h2>
+
+<ul>
+  <li>
+    <p>Compositional Task Descriptors are used to speed training on the subsequent tasks.</p>
+  </li>
+  <li>
+    <p>A matrix specifying the attribute value of objects (to be recognized in the task) are used.</p>
+  </li>
+  <li>
+    <p>A joint-embedding space between image features and attribute embeddings is learned.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<h3 id="datasets">Datasets</h3>
+
+<ul>
+  <li>
+    <p><a href="https://arxiv.org/abs/1612.00796">Permuted MNIST</a></p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1703.04200">Split CIFAR</a></p>
+  </li>
+  <li>
+    <p><a href="http://www.vision.caltech.edu/visipedia/CUB-200-2011.html">Split CUB</a></p>
+  </li>
+  <li>
+    <p><a href="http://cvml.ist.ac.at/papers/lampert-cvpr2009.pdf">Split AWA</a></p>
+  </li>
+</ul>
+
+<h3 id="setup">Setup</h3>
+
+<ul>
+  <li>
+    <p>Integer task descriptors for MNIST and CIFAR and class attributes as descriptors for CUB and AWA</p>
+  </li>
+  <li>
+    <p>Baselines include <a href="https://arxiv.org/abs/1706.08840">GEM</a>, <a href="https://arxiv.org/abs/1611.07725">iCaRL</a>, <a href="https://arxiv.org/pdf/1612.00796.pdf">Elastic Weight Consolidation</a>, <a href="https://arxiv.org/abs/1606.04671">Progressive Neural Networks</a> etc.</p>
+  </li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<ul>
+  <li>
+    <p>AGEM outperforms other models on all the datasets expect MNIST where the Progressive Neural Networks lead. One reason could be that MNIST has a large number of training examples per task. But Progressive Neural Networks lead to bad utilization of capacity.</p>
+  </li>
+  <li>
+    <p>While AGEM and GEM have similar performance, GEM has a much higher computational and memory overhead.</p>
+  </li>
+  <li>
+    <p>Use of task descriptors improves the accuracy for all the models.</p>
+  </li>
+  <li>
+    <p>It seems that AGEM offers a good tradeoff between average accuracy performance and efficiency - in terms of sample efficiency, memory requirements and computational costs.</p>
+  </li>
+</ul>
diff --git a/_site/site/2019/01/15/Hierarchical-RL-Using-an-Ensemble-of-Proprioceptive-Periodic-Policies.html b/_site/site/2019/01/15/Hierarchical-RL-Using-an-Ensemble-of-Proprioceptive-Periodic-Policies.html
new file mode 100644
index 00000000..46155f45
--- /dev/null
+++ b/_site/site/2019/01/15/Hierarchical-RL-Using-an-Ensemble-of-Proprioceptive-Periodic-Policies.html
@@ -0,0 +1,107 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper proposes a simple and robust approach for hierarchically training an agent in the sparse reward setup.</p>
+  </li>
+  <li>
+    <p>The broad idea is to train low-level primitives that are sufficiently diverse (so that they can be composed for solving higher level tasks) and to train a high level primitive that learns to combine these primitives for any given downstream task.</p>
+  </li>
+  <li>
+    <p><a href="https://openreview.net/forum?id=SJz1x20cFQ">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<ul>
+  <li>The state can be divided into two components: proprioceptive states s<sup>p</sup> (measurement of agent’s own body that can be directly controlled by the agent) and the external states s<sup>e</sup>/</li>
+</ul>
+
+<h3 id="low-level-policy-training">Low-Level Policy Training</h3>
+
+<ul>
+  <li>
+    <p>Low-level policies should be:</p>
+
+    <ul>
+      <li>Diverse: should cover all the skills that the agent might have to perform.</li>
+      <li>Effective: can make significant changes to the environment.</li>
+      <li>Controllable: easy for high-level policies to use and control</li>
+    </ul>
+  </li>
+  <li>
+    <p>For the low-level policy, the per-time step reward is directly proportional to change in the external state. The same reward is used for all the agents and environments(except regulated with environment specific controls and survival rewards).</p>
+  </li>
+</ul>
+
+<h3 id="phase-conditioned-policies">Phase conditioned policies</h3>
+
+<ul>
+  <li>
+    <p>Good movement policies are expected to be at least roughly periodic and phase input (or time index) is used to achieve periodicity.</p>
+  </li>
+  <li>
+    <p>Phase conditioned policy (=f(s<sup>p</sup>, φ)) where φ = {0, 1, …, k-1} is the phase index.</p>
+  </li>
+  <li>
+    <p>At each timestep <em>t</em>, the model receives observation s<sup>p</sup> and phase index φ = t%k. The phase index is represented by a vector b<sub>φ</sub>.</p>
+  </li>
+  <li>
+    <p>For phase conditioned policies, the agent state and actions are encouraged to be cyclic with the help of a cyclic loss.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>Environments: Ant and Humanoid from Mujoco.</p>
+  </li>
+  <li>
+    <p>Low-level control:</p>
+
+    <ul>
+      <li>Using phase-conditioning is helpful when training low-level primitives.</li>
+    </ul>
+  </li>
+  <li>
+    <p>High-level control:</p>
+
+    <ul>
+      <li>
+        <p>Cross Maze Environment with fixed goals</p>
+
+        <ul>
+          <li>
+            <p>3 goals along 3 paths</p>
+          </li>
+          <li>
+            <p>Proposed method converges faster and to a smaller final distance to the goal showing that it is both efficient and consistent (with smaller variance across random seeds).</p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>Random Goal Maze</p>
+
+        <ul>
+          <li>
+            <p>The goal is randomly drawn from a set of goals.</p>
+          </li>
+          <li>
+            <p>“Cross” (shaped) maze and “skull” (shaped) mazes are considered.</p>
+          </li>
+          <li>
+            <p>Even with velocity rewards and pretraining on low-level objectives (which can be thought of as exploration bonuses), the baseline fails to get close to the goal locations while the proposed model reach the goal most of the times.</p>
+          </li>
+          <li>
+            <p>The main results are reported using PPO though repeating the experiments with A2C and DQN show that the idea is fairly robust.</p>
+          </li>
+          <li>
+            <p>The paper reported that in their experiments, finetuning the lower level primitives did not help much though it might not be the case of other environments.</p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+</ul>
diff --git a/_site/site/2019/01/22/Modular-meta-learning.html b/_site/site/2019/01/22/Modular-meta-learning.html
new file mode 100644
index 00000000..52d61eef
--- /dev/null
+++ b/_site/site/2019/01/22/Modular-meta-learning.html
@@ -0,0 +1,196 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper proposes an approach for learning neural networks (modules) that can be combined in different ways to solve different tasks (combinatorial generalization).</p>
+  </li>
+  <li>
+    <p>The proposed model is called as BOUNCEGRAD.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1806.10166">Link to the paper</a></p>
+  </li>
+  <li>
+    <p><a href="https://github.com/FerranAlet/modular-metalearning">Link to the code</a></p>
+  </li>
+</ul>
+
+<h2 id="setup">Setup</h2>
+
+<ul>
+  <li>
+    <p>Focuses on supervised learning.</p>
+  </li>
+  <li>
+    <p>Task distribution <em>p(T)</em>.</p>
+  </li>
+  <li>
+    <p>Each task is a joint distribution <em>p<sub>T</sub>(x, y)</em> over <em>(x, y)</em> data pairs.</p>
+  </li>
+  <li>
+    <p>Given data from <em>m</em> meta-training tasks, and a meta-test task, find a hypothesis <em>h</em> which performs well on the unseen data drawn from the meta-test task.</p>
+  </li>
+</ul>
+
+<h2 id="structured-hypothesis">Structured Hypothesis</h2>
+
+<ul>
+  <li>
+    <p>Given a compositional scheme <em>C</em>, a set of modules <em>F<sub>1</sub>, …, F<sub>k</sub></em> (represented as a whole by <em>F</em>) and the set of their respective parameters θ<sub>1</sub>, …, θ<sub>k</sub> (represented as a whole by θ), <em>(C, F, θ)</em> represents the set of possible functional input-output mappings. These mappings form the hypothesis space.</p>
+  </li>
+  <li>
+    <p>A structured hypothesis model is specified by what modules to use and their parametric forms (but not the values).</p>
+  </li>
+</ul>
+
+<h3 id="examples-of-compositional-schemes">Examples of compositional schemes</h3>
+
+<ul>
+  <li>
+    <p>Choosing a single module for the task at hand.</p>
+  </li>
+  <li>
+    <p>Fixed compositional structure but different modules selected every time.</p>
+  </li>
+  <li>
+    <p>Weight ensemble (maybe using attention mechanism)</p>
+  </li>
+  <li>
+    <p>General function composition tree</p>
+  </li>
+</ul>
+
+<h3 id="phases">Phases</h3>
+
+<ul>
+  <li>
+    <p>Offline Meta Learning Phase:</p>
+
+    <ul>
+      <li>
+        <p>Take training and validation dataset for the first <em>k</em> tasks and generate a parameterization for each module <em>θ<sub>1</sub>, …, θ<sub>k</sub></em>.</p>
+      </li>
+      <li>
+        <p>The hypothesis (or composition) to use comes from the online meta-test learning phase.</p>
+      </li>
+      <li>
+        <p>In this stage, find the best θ given a structure.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Online Meta-test Learning Phase</p>
+
+    <ul>
+      <li>
+        <p>Given a hypothesis space and θ, the output is a compositional form (or hypothesis) that specifies how to compose the models.</p>
+      </li>
+      <li>
+        <p>In this stage, find the best structure, given a hypothesis space and θ.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="learning-algorithm">Learning Algorithm</h2>
+
+<ul>
+  <li>
+    <p>During Meta-test learning phase, simulated annealing is used to find the optimal structure, with temperature <em>T</em> decreased over time.</p>
+  </li>
+  <li>
+    <p>During meta-learning phrase, the actual objective function is replaced by a surrogate, smooth objective function (during the search step) to avoid local minima.</p>
+  </li>
+  <li>
+    <p>Once a structure has been picked, any gradient descent based approach can be used to optimize the modules.</p>
+  </li>
+  <li>
+    <p>Basically the state of optimization process comprises of the parameters and the temperature. Together, they are used to induce a distribution over the structures. Given a structure, θ is optimized and <em>T</em> is annealed over time.</p>
+  </li>
+  <li>
+    <p>The learning procedure can be improved upon by performing parameter tuning during the online (meta-test learning) phase as well. the resulting approach is referred to as MOMA - MOdular MAml.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<h3 id="approaches">Approaches</h3>
+
+<ul>
+  <li>
+    <p>Pooled - Single network using combined data of all the tasks.</p>
+  </li>
+  <li>
+    <p>MAML - Single network using MAML</p>
+  </li>
+  <li>
+    <p>BOUNCEGRAD - Modular Network without MAML adaptation in online learning.</p>
+  </li>
+  <li>
+    <p>MOMA - BOUNCEGRAD with MAML adaptation in online learning.</p>
+  </li>
+</ul>
+
+<h3 id="domains">Domains</h3>
+
+<h4 id="simple-functional-relationships">Simple Functional Relationships</h4>
+
+<ul>
+  <li>
+    <p>Sine-function prediction problem</p>
+  </li>
+  <li>
+    <p>In general, MOMA outperforms other models.</p>
+  </li>
+  <li>
+    <p>With a small amount of online training data, BOUNCEGRAD outperforms other models as it has a better structural prior.</p>
+  </li>
+</ul>
+
+<h4 id="predicting-next-frame-of-a-kinematic-skeleton-motion-capture-data">Predicting next frame of a kinematic skeleton (motion capture data)</h4>
+
+<ul>
+  <li>
+    <p>11 different objects (with different shapes) on 4 surfaces with different friction properties.</p>
+  </li>
+  <li>
+    <p>2 meta-learning scenarios are considered. In the first case, the object-surface combination in the test case was present in some meta-training tasks and in the other case, it was not present.</p>
+  </li>
+  <li>
+    <p>For previously seen combinations, MOMA performs the best followed by BOUNCEGRAD and MAML.</p>
+  </li>
+  <li>
+    <p>For unseen combinations, all the 3 are equally good.</p>
+  </li>
+  <li>
+    <p>Compositional scheme is the attention mechanism.</p>
+  </li>
+  <li>
+    <p>An interesting result is that the modules seem to specialize (and activate more often) based on the shape of the object.</p>
+  </li>
+</ul>
+
+<h3 id="predicting-next-frame-of-a-kinematic-selection-using-motion-capture-data">Predicting next frame of a kinematic selection (using motion capture data)</h3>
+
+<ul>
+  <li>
+    <p>Composition Structure - generating kinematics subtrees for each body part (2 legs, 2 arms, 2 torsi).</p>
+  </li>
+  <li>
+    <p>Again 2 setups are used - one where all activities in the training and the meta-test task are shared while the other setup where the activities are not shared.</p>
+  </li>
+  <li>
+    <p>For known activities MOMA and BOUNCEGRAD perform the best while for unknown activities, MOMS performs the best.</p>
+  </li>
+</ul>
+
+<h2 id="notes">Notes</h2>
+
+<ul>
+  <li>
+    <p>While the approach is interesting, maybe a more suitable set of tasks (from the point of composition) would be more convincing.</p>
+  </li>
+  <li>
+    <p>It would be useful to see the computational tradeoff between MAML, BOUNCEGRAD, and MOMA.</p>
+  </li>
+</ul>
diff --git a/_site/site/2019/01/29/Diversity-is-All-You-Need-Learning-Skills-without-a-Reward-Function.html b/_site/site/2019/01/29/Diversity-is-All-You-Need-Learning-Skills-without-a-Reward-Function.html
new file mode 100644
index 00000000..a8f0f7d0
--- /dev/null
+++ b/_site/site/2019/01/29/Diversity-is-All-You-Need-Learning-Skills-without-a-Reward-Function.html
@@ -0,0 +1,104 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper proposes an approach to learn useful skills without a reward function by maximizing an information theoretic objective by using a maximum entropy policy.</p>
+  </li>
+  <li>
+    <p>Skills are defined as latent-conditioned policies that alter the state of the environment in a consistent way.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1802.06070">Link to the paper</a></p>
+  </li>
+  <li>
+    <p><a href="https://github.com/ben-eysenbach/sac">Link to the code</a></p>
+  </li>
+</ul>
+
+<h2 id="setup">Setup</h2>
+
+<ul>
+  <li>Unsupervised “exploration” stage followed by supervised stage.</li>
+</ul>
+
+<h2 id="desirable-qualities-of-skills">Desirable Qualities of Skills</h2>
+
+<ul>
+  <li>
+    <p>Skills should dictate the states that the agent visits. Different skills should visit different states to be distinguishable.</p>
+  </li>
+  <li>
+    <p>States (not actions) should be used to distinguish between skills as not all actions change the state (for the outside observer).</p>
+  </li>
+  <li>
+    <p>Skills are encouraged to be diverse and “exploratory” by learning skills that act randomly (have high entropy).</p>
+  </li>
+</ul>
+
+<h2 id="loss-formulation">Loss Formulation</h2>
+
+<ul>
+  <li>
+    <p>(S, A) - state and action</p>
+  </li>
+  <li>
+    <p>z ~ p(z) - latent variable to condition the policy.</p>
+  </li>
+  <li>
+    <p>Skill - policy conditioned on a fixed z.</p>
+  </li>
+  <li>
+    <p>Objective is to maximize the mutual information between skill and state (MI(A; Z)) ie skill should control which state is visited or the skill should be inferrable from the state visited.</p>
+  </li>
+  <li>
+    <p>Simultaneously minimize the mutual information between skills and actions given the state to ensure that the state (and not the action) is used to distinguish the skills.</p>
+  </li>
+  <li>
+    <p>Maximize the entropy of the mixture of policies (p(z) and all the skills).</p>
+  </li>
+</ul>
+
+<h2 id="implementation">Implementation</h2>
+
+<ul>
+  <li>
+    <p>Policy π(a | s, z)</p>
+  </li>
+  <li>
+    <p>Task reward replaced by the pseduoreward logq<sub>φ</sub>(z | s) - log(p(z)).</p>
+  </li>
+  <li>
+    <p>During unsupervised training, z is sampled at the start of the episode and then not changed during the episode.</p>
+  </li>
+  <li>
+    <p>Learning agent gets rewards for visiting the states that are easy to discriminate while the discriminator updated to correctly predict z from the states visited.</p>
+  </li>
+</ul>
+
+<h2 id="observations">Observations</h2>
+
+<h3 id="analysis-of-learned-skills">Analysis of Learned Skills</h3>
+
+<ul>
+  <li>
+    <p>The agent learns a diverse set of primitive behaviors for all tasks ranging from 2 DoF to 111 DoF.</p>
+  </li>
+  <li>
+    <p>for inverted pendulum and mountain car, the skills become increasingly diverse throughout training.</p>
+  </li>
+  <li>
+    <p>Use of uniform prior, in place of a learned prior, for p(z) allows for discovery of more diverse skills.</p>
+  </li>
+  <li>
+    <p>The proposed approach can be used as a pretraining technique where the best-performing primitives (from unsupervised training) can be finetuned with the task-specific rewards.</p>
+  </li>
+  <li>
+    <p>The discovered skills can be used for hierarchical RL by learning a meta-policy(which chooses the skill to execute for k steps).</p>
+  </li>
+  <li>
+    <p>Modifying the discriminator in the proposed formulation can be used to bias DIAYN towards discovering a particular type of policies. This provides a mechanism for incorporating “supervision” in the learning setup.</p>
+  </li>
+  <li>
+    <p>The “discovered” primitives can also be used for imitation learning.</p>
+  </li>
+</ul>
diff --git a/_site/site/2019/02/05/Linguistic-Knowledge-as-Memory-for-Recurrent-Neural-Networks.html b/_site/site/2019/02/05/Linguistic-Knowledge-as-Memory-for-Recurrent-Neural-Networks.html
new file mode 100644
index 00000000..8ed339f8
--- /dev/null
+++ b/_site/site/2019/02/05/Linguistic-Knowledge-as-Memory-for-Recurrent-Neural-Networks.html
@@ -0,0 +1,59 @@
+<ul>
+  <li>
+    <p><a href="https://arxiv.org/abs/1703.02620">Link to the paper</a></p>
+  </li>
+  <li>
+    <p>Training RNNs to model long term dependencies is difficult but in some cases, the information about dependencies between elements (of the sequence) may be present in the form of symbolic knowledge.</p>
+  </li>
+  <li>
+    <p>For example, when encoding sentences, coreference, and hypernymy relations can be extracted between tokens.</p>
+  </li>
+  <li>
+    <p>These elements(tokens) can be connected with each other with different kind of edges resulting in the graph data structure.</p>
+  </li>
+  <li>
+    <p>One approach could be to model this knowledge(encoded in the graph) using a graph neural network (GNN).</p>
+  </li>
+  <li>
+    <p>The authors prefer to encode the information into 2 DAGs (via topological sorting) as training the GNN could add some extra overhead.</p>
+  </li>
+  <li>
+    <p>This results into the Memory as Acyclic Graph Encoding RNN (MAGE-RNN) architecture. Its GRU version is referred to as MAGE-GRU.</p>
+  </li>
+  <li>
+    <p>Given an input sequence of tokens [x<sub>1</sub>, x<sub>2</sub>, …, x<sub>T</sub>] and information about which tokens relate to other tokens, a graph G is constructed with different (possibly typed) edges.</p>
+  </li>
+  <li>
+    <p>Given the graph <em>G</em>, two DFS orderings are computed - forward DFS and backward DFS.</p>
+  </li>
+  <li>
+    <p>MAGE-RNN uses separate networks for accessing the forward and backward DFS orders.</p>
+  </li>
+  <li>
+    <p>A separate hidden state is maintained for each entity type to separate memory content from addressing.</p>
+  </li>
+  <li>
+    <p>For any DFS order (forward or backward), the representation at time <em>t</em> is given as the concatenation of representation of different edge types at that time.</p>
+  </li>
+  <li>
+    <p>The hidden states (for different edge types at time t) are updated in the topological order using the current state of all incoming edges at x<sub>t</sub>.</p>
+  </li>
+  <li>
+    <p>The representation of the DFS order is given as the sequence of all the previous representations.</p>
+  </li>
+  <li>
+    <p>In some cases, elements across multiple sequences could be related to each other. In that case, the graph is decomposed into a collection of DAGs and use MAGE-GRU on the DAGs by taking one random permutation of the sequences and decomposing it into the forward and the backward graphs.</p>
+  </li>
+  <li>
+    <p>The model is evaluated on the task of text comprehension with coreference on bAbi dataset (story based QA), LAMBADA dataset (broad context language modeling) and CNN dataset (cloze-style QA).</p>
+  </li>
+  <li>
+    <p>MAGE-GRU was used as a replacement for GRU units in bi-directional GRUs and GA-Reader architecture.</p>
+  </li>
+  <li>
+    <p>DAG-RNN and shared version of MAGE-GRU (with shared edge types) are the other baselines.</p>
+  </li>
+  <li>
+    <p>For all the cases, the model with MAGE-GRU works the best.</p>
+  </li>
+</ul>
diff --git a/_site/site/2019/02/19/TuckER-Tensor-Factorization-for-Knowledge-Graph-Completion.html b/_site/site/2019/02/19/TuckER-Tensor-Factorization-for-Knowledge-Graph-Completion.html
new file mode 100644
index 00000000..08bbde0c
--- /dev/null
+++ b/_site/site/2019/02/19/TuckER-Tensor-Factorization-for-Knowledge-Graph-Completion.html
@@ -0,0 +1,134 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>TuckER is a simple, yet powerful linear model that uses Tucker decomposition for the task of link prediction in knowledge graphs.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1901.09590">Paper</a></p>
+  </li>
+  <li>
+    <p><a href="https://github.com/ibalazevic/TuckER">Implementation</a></p>
+  </li>
+</ul>
+
+<h2 id="knowledge-graph-as-a-tensor">Knowledge Graph as a Tensor</h2>
+
+<ul>
+  <li>
+    <p>Let E be the set of all the entities and R be the set of all the relations in a given knowledge graph (KG).</p>
+  </li>
+  <li>
+    <p>The KG can be represented as a list of triples of the form (source entity, relation, object entity) or (e<sub>s</sub>, r, e<sub>o</sub>).</p>
+  </li>
+  <li>
+    <p>The list of triples can be represented as a third-order tensor (of binary values) where each element corresponds to a triple and each element’s value corresponds to ether that element is present in the KG or not.</p>
+  </li>
+  <li>
+    <p>The link prediction task can be formulated as - given a set of all triples, learn a scoring function that assigns a score to each triple. The score indicates whether the triple is actually present in the KG or not.</p>
+  </li>
+</ul>
+
+<h2 id="tucker-decomposition">TuckER Decomposition</h2>
+
+<ul>
+  <li>
+    <p>Tucker decomposition factorizes a tensor into a set of factor matrices and a smaller core tensor.</p>
+  </li>
+  <li>
+    <p>In the specific case of three-mode tensors (alternate representation of a KG), the given original tensor <strong>X</strong> (of shape <em>IxJxK</em>) can be factorized into a core tensor <strong>W</strong> (of shape <em>PxQxR</em>) and 3 factor matrics - <strong>A</strong> (of shape <em>IxP</em>), <strong>B</strong> (of shape <em>JxQ</em>) and <strong>C</strong> (of shape <em>KxR</em>) such that <strong>X</strong> is approximately <strong>W</strong> x<sub>1</sub> <strong>A</strong> x<sub>2</sub> <strong>B</strong> x<sub>3</sub> <strong>C</strong>, where X<sub>n</sub> denotes the tensor product along the nth mode.</p>
+  </li>
+  <li>
+    <p>Generally, <em>P, Q, R</em> are smaller than <em>I, J K</em> (respectively) and <strong>W</strong> can be seen as a compressed version of <strong>X</strong>.</p>
+  </li>
+</ul>
+
+<h2 id="tucker-decomposition-for-link-prediction">TuckER Decomposition for Link Prediction</h2>
+
+<ul>
+  <li>
+    <p>Two embedding matrics are used for embedding the entities and the relations respectively.</p>
+  </li>
+  <li>
+    <p>Entity embedding matrix <strong>E</strong> is shared for both subject and the object ie <strong>E</strong> = <strong>A</strong> = <strong>B</strong>.</p>
+  </li>
+  <li>
+    <p>The scoring function is gives as <strong>W</strong> x<sub>1</sub> <strong>e<sub>s</sub></strong> x<sub>2</sub> <strong>w<sub>r</sub></strong> x<sub>3</sub> <strong>e<sub>0</sub></strong> where <strong>e<sub>s</sub></strong>, <strong>w<sub>r</sub></strong> and <strong>e<sub>o</sub></strong> are the embedding vectors corresonding to e<sub>s</sub>, e<sub>r</sub> and e<sub>o</sub> respectively. Note that both the core tensor and the factor matrices are to be learnt.</p>
+  </li>
+  <li>
+    <p>Model is trained with the standard negative log-likelihood loss given as (for one triple):  y * log(p) + (1-y) * log(1-p)</p>
+  </li>
+  <li>
+    <p>To speed up training and increase accuracy, 1-N scoring is used. A given (e<sub>s</sub>, r) is simultaneously scored for all the entities using the local-closed world assumption (knowledge graph is only locally complete).</p>
+  </li>
+  <li>
+    <p>Handling asymmetric relations is straightforward by learning a relation embedding alongside a relation-agnostic core tensor which enables knowledge sharing across relations.</p>
+  </li>
+</ul>
+
+<h2 id="theoretical-analysis">Theoretical Analysis</h2>
+
+<ul>
+  <li>
+    <p>One important consideration would be the expressive power of TuckER models, especially in relation to other models like ComplEx and SimplE.</p>
+  </li>
+  <li>
+    <p>It can be shown the TuckER is fully expressive ie give any ground truth over E and R, there exists a TuckER model which can perfectly represent the data - using 1-hot entity and relation embedding.</p>
+  </li>
+  <li>
+    <p>For full expressiveness, dimensionality of entity (relation) is n<sub>E</sub> (n<sub>R</sub>) where n<sub>E</sub> (n<sub>R</sub>) are the number of entities (relations). In comparsion, the required dimensionality for ComplEx is n<sub>E</sub> * n<sub>R</sub> (for both entity and relations) and for SimplE, it is min(<sub>E</sub> * n<sub>R</sub>, number of facts + 1) (for both entity and relations).</p>
+  </li>
+  <li>
+    <p>Many existing models like RESCAL, DistMult, ComplEx, SimplE etc can be seen as special cases of TuckER.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<h3 id="datasets">Datasets</h3>
+
+<ul>
+  <li>
+    <p>FB15k, FB15k-237, WN18, WN18RR</p>
+  </li>
+  <li>
+    <p>The max number of entities is around 41K and max number of relations is around 1.3K</p>
+  </li>
+</ul>
+
+<h3 id="implementation">Implementation</h3>
+
+<ul>
+  <li>BatchNorm, Dropout and Learning rate decay are used.</li>
+</ul>
+
+<h3 id="metrics">Metrics</h3>
+
+<ul>
+  <li>
+    <p>Mean Reciprocal Rank (MRR) - the average of the inverse of mean rank assigned to the true triple overall n<sub>e</sub> generated triples.</p>
+  </li>
+  <li>
+    <p>hits@k (k = 1, 3, 10) - percentage of times the true triple is ranked in the top k of the n<sub>e</sub> generated triples.</p>
+  </li>
+  <li>
+    <p>Higher is better for both the metrics.</p>
+  </li>
+</ul>
+
+<h3 id="results">Results</h3>
+
+<ul>
+  <li>
+    <p>TuckER outperforms all the baseline models on all but one task.</p>
+  </li>
+  <li>
+    <p>Dropout is an important factor with higher dropout rates (0, 3, 0.4, 0.5) needed for datasets with fewer training examples per relation (hence more prone to overfitting).</p>
+  </li>
+  <li>
+    <p>TuckER improves performance more significantly when the number of relations is large.</p>
+  </li>
+  <li>
+    <p>Even with lower embedding dimensions, TuckER’s performance does not deteriorate as much as other models.</p>
+  </li>
+</ul>
diff --git a/_site/site/2019/03/12/Model-Primitive-Hierarchical-Lifelong-Reinforcement-Learning.html b/_site/site/2019/03/12/Model-Primitive-Hierarchical-Lifelong-Reinforcement-Learning.html
new file mode 100644
index 00000000..2d1a0dc7
--- /dev/null
+++ b/_site/site/2019/03/12/Model-Primitive-Hierarchical-Lifelong-Reinforcement-Learning.html
@@ -0,0 +1,164 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper presents a framework that uses diverse suboptimal world models that can be used to break complex policies into simpler and modular sub-policies.</p>
+  </li>
+  <li>
+    <p>Given a task, both the sub-policies and the controller are simultaneously learned in a bottom-up manner.</p>
+  </li>
+  <li>
+    <p>The framework is called as Model Primitive Hierarchical Reinforcement Learning (MPHRL).</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1903.01567">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="idea">Idea</h2>
+
+<ul>
+  <li>
+    <p>Instead of learning a single transition model of the environment (aka <em>world model</em>) that can model the transitions very well, it is sufficient to learn several (say <em>k</em>) suboptimal models (aka <em>model primitives</em>).</p>
+  </li>
+  <li>
+    <p>Each <em>model primitive</em> will be good in only a small part of the state space (aka <em>region of specialization</em>).</p>
+  </li>
+  <li>
+    <p>These <em>model primitives</em> can then be used to train a gating mechanism for selecting sub-policies to solve a given task.</p>
+  </li>
+  <li>
+    <p>Since these <em>model primitives</em> are sub-optimal, they are not directly used with model-based RL but are used to obtain useful functional decompositions and sub-policies are trained with model-free approaches.</p>
+  </li>
+</ul>
+
+<h2 id="single-task-learning">Single Task Learning</h2>
+
+<ul>
+  <li>
+    <p>A gating controller is trained to choose the sub-policy whose <em>model primitive</em> makes the best prediction.</p>
+  </li>
+  <li>
+    <p>This requires modeling <em>p(M<sub>k</sub> | s<sub>t</sub>, a<sub>t</sub>, s<sub>t+1</sub>)</em> where <em>p(M<sub>k</sub>)</em> denotes the probability of selecting the <em>k<sup>th</sup> model primitive</em>. This is hard to compute as the system does not have access to <em>s<sub>t+1</sub></em>  and <em>a<sub>t</sub></em> at time <em>t</em> before it has choosen the sub-policy.</p>
+  </li>
+  <li>
+    <p>Properly marginalizing <em>s<sub>t+1</sub></em> and <em>a<sub>t</sub></em> would require expensive MC sampling. Hence an approximation is used and the gating controller is modeled as a categorical distribution - to produce <em>p(M<sub>k</sub> | s<sub>t</sub>)</em>. This is trained via a conditional cross entropy loss where the ground truth distribution is obtained from transitions sampled in a rollout.</p>
+  </li>
+  <li>
+    <p>The paper notes that technique is biased but reports that it still works for the downstream tasks.</p>
+  </li>
+  <li>
+    <p>The gating controller composes the sub-policies as a mixture of Gaussians.</p>
+  </li>
+  <li>
+    <p>For learning, PPO algorithm is used with each <em>model primitives</em> gradient weighted by the probability from the gating controller.</p>
+  </li>
+</ul>
+
+<h2 id="lifelong-learning">Lifelong Learning</h2>
+
+<ul>
+  <li>Different tasks could share common subtasks but may require a different composition of subtasks. Hence, the learned sub-policies are transferred across tasks but not the gating controller or the baseline estimator (from PPO).</li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>Domains:</p>
+
+    <ul>
+      <li>
+        <p>Mujoco ant navigating different mazes.</p>
+      </li>
+      <li>
+        <p>Stacker arm picking up and placing different boxes.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Implementation Details:</p>
+
+    <ul>
+      <li>
+        <p>Gaussian subpolicies</p>
+      </li>
+      <li>
+        <p>PPO as the baseline</p>
+      </li>
+      <li>
+        <p>Model primitives are hand-crafted using the true next state provided by the environment simulator.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Single Task</p>
+
+    <ul>
+      <li>
+        <p>Only maze task is considered with the start position (of the ant) and the goal position is fixed.</p>
+      </li>
+      <li>
+        <p>Observation includes distance from the goal.</p>
+      </li>
+      <li>
+        <p>Forcing the agent to decompose the problem, when a more direct solution may be available, causes the sample complexity to increase on one task.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Lifelong Learning</p>
+
+    <ul>
+      <li>
+        <p>Maze</p>
+
+        <ul>
+          <li>
+            <p>10 random Mujoco ant mazes used as the task distribution.</p>
+          </li>
+          <li>
+            <p>MPHRL takes almost twice the number of steps (as compared to PPO baseline) to solve the first task but this cost gets amortized over the distribution and the model takes half the number of steps as compared to the baseline (summed over the 10 tasks).</p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>Pick and Place</p>
+
+        <ul>
+          <li>
+            <p>8 Pick and Place tasks are created with max 3 goal locations.</p>
+          </li>
+          <li>
+            <p>Observation includes the position of the goal.</p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Ablations</p>
+
+    <ul>
+      <li>
+        <p>Overlapping <em>model primitives</em> can degrade the performance (to some extent). Similarly, the performance suffers when redundant primitives are introduced indicating that the gating mechanism is not very robust.</p>
+      </li>
+      <li>
+        <p>Sub-policies could quickly adapt to the previous tasks (on which they were trained initially) despite being finetuned on subsequent tasks.</p>
+      </li>
+      <li>
+        <p>The order of tasks (in the 10-Mazz task) does not degrage the performance.</p>
+      </li>
+      <li>
+        <p>Transfering the gating controller leads to negative transfer.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Notes</p>
+
+    <ul>
+      <li>I think the biggest strength of the work is that accurate dynamics model are not needed (which are hard to train anyways!) through the experimental results are not conclusive given the limited number of domains on which the approach is tested.</li>
+    </ul>
+  </li>
+</ul>
diff --git a/_site/site/2019/03/16/To-Tune-or-Not-to-Tune-Adapting-Pretrained-Representations-to-Diverse-Tasks.html b/_site/site/2019/03/16/To-Tune-or-Not-to-Tune-Adapting-Pretrained-Representations-to-Diverse-Tasks.html
new file mode 100644
index 00000000..f7927cef
--- /dev/null
+++ b/_site/site/2019/03/16/To-Tune-or-Not-to-Tune-Adapting-Pretrained-Representations-to-Diverse-Tasks.html
@@ -0,0 +1,146 @@
+<ul>
+  <li>
+    <p><a href="https://arxiv.org/abs/1903.05987">Link to the paper</a></p>
+  </li>
+  <li>
+    <p>The paper provides useful empirical advice for adapting pretrained language models for a given target task.</p>
+  </li>
+  <li>
+    <p>Pre-trained models considered</p>
+
+    <ul>
+      <li>
+        <p>ELMo</p>
+      </li>
+      <li>
+        <p>BERT</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Tasks considered</p>
+
+    <ul>
+      <li>
+        <p>Named Entity Recognition (NER) - CoNLL 2003 dataset</p>
+      </li>
+      <li>
+        <p>Sentiment Analysis (SA) - Stanford Sentiment Treebank (SST-2) dataset</p>
+      </li>
+      <li>
+        <p>Natural Language Inference (NLI) - MultiNLI and Sentences Involving Compositional Knowledge (SICK-E) dataset</p>
+      </li>
+      <li>
+        <p>Paraphrase Detection (PD) - Microsoft Research Paraphrase Corpus (MRPC)</p>
+      </li>
+      <li>
+        <p>Semantic Textual Similarity (STS) - Semantic Textual Similarity Benchmark (STS-B) and SICK-R</p>
+      </li>
+      <li>
+        <p>The last 3 tasks (NLI, PD, STS) are defined for sentence pairs.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Adaptation Strategies</p>
+
+    <ul>
+      <li>
+        <p>Feature Extraction</p>
+
+        <ul>
+          <li>
+            <p>The pretrained model is only used for extracting features and its weights are kept fixed.</p>
+          </li>
+          <li>
+            <p>For both ELMo and BERT, the contextual representation of the words from all the layers are extracted.</p>
+          </li>
+          <li>
+            <p>A weighted combination of these layers is used as an input to the task-specific model.</p>
+          </li>
+          <li>
+            <p>Task-specific models</p>
+
+            <ul>
+              <li>
+                <p>NER - BiLSTM with CRF layer</p>
+              </li>
+              <li>
+                <p>SA - bi-attentive classification network</p>
+              </li>
+              <li>
+                <p>NLI, PD, STS - <a href="https://arxiv.org/abs/1609.06038">Enhanced Sequential Inference Model (ESIM)</a></p>
+              </li>
+            </ul>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>Fine-tuning</p>
+
+        <ul>
+          <li>
+            <p>The pretrained model is finetuned on the target task.</p>
+          </li>
+          <li>
+            <p>Task-specific models for ELMO</p>
+
+            <ul>
+              <li>
+                <p>NER - CRF on top of LSTM states</p>
+              </li>
+              <li>
+                <p>SA - Max-pool over the language model states followed by a softmax layer</p>
+              </li>
+              <li>
+                <p>NLI, PD, STS - cross sentence bi-attention between the language model states followed by pooling and softmax layer.</p>
+              </li>
+            </ul>
+          </li>
+          <li>
+            <p>Task-specific models for BERT</p>
+
+            <ul>
+              <li>
+                <p>NER - Extract representation of the first-word piece of each token followed by the softmax layer</p>
+              </li>
+              <li>
+                <p>SA, NLI, PD, STS - standard BERT training</p>
+              </li>
+            </ul>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Main observations</p>
+
+    <ul>
+      <li>
+        <p>Feature extraction and fine-tuning have comparable performance in most cases unless the two tasks are highly similar(fine-tuning is better) or highly dissimilar (feature extraction is better).</p>
+      </li>
+      <li>
+        <p>For ELMo, feature extraction consistently outperforms fine-tuning for the sentence pair tasks (NLI, PD, STS). The reverse trend is observed for BERT with fine-tuning being better on sentence pair tasks.</p>
+      </li>
+      <li>
+        <p>Adding extra parameters is helpful for feature extraction but not fine-tuning.</p>
+      </li>
+      <li>
+        <p>ELMo fine-tuning requires careful tuning and other tricks like triangular learning rates, gradual unfreezing and discriminative fine-tuning.</p>
+      </li>
+      <li>
+        <p>For the tasks considered, there is no correlation observed between the distance of the source and target domains and adaptation performance.</p>
+      </li>
+      <li>
+        <p>Training a diagnostic classifier (on the intermediate representations) suggests that fine-tuning improves the performance of the classifier at all the intermediate layers (which is sort of expected).</p>
+      </li>
+      <li>
+        <p>In terms of mutual information estimates, fine-tuned representations have a much higher mutual information as compared to the feature extraction based representations.</p>
+      </li>
+      <li>
+        <p>Knowledge for single sentence tasks seems to be mostly concentrated in the last layers while for pair classification tasks, the knowledge seems gradually build un in the intermediate layers, all the way up to the last layer.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
diff --git a/_site/site/2019/03/26/GNN-Explainer-A-Tool-for-Post-hoc-Explanation-of-Graph-Neural-Networks.html b/_site/site/2019/03/26/GNN-Explainer-A-Tool-for-Post-hoc-Explanation-of-Graph-Neural-Networks.html
new file mode 100644
index 00000000..32c28481
--- /dev/null
+++ b/_site/site/2019/03/26/GNN-Explainer-A-Tool-for-Post-hoc-Explanation-of-Graph-Neural-Networks.html
@@ -0,0 +1,200 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Graph Neural Network (GNN) is a family of powerful machine learning (ML) models for graphs that can combine node information with the structural information.</p>
+  </li>
+  <li>
+    <p>One downside of GNNs is that their predictions are hard to interpret.</p>
+  </li>
+  <li>
+    <p>The paper proposes GNN Explainer model for solving the problem of interpretability.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1903.03894">Paper</a></p>
+  </li>
+</ul>
+
+<h2 id="desiderata-for-gnn-explanations">Desiderata for GNN explanations</h2>
+
+<ul>
+  <li>
+    <p><strong>Local edge fidelity</strong> - identify the subgraph structure (ideally the smallest) that significantly affected the predictions of the GNN. ie identify the important edges in the graph (for a given prediction).</p>
+  </li>
+  <li>
+    <p><strong>Local node fidelity</strong> - identify the import node features and correlations in the features of the neighboring nodes.</p>
+  </li>
+  <li>
+    <p><strong>Single instance and multi-instance explanations</strong> - Support both single instance prediction tasks and multi-instance prediction tasks.</p>
+  </li>
+  <li>
+    <p><strong>Model Agnostic</strong> - Support a large family of models (ideally all)</p>
+  </li>
+  <li>
+    <p><strong>Task Agnostic</strong> - Support a large family of tasks (ideally all)</p>
+  </li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<ul>
+  <li>
+    <p>I first describe the single instance prediction case and use that as the base to describe the multiple instance prediction cases. All the discussion in this section assumes a single instance prediction task.</p>
+  </li>
+  <li>
+    <p><strong>Input</strong>: Trained GNN, a single instance whose prediction is to be explained.</p>
+  </li>
+  <li>
+    <p><strong>Task</strong>: Identify the small subgraph and the small subset of features that explain the prediction.</p>
+  </li>
+  <li>
+    <p><strong>Idea</strong>: Maximize the mutual information (MI) between the GNN and the explanation by learning a <em>graph mask</em> which can be used for selecting the relevant subgraph (from the GNN’s computational graph) and features (from all layers of the GNN).</p>
+  </li>
+  <li>
+    <p>Computational graph of GNN (corresponding to a node) refers to the approx L-hop neighborhood of the node in the graph ie the subgraph formed by nodes and edges whose representation affected the representation of the given node.</p>
+  </li>
+</ul>
+
+<h3 id="single-instance-explanations">Single-Instance Explanations</h3>
+
+<ul>
+  <li>
+    <p>For a node <em>v</em>, the information used to predict its label <em>y</em> is completely described by its computation graph <em>G<sub>c</sub>(v)</em> and the associated feature set <em>X<sub>c</sub>(v)</em>. The feature set includes the features of all the nodes in the computation graph.</p>
+  </li>
+  <li>
+    <p>When constructing the explaination, only <em>G<sub>c</sub>(v)</em> and <em>X<sub>c</sub>(v)</em> are used.</p>
+  </li>
+  <li>
+    <p>The task can be reformulated as identifying a subgraph <em>G<sub>S</sub></em> (subset of <em>G<sub>c</sub>(v)</em>) with associated features <em>X<sub>S</sub></em> which are important when predicting the label <em>y</em> for node <em>v</em>.</p>
+  </li>
+  <li>
+    <p>“Importance” is measured in terms of MI</p>
+  </li>
+</ul>
+
+<p><em>MI(Y, (G<sub>S</sub>, X<sub>S</sub>)) = H(Y) - H(Y | G = G<sub>S</sub>, X = X<sub>S</sub>)</em> where <em>H</em> is the entropy and <em>Y</em> is a random variable representing the prediction.</p>
+
+<ul>
+  <li>
+    <p>A further constraint, <em>| G<sub>S</sub>| &lt; k</em> is imposed to obtain consise explaintations.</p>
+  </li>
+  <li>
+    <p>Since <em>H(Y)</em> is fixed (recall that the network has already been trained and is now being used in the inference mode), maximizing MI is equivalent to minimizing the conditional entropy <em>H(Y | G = G<sub>S</sub>, X = X<sub>S</sub>)</em></p>
+  </li>
+  <li>
+    <p>This is equivalent to selecting the subgraph that minimizes the uncertainty in the prediction of <em>y</em> when the computational graph is <em>G<sub>c</sub>(v)</em></p>
+  </li>
+</ul>
+
+<h4 id="optimiation-process">Optimiation Process</h4>
+
+<ul>
+  <li>
+    <p>Given the exponentially large number of possible subgraphs, we can not directly optimize the given equation.</p>
+  </li>
+  <li>
+    <p>A “relaxed”-adjacency matrix (whose values are real numbers in the range 0 to 1) is introduced where each element of this fractional adjacency matrix is smaller than the corresponding element of the original adjacency matrix. Gradient descent can be performed on this adjacency matrix.</p>
+  </li>
+  <li>
+    <p>The “relaxed” <em>G<sub>S</sub></em> can be interpreted as a variational approximation of the subgraph distributions of <em>G<sub>c</sub>(v)</em> and the objective can be written as <em>min E<sub>G<sub>S</sub></sub>H(Y | G = G<sub>S</sub>, X = X<sub>S</sub>)</em></p>
+  </li>
+  <li>
+    <p>Now the paper makes a big approximation that the GNN is convex so as to leverage the Jensen inequality and push the expectation inside the entropy term to get an upper bound and then minimize that ie <em>min H(Y | G = E<sub>s</sub>[G<sub>S</sub>], X = X<sub>S</sub>)</em></p>
+  </li>
+  <li>
+    <p>The paper reports that the convexity approximation (along with discreteness constraint) works in practice.</p>
+  </li>
+  <li>
+    <p>Next, mean field approximation is used to decompose <em>P(G<sub>S</sub>)</em> as a multivariate Bernoulli distrbitution ie product of <em>A<sub>S</sub>(i, j)</em> for all <em>(i, j)</em> belonging to <em>G<sub>c</sub>(v)</em>. <em>A<sub>S</sub></em> can be optimized directly and its values represent the expectation of the Bernoulli distrbitution on wether the edge <em>e<sub>i, j</sub></em> exists.</p>
+  </li>
+  <li>
+    <p>Given the constraints on <em>A<sub>S</sub></em>, it is easier to learn a mask matrix <em>M</em> and optimize that such that <em>A<sub>S</sub></em> = M * A<sub>c</sub>* Additionally, the sigmod operator can be applied on <em>M</em>.</p>
+  </li>
+  <li>
+    <p>Once <em>M</em> is learned, only the top <em>k</em> values are retained.</p>
+  </li>
+</ul>
+
+<h4 id="including-node-features-in-the-explanation">Including Node Features in the Explanation</h4>
+
+<ul>
+  <li>
+    <p>Similar to the previous approach, another feature mask is learned (either one for entire GNN or one per node of the GNN) and is used as a feature selector.</p>
+  </li>
+  <li>
+    <p>The mask could either be learned such that same set of node features (in terms of dimensions) are selected or a different set of features are selected per node. The paper uses the former as it is more straightforward.</p>
+  </li>
+  <li>
+    <p>Just like before, a “relaxed” mask <em>M<sub>T</sub></em> is trained to select features as <em>M<sub>T</sub> * X<sub>S</sub></em>.</p>
+  </li>
+  <li>
+    <p>One tricky case is where one feature is important but its value is set to 0. In the case, the value will be masked even though it should not be</p>
+  </li>
+  <li>
+    <p>The workaround is to use Monte Carlo (MC) estimates of marginals of the missing features. This gives a way to assign importance scores to each feature dimension and a form of reparameterization trick is used to perform end-to-end learning.</p>
+  </li>
+  <li>
+    <p>Masks are encouraged to be discrete by regularizing their element-wise entropy.</p>
+  </li>
+  <li>
+    <p>Resulting computation graph is valid as in it allows message passing towards the central node <em>v</em>.</p>
+  </li>
+</ul>
+
+<h2 id="multi-instance-explanations">Multi-Instance Explanations</h2>
+
+<ul>
+  <li>
+    <p>Given a set of nodes (having the label say <em>y</em>),  the task is to obtain a global explanation of the predictions.</p>
+  </li>
+  <li>
+    <p>For the given class, a prototypical reference node is chosen by computing the mean of embeddings of all the nodes in the class and then selecting the node which is closest to the mean.</p>
+  </li>
+  <li>
+    <p>Now, compute the important computational graph corresponding to this node and align the computational subgraphs of all the other nodes (in the given class) to reference.</p>
+  </li>
+  <li>
+    <p>Let <em>A*</em> be the adjacency matrix and <em>X*</em> be the feature matrix for the explanation corresponding to the reference node. Let <em>A<sub>v</sub></em> and <em>X<sub>v</sub></em> be the adjacency matrix and feature matrix of the to-ber-aligned computational graph.</p>
+  </li>
+  <li>
+    <p>A relaed alignment matrix <em>P</em> is optimized to align the nodes and features in the two graphs ie we minimize <em>|P<sup>T</sup>A<sub>v</sub>P - A*| + *|P<sup>T</sup>X<sub>v</sub>P - X*|</em></p>
+  </li>
+  <li>
+    <p>Choosing concise explanations helps in efficient graph matching.</p>
+  </li>
+  <li>
+    <p>For GNNs that compute attention over the entire graph, edges with low attention weight can be pruned to increase efficiency.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>Datasets</p>
+
+    <ul>
+      <li>
+        <p>Node classification: BA-Shapes, BA-Community, Tree-Cycles, Tree-Grid</p>
+      </li>
+      <li>
+        <p>Graph classification: MUTAG, Reddit-Binary</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Baselines</p>
+
+    <ul>
+      <li>
+        <p>GRAD - Compute the gradient of the model loss with respect to the adjacency matrix and the node features to be classified and fix the edges with the highest absolute gradient.</p>
+      </li>
+      <li>
+        <p>GAT - Graph Attention Network</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>The proposed model seems to outperform the baselines both qualitatively and quantitatively. But the results should be taken with a grain of salt as only 2 baselines are considered.</p>
+  </li>
+</ul>
diff --git a/_site/site/2019/04/02/Meta-Learning-Update-Rules-for-Unsupervised-Representation-Learning.html b/_site/site/2019/04/02/Meta-Learning-Update-Rules-for-Unsupervised-Representation-Learning.html
new file mode 100644
index 00000000..a3242fc2
--- /dev/null
+++ b/_site/site/2019/04/02/Meta-Learning-Update-Rules-for-Unsupervised-Representation-Learning.html
@@ -0,0 +1,122 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Standard unsupervised learning aims to learn transferable features. The paper proposes to learn a transferable learning rule (in an unsupervised manner) that can generalize across tasks and architectures.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1804.00222">Paper</a></p>
+  </li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<ul>
+  <li>
+    <p>Consider training the model with supervised learning - <em>φ<sub>t+1</sub> = SupervisedUpdate(φ<sub>t</sub>, x<sub>t</sub>, y<sub>t</sub>, θ)</em>.</p>
+  </li>
+  <li>
+    <p>Here <em>t</em> denotes the step, <em>(x, y)</em> denotes the data points, <em>θ</em> denotes the hyperparameters of the optimizer.</p>
+  </li>
+  <li>
+    <p>Extending this formulation for meta-learning, one could say that <em>t</em> is the step of the inner loop, <em>θ</em> are the parameters of the meta learning model.</p>
+  </li>
+  <li>
+    <p>Further, the paper proposes to use <em>φ<sub>t+1</sub> = UnsupervisedUpdate(φ<sub>t</sub>, x<sub>t</sub>, θ)</em> ie <em>y<sub>t</sub></em> is not used (or even assumed to be available as this is unsupervised learning).</p>
+  </li>
+  <li>
+    <p>The meta update rule is used to learn the weights of a meta-model by performing SGD on the sum of <em>MetaObjective</em> over the distribution of tasks (over the course of inner loop training).</p>
+  </li>
+</ul>
+
+<h2 id="model">Model</h2>
+
+<ul>
+  <li>
+    <p>Base model: MLP with parameters <em>φ<sub>t</sub></em></p>
+  </li>
+  <li>
+    <p>To ensure that it generalizes across architectures, the update rule is designed to be neural-local ie updates are a function of pre and postsynaptic neurons though, in practice, this constraint is relaxed to decorrelate neurons by using cross neural information.</p>
+  </li>
+  <li>
+    <p>Each neuron <em>i</em> in every layer <em>l</em> (in the base model) has an update network (MLP) which takes as input the feedforward activations, feedback weights and error signals. ie <em>h<sub>b</sub><sup>l</sup>(i) = MLP(x<sub>b</sub><sup>l</sup>(i), z<sub>b</sub><sup>l</sup>(i), v<sup>l+1</sup>,
+δ<sup>l</sup>(i), θ)</em></p>
+
+    <ul>
+      <li><em>b</em> - index of the minibatch</li>
+      <li><em>x<sup>l</sup></em> - pre non-linearity activations</li>
+      <li><em>z<sup>l</sup></em> - post non-linearity activations</li>
+      <li><em>v<sup>l</sup></em> - feedback weights</li>
+      <li><em>δ<sup>l</sup></em> - error signal</li>
+    </ul>
+  </li>
+  <li>
+    <p>All the update networks share the meta parameters <em>θ</em></p>
+  </li>
+  <li>
+    <p>The model is run in a standard feed-forward manner and the update network (corresponding to each unit) is used to generate the error signal <em>δ<sup>l</sup><sub>b</sub>(i) = lin(h<sub>b</sub><sup>l</sup>(i))</em>.</p>
+  </li>
+  <li>
+    <p>This loss is backpropogated using the set of learned backward weights <em>v<sup>l</sup></em> instead of the forward weights <em>w<sub>l</sub></em>.</p>
+  </li>
+  <li>
+    <p>The weight update <em>Δw<sub>l</sub></em> is also generated using a per-neuron update network.</p>
+  </li>
+</ul>
+
+<h2 id="meta-objective">Meta Objective</h2>
+
+<ul>
+  <li>
+    <p>The <em>MetaObjective</em> is based on fitting a linear regression model to labeled examples with a small number of data points.</p>
+  </li>
+  <li>
+    <p>Given the emphasis on learning generalizable features, the weights (of linear regression) are estimated on one batch and evaluated on another batch.</p>
+  </li>
+  <li>
+    <p>The <em>MetaObjective</em> is to reduce the cosine distance between <em>y<sub>b</sub></em> and <em>v<sup>T</sup>x<sub>b</sub><sup>L</sup></em></p>
+
+    <ul>
+      <li>
+        <p><em>y<sub>b</sub></em> - Actual lables on the evaluation batch</p>
+      </li>
+      <li>
+        <p><em>x<sub>b</sub><sup>L</sup></em> - Features of the evaluation batch (using the base model)</p>
+      </li>
+      <li>
+        <p><em>v</em> - parameters of the linear regression model (learned on train batch)</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="practical-considerations">Practical Considerations</h2>
+
+<ul>
+  <li>
+    <p>Meta gradients are approximated using truncated backdrop through time.</p>
+  </li>
+  <li>
+    <p>Increasing variation in the training dataset helps the meta optimization process. Data is augmented with shifts, rotations, and noise. Predicting these coefficients is an auxiliary (regression) task for training the meta-objective.</p>
+  </li>
+  <li>
+    <p>Training the system requires a lot of resources - 8 days with 512 workers.</p>
+  </li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<ul>
+  <li>
+    <p>With standard unsupervised learning, the performance (on transfer task) starts declining after some time even though the performance (on the unsupervised task) is improving. This suggests that the objective function for the two tasks starts to mismatch.</p>
+  </li>
+  <li>
+    <p><em>UnsupervisedUpdate</em> leads to a better generalization as compared to both VAE and supervised learning (followed by transfer).</p>
+  </li>
+  <li>
+    <p><em>UnsupervisedUpdate</em> also leads to a positive transfer across domains (vision to language) when trained for a shorter duration of time (to ensure that the meta-objective does not overfit).</p>
+  </li>
+  <li>
+    <p><em>UnsupervisedUpdate</em> also generalizes to larger model architectures and different activation functions.</p>
+  </li>
+</ul>
diff --git a/_site/site/2019/04/09/Towards-a-natural-benchmark-for-continual-learning.html b/_site/site/2019/04/09/Towards-a-natural-benchmark-for-continual-learning.html
new file mode 100644
index 00000000..a0a48e7a
--- /dev/null
+++ b/_site/site/2019/04/09/Towards-a-natural-benchmark-for-continual-learning.html
@@ -0,0 +1,51 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Continual Learning paradigm focuses on learning from a non-stationary stream of data with additional desiderata - transferring knowledge from previously seen task to unseen tasks and being resilient to catastrophic forgetting - all with a fixed memory and computational budget.</p>
+  </li>
+  <li>
+    <p>This is in contrast to the IID (independent and identically distributed) assumption in statistical learning.</p>
+  </li>
+  <li>
+    <p>One common example of the non-iid data is setups involving sequential decision making - eg Reinforcement learning.</p>
+  </li>
+  <li>
+    <p><a href="https://marcpickett.com/cl2018/CL-2018_paper_48.pdf">Paper</a></p>
+  </li>
+</ul>
+
+<h2 id="benchmark">Benchmark</h2>
+
+<ul>
+  <li>
+    <p>Many existing benchmarks use MNIST as the underlying dataset (eg Permuted MNIST, Split MNIST, etc). These benchmarks lack complexity and make it hard to observe positive and negative backward transfer.</p>
+  </li>
+  <li>
+    <p>Most works focus only on the catastrophic forgetting challenge and ignore the other issues (like computation and memory footprint, the capacity of the network, etc).</p>
+  </li>
+  <li>
+    <p>The paper proposes a new benchmark based on Starcraft II video game to understand the different approaches for lifelong learning.</p>
+  </li>
+  <li>
+    <p>The sequence of tasks is designed to be a curriculum - the learning agent stats with learning simple skills and later move to more complex tasks. These complex tasks require remembering and composing skills learned in the earlier levels.</p>
+  </li>
+  <li>
+    <p>To evaluate for catastrophic forgetting, the tasks are designed such that not all the skills are needed for solving each task. Hence the learning agent needs to remember skills even though they are not needed at the current level.</p>
+  </li>
+  <li>
+    <p>Each level comes with a fixed computational budget of episodes and each episode has a fixed time limit. Once the budget is consumed the agent has to proceed to the next level. Hence agents with better sample efficiency would benefit.</p>
+  </li>
+  <li>
+    <p>The benchmark supports both RL and supervised learning version. In the supervised version, expert agents (pretrained on each level) are also provided.</p>
+  </li>
+  <li>
+    <p>Baselines are provided for distillation (using experts): sequential training (fine tuning), Dropout and SER. None of the baseline methods achieve positive or negative backward transfer.</p>
+  </li>
+  <li>
+    <p>When modeled as a pure RL task, the benchmark is extremely difficult to solve.</p>
+  </li>
+  <li>
+    <p>The paper suggests using a metric to record the amount of learning/data required to recover performance on the previous task.</p>
+  </li>
+</ul>
diff --git a/_site/site/2019/05/14/Multiple-Model-Based-Reinforcement-Learning.html b/_site/site/2019/05/14/Multiple-Model-Based-Reinforcement-Learning.html
new file mode 100644
index 00000000..9ce04fb3
--- /dev/null
+++ b/_site/site/2019/05/14/Multiple-Model-Based-Reinforcement-Learning.html
@@ -0,0 +1,56 @@
+<ul>
+  <li>
+    <p>The paper presents some general ideas and mechanisms for multiple model-based RL. Even though the task and model architecture may not be very relevant now, I find the general idea and the mechanisms to be quite useful. As such, I am focusing only on high-level ideas and not the implementation details themselves.</p>
+  </li>
+  <li>
+    <p>The main idea behind Multiple Model-based RL (MMRL) is to decompose complex tasks into multiple domains in space and time so that the environment dynamics within each domain is predictable.</p>
+  </li>
+  <li>
+    <p><a href="https://www.mitpressjournals.org/doi/abs/10.1162/089976602753712972">Link to the paper</a></p>
+  </li>
+  <li>
+    <p>MMRL proposes an RL architecture composes of multiple modules, each with its own state prediction model and RL controller.</p>
+  </li>
+  <li>
+    <p>The prediction error from each of the state prediction model defines the “responsibility signal” for each module.</p>
+  </li>
+  <li>
+    <p>This responsibility signal is used to:</p>
+
+    <ul>
+      <li>
+        <p>Weigh the state prediction output ie the predicted state is the weighted sum of individual state predictions (weighted by the responsibility signal).</p>
+      </li>
+      <li>
+        <p>Weigh the parameter update of the environment models as well as the RL controllers.</p>
+      </li>
+      <li>
+        <p>Weighing the action output - ie predicted action is a weighted sum of individual actions.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>The framework is amenable for incorporating prior knowledge about which module should be selected.</p>
+  </li>
+  <li>
+    <p>In the modular decomposition of a task, the modules should not change too frequently and some kind of spatial and temporal continuity is also desired.</p>
+  </li>
+  <li>
+    <p>Temporal continuity can be accounted for by using the previous responsibility signal as input during the current timestep.</p>
+  </li>
+  <li>
+    <p>Spatial continuity can b ensured by considering a spatial prior like the Gaussian spatial prior.</p>
+  </li>
+  <li>
+    <p>Though model-free methods could be used for learning the RL controllers, model-based methods could be more relevant given that the modules are learning state-prediction models as well.</p>
+  </li>
+  <li>
+    <p>Exploration can be ensured by using a stochastic version of greedy action selection.</p>
+  </li>
+  <li>
+    <p>One failure mode for such modular architectures is when a single module tries to perform well across all the tasks. The modules themselves should be relatively simplistic (eg linear models) which can learn quickly and generalize well.</p>
+  </li>
+  <li>
+    <p>Non-stationary hunting task in a grid world and non-linear, non-stationary control task of swinging up a pendulum provides the proof of concept for the proposed methods.</p>
+  </li>
+</ul>
diff --git a/_site/site/2019/05/21/Good-Enough-Compositional-Data-Augmentation.html b/_site/site/2019/05/21/Good-Enough-Compositional-Data-Augmentation.html
new file mode 100644
index 00000000..fec0d400
--- /dev/null
+++ b/_site/site/2019/05/21/Good-Enough-Compositional-Data-Augmentation.html
@@ -0,0 +1,47 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper introduces a simple data augmentation protocol that provides a good compositional inductive bias for sequential models.</p>
+  </li>
+  <li>
+    <p>Synthetic examples are created by taking real sequences and replacing the fragments in sequences which appear in similar environments. This operation is referred to as GECA (Good Enough Compositional Augmentation).</p>
+  </li>
+  <li>
+    <p>The underlying idea is that if two fragments of training examples occur in some environment, then any environment where the first fragment appears is also a valid environment for the second fragment.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1904.09545">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<ul>
+  <li>
+    <p>Discover substitutable fragments (ie pairs of fragments that co-occur with a common fragment) and use them to generate new sequences by swapping fragments.</p>
+  </li>
+  <li>
+    <p>The current work uses very simple criteria to decide if fragments are substitutable - fragments should occur in at least one lexical environment that is exactly the same. A lexical environment is the k-word window around each span of the fragment.</p>
+  </li>
+  <li>
+    <p>Though the idea can be motivated by work in generative syntax and distributional semantics, it would not hold like a physical law when applied to the real data.</p>
+  </li>
+  <li>
+    <p>The authors view this tradeoff as a balance between the shortage of training data vs relative frequency of mistake in the proposed data augmentation approach.</p>
+  </li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<ul>
+  <li>
+    <p>The approach is evaluated on the SCAN dataset when the model is trained on the short sequence of English commands. Though the dataset augmentation helps the baseline models, it is not surprising given the nature of the SCAN dataset.</p>
+  </li>
+  <li>
+    <p>More challenging tasks (for evaluating the proposed approach) are semantic parsing (where the query is represented in the form of λ calculus or SQL and low resource language modeling. While the improvement (in terms of metrics) is sometimes limited, the gains are consistent across different datasets.</p>
+  </li>
+  <li>
+    <p>Given that the proposed approach is relatively simple and straightforward, it appears to be quite promising.</p>
+  </li>
+</ul>
diff --git a/_site/site/2019/06/01/Relational-Reinforcement-Learning.html b/_site/site/2019/06/01/Relational-Reinforcement-Learning.html
new file mode 100644
index 00000000..724c013d
--- /dev/null
+++ b/_site/site/2019/06/01/Relational-Reinforcement-Learning.html
@@ -0,0 +1,121 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Relational Reinforcement Learning (RRL) paradigm uses relational state (and action) space and policy representation to leverage the generalization capability of relational learning for reinforcement learning.</p>
+  </li>
+  <li>
+    <p>The paper shows that effectiveness of RRL - in terms of generalization, sample efficiency and interplay - using box-world and StarCraft II minigames.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1806.01830">Link to the paper</a>.</p>
+  </li>
+</ul>
+
+<h2 id="architecture">Architecture</h2>
+
+<ul>
+  <li>
+    <p>The main idea is to use neural network models that operate on structured representations and perform relational reasoning via iterated, message-passing style methods.</p>
+  </li>
+  <li>
+    <p>Use of non-local computations using a shared function (in terms of pairwise interactions between entities) provides a better inductive bias.</p>
+  </li>
+  <li>
+    <p>Multi-head dot product attention mechanism is used to model the pairwise interactions (with one or more attention blocks).</p>
+  </li>
+  <li>
+    <p>Iterative computations can be used to capture higher-order interactions between entities.</p>
+  </li>
+  <li>
+    <p>Entity extraction is based on the assumption that entities are things located at a particular point in space.</p>
+  </li>
+  <li>
+    <p>A CNN is used to parse the pixel space observation into <em>k</em> feature maps of size <em>nxn</em>. The <em>(x, y)</em> coordinates are concatenated to each <em>k-</em>dimensional pixel feature-vector to indicate the pixel’s position in the map.</p>
+  </li>
+  <li>
+    <p>The resulting <em>n<sup>2</sup> x k</em> matrix acts as the entity matrix.</p>
+  </li>
+  <li>
+    <p>Actor-critic architecture (using distributed agent IMPALA) is used.</p>
+  </li>
+</ul>
+
+<h2 id="environment">Environment</h2>
+
+<h3 id="box-world">Box-World</h3>
+
+<ul>
+  <li>
+    <p>12 x 12-pixel room with keys and boxes placed randomly.</p>
+  </li>
+  <li>
+    <p>Agent can move in 4 directions.</p>
+  </li>
+  <li>
+    <p>The task is to collect gems by unlocking boxes (which may contain keys to unlock other boxes).</p>
+  </li>
+  <li>
+    <p>Each level has a unique sequence in which boxes need to be opened as opening the wrong box could make the level unsolvable.</p>
+  </li>
+  <li>
+    <p>Difficulty of a level can be controlled using: (i) Number of boxes in the path to the goal. (ii) The number of distractor branches, (iii)  Length of distractor branches.</p>
+  </li>
+</ul>
+
+<h3 id="starcraft-ii-minigames">StarCraft II minigames</h3>
+
+<ul>
+  <li>9 mini games designed as specific scenarios in the Starcraft game are used.</li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<h3 id="box-world-1">Box-World</h3>
+
+<ul>
+  <li>
+    <p>RRL agents solve over 98% of the levels while the RL agent solves less than 95% of the levels.</p>
+  </li>
+  <li>
+    <p>Visualising the attention scores indicate that:</p>
+
+    <ul>
+      <li>
+        <p>keys attend to locks they can unlock.</p>
+      </li>
+      <li>
+        <p>all objects attend to agent’s location.</p>
+      </li>
+      <li>
+        <p>agent and gem attend to each other (and themselves).</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Generalization capacity is tested in two ways:</p>
+
+    <ul>
+      <li>
+        <p>Performance on levels that require opening a larger sequence of boxes than it is trained on.</p>
+      </li>
+      <li>
+        <p>Performance on levels that require key-lock combinations not seen during training.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>In both the scenarios, the RRL agent significantly outperforms the RL agent.</p>
+  </li>
+</ul>
+
+<h2 id="starcraft">StarCraft</h2>
+
+<ul>
+  <li>
+    <p>RLL agent achieves better or equal results that the RL agent in all but one game.</p>
+  </li>
+  <li>
+    <p>For testing generalization, the agent, that was trained for controlling two marines, was transferred on the task which requires it to control 5 marines. These results are not conclusive given the high variability.</p>
+  </li>
+</ul>
diff --git a/_site/site/2019/06/08/Meta-Reinforcement-Learning-of-Structured-Exploration-Strategies.html b/_site/site/2019/06/08/Meta-Reinforcement-Learning-of-Structured-Exploration-Strategies.html
new file mode 100644
index 00000000..759c1ddf
--- /dev/null
+++ b/_site/site/2019/06/08/Meta-Reinforcement-Learning-of-Structured-Exploration-Strategies.html
@@ -0,0 +1,92 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper looks at the problem of learning structured exploration policies for training RL agents.</p>
+  </li>
+  <li>
+    <p>Link to the <a href="https://arxiv.org/abs/1802.07245">paper</a></p>
+  </li>
+</ul>
+
+<h2 id="structured-exploration">Structured Exploration</h2>
+
+<ul>
+  <li>
+    <p>Consider a stochastic, parameterized policy π<sub>θ</sub>(a|s) where θ represents the <em>policy-parameters</em>.</p>
+  </li>
+  <li>
+    <p>To encourage exploration, noise can be added to the policy at each time step t. But the noise added in such a manner does not have any notion of temporal coherence.</p>
+  </li>
+  <li>
+    <p>Another issue is that if the policy is represented by a simple distribution (say parameterized unimodal Gaussian), it can not model complex time-correlated stochastic processes.</p>
+  </li>
+  <li>
+    <p>The paper proposes to condition the policy on per-episode random variables (z) which are sampled from a learned latent distribution.</p>
+  </li>
+  <li>
+    <p>Consider a distibution over the tasks p(T). At the start of any episode of the i<sup>th</sup> task, a latent variable z<sub>i</sub> is sampled from the distribution <em>N(μ<sub>i</sub>, σ<sub>i</sub>)</em> where μ<sub>i</sub> and σ<sub>i</sub> are the learned parameters of the distribution and are referred to as the <em>variation parameters</em>.</p>
+  </li>
+  <li>
+    <p>Once sampled, the same <em>z<sub>i</sub></em> is used to condition the policy for as long as the current episode lasts and the action is sampled from then distribution π<sub>θ</sub>(a|s, z<sub>i</sub>).</p>
+  </li>
+  <li>
+    <p>The intuition is that the latent variable z<sub>i</sub> would encode the notion of a task or goal that does not change arbitrarily during the episode.</p>
+  </li>
+</ul>
+
+<h2 id="model-agnostic-exploration-with-structured-noise">Model Agnostic Exploration with Structured Noise</h2>
+
+<ul>
+  <li>
+    <p>The paper focuses on the setting where the structured exploration policies are to be learned while leveraging the learning from prior tasks.</p>
+  </li>
+  <li>
+    <p>A meta-learning approach, called as model agnostic exploration with structured noise (MAESN) is proposed to learn a good initialization of the <em>policy-parameters</em> and to learn a latent space (for sampling the z from) that can inject structured stochasticity in the policy.</p>
+  </li>
+  <li>
+    <p>General meta-RL approaches have two limitations when it comes to “learning to explore”:</p>
+
+    <ul>
+      <li>Casting meta-RL problems as RL problems lead to policies that do not exhibit sufficient variability to explore effectively.</li>
+      <li>Many current approaches try to meta-learn the entire learning algorithm which limits the asymptotic performance of the model.</li>
+    </ul>
+  </li>
+  <li>
+    <p>Idea behind MAESN is to meta-train <em>policy-parameters</em> so that they learn to use the task-specific <em>latent variables</em> for exploration and can quickly adapt to a new task.</p>
+  </li>
+  <li>
+    <p>An important detail is that the parameters are optimized to maximize the expected rewards after one step of gradient update to ensure that the policy uses the latent variables for exploration.</p>
+  </li>
+  <li>
+    <p>For every iteration of meta-training, an “inner” gradient update is performed on the variational parameters and the <em>post-inner-update</em> parameters are used to perform the meta-update.</p>
+  </li>
+  <li>
+    <p>The authors report that performing the “inner” gradient update on the <em>policy-parameters</em> does not help the overall learning objective and that the step size for each parameter had to be meta-learned.</p>
+  </li>
+  <li>
+    <p>The variation parameters have the usual KL divergence loss which encourages them to be close to the prior distribution (unit Gaussian in this case).</p>
+  </li>
+  <li>
+    <p>After training, the <em>variational parameters</em> for each task are quite close to the prior probably because the training objective optimizes for the expected reward after one step of gradient descent on the <em>variational parameters</em>.</p>
+  </li>
+  <li>
+    <p>Another implementation detail is that reward shaping is used to ensure that the policy gets useful signal during meta-training. To be fair to the baselines, reward shaping is used while training baselines as well. Moreover, the policies trained with reward shaping generalizes to sparse reward setup as well (during meta-test time).</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>Three tasks distributions: Robotic Manipulation, Wheeled Locomotion, and Legged Locomotion. Each task distribution has 100 meta-training tasks.</p>
+  </li>
+  <li>
+    <p>In the Manipulation task distribution, the learner has to push different blocks from different positions to different goal positions. In the Locomotion task distributions, the different tasks correspond to the different goal positions.</p>
+  </li>
+  <li>
+    <p>The experiments show that the proposed approach can adapt to new tasks quickly and the learn coherent exploration strategy.</p>
+  </li>
+</ul>
+
+<p>• In some cases, learning from scratch also provides a strong asymptotic performance although learning from scratch takes much longer.</p>
diff --git a/_site/site/2019/06/13/Extrapolating-Beyond-Suboptimal-Demonstrations-via-Inverse-Reinforcement-Learning-from-Observations.html b/_site/site/2019/06/13/Extrapolating-Beyond-Suboptimal-Demonstrations-via-Inverse-Reinforcement-Learning-from-Observations.html
new file mode 100644
index 00000000..a2a21f00
--- /dev/null
+++ b/_site/site/2019/06/13/Extrapolating-Beyond-Suboptimal-Demonstrations-via-Inverse-Reinforcement-Learning-from-Observations.html
@@ -0,0 +1,72 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper proposes a new inverse RL (IRL) algorithm, called as Trajectory-ranked Reward EXtrapolation (T-REX) that learns a reward function from a collection of ranked trajectories.</p>
+  </li>
+  <li>
+    <p>Standard IRL approaches aim to learn a reward function that “justifies” the demonstration policy and hence those approaches cannot outperform the demonstration policy.</p>
+  </li>
+  <li>
+    <p>In contrast, T-REX aims to learn a reward function that “explains” the ranking over demonstrations and can learn a policy that outperforms the demonstration policy.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1904.06387">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<ul>
+  <li>
+    <p>The input is a sequence of trajectories <em>T<sub>1</sub>, … T<sub>m</sub></em> which are ranked in the order of preference. That is, given any pair of trajectories, we know which of the two trajectories is better.</p>
+  </li>
+  <li>
+    <p>The setup is to learn from observations where the learning agent does not have access to the true reward function or the action taken by the demonstration policy.</p>
+  </li>
+  <li>
+    <p>Reward Inference</p>
+
+    <ul>
+      <li>
+        <p>A parameterized reward function <em>r<sub>θ</sub></em> is trained with the ranking information using a binary classification loss function which aims to predict which of the two given trajectory would be ranked higher.</p>
+      </li>
+      <li>
+        <p>Given a trajectory, the reward function predicts the reward for each state. The sum of rewards (corresponding to the two trajectories) is used used to predict the preferred trajectory.</p>
+      </li>
+      <li>
+        <p>T-REX uses partial trajectories instead of full trajectories as a data augmentation strategy.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Policy Optimization</p>
+
+    <ul>
+      <li>Once a reward function has been learned, standard RL approaches can be used to train a new policy.</li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<ul>
+  <li>
+    <p>Environments: Mujoco (Half Cheetah, Ant, Hooper), Atari</p>
+  </li>
+  <li>
+    <p>Demonstrations generated using PPO (checkpointed at different stages of training).</p>
+  </li>
+  <li>
+    <p>Ensemble of networks used to learn the reward functions.</p>
+  </li>
+  <li>
+    <p>The proposed approach outperforms the baselines <a href="https://arxiv.org/abs/1805.01954">Behaviour Cloning from Observations</a> and <a href="https://arxiv.org/abs/1606.03476">Generative Adversarial Imitation Learning</a>.</p>
+  </li>
+  <li>
+    <p>In terms of reward extrapolation, T-REX can predict the reward for trajectories which are better than the demonstration trajectories.</p>
+  </li>
+  <li>
+    <p>Some ablation studies considered the effect of adding noise (random swapping the preference between trajectories) and found that the model is somewhat robust to noise up to an extent.</p>
+  </li>
+</ul>
diff --git a/_site/site/2019/06/20/Hamiltonian-Neural-Networks.html b/_site/site/2019/06/20/Hamiltonian-Neural-Networks.html
new file mode 100644
index 00000000..ee138e4d
--- /dev/null
+++ b/_site/site/2019/06/20/Hamiltonian-Neural-Networks.html
@@ -0,0 +1,79 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper proposes a very cool idea at the intersection of deep learning and physics.</p>
+  </li>
+  <li>
+    <p>The idea is to train a neural network architecture that builds on the concept of Hamiltonian Mechanics (from Physics) to learn physical conservation laws in an unsupervised manner.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1906.01563">Link to the paper</a></p>
+  </li>
+  <li>
+    <p><a href="https://github.com/greydanus/hamiltonian-nn">Link to the code</a></p>
+  </li>
+  <li>
+    <p><a href="https://greydanus.github.io/2019/05/15/hamiltonian-nns/">Link to author’s blog</a></p>
+  </li>
+</ul>
+
+<h2 id="hamiltonian-mechanics">Hamiltonian Mechanics</h2>
+
+<ul>
+  <li>
+    <p>It is a branch of physics that can describe systems which follow some conservation laws and invariants.</p>
+  </li>
+  <li>
+    <p>Consider a set of <em>N</em> pair of coordinates [(q<sub>1</sub>, p<sub>1</sub>), …, (q<sub>N</sub>, p<sub>N</sub>)] where <strong>q</strong> = [q<sub>1</sub>, …, q<sub>N</sub>] dnotes the position of the set of objects while <strong>p</strong> = [p<sub>1</sub>, …, p<sub>N</sub>] denotes the momentum of the set of variables.</p>
+  </li>
+  <li>
+    <p>Together these <em>N</em> pairs completely describe the system.</p>
+  </li>
+  <li>
+    <p>A scalar function <em>H(<strong>q</strong>, <strong>p</strong>)</em>, called as the Hamiltonian is defined such that the partial derivative of <em>H</em> with respect to <strong>p</strong> is equal to derivative of <strong>q</strong> with respect to time <em>t</em> and the negative of partial derivative of <em>H</em> with respect to <strong>q</strong> is equal to derivative of <strong>p</strong> with respect to time <em>t</em>.</p>
+  </li>
+  <li>
+    <p>This can be expressed in the form of the equation as follows:</p>
+  </li>
+</ul>
+
+<p><img src="https://raw.githubusercontent.com/shagunsodhani/papers-I-read/master/assets/HNN/equation1.png" alt="equation1" width="100" height="100" /></p>
+
+<ul>
+  <li>The Hamiltonian can be tied to the total energy of the system and can be used in any system where the total energy is conserved.</li>
+</ul>
+
+<h2 id="hamiltonian-neural-network-hnn">Hamiltonian Neural Network (HNN)</h2>
+
+<ul>
+  <li>
+    <p>The Hamiltonian <em>H</em> can be parameterized using a neural network and can learn conserved quantities from the data in an unsupervised manner.</p>
+  </li>
+  <li>
+    <p>The loss function looks as follows:</p>
+  </li>
+</ul>
+
+<p><img src="https://raw.githubusercontent.com/shagunsodhani/papers-I-read/master/assets/HNN/equation2.png" alt="equation2" width="400" height="50" /></p>
+
+<ul>
+  <li>The partial derivatives can be obtained by computing the <em>in-graph</em> gradient of the output variables with respect to the input variables.</li>
+</ul>
+
+<h2 id="observations">Observations</h2>
+
+<ul>
+  <li>
+    <p>For setups where the energy must be conserved exactly, (eg ideal mass-spring and ideal pendulum), the HNN learn to preserve an energy-like scalar.</p>
+  </li>
+  <li>
+    <p>For setups where the energy need not be conserved exactly, the HNNs still learn to preserve the energy thus highlighting a limitation of HNNs.</p>
+  </li>
+  <li>
+    <p>In case of two body problems, the HNN model is shown to be much more robust when making predictions over longer time horizons as compared to the baselines.</p>
+  </li>
+  <li>
+    <p>In the final experiment, the model is trained on pixel observations and not state observations. In this case, two auxiliary losses are added: auto-encoder reconstruction loss and a loss on the latent space representations. Similar to the previous experiments, the HNN model makes robust predictions over much longer time horizons.</p>
+  </li>
+</ul>
diff --git a/_site/site/2019/06/27/Measuring-Abstract-Reasoning-in-Neural-Networks.html b/_site/site/2019/06/27/Measuring-Abstract-Reasoning-in-Neural-Networks.html
new file mode 100644
index 00000000..70ab4f5e
--- /dev/null
+++ b/_site/site/2019/06/27/Measuring-Abstract-Reasoning-in-Neural-Networks.html
@@ -0,0 +1,165 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper proposes a dataset to diagnose the abstract reasoning capabilities of learning systems.</p>
+  </li>
+  <li>
+    <p>The paper shows that a variant of the relational networks, explicitly designed for abstract reasoning, outperforms models like ResNets.</p>
+  </li>
+  <li>
+    <p><a href="http://proceedings.mlr.press/v80/santoro18a/santoro18a.pdf">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="idea">Idea</h2>
+
+<ul>
+  <li>
+    <p>Visual reasoning tasks, that are inspired by the human IQ test, are used to evaluate the models in terms of generalization.</p>
+  </li>
+  <li>
+    <p>Let’s say that we want to test if the model understands the abstract notion of “increasing”. We could train the model on data that captures the notion of “increasing”, in terms of say increasing size (or quantities) of objects and then test it on a dataset where the notion is expressed in terms of increasing intensity of color.</p>
+  </li>
+  <li>
+    <p>The dataset is then used to evaluate if the models can find any solution to such abstract reasoning tasks and how well they generalize when the abstract content is specifically controlled.</p>
+  </li>
+</ul>
+
+<h2 id="dataset">Dataset</h2>
+
+<h3 id="ravens-progressive-matrics-rpms">Raven’s Progressive Matrics (RPMs):</h3>
+
+<ul>
+  <li>
+    <p>Consists of an incomplete 3x3 matrix of images where the missing image needs to be filled in, typically by choosing from a set of candidate images.</p>
+  </li>
+  <li>
+    <p>As such, it is possible to justify multiple answers to be correct though, in practice, the right answer is the one with the simplest explanation.</p>
+  </li>
+</ul>
+
+<h3 id="procedurally-generated-matrices-pgms">Procedurally Generated Matrices (PGMs)</h3>
+
+<ul>
+  <li>
+    <p>Generating RPM like matrices procedurally by building an abstract structure for matrices.</p>
+  </li>
+  <li>
+    <table>
+      <tbody>
+        <tr>
+          <td>The abstract structure <em>S</em> consists of 3 components: (i) Relation types (<em>R</em>), (ii) Object types (<em>O</em>) and (iii) Attribute types (<em>A</em>). ie *S = {(r, o, a)</td>
+          <td>r in R, o in O and a in A}*.</td>
+        </tr>
+      </tbody>
+    </table>
+  </li>
+  <li>
+    <p>This can be read as: “Structure <em>S</em> is instantiated on attribute <em>a</em> of object <em>o</em> and exhibits the relation <em>r</em>”. For example, <em>S</em> is instantiated on “color” of object “shape” and exhibits the relation “increasing”.</p>
+  </li>
+  <li>
+    <p>In general, the structure could be made of more than one such tuple and more are the tuples, harder is the task.</p>
+  </li>
+  <li>
+    <p>Given the structure, sample values <em>v</em> for each attribute <em>a</em> while conforming with the relation <em>r</em>. For example, if the attribute is “color” and the relation is “increasing”, the intensity of color must increase.</p>
+  </li>
+  <li>The resulting structure is rendered as pixels.</li>
+</ul>
+
+<h2 id="test-for-generalization">Test for Generalization</h2>
+
+<ul>
+  <li>
+    <p>The paper tests for the following generalization scenarios:</p>
+  </li>
+  <li>
+    <p>Neutral: The structure of the training and test data can contain any tuple.</p>
+  </li>
+  <li>
+    <p>Interpolation: The training data contains even-indexed members of the attribute values while the test data contains odd-indexed members of the attribute values.</p>
+  </li>
+  <li>
+    <p>Extrapolation: The training data contains first-half of the attribute values while the test data contains the second-half of the attribute values.</p>
+  </li>
+  <li>
+    <p>Heldout attribute: Training data contains no tuples with (o = shape, a = color) or (o = line, a = type).</p>
+  </li>
+  <li>
+    <p>Heldout triples: Out of 29 possible triples, 7 are held out from training and only used during testing.</p>
+  </li>
+  <li>
+    <p>Heldout pair-of-triples: Out of 400 possible sets of pair of triples, 40 were held out and used only during testing.</p>
+  </li>
+  <li>
+    <p>Heldout pair-of-triples: Out of 400 possible sets of pair of triples, 40 were held out and used only during testing.</p>
+  </li>
+  <li>
+    <p>Heldout attribute pair: Out of 20 (unordered) variable attribute pairs, 4 were held out and used only during testing.</p>
+  </li>
+</ul>
+
+<h2 id="models">Models</h2>
+
+<ul>
+  <li>
+    <p><strong>Input</strong>: 8 context panels (from the 3x3) matrix where the last panel needs to be filled.</p>
+  </li>
+  <li>
+    <p>CNN-MLP - 4 layer CNN with batchnorm and ReLU.</p>
+  </li>
+  <li>
+    <p>ResNet - ResNet-50 (as it perfomed better than ResNet-101 and ResNet-152).</p>
+  </li>
+  <li>
+    <p>LSTM</p>
+  </li>
+  <li>
+    <p>Wild Relation Network (WReN) - A CNN model encodes the 8 panels and the candidate answers and feeds them as input to a relational network.</p>
+  </li>
+  <li>
+    <p>Context-blind ResNet - ResNet network without the context (or the 8 input panels).</p>
+  </li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<ul>
+  <li>
+    <p>WReN model outperforms the other models on the Neutral setup.</p>
+  </li>
+  <li>
+    <p>Models have a harder time differentiating between size than quantity.</p>
+  </li>
+  <li>
+    <p>WRen is the best performing models in all the setups and rest of the discussion only applies to that model.</p>
+  </li>
+  <li>
+    <p>Generalisation is easy in the context of interpolation while worst in case of extrapolation, hinting at the limited generalization capability of the models.</p>
+  </li>
+</ul>
+
+<h2 id="auxiliary-training">Auxiliary Training</h2>
+
+<ul>
+  <li>
+    <p>The model is also trained to predict the relevant relation, object and attribute types using the meta-targets that encode this information.</p>
+  </li>
+  <li>
+    <p>The auxiliary training helps in all the cases. Further, the model’s accuracy on the main task is where in the cases where it solves the auxiliary tasks well.</p>
+  </li>
+</ul>
+
+<h2 id="key-takeaway">Key Takeaway</h2>
+
+<ul>
+  <li>
+    <p>For abstract visual reasoning tasks, the choice of models can make a large difference, the case in consideration of ResNets vs Relational Networks.</p>
+  </li>
+  <li>
+    <p>Using auxiliary loss that encourages the model to “explain” its reasoning (in this case by predicting the attributes, relations, etc) helps to improve the performance on the main task as well.</p>
+  </li>
+  <li>
+    <p>Given that the challenge is motivated by tasks used to measure human IQ, it would have been interesting to get an estimate of human performance on at least a subset of this dataset.</p>
+  </li>
+</ul>
diff --git a/_site/site/2019/07/18/Set-Transformer-A-Framework-for-Attention-based-Permutation-Invariant-Neural-Networks.html b/_site/site/2019/07/18/Set-Transformer-A-Framework-for-Attention-based-Permutation-Invariant-Neural-Networks.html
new file mode 100644
index 00000000..02555698
--- /dev/null
+++ b/_site/site/2019/07/18/Set-Transformer-A-Framework-for-Attention-based-Permutation-Invariant-Neural-Networks.html
@@ -0,0 +1,103 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Consider problems where the input to the model is a set. In such problems (referred to as the set-input problems), the model should be invariant to the permutation of the data points.</p>
+  </li>
+  <li>
+    <p>In “set pooling” methods (<a href="https://arxiv.org/abs/1606.02185">1</a>, <a href="https://arxiv.org/abs/1703.06114">2</a>), each data point (in the input set) is encoded using a feed-forward network and the resulting set of encoded representations are pooled using the “sum” operator.</p>
+  </li>
+  <li>
+    <p>This approach can be shown to be bot permutation-invariant and a universal function approximator.</p>
+  </li>
+  <li>
+    <p>The paper proposes an attention-based network module, called as the Set Transformer, which can model the interactions between the elements of an input set while being permutation invariant.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1810.00825">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="transformer">Transformer</h2>
+
+<ul>
+  <li>
+    <p>An attention function <em>Attn(Q, K, V) = (QK<sup>T</sup>)V</em> is used to map queries <em>Q</em> to output using key-value pairs <em>K, V</em>.</p>
+  </li>
+  <li>
+    <p>In case of multi-head attention, the key, query, and value are projected into <em>h</em> different vectors and attention is applied on all these vectors. The output is a linear transformation of the concatenation of all the vectors.</p>
+  </li>
+</ul>
+
+<h2 id="set-transformer">Set Transformer</h2>
+
+<ul>
+  <li>
+    <p>3 modules are introduced: MAB, SAB and ISAB.</p>
+  </li>
+  <li>
+    <p>Multihead Attention Block (MAB) is a module very similar to to the encoder in the Transformer, without the positional encoding and dropout.</p>
+  </li>
+  <li>
+    <p>Set Attention Block (SAB) is a module that takes as input a set and performs self-attention between the elements of the set to produce another set of the same size ie <em>SAB(X) = MAB(X, X)</em>.</p>
+  </li>
+  <li>
+    <p>The time complexity of the SAB operation is <em>O(n<sup>2</sup>)</em> where <em>n</em> is the number of elements in the set. It can be reduced to <em>O(m*n)</em> by using Induced Set Attention Blocks (ISAB) with <em>m</em> induced point vectors (denoted as I).</p>
+  </li>
+  <li>
+    <p><em>ISAB<sub>m</sub> = MAB(X, MAB(I, X))</em>.</p>
+  </li>
+  <li>
+    <p>ISAB can be seen as performing a low-rank projection of inputs.</p>
+  </li>
+  <li>
+    <p>These modules can be used to model the interactions between data points in any given set.</p>
+  </li>
+</ul>
+
+<h2 id="pooling-by-multihead-attention-pma">Pooling by Multihead Attention (PMA)</h2>
+
+<ul>
+  <li>
+    <p>Aggregation is performed by applying multi-head attention on a set of <em>k</em> seed vectors.</p>
+  </li>
+  <li>
+    <p>The interaction between the <em>k</em> outputs (from PMA) can be modeled by applying another SAB.</p>
+  </li>
+  <li>
+    <p>Thus the entire network is a stack of SABs and ISABs. Both the modules are permutation invariant and so is any network obtained by stacking them.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>Datasets include:</p>
+
+    <ul>
+      <li>Predicting the maximum value from a set.</li>
+      <li>Counting unique (Omniglot) characters from an image.</li>
+      <li>Clustering with a mixture of Gaussians (synthetic points and CIFAR 100).</li>
+      <li>Set Anomaly detection (celebA).</li>
+    </ul>
+  </li>
+  <li>
+    <p>Generally, increasing <em>m</em> (the number of inducing datapoints) improve performance, to some extent. This is somewhat expected.</p>
+  </li>
+  <li>
+    <p>The paper considers various ablations of the proposed approach (like disabling attention in the encoder or pooling layer) and shows that attention mechanism is needed during both the stages.</p>
+  </li>
+  <li>
+    <p>The work has two main benefits over prior work:</p>
+
+    <ul>
+      <li>
+        <p>Reducing the <em>O(n<sup>2</sup>)</em> complexity to <em>O(m*n)</em> complexity.</p>
+      </li>
+      <li>
+        <p>Using self-attention mechanism for both encodings the inputs and for aggregating the encoded representations.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
diff --git a/_site/site/2019/07/25/Quantifying-Generalization-in-Reinforcement-Learning.html b/_site/site/2019/07/25/Quantifying-Generalization-in-Reinforcement-Learning.html
new file mode 100644
index 00000000..5db404b6
--- /dev/null
+++ b/_site/site/2019/07/25/Quantifying-Generalization-in-Reinforcement-Learning.html
@@ -0,0 +1,128 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper introduces a new, procedurally generated environment called as CoinRun that is designed to benchmark the generalization capabilities of RL algorithms.</p>
+  </li>
+  <li>
+    <p>The paper reports that deep convolutional architectures and techniques like L2 regularization, batch norm, etc (which were proposed in the context of generalization in supervised learning) are also useful for RL.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1812.02341">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="coinrun-environment">CoinRun Environment</h2>
+
+<ul>
+  <li>
+    <p>CoinRun is made of multiple levels.</p>
+  </li>
+  <li>
+    <p>In each level, the agent spawns on the far left side and needs to collect a single coin that lies on the far right side.</p>
+  </li>
+  <li>
+    <p>There are many obstacles in between and colliding with an obstacle leads to agent’s death.</p>
+  </li>
+  <li>
+    <p>Each episode extends for a maximum for 1000 steps.</p>
+  </li>
+  <li>
+    <p>CoinRun is designed such that given sufficient training time and levels, a near-optimal policy can be learned for all the levels.</p>
+  </li>
+</ul>
+
+<h2 id="generalization">Generalization</h2>
+
+<ul>
+  <li>
+    <p>Generalization can be measure by training an agent on a given set of training tasks and evaluating on an unseen set of test tasks.</p>
+  </li>
+  <li>
+    <p>9 agents are trained to play CoinRun, on different training sets (each with a different number of levels).</p>
+  </li>
+  <li>
+    <p>The first 8 agents are trained on sets of size 100 to 16000 levels while the last agent is trained on an unbounded set of levels.</p>
+  </li>
+  <li>
+    <p>Training a model on an unbounded set of levels provides a good proxy for the train-to-test generalization performance.</p>
+  </li>
+</ul>
+
+<h2 id="evaluating-architectures">Evaluating Architectures</h2>
+
+<ul>
+  <li>
+    <p>Two convolutional architectures (of different sizes) are compared:</p>
+
+    <ul>
+      <li>
+        <p>Nature-CNN: The CNN architecture used in the <a href="https://web.stanford.edu/class/psych209/Readings/MnihEtAlHassibis15NatureControlDeepRL.pdf">Deep Q Network</a>. This is the smaller network among the two models.</p>
+      </li>
+      <li>
+        <p>IMPALA-CNN: The CNN architecture used in the <a href="https://arxiv.org/abs/1802.01561">Imapla architecture</a>.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>IMPALA-CNN agent always outperforms the Nature-CNN agent indicating that larger architecture has more capacity for generalization. But increasing the network size beyond a limit gives diminishing returns.</p>
+  </li>
+</ul>
+
+<h2 id="evaluating-regularization">Evaluating Regularization</h2>
+
+<ul>
+  <li>
+    <p>While both L2 regularization and Dropout helps to improve generalization, L2 regularization is more impactful.</p>
+  </li>
+  <li>
+    <p>A domain randomization/data augmentation approach is tested where rectangular regions of different sizes are masked and assigned a random color. This approach seems to improve performance.</p>
+  </li>
+  <li>
+    <p>Batch Normalization helps to improve performance as well.</p>
+  </li>
+  <li>
+    <p>Environment stochasticity is introduced by using sticky actions while policy stochasticity is introduced by controlling the entropy bonus. Both these forms of stochasticity boost performance.</p>
+  </li>
+  <li>
+    <p>While combining different regularization methods help, the gains are only marginally better than using just 1 regularization approach. This suggests that these different approaches induce similar generalization properties.</p>
+  </li>
+</ul>
+
+<h2 id="additional-environments">Additional Environments</h2>
+
+<ul>
+  <li>
+    <p>Two additional environments are also considered to verify the high degree of overfitting observed in the CoinRun environment:</p>
+
+    <ul>
+      <li>
+        <p>CoinRun-Platforms:</p>
+
+        <ul>
+          <li>
+            <p>Unlike CoinRun, each episode can have multiple coins and the time limit is 0increased to 1000 steps.</p>
+          </li>
+          <li>
+            <p>Levels are larger as well so the agent might need to backtrack their steps.</p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>RandomMazes:</p>
+
+        <ul>
+          <li>
+            <p>Partially observed environment with square mazes of dimensions 3x3 to 25x25.</p>
+          </li>
+          <li>
+            <p>Timelimit of 500 steps</p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Overfitting is observed for both these environments as well.</p>
+  </li>
+</ul>
diff --git a/_site/site/2019/08/01/Assessing-Generalization-in-Deep-Reinforcement-Learning.html b/_site/site/2019/08/01/Assessing-Generalization-in-Deep-Reinforcement-Learning.html
new file mode 100644
index 00000000..23fa865e
--- /dev/null
+++ b/_site/site/2019/08/01/Assessing-Generalization-in-Deep-Reinforcement-Learning.html
@@ -0,0 +1,141 @@
+<ul>
+  <li>
+    <p>The paper presents a benchmark and experimental protocol (environments, metrics, baselines, training/testing setup) to evaluate RL algorithms for generalization.</p>
+  </li>
+  <li>
+    <p>Several RL algorithms are evaluated and the key takeaway is that the “vanilla” RL algorithms can generalize better than the RL algorithms that are specifically designed to generalize, given enough diversity in the distribution of the training environments.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1810.12282">Link to the paper</a></p>
+  </li>
+  <li>
+    <p>The focus is on evaluating generalization to environmental changes that affect the system dynamics (and not the goal or rewards).</p>
+  </li>
+  <li>
+    <p>Two generalization regimes are considered:</p>
+
+    <ul>
+      <li>
+        <p>Interpolation - parameters of the test environment are similar to the parameters of the training environment.</p>
+      </li>
+      <li>
+        <p>Extrapolation - parameters of the test environment are different from the parameters of the training environment.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Following algorithms are considered as part of the benchmark:</p>
+
+    <ul>
+      <li>
+        <p>“Vanilla” RL algorithms - A2C, PPO</p>
+      </li>
+      <li>
+        <p>RL algorithms that are designed to generalize:</p>
+
+        <ul>
+          <li>
+            <p>EPOpt - Learn a (robust) policy that maximizes the expected reward over the most difficult distribution of environments (ones with the worst expected reward).</p>
+          </li>
+          <li>
+            <p>RL<sup>2</sup> - Learn an (adaptive) policy that can adapt to the current environment/task by considering the trajectory and not just the state transition sequence.</p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>These specially designed RL algorithms can be optimized using either A2C or PPO leading to combinations like EPOpt-A2C or EPOpt-PPO etc.</p>
+      </li>
+      <li>
+        <p>The models are either composed of feedforward networks completely or feedforward + recurrent networks.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Environments</p>
+
+    <ul>
+      <li>
+        <p>CartPole, MountainCar, Acrobot, and Pendulum from OpenAI Gym.</p>
+      </li>
+      <li>
+        <p>HalfCheetah and Hopper from OpenAI Roboschool.</p>
+      </li>
+      <li>
+        <p>Three versions of each environment are considered:</p>
+
+        <ul>
+          <li>
+            <p>Deterministic: Environment parameters are fixed. This case corresponds to the standard environment setup in classical RL.</p>
+          </li>
+          <li>
+            <p>Random: Environment parameters are sampled randomly. This case corresponds to sampling from a distribution of environments.</p>
+          </li>
+          <li>
+            <p>Extreme: Environment parameters are sampled from their extreme values. This case corresponds to the edge-case environments which would not be encountered during training generally.</p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Performance Metrics</p>
+
+    <ul>
+      <li>
+        <p>Average total reward per episode.</p>
+      </li>
+      <li>
+        <p>Success percentage: Percentage of episodes where a certain goal (or reward) is obtained.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Evaluation Metrics/Setups</p>
+
+    <ul>
+      <li>
+        <p>Default: success percentage when training and evaluating the deterministic version of the environment.</p>
+      </li>
+      <li>
+        <p>Interpolation: success percentage when training and evaluating on the random version of the environment.</p>
+      </li>
+      <li>
+        <p>Extrapolation: the geometric mean of the success percentage of following three versions:</p>
+
+        <ul>
+          <li>
+            <p>Train on deterministic and evaluate on the random version.</p>
+          </li>
+          <li>
+            <p>Train on deterministic and evaluate on extreme version.</p>
+          </li>
+          <li>
+            <p>Train on random and evaluate on the extreme version.</p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Observations</p>
+
+    <ul>
+      <li>
+        <p>Extrapolation is harder than interpolation.</p>
+      </li>
+      <li>
+        <p>Increasing the diversity in the training environments improves the interpolation generalization of vanilla RL methods.</p>
+      </li>
+      <li>
+        <p>EPOpt improves generalization only for continuous control environments and only with PPO.</p>
+      </li>
+      <li>
+        <p>RL<sup>2</sup> is difficult to train on the environments considered and did not provide a clear advantage in terms of generalization.</p>
+      </li>
+      <li>
+        <p>EPOpt-PPO outperforms PPO on only 3 environments and EPOpt-A2C does not</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
diff --git a/_site/site/2019/08/08/Deep-Reinforcement-Learning-in-a-Handful-of-Trials-using-Probabilistic-Dynamics-Models.html b/_site/site/2019/08/08/Deep-Reinforcement-Learning-in-a-Handful-of-Trials-using-Probabilistic-Dynamics-Models.html
new file mode 100644
index 00000000..ecfcd8ec
--- /dev/null
+++ b/_site/site/2019/08/08/Deep-Reinforcement-Learning-in-a-Handful-of-Trials-using-Probabilistic-Dynamics-Models.html
@@ -0,0 +1,81 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper proposes a new algorithm called as Probabilistic Ensemble with Trajectory sampling (PETS) that combines uncertainty aware deep learning models (ensemble of deep learning models that encode uncertainty) with sampling-based uncertainty propagation.</p>
+  </li>
+  <li>
+    <p>PETS improves over other probabilistic MBRL approaches by isolating epistemic uncertainty (due to limited training data) and aleatoric uncertainty (inherent in the system).</p>
+  </li>
+  <li>
+    <p><a href="">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="uncertainty-aware-neural-network-dynamics-model">Uncertainty-Aware Neural Network Dynamics Model</h2>
+
+<ul>
+  <li>
+    <p>Aleatoric uncertainty can be accounted for by learning a parameterized distribution (probabilistic neural network) trained with negative log-likelihood.</p>
+  </li>
+  <li>
+    <p>Epistemic uncertainty can be accounted for by either having an infinite amount of data or by using ensembles.</p>
+  </li>
+  <li>
+    <p>The paper uses a neural network to predict the mean and standard deviation of a gaussian distribution which defines the predictive model. This setup is referred to as the “probabilistic” model and denoted by <strong>P</strong>.</p>
+  </li>
+  <li>
+    <p>The alternate setup of the deterministic model is where a neural network is used to make a point prediction (and is denoted by <strong>D</strong>).</p>
+  </li>
+  <li>
+    <p>Ensemble of probabilistic models is denoted as <strong>PE</strong> while that of deterministic models is denoted as <strong>DE</strong>.</p>
+  </li>
+</ul>
+
+<h2 id="planning-and-control-with-learned-dynamics">Planning and Control with learned Dynamics</h2>
+
+<ul>
+  <li>
+    <p>Model Predictive Control (MPC) is used for planning.</p>
+  </li>
+  <li>
+    <p>Given a start state and an action sequence, the probabilistic dynamics model induces a distribution over the trajectories.</p>
+  </li>
+  <li>
+    <p>The first action, among the sequence of optimized actions, is executed.</p>
+  </li>
+  <li>
+    <p>Instead of random shooting, <a href="https://www.sciencedirect.com/science/article/pii/B9780444538598000035">Cross Entropy Method (CEM)</a> is used.</p>
+  </li>
+</ul>
+
+<h2 id="trajectory-sampling">Trajectory Sampling</h2>
+
+<ul>
+  <li>
+    <p>Let us say there are B bootstrap models in the ensemble. Given the current state, P particles are created and each particle is propagated using one of the bootstrap models. Two variants are considered:</p>
+
+    <ul>
+      <li>
+        <p>TS1 - At each timestep, each particle samples a bootstrap. In this case, particle separation can not be attributed to ti the compounding effects of the bootstraps.</p>
+      </li>
+      <li>
+        <p>TS$\infty$ - The bootstrapped model (per particle) is sampled just once and is not changed after that. This setup separates aleatoric and epistemic uncertainty. Aleatoric state variance is the average variance of the particles of the same bootstrap while epistemic state variance is the variance of the average of particles of same bootstrap indexes.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="result">Result</h2>
+
+<ul>
+  <li>
+    <p>The proposed approach reaches the asymptotic performance of state-of-the-art model-free algorithms in much fewer samples.</p>
+  </li>
+  <li>
+    <p>The general performance trend is probabilistic emsemble &gt; probabilisitc model &gt; deterministc ensemble &gt; determinisitc model./.</p>
+  </li>
+  <li>
+    <p>Initial experiments for learning policy by propagating gradients through the ensemble of models did not work and has been left as future work.</p>
+  </li>
+</ul>
diff --git a/_site/site/2019/08/15/Abductive-Commonsense-Reasoning.html b/_site/site/2019/08/15/Abductive-Commonsense-Reasoning.html
new file mode 100644
index 00000000..543daada
--- /dev/null
+++ b/_site/site/2019/08/15/Abductive-Commonsense-Reasoning.html
@@ -0,0 +1,80 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper presents the task of abductive NLP (pronounced as <em>alpha NLP</em>) where the model needs to perform abductive reasoning.</p>
+  </li>
+  <li>
+    <p>Abductive reasoning is the inference to the most plausible explanation. Even though it is considered to be an important component for understanding narratives, the work in this domain is sparse.</p>
+  </li>
+  <li>
+    <p>A new dataset called as Abstractive Reasoning in narrative Text (ART) consisting of 20K narrative contexts and 200k explanations is also provided. The dataset models the task as multiple-choice questions to make the evaluation process easy.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1908.05739">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="task-setup">Task Setup</h2>
+
+<ul>
+  <li>
+    <p>Given a pair of observations <em>O<sub>1</sub></em> and <em>O<sub>2</sub></em> and two hypothesis <em>h<sub>1</sub></em> and <em>h<sub>2</sub></em>, the task is to select the most plausible hypothesis.</p>
+  </li>
+  <li>
+    <p>In general, <em>P(h | O<sub>1</sub>, O<sub>2</sub>)</em> is propotional to <em>P(h |O<sub>1</sub>)P(O<sub>2</sub>|h, O<sub>1</sub>)</em>.</p>
+  </li>
+  <li>
+    <p>Different independence assumptions can be imposed on the structure of the problem eg one assumption could be that the hypothesis is independent of the observations or the “fully connected” assumption would jointly model both the observations and the hypothesis.</p>
+  </li>
+</ul>
+
+<h2 id="dataset">Dataset</h2>
+
+<ul>
+  <li>
+    <p>Along with crowdsourcing several plausible hypotheses for each observation instance pair, an adversarial filtering algorithm (AF) is used to remove weak pairs of hypothesis.</p>
+  </li>
+  <li>
+    <p>Observation pairs are created using the <a href="https://aclweb.org/anthology/N16-1098">ROCStories dataset</a> which is a collection of short, manually crafted stories of 5 sentences.</p>
+  </li>
+  <li>
+    <p>The average word length for both the content and the hypothesis is between 8 to 9.</p>
+  </li>
+  <li>
+    <p>To collect plausible hypothesis, the crowd workers were asked to fill in a plausible “in-between” sentence in natural language.</p>
+  </li>
+  <li>
+    <p>Given the plausible hypothesis, the crowd workers were asked to create an implausible hypothesis by editing fewer than 6 words.</p>
+  </li>
+  <li>
+    <p>Adversarial filtering approach from <a href="https://aclweb.org/anthology/D18-1009">Zellers et al.</a> is used with BERT as the adversary. A temperature parameter is introduced to control the maximum number of instances that can be changed in each adversarial filtering iteration.</p>
+  </li>
+</ul>
+
+<h2 id="key-observations">Key Observations</h2>
+
+<ul>
+  <li>
+    <p>Human performance: 91.4%</p>
+  </li>
+  <li>
+    <p>Baselines like SVM classifier, the bag-of-words classifier (using Glove) and max-pooling overt BiLSTM representation: approx 50%</p>
+  </li>
+  <li>
+    <p>Entailment NLI baseline: 59%. This highlights the additional complexity of abductive NLI as compared to entailment NLI.</p>
+  </li>
+  <li>
+    <p>BERT: 68.9%</p>
+  </li>
+  <li>
+    <p>GPT: 63.1%</p>
+  </li>
+  <li>
+    <p>Numerical and spatial knowledge-based data points are particularly hard.</p>
+  </li>
+  <li>
+    <p>The model is more likely to fail when the narrative created by the incorrect hypothesis is plausible</p>
+  </li>
+</ul>
+
diff --git a/_site/site/2019/08/22/Large-Memory-Layers-with-Product-Keys.html b/_site/site/2019/08/22/Large-Memory-Layers-with-Product-Keys.html
new file mode 100644
index 00000000..e499a864
--- /dev/null
+++ b/_site/site/2019/08/22/Large-Memory-Layers-with-Product-Keys.html
@@ -0,0 +1,105 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>The paper proposes a structured key-value memory layer that:
+    <ul>
+      <li>Can scale to a very large size (and capacity).</li>
+      <li>Has very low computational overhead.</li>
+      <li>Supports exact search in the keyspace.</li>
+      <li>Can be easily integrated with neural networks.</li>
+    </ul>
+  </li>
+  <li><a href="https://arxiv.org/abs/1907.05242">Link to the paper</a></li>
+</ul>
+
+<h2 id="architecture">Architecture</h2>
+
+<ul>
+  <li>
+    <p>The memory layer is composed of 3 components:</p>
+
+    <ul>
+      <li>
+        <p><strong>Query Network</strong></p>
+
+        <ul>
+          <li>Maps input to a latent space.</li>
+          <li>Can be implemented as a feed-forward network.</li>
+          <li>Adding batch-norm on top of the query network helps to spread out keys.</li>
+        </ul>
+      </li>
+      <li>
+        <p><strong>Key selection module</strong></p>
+
+        <ul>
+          <li>Lets say there are a total of <em>K</em> keys of dimensionality <em>d<sub>q</sub></em> of which we want to select top <em>k</em> keys.</li>
+          <li>Partition the set of keys into two sets of <em>subkeys</em> (say <em>Q<sub>1</sub></em> and <em>Q<sub>2</sub></em>) where each subset has <em>K</em> keys of dimensionality <em>d_q/2</em>.</li>
+          <li>The query is split into two subqueries (say <em>q<sub>1</sub></em> and <em>q<sub>2</sub></em>).</li>
+          <li>Each of these two queries are compared with every query in their corresponding set of <em>subkeys</em>.</li>
+          <li>For example, <em>q<sub>1</sub></em> is compared with every query is <em>Q<sub>1</sub></em>.</li>
+          <li>Top <em>k</em> ranked keys are selected from each set to create two new sets <em>C<sub>1</sub></em> and <em>C2<sub>2</sub></em>.</li>
+          <li>The keys from these two sets are combined uder the concatenation operator to obtain <em>k<sub>2</sub></em> vectors.</li>
+          <li>the final top <em>k</em> (concatenated) keys are searched from these *k<sup>2* keys.</sup></li>
+          <li>The overall complexity is $O((\sqrt K + k^2) \times d_q)$ where <em>K</em> is the total number of keys (whiuc)</li>
+        </ul>
+      </li>
+      <li>
+        <p><strong>Value lookup table</strong></p>
+
+        <ul>
+          <li>The values (corresponding to selected subkeys) are aggregated (using weighted sum operation) to obtain the output.</li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>All the parameters are trainable, though, in practice, only the selected <em>k</em> memory slots are updated.</p>
+  </li>
+  <li>
+    <p>Using Multihead attention mechanism helps to improve the performance further.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>1 or more feedforward layers in transformers are placed by the memory layers.</p>
+  </li>
+  <li>
+    <p>The model is evaluated on large scale language modeling tasks with 140 Gb of data from common crawl corpora (28n billion words).</p>
+  </li>
+  <li>
+    <p>Evaluation metrics</p>
+
+    <ul>
+      <li>
+        <p>Perplexity on the test set.</p>
+      </li>
+      <li>
+        <p>Fraction of accessed values.</p>
+      </li>
+      <li>
+        <p>KL divergence between the (normalized) weights of key access and uniform distribution.</p>
+      </li>
+      <li>
+        <p>The last two metrics are used together to determine how well the keys are utilized.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<ul>
+  <li>
+    <p>Given the large size of the training dataset, adding more layers to the transformer model helps.</p>
+  </li>
+  <li>
+    <p>Effect of using memory layer is more powerful than the effect of adding new layers to the transformer. For example, a 12 layer transformer + memory layer outperforms a 24 layer transformer while being almost twice as fast.</p>
+  </li>
+  <li>
+    <p>The best position to place the memory is at an intermediate layer and placing the memory layer right after the input or just before the softmax layer does not work well in practice.</p>
+  </li>
+</ul>
+
diff --git a/_site/site/2019/08/29/PHYRE-A-New-Benchmark-for-Physical-Reasoning.html b/_site/site/2019/08/29/PHYRE-A-New-Benchmark-for-Physical-Reasoning.html
new file mode 100644
index 00000000..41996628
--- /dev/null
+++ b/_site/site/2019/08/29/PHYRE-A-New-Benchmark-for-Physical-Reasoning.html
@@ -0,0 +1,164 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper proposes the PHYRE (PHYsical REasoning) benchmark - consisting of classic mechanical puzzles in 2D physical environments - as a means to evaluate the physical reasoning ability of machine learning models.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1908.05656">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="environment">Environment</h2>
+
+<ul>
+  <li>
+    <p>2D world that obeys Newtonian mechanics.</p>
+  </li>
+  <li>
+    <p>Gravitational force + Friction.</p>
+  </li>
+  <li>
+    <p>Non-deformable objects that can be static (ie fixed) or dynamic (ie can move and are affected by collisions etc).</p>
+  </li>
+</ul>
+
+<h2 id="task">Task</h2>
+
+<ul>
+  <li>
+    <p>The learning agent starts in some initial world state (ie configuration of objects).</p>
+  </li>
+  <li>
+    <p>Goal is described in the form of (<code class="language-plaintext highlighter-rouge">subject</code>, <code class="language-plaintext highlighter-rouge">relation</code>, <code class="language-plaintext highlighter-rouge">object</code>) where the agent’s task is to satisfy the <code class="language-plaintext highlighter-rouge">relation</code> between the <code class="language-plaintext highlighter-rouge">subject</code> and the <code class="language-plaintext highlighter-rouge">object</code>.</p>
+  </li>
+  <li>
+    <p>Currently, only the “touch” <code class="language-plaintext highlighter-rouge">relation</code> is supported.</p>
+  </li>
+</ul>
+
+<h2 id="setup">Setup</h2>
+
+<ul>
+  <li>
+    <p>The learning agent has to take a single action - placing one or more new dynamic objects in the world.</p>
+  </li>
+  <li>
+    <p>A simulator is run on the new configuration (for a fixed amount of time) to check if the goal condition is satisfied.</p>
+  </li>
+  <li>
+    <p>At the end of the simulation, a binary reward and intermediate observations (collected as the simulator executes) are provided to the learning agent.</p>
+  </li>
+  <li>
+    <p>These observations are 256*256 grids where each grid cell can take 1 of the 7 values (denoting different types of objects).</p>
+  </li>
+  <li>
+    <p>Since only one relation supported currently, the color is sufficient to encode the goal.</p>
+  </li>
+</ul>
+
+<h2 id="benchmark-tiers">Benchmark Tiers</h2>
+
+<ul>
+  <li>
+    <p>Two benchmark tiers are provided where each tier comprises of a combination of:</p>
+
+    <ul>
+      <li>
+        <p>a predefined set of all the actions that the agent is allowed to perform.</p>
+      </li>
+      <li>
+        <p>set of tasks that can be solved by at least one action from the allowed action set.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p><strong>PHYRE-B</strong> - The agent is allowed to place a single (ball of any radii) at any valid location.</p>
+  </li>
+  <li>
+    <p><strong>PHYRE-2B</strong> - The agent is allowed to place 2 balls at any valid pair of locations.</p>
+  </li>
+  <li>
+    <p>Each of the two tiers has 25 task templates where each template comprises of variants of a single task (same goal but different initial conditions).</p>
+  </li>
+</ul>
+
+<h2 id="evaluation">Evaluation</h2>
+
+<ul>
+  <li>
+    <p>Two evaluation setups are considered:</p>
+
+    <ul>
+      <li>
+        <p><strong>within-template</strong> where the agent is trained on some tasks in a template and evaluated on a set of held-out tasks from the same template.</p>
+      </li>
+      <li>
+        <p><strong>cross-template</strong> where the agent is evaluated on tasks from a different template.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>In the training phase, the model has access to the simulator (but not to the correct solution). So the model could learn an action-prediction model or forward dynamics model or both.</p>
+  </li>
+  <li>
+    <p>In the testing phase, the model can query the simulator only a few times. Each query provides it with the binary reward and the intermediate observations.</p>
+  </li>
+</ul>
+
+<h2 id="performance-measure">Performance Measure</h2>
+
+<ul>
+  <li>
+    <p>The emphasis is on solving more tasks (in few queries) during the test phase.</p>
+  </li>
+  <li>
+    <p>This requirement is captured using a metric called AUCCESS.</p>
+  </li>
+  <li>
+    <p>In general, the tasks in PHYRE-2B are harder than tasks in PHYRE-B.</p>
+  </li>
+</ul>
+
+<h2 id="baseline-agents">Baseline Agents</h2>
+
+<ul>
+  <li>
+    <p>Random Agent - Randomly samples actions</p>
+  </li>
+  <li>
+    <p>Non-parametric agent (MEM) - generates R actions at random and uses the simulator to check how many tasks can be solved using these R random actions. During testing, try the R actions in the decreasing order of the number of tasks they solve.</p>
+  </li>
+  <li>
+    <p>Non-parametric agent with online learning (MEM-O) - Variant of MEM where an online adaptation step is performed during test time (to update the rank of the actions).</p>
+  </li>
+  <li>
+    <p>Deep Q Networks with an action encoder, observation encoder and fusion model (combine action and observation representation).</p>
+  </li>
+  <li>
+    <p>DQN with online learning (DQN-0): Variant of DQN with online updates (during the test phase).</p>
+  </li>
+  <li>
+    <p>Contextual bandits.</p>
+  </li>
+  <li>
+    <p>Policy learning approaches like PPO and A2C.</p>
+  </li>
+</ul>
+
+<h2 id="observations">Observations</h2>
+
+<ul>
+  <li>
+    <p>Both Contextual bandits and policy-based approaches show poor training stability.</p>
+  </li>
+  <li>
+    <p>The best agent, DQN-O, reaches AUCCESS of 56.2\% on PHYRE-B and 39.26\% on PHYRE-2B. In general, agents with online adaptation perform better.</p>
+  </li>
+  <li>
+    <p>The tasks are designed such that 100000 attempts are sufficient to solve 100\% of tasks in PHYRE-B and 95\% of tasks in PHYRE-2B.</p>
+  </li>
+  <li>
+    <p>Even though only two tiers are provided right now, the benchmark is readily extensible and new tasks can be added in the future.</p>
+  </li>
+</ul>
diff --git a/_site/site/2019/09/05/How-to-train-your-MAML.html b/_site/site/2019/09/05/How-to-train-your-MAML.html
new file mode 100644
index 00000000..77fc62ce
--- /dev/null
+++ b/_site/site/2019/09/05/How-to-train-your-MAML.html
@@ -0,0 +1,91 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper proposes MAML++ - a modification of MAML algorithm that stabilizes its training improves generalization performance and reduces the computational overhead.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1810.09502">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="notes">Notes</h2>
+
+<h3 id="unstable-training">Unstable Training</h3>
+
+<ul>
+  <li>
+    <p>Training the outer loop requires unfolding the inner loop multiple times.</p>
+  </li>
+  <li>
+    <p>In absence of skip connections, the gradient is multiplied by the same parameter multiple times.</p>
+  </li>
+  <li>
+    <p>Large depth and absent skip connections could lead to exploding and vanishing gradients respectively.</p>
+  </li>
+  <li>
+    <p>The paper proposes to stabilize the gradient propagation by minimizing the target set loss computed by the base-network after every step towards a support set task.</p>
+  </li>
+  <li>
+    <p>It is important to anneal the contribution of earlier steps and increase the contribution of later steps over time.</p>
+  </li>
+</ul>
+
+<h3 id="second-order-derivatives-are-expensive-to-compute">Second Order derivatives are expensive to compute</h3>
+
+<ul>
+  <li>
+    <p>While the first-order MAML is faster, the resulting model may not have as good a generalization error as the second-order MAML.</p>
+  </li>
+  <li>
+    <p>The paper proposes to use derivative order annealing where the first order gradients are used for the first 50 epochs and the network uses second-order gradients from thereon.</p>
+  </li>
+  <li>
+    <p>This derivative order annealing appears to be more stable than models that use second-order derivatives only.</p>
+  </li>
+</ul>
+
+<h3 id="batch-normalization">Batch Normalization</h3>
+
+<ul>
+  <li>
+    <p>In MAML, the statistics of the current batch are used for normalization instead of accumulating the running statistics.</p>
+  </li>
+  <li>
+    <p>The paper proposes to collect the statistics per step which can increase the convergence speed, stability, and generalization performance.</p>
+  </li>
+  <li>
+    <p>In MAML, the batch normalization biases are not updated in the inner-loop which can adversely impact the performance.</p>
+  </li>
+  <li>
+    <p>The paper proposes to learn a set of biases (per step) within the inner loop update.</p>
+  </li>
+</ul>
+
+<h3 id="fixed-learning-rate">Fixed Learning Rate</h3>
+
+<ul>
+  <li>
+    <p>MAML uses a single learning rate across all the steps and all the parameters. This means there is one single learning rate that needs to be hyperparameter to work well for all the layers and steps.</p>
+  </li>
+  <li>
+    <p>An alternate solution would be to learn a separate learning rate per parameter but this can be impractical as it doubles the number of parameters to be learned.</p>
+  </li>
+  <li>
+    <p>The paper proposes to learn a learning rate and direction for each layer in the network, for each step it takes in the inner loop.</p>
+  </li>
+  <li>
+    <p>The paper also proposed to anneal the learning rate of the outer loop (using cosine annealing) as it helps to achieve better generalization.</p>
+  </li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<ul>
+  <li>
+    <p>Using these modifications helps to outperform the MAML model on both Omniglot and MiniImagenet datasets.</p>
+  </li>
+  <li>
+    <p>The biggest benefit comes by learning the per-layer, per-step learning rates and by using the per-step batch normalization.</p>
+  </li>
+</ul>
diff --git a/_site/site/2019/09/12/Gossip-based-Actor-Learner-Architectures-for-Deep-RL.html b/_site/site/2019/09/12/Gossip-based-Actor-Learner-Architectures-for-Deep-RL.html
new file mode 100644
index 00000000..6b725d8e
--- /dev/null
+++ b/_site/site/2019/09/12/Gossip-based-Actor-Learner-Architectures-for-Deep-RL.html
@@ -0,0 +1,62 @@
+<ul>
+  <li>
+    <p><a href="https://arxiv.org/abs/1906.04585">Link to the paper</a></p>
+  </li>
+  <li>
+    <p>The paper considers the task of training an RL system by sampling data from multiple simulators (over parallel devices).</p>
+  </li>
+  <li>
+    <p>The setup is that of distributed RL setting with <em>n</em> agents or actor-learners (composed of a single learner and several actors). These agents are trying to maximize a common value function.</p>
+  </li>
+  <li>
+    <p>One (existing) approach is to perform on-policy updates with a shared policy. The policy could be updated in synchronous (does not scale well) or asynchronous manner (can be unstable due to stale gradients).</p>
+  </li>
+  <li>
+    <p>Off policy approaches allow for better computational efficiency but can be unstable during training.</p>
+  </li>
+  <li>
+    <p>The paper proposed Gossip based Actor-Learner Architecture (GALA) which uses asynchronous communication (gossip) between the <em>n</em> agents to improve the training of Deep RL models.</p>
+  </li>
+  <li>
+    <p>These agents are expected to converge to the same policy.</p>
+  </li>
+  <li>
+    <p>During training, the different agents are not required to share the same policy and it is sufficient that the agent’s policies remain $\epsilon$-close to each other. This relaxation allows the policies to be trained asynchronously.</p>
+  </li>
+  <li>
+    <p>GALA approach is combined with A2C agents resulting in GALA-A2C agents. They have better computational efficiency and scalability (as compared to A2C) and similar in performance to A3C and Impala.</p>
+  </li>
+  <li>
+    <p>Training alternates between one local policy-gradient (and TD update) and asynchronous gossip between agents.</p>
+  </li>
+  <li>
+    <p>During the gossip step, the agents send their parameters to some of the other agents (referred to as the peers) and update their parameters based on the parameters received from the other agents (for which the given agent is a peer).</p>
+  </li>
+  <li>
+    <p>GALA agents are implemented using non-blocking communication so that they can operate asynchronously.</p>
+  </li>
+  <li>
+    <p>The paper includes the proof that the policies learned by the different agents are within $\epsilon$ distance of each other (ie all the policies lie within an $\epsilon$-distance ball) thus ensuring that the policies do not diverge much from each other.</p>
+  </li>
+  <li>
+    <p>Six games from the Ataru 2600 games suite are used for the experiments.</p>
+  </li>
+  <li>
+    <p>Baselines: A2C, A3C, Impala</p>
+  </li>
+  <li>
+    <p>GALA agents are configured in a directed ring graph topology.</p>
+  </li>
+  <li>
+    <p>With A2C, as the number of simulators increases, the number of convergent runs (runs with a threshold reward) decreases.</p>
+  </li>
+  <li>
+    <p>Using gossip algorithms increases or maintains the number of convergent runs. It also improves the performance, sample efficiency and compute efficiency of A2C across all the six games.</p>
+  </li>
+  <li>
+    <p>When compared to Impala and A3C, GALA-A2C generally outperforms (or performs as well as) those baselines.</p>
+  </li>
+  <li>
+    <p>Given that the learned policies remain within an $\epsilon$ ball, the agent’s gradients are less correlated as compared to the A2C agents.</p>
+  </li>
+</ul>
diff --git a/_site/site/2019/11/28/Contrastive-Learning-of-Structured-World-Models.html b/_site/site/2019/11/28/Contrastive-Learning-of-Structured-World-Models.html
new file mode 100644
index 00000000..8f4786ca
--- /dev/null
+++ b/_site/site/2019/11/28/Contrastive-Learning-of-Structured-World-Models.html
@@ -0,0 +1,131 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper introduces Contrastively-trained Structured World Models (C-SWMs).</p>
+  </li>
+  <li>
+    <p>These models use a contrastive approach for learning representations in environments with compositional structure.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1911.12247">Link to the paper</a></p>
+  </li>
+  <li>
+    <p><a href="https://github.com/tkipf/c-swm">Link to the code</a>.</p>
+  </li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<ul>
+  <li>
+    <p>The training data is in the form of an experience buffer \(B = \{(s_t, a_t, s_{t+1})\}_{t=1}^T\) of state transition tuples.</p>
+  </li>
+  <li>
+    <p>The goal is to learn:</p>
+
+    <ul>
+      <li>
+        <p>an encoder \(E\) that maps the observed states $s_t$ (pixel state observations) to latent state $z_t$.</p>
+      </li>
+      <li>
+        <p>a transition model \(T\) that predicts the dynamics in the hidden state.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>The model defines the enegry of a tuple \((s_t, a_t, s_{t+1})\) as \(H = d(z_t + T(z_t, a_t), z_{t+1})\).</p>
+  </li>
+  <li>
+    <p>The model has an inductive bias for modeling the effect of action as translation in the abstract state space.</p>
+  </li>
+  <li>
+    <p>An extra hinge-loss term is added: \(max(0, \gamma - d(z^{~}_{t}, z_{t+1}))\) where \(z^{~}_{t} = E(s^{~}_{t})\) is a corrputed latent state corresponding to a randomly sampled state \(s^{~}_{t}\).</p>
+  </li>
+</ul>
+
+<h2 id="object-oriented-state-factorization">Object-Oriented State Factorization</h2>
+
+<ul>
+  <li>
+    <p>The goal is to learn object-oriented representations where each state embedding is structured as a set of objects.</p>
+  </li>
+  <li>
+    <p>Assuming the number of object slots to be \(K\), the latent space, and the action space can be factored into \(K\) independent latent spaces (\(Z_1 \times ... \times Z_K\)) and action spaces (\(A_1 \times ... \times A_k\)) respectively.</p>
+  </li>
+  <li>
+    <p>There are <em>K</em> CNN-based object extractors and an MLP-based object encoder.</p>
+  </li>
+  <li>
+    <p>The actions are represented as one-hot vectors.</p>
+  </li>
+  <li>
+    <p>A fully connected graph is induced over <em>K</em> objects (representations) and the transition function is modeled as a Graph Neural Network (GNN) over this graph.</p>
+  </li>
+  <li>
+    <p>The transition function produces the change in the latent state representation of each object.</p>
+  </li>
+  <li>
+    <p>The factorization can be taken into account in the loss function by summing over the loss corresponding to each object.</p>
+  </li>
+</ul>
+
+<h2 id="environments">Environments</h2>
+
+<ul>
+  <li>
+    <p>Grid World Environments - 2D shapes, 3D blocks</p>
+  </li>
+  <li>
+    <p>Atari games - Pong and Space Invaders</p>
+  </li>
+  <li>
+    <p>3-body physics simulation</p>
+  </li>
+</ul>
+
+<h2 id="setup">Setup</h2>
+
+<ul>
+  <li>
+    <p>Random policy is used to collect the training data.</p>
+  </li>
+  <li>
+    <p>Evaluation is performed in the latent space (no reconstruction in the pixel space) using ranking metrics. The observations (to compare against) are randomly sampled from the buffer.</p>
+  </li>
+  <li>
+    <p>Baselines - auto-encoder based World Models and <a href="https://arxiv.org/abs/1905.11169">Physics as Inverse Graphics model</a>.</p>
+  </li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<ul>
+  <li>
+    <p>In the grid-world environments, C-SWM models the latent dynamics almost perfectly.</p>
+  </li>
+  <li>
+    <p>Removing either the state factorization or the GNN transition model hurts the performance.</p>
+  </li>
+  <li>
+    <p>C-SWM performs well on Atari as well but the results tend to have high variance.</p>
+  </li>
+  <li>
+    <p>The optimal values of $K$ should be obtained by hyperparameter tuning.</p>
+  </li>
+  <li>
+    <p>For the 3-body physics tasks, both the baselines and proposed models work quite well.</p>
+  </li>
+  <li>
+    <p>Interestingly, the paper has a section on limitations:</p>
+
+    <ul>
+      <li>
+        <p>The object extractor module can not disambiguate between multiple instances of the same object (in a scene).</p>
+      </li>
+      <li>
+        <p>The current formulation of C-SWM can only be used with deterministic environments.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
diff --git a/_site/site/2019/12/05/Mastering-Atari,-Go,-Chess-and-Shogi-by-Planning-with-a-Learned-Model.html b/_site/site/2019/12/05/Mastering-Atari,-Go,-Chess-and-Shogi-by-Planning-with-a-Learned-Model.html
new file mode 100644
index 00000000..95c202e5
--- /dev/null
+++ b/_site/site/2019/12/05/Mastering-Atari,-Go,-Chess-and-Shogi-by-Planning-with-a-Learned-Model.html
@@ -0,0 +1,121 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper presents the MuZero algorithm that performs planning with a learned model.</p>
+  </li>
+  <li>
+    <p>The algorithm achieves state of the art results on Atari suite (where generally model-free approaches perform the best) and on planning-oriented games like Chess and Go (where generall planning approaches perform the best).</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1911.08265">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="relation-to-standard-model-based-approaches">Relation to standard Model-Based Approaches</h2>
+
+<ul>
+  <li>
+    <p>Model-based approaches generally focus on reconstructing the true environment state or the sequence of full observations.</p>
+  </li>
+  <li>
+    <p>MuZero focuses on predicting only those aspects that are most relevant for planning - policy, value functions, and rewards.</p>
+  </li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<ul>
+  <li>
+    <p>The model consists of three components: (representation) encoder, dynamics function, and the prediction network.</p>
+  </li>
+  <li>
+    <p>The learning agent has two kinds of interactions - real interactions (ie the actions that are actually executed in the real environment) and hypothetical or imaginary actions (ie the actions that are executed in the learned model or the dynamics function).</p>
+  </li>
+  <li>
+    <p>At any timestep <em>t</em>, the past observations <em>o<sub>1</sub></em>, … <em>o<sub>t</sub></em> are encoded into the state <em>s<sub>t</sub></em> using the encoder.</p>
+  </li>
+  <li>
+    <p>Now the model takes hypothetical actions for the next <em>K</em> timesteps by unrolling the model for <em>K</em> steps.</p>
+  </li>
+  <li>
+    <p>For each timestep <em>k = 1, …, K</em>, the dynamics model predicts the immediate reward <em>r<sub>k</sub></em> and a new hidden state <em>h<sub>k</sub></em> using the previous hidden state <em>h<sub>k-1</sub></em> and action <em>a<sub>k</sub></em>.</p>
+  </li>
+  <li>
+    <p>At the same time, the policy <em>p<sup>k</sup></em> and the value function <em>v<sup>k</sup></em> are computed using the prediction network.</p>
+  </li>
+  <li>
+    <p>The initial hidden state <em>h<sub>0</sub></em> is initialized using the state <em>s<sub>t</sub></em></p>
+  </li>
+  <li>
+    <p>Any MDP Planning algorithm can be used to search for optimal policy and value function given the state transitions and the rewards induced by the dynamics function.</p>
+  </li>
+  <li>
+    <p>Specifically, the MCTS (Monte Carlo Tree Search) algorithm is used and the action <em>a<sub>t+1</sub></em> (ie the action that is executed in the actual environment) is selected from the policy outputted by MCTS.</p>
+  </li>
+</ul>
+
+<h2 id="collecting-data-for-the-replay-buffer">Collecting Data for the Replay Buffer</h2>
+
+<ul>
+  <li>
+    <p>At each timestep <em>t</em>, the MCTS algorithm is executed to choose the next action (which will be executed in the real environment).</p>
+  </li>
+  <li>
+    <p>The resulting next observation <em>o<sub>t+1</sub></em> and reward <em>r<sub>t+1</sub></em> are stored and the trajectory is written to the replay buffer (at the end of the episode).</p>
+  </li>
+</ul>
+
+<h2 id="objective">Objective</h2>
+
+<ul>
+  <li>
+    <p>For every hypothetical step <em>k</em>, match the predicted policy, value, and reward to the actual target values.</p>
+  </li>
+  <li>
+    <p>The target policy is generated by the MCTS algorithm.</p>
+  </li>
+  <li>
+    <p>The target value function and reward are generated by actually playing the game (or the MDP).</p>
+  </li>
+</ul>
+
+<h2 id="relation-to-alphazero">Relation to AlphaZero</h2>
+
+<ul>
+  <li>
+    <p>MuZero leverages the search-based policy iteration from AlphaZero.</p>
+  </li>
+  <li>
+    <p>It extends AlphaZero to setups with a single agent (where self-play is not possible) and setups with a non-zero reward at the intermediate time steps.</p>
+  </li>
+  <li>
+    <p>The encoder and the predictions functions are similar to ones used by AlphZero.</p>
+  </li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<ul>
+  <li>
+    <p><em>K</em> is set to 5.</p>
+  </li>
+  <li>
+    <p>Environments: 57 games in Atari along with Chess, Go and Shogi</p>
+  </li>
+  <li>
+    <p>MuZero achieves the same level of performance as AlphaZero for Chess and Shogi. In Go, MuZero slightly outperforms AlphaZero despite doing fewer computations per node in the search tree.</p>
+  </li>
+  <li>
+    <p>In Atari, MuZero achieves a new state-of-the-art compared to both model-based and model-free approaches.</p>
+  </li>
+  <li>
+    <p>The paper considers a variant called MuZero Reanalyze that reanalyzes old trajectories by re-running the MCTS algorithm with the updated network parameter. The motivation is to have a better sample complexity.</p>
+  </li>
+  <li>
+    <p>MuZero performs well even when using a single simulation of MCTS (during inference).</p>
+  </li>
+  <li>
+    <p>During training, using more simulations of MCTS helps to achieve better performance through even just 6 simulations per move is sufficient to learn a good model for Ms. Pacman.</p>
+  </li>
+</ul>
diff --git a/_site/site/2019/12/12/Everything-Happens-for-a-Reason-Discovering-the-Purpose-of-Actions-in-Procedural-Text.html b/_site/site/2019/12/12/Everything-Happens-for-a-Reason-Discovering-the-Purpose-of-Actions-in-Procedural-Text.html
new file mode 100644
index 00000000..c8c3c50a
--- /dev/null
+++ b/_site/site/2019/12/12/Everything-Happens-for-a-Reason-Discovering-the-Purpose-of-Actions-in-Procedural-Text.html
@@ -0,0 +1,204 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Procedural text comprehension tasks focus on modeling the effect of actions and predicting what happens next.</p>
+  </li>
+  <li>
+    <p>But they do not consider <em>why</em> some actions need to happen before other actions.</p>
+  </li>
+  <li>
+    <p>The paper proposes a new model called XPAD (eXPlainable Action Dependency) that considers the <em>purpose</em> of actions while predicting their effect.</p>
+  </li>
+  <li>
+    <p>The model favors <em>effects</em> that:</p>
+
+    <ul>
+      <li>
+        <p>explain more of actions in the text.</p>
+      </li>
+      <li>
+        <p>are more plausible given the context.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>An existing procedural text benchmark dataset (Propara) is expanded by adding the task of explaining actions by predicting their dependencies.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1909.04745">Link to the paper</a></p>
+  </li>
+  <li>
+    <p><a href="http://data.allenai.org/propara/">Link to the dataset</a></p>
+  </li>
+</ul>
+
+<h2 id="setup">Setup</h2>
+
+<ul>
+  <li>
+    <p>Input</p>
+
+    <ul>
+      <li>
+        <p>Procedural (chronologically ordered text) sequence of <em>T</em> sentences.</p>
+      </li>
+      <li>
+        <p>List of <em>N</em> participant entities, whose state changes at some step.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Output</p>
+
+    <ul>
+      <li>
+        <p>State change matrix $\pi(T \times N)$ with four possible states - move, create destroy, none.</p>
+      </li>
+      <li>
+        <p>This matrix tracks how property changes after each step.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Dependency Explanation Graph</p>
+
+    <ul>
+      <li>
+        <p>Identify what steps are necessary to execute a given step (say <em>s<sub>i</sub></em>) and represent this dependency in the form of a dependency explanation graph <em>G = &lt;S, E&gt;</em>.</p>
+      </li>
+      <li>
+        <p>In this graph, each node is a step and the direction of edge describes the order of dependency.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="dependency-graph-dataset">Dependency Graph Dataset</h2>
+
+<ul>
+  <li>
+    <p><a href="https://arxiv.org/abs/1805.06975">Propara dataset</a> is expanded to extract the dependency graph using both heuristic and automated methods.</p>
+  </li>
+  <li>
+    <p>The automated method is based on the coherence assumption that if step <em>s<sub>j</sub></em> changes state of entity <em>e<sub>k</sub></em> then <em>s<sub>j</sub></em> is a precondition for the first subsequent step that changes the state of <em>e<sub>k</sub></em>.</p>
+  </li>
+</ul>
+
+<h2 id="xpad-model">XPAD Model</h2>
+
+<ul>
+  <li>
+    <p>The model is based on the ProStruct system and uses an encoder-decoder based architecture.</p>
+  </li>
+  <li>
+    <p>Encoder</p>
+
+    <ul>
+      <li>
+        <p>Input: Sentence <em>s<sub>t</sub></em> and entity <em>e<sub>j</sub></em>.</p>
+      </li>
+      <li>
+        <p>Sentence is encoded using the GloVe vectors and a BiLSTM model and the entity is encoded as an indicator variable.</p>
+      </li>
+      <li>
+        <p>The combined representation is denoted as <em>c<sub>tj</sub></em>.</p>
+      </li>
+      <li>
+        <p>This representation is passed through an MLP to generate <em>k</em> logits that encode the probability of each entity <em>j</em> undergoing a state change at step <em>t</em>.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Decoder</p>
+
+    <ul>
+      <li>
+        <p>Beam search is performed to decode the encoder representation into the state change matrix and dependency graph using a score function that ensures global consistency.</p>
+      </li>
+      <li>
+        <p>Score function has two components:</p>
+
+        <ul>
+          <li>
+            <p>State change score - depends on the likelihood that the selected state changes at step <em>t</em> given the text and state change history from steps <em>s<sub>1</sub></em> to <em>s<sub>t-1</sub></em>.</p>
+          </li>
+          <li>
+            <p>Dependency graph score</p>
+
+            <ul>
+              <li>
+                <p>This is based on the connectivity and likelihood of the resulting dependency explanation graph.</p>
+              </li>
+              <li>
+                <p>This score is used to bias the graph search towards:</p>
+
+                <ul>
+                  <li>
+                    <p>predictions that have an identifiable purpose ie checking if a particular state change prediction leads to a connection in the dependency explanation graph.</p>
+                  </li>
+                  <li>
+                    <p>graphs that are more likely according to the background knowledge to distinguish likely dependency links from the unlikely ones.</p>
+                  </li>
+                </ul>
+              </li>
+            </ul>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>During training, XPAD has access to the correct path (in the search space) and learns to minimize the joint loss corresponding to predicting the state change and the dependency explanation graph.</p>
+  </li>
+  <li>
+    <p>During testing, XPAD performs beam search to predict the most likely state change and dependency explanation graph.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>Tasks:</p>
+
+    <ul>
+      <li>
+        <p>State change prediction</p>
+      </li>
+      <li>
+        <p>Dependency explanation prediction</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Baselines:</p>
+
+    <ul>
+      <li>
+        <p><a href="https://arxiv.org/abs/1612.03969">Recurrent Entity Networks</a></p>
+      </li>
+      <li>
+        <p><a href="https://arxiv.org/abs/1606.04582">Query-Reduction Networks</a></p>
+      </li>
+      <li>
+        <p><a href="https://arxiv.org/abs/1805.06975">ProLocal and ProGlobal</a></p>
+      </li>
+      <li>
+        <p><a href="https://arxiv.org/abs/1808.10012">ProStruct</a></p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>XPAD significantly outperforms all the baseline models on the dependency explanation task.</p>
+  </li>
+  <li>
+    <p>Improvements on the state change prediction task are less significant.</p>
+  </li>
+  <li>
+    <p>Removing dependency graph scores from XPAD leads to a drop in the F1 score.</p>
+  </li>
+  <li>
+    <p>The paper provides an elaborate discussion on the different types of errors that the XPAD system makes.</p>
+  </li>
+</ul>
diff --git a/_site/site/2019/12/19/ALBERT-A-Lite-BERT-for-Self-supervised-Learning-of-Language-Representations.html b/_site/site/2019/12/19/ALBERT-A-Lite-BERT-for-Self-supervised-Learning-of-Language-Representations.html
new file mode 100644
index 00000000..9fb5bcda
--- /dev/null
+++ b/_site/site/2019/12/19/ALBERT-A-Lite-BERT-for-Self-supervised-Learning-of-Language-Representations.html
@@ -0,0 +1,114 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper proposes parameter-reduction techniques to lower the memory consumption (and improve training speed) of BERT.</p>
+  </li>
+  <li>
+    <p>It also proposes to use a self-supervised loss (based on inter-sentence coherence) and argues that this loss is better than the NSP loss used by BERT.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1909.11942">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="architecture">Architecture</h2>
+
+<ul>
+  <li>
+    <p>ALBERT architecture is similar to that of BERT with three major differences.</p>
+  </li>
+  <li>
+    <p>Factorized Embedding Parameterization</p>
+
+    <ul>
+      <li>
+        <p>In BERT and followup works, the embedding size was tied to the size of the context vector.</p>
+      </li>
+      <li>
+        <p>Since context vector is expected to encoder the entire context, it needs to have a large dimensionality.</p>
+      </li>
+      <li>
+        <p>One consequence of this choice is that even the embedding layer (which encodes the representation for each token) has a large size. This increases the overall memory footprint of the model.</p>
+      </li>
+      <li>
+        <p>The paper proposed to factorize the embedding parameters into two smaller matrics.</p>
+      </li>
+      <li>
+        <p>The embedding layer learns a low dimensional representation of the tokens and this representation is projected into a high dimensional space.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Cross-layer parameter sharing</p>
+
+    <ul>
+      <li>ALBERT shares all the parameters across the layers.</li>
+    </ul>
+  </li>
+  <li>
+    <p>Inter-sentence coherence loss</p>
+
+    <ul>
+      <li>
+        <p>BERT uses two losses - Masked Language Modeling loss (MLM) and Next Sentence Prediction (NSP).</p>
+      </li>
+      <li>
+        <p>In the NSP task, the model is provided a pair of sentences and it has to predict if the two sentences appear consecutively in the same document or not. Negative samples are created by sampling sentences from different documents.</p>
+      </li>
+      <li>
+        <p>The paper argues that NSP is not effective as a loss function as it merges topic prediction and coherence prediction into one task (as the two sentences come from different documents). The topic prediction is an easier task as compared to coherence prediction.</p>
+      </li>
+      <li>
+        <p>Hence the paper proposes to use the Sentence Order Prediction task where the model has to predict which of the two sentences comes first in a document. The negative samples are created by simply swapping the order in the positive samples. Hence both the sentences come from the same document and topic prediction alone can not be used to solve the task.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="setup">Setup</h2>
+
+<ul>
+  <li>
+    <p>Different variants (in terms of size) of ALBERT and BERT models are compared (eg ALBERT, ALBERT-x, BERT-x, etc).</p>
+  </li>
+  <li>
+    <p>In general, ALBERT models have many-times fewer parameters as compared to the BERT models.</p>
+  </li>
+  <li>
+    <p>Datasets - BookCorpus, English Wikipedia.</p>
+  </li>
+</ul>
+
+<h2 id="observations">Observations</h2>
+
+<ul>
+  <li>
+    <p>ALBERT-xxlarge significantly outperforms the BERT-large model even though it has around 70% parameters as the BERT-large model.</p>
+  </li>
+  <li>
+    <p>BERT-xlarge performs worse than BERT-base hinting that it is difficult to train such large models.</p>
+  </li>
+  <li>
+    <p>ALBERT models also have better data throughput as compared to BERT models.</p>
+  </li>
+  <li>
+    <p>For the ALBERT models, an embedding size of 128 performs the best.</p>
+  </li>
+  <li>
+    <p>As the hidden dimension is increased, the model obtains better performance, but with diminishing returns.</p>
+  </li>
+  <li>
+    <p>Very wide ALBERT models (say with a context size of 1024) do not benefit much from depth.</p>
+  </li>
+  <li>
+    <p>Using additional training data boosts the performance for most of the downstream tasks.</p>
+  </li>
+  <li>
+    <p>The paper empirically shows that using dropout could hurt the performance of the ALBERT models. This observation may not hold for BERT as it does not share parameters across layers and hence may need regularization via dropout.</p>
+  </li>
+  <li>
+    <p>ALBERT also improves the state of the art performance on GLUE, SQuAD and RACE benchmarks, for both single-model and ensemble setup.</p>
+  </li>
+</ul>
+
diff --git a/_site/site/2019/12/26/Towards-a-Unified-Theory-of-State-Abstraction-for-MDPs.html b/_site/site/2019/12/26/Towards-a-Unified-Theory-of-State-Abstraction-for-MDPs.html
new file mode 100644
index 00000000..7ea43cfd
--- /dev/null
+++ b/_site/site/2019/12/26/Towards-a-Unified-Theory-of-State-Abstraction-for-MDPs.html
@@ -0,0 +1,124 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper studies five different techniques for stat abstraction in MDPs (Markov Decision Processes) and evaluates their usefulness for planning and learning.</p>
+  </li>
+  <li>
+    <p>The general idea behind abstraction is to map the actual (or observed) state to an abstract state that should be more amenable for learning.</p>
+  </li>
+  <li>
+    <p>It can be thought of as a mapping from one representation to another representation while preserving some useful properties.</p>
+  </li>
+  <li>
+    <p><a href="https://pdfs.semanticscholar.org/ca9a/2d326b9de48c095a6cb5912e1990d2c5ab46.pdf">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="general-definition">General Definition</h2>
+
+<ul>
+  <li>
+    <p>Consider a MDP \(M = &lt;S, A, P, R, \gamma&gt;\) where \(S\) is the finite set of states, \(A\) is finite set of actions, \(P\) is the transition function, \(R\) is the bounded reward function and \(\gamma\) is the discount factor.</p>
+  </li>
+  <li>
+    <p>The abstract version of the MDP is \(\widetilde{M} = &lt;\widetilde{S}, A, \widetilde{P}, \widetilde{R}, \gamma&gt;\) where \(\widetilde{S}\) is the finite set if abstract states, \(\widetilde{P}\) is the transition function in the abstract state space and \(\widetilde{R}\) is the bounded reward function in the abstract reward space.</p>
+  </li>
+  <li>
+    <p>Abstraction function \(\phi\) is a function that maps a given state \(s\) to its abstract counterpart \(\widetilde{s}\).</p>
+  </li>
+  <li>
+    <p>The inverse image \(\phi^{-1}(\widetilde{s})\) is the set of ground states that map to the \(\widetilde{s}\) under the abstraction function \(\phi\).</p>
+  </li>
+  <li>
+    <p>A wieghing functioon \(w(s)\) is used to measure how much does a state \(s\) contribute to the abstract state \(\phi(s)\).</p>
+  </li>
+</ul>
+
+<h2 id="topology-of-abstraction-space">Topology of Abstraction Space</h2>
+
+<ul>
+  <li>
+    <p>Given two abstraction functions \(\phi_{1}\) and \(\phi_{2}\), \(\phi_{1}\) is said to be <em>finer</em> than \(\phi_{2}\) iff for any states \(s_{1}, s_{2}\) if \(\phi_{1}(s_{1}) = \phi_{1}(s_{2})\) then \(\phi_{2}(s_{1}) = \phi_{2}(s_{2})\).</p>
+  </li>
+  <li>
+    <p>This <em>finer</em> relation is reflex, antisymmetric, transitive and partially ordered.</p>
+  </li>
+</ul>
+
+<h2 id="five-types-of-abstraction">Five Types of Abstraction</h2>
+
+<ul>
+  <li>
+    <p>While many abstractions are possible, not all abstractions are equally important.</p>
+  </li>
+  <li>
+    <p>Model-irrelevance abstraction \(\phi_{model}\):</p>
+
+    <ul>
+      <li>
+        <p>If two states $s_{1}$ and $s_{2}$ have the same abstracted state, then their one-step model is preserved.</p>
+      </li>
+      <li>
+        <p>Consider any action \(a\) and any abstract state \(\widetilde{s}\), if \(\phi_{model}(s_{1} = \phi_{model}(s_{2})\) then \(R(s_1, a) = R(s_2, a)\) and \(\sum_{s' \in \phi_{model}^{-1}\widetilde(s)}P_{s_1, s'}^{a} = \sum_{s' \in \phi_{model}^{-1}\widetilde(s)}P_{s_2, s'}^{a}\).</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>\(Q^{\pi}\)-irrelevance abstraction:</p>
+
+    <ul>
+      <li>
+        <p>It preserves the state-action value finction for all the states.</p>
+      </li>
+      <li>
+        <p>\(\phi_{Q^{\pi}}(s_1) = \phi_{Q^{\pi}}(s_2)\) implies \(Q^{\pi}(s_1, a) = Q^{\pi}(s_1, a)\).</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>\(Q^{*}\)-irrelevance abstraction:</p>
+
+    <ul>
+      <li>It preserves the optimal state-action value function.</li>
+    </ul>
+  </li>
+  <li>
+    <p>\(a^{*}\)-irrelevance abstraction:</p>
+
+    <ul>
+      <li>It preserves the optimal action and its value function.</li>
+    </ul>
+  </li>
+  <li>
+    <p>\(\phi_{\pi^{*}}\)-irrelevance abstraction:</p>
+
+    <ul>
+      <li>It preserves the optimal action.</li>
+    </ul>
+  </li>
+  <li>
+    <p>In terms of <em>fineness</em>, \(\phi_0 \geq \phi_{model} \geq \phi_{Q^{\pi}} \geq \phi_{Q^*} \geq \phi_{a^*} \geq \phi_{\pi^*}\). Here \(\phi_0\) is the identity mapping ie \(\phi_0(s) = s\)</p>
+  </li>
+  <li>
+    <p>If a property applies to any abstraction, it also applies to all the finer abstractions.</p>
+  </li>
+</ul>
+
+<h2 id="key-theorems">Key Theorems</h2>
+
+<ul>
+  <li>
+    <p>As we go from finer to coarser abstractions, the information loss increases (ie fewer components can be recovered) while the state-space reduces (ie the efficiency of solving the problem increases). This leads to a tradeoff when selecting abstractions.</p>
+  </li>
+  <li>
+    <p>For example, with abstractions \(\phi_{model}, \phi_{Q^{\pi}}, \phi_{Q^*}, \phi_{a^*}\), the optimal abstract policy \(\widetilde(\pi)^*\) is optimal in the ground MDP.</p>
+  </li>
+  <li>
+    <p>Similarly, if each state-action pair is visited infinitely often and the step-size decays properly, Q-learning with \(\phi_{model}, \phi_{Q^{\pi}}, \phi_{Q^*}\) converges to the optimal state-action value functions in the MDP. More conditions are needed for convergence in the case of the remaining two abstractions.</p>
+  </li>
+  <li>
+    <p>For \(\phi_{model}, \phi_{Q^{\pi}}, \phi_{Q^*}, \phi_{a^*}\), the model built with the experience converges to the true abstract model with infinite experience if the weighing function \(w(s)\) is fixed.</p>
+  </li>
+</ul>
+
diff --git a/_site/site/2020/01/02/Superposition-of-many-models-into-one.html b/_site/site/2020/01/02/Superposition-of-many-models-into-one.html
new file mode 100644
index 00000000..4169e198
--- /dev/null
+++ b/_site/site/2020/01/02/Superposition-of-many-models-into-one.html
@@ -0,0 +1,185 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper proposes a technique (called Parameter Superposition or PSP) for training and storing multiple models within a single set (or instance) of parameters.</p>
+  </li>
+  <li>
+    <p>The different models exist in “superposition” and can be retrieved dynamically given task-specific context information.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1902.05522">Link to the paper</a>.</p>
+  </li>
+</ul>
+
+<h2 id="parameter-substitution">Parameter Substitution</h2>
+
+<ul>
+  <li>
+    <p>Consider a task with input \(x \in R^N\) and parameter \(W$ \in R^{M \times N}\) where the output (target or features) are given as \(y=Wx\).</p>
+  </li>
+  <li>
+    <p>Now consider \(K\) such tasks with parameters \(W_1, W_2, \cdots W_K\).</p>
+  </li>
+  <li>
+    <p>If each \(W_k\) requires only a small subspace in \(R^N\), then a linear transformation \(C_k^{-1}\) can be used such that each \(W_kC_k^{-1}\) occupies a mutually orthogonal subspace in \(R^N\).</p>
+  </li>
+  <li>
+    <p>The set of parameters \(W_1, \cdots W_K\) can be represented by a single \(W^{M \times N}\) by adding \(W_kC_k^{-1}\).</p>
+  </li>
+  <li>
+    <p>The parameter corresponding to the \(k^{th}\) task can be retrived (with some noise) using the context \(C_k\) as \(W^{~}_k = WC_k\)</p>
+  </li>
+  <li>
+    <p>Even though the retrieval is noisy, the effect of noise is limited for the context vectors used in the paper.</p>
+  </li>
+  <li>
+    <p>Finally, \(\widetilde(y) = \widetilde(W)_{k}x = (WC_{k})x = W(C_{k}x)\)</p>
+  </li>
+  <li>
+    <p>Instead of learning \(K\) separate models, only \(K\) context vectors (along with 1 superimposed model) needs to be learned.</p>
+  </li>
+  <li>
+    <p>The key assumption is that \(N\) (in \(x \in R^N)\) is large enough such that each \(W_k\) requires only a small subspace of \(R^N\).</p>
+  </li>
+  <li>
+    <p>Since images and speech signals tend to occupy a low dimensional manifold, this requirement can be satisfied by over-parameterizing x.</p>
+  </li>
+</ul>
+
+<h2 id="choice-of-context-c">Choice of Context C</h2>
+
+<ul>
+  <li>
+    <p>Rotational Superposition (pspRotation)</p>
+
+    <ul>
+      <li>
+        <p>Sample rotations uniformly from the orthogonal group \(O(M)\).</p>
+      </li>
+      <li>
+        <p>Downside is that if \(M \sim N\), it requires storing as many parameters as learning \(K\) individual models (since \(C\) is of the size of ##M \times M$$).</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Complex Superposition (pspComplex)</p>
+
+    <ul>
+      <li>
+        <p>The design of rotational superposition can be improved by choosing \(C_k\) to be a diagonal matrix ie \(C_k = diag(c_k)\) where \(c_k\) is a vector of size \(M\).</p>
+      </li>
+      <li>
+        <p>Choosing \(c_k\) to be a vector of complex numbers (of the form \(c_{k}^{j} = e^{i\phi_{j}(k)}\) where \(\phi_{j}(k)\) or the phase is sampled uniformly from \([-\pi, \pi]\)) leads to \(C_k\) being a digonal orthogonal matrix.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Powers of a single context</p>
+
+    <ul>
+      <li>The memory footprint can be further reduced by choosing the context vectors to be integral powers of the first context vector.</li>
+    </ul>
+  </li>
+  <li>
+    <p>Binary Superposition (pspBinary)</p>
+
+    <ul>
+      <li>This is a special case of complex superposition where the context vectors are binary.</li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="neural-network-superposition">Neural Network Superposition</h2>
+
+<ul>
+  <li>
+    <p>The parameter superposition principle can be applied to all the linear layers of a network.</p>
+  </li>
+  <li>
+    <p>For the convolutional layers, it makes more sense to apply superposition to the convolutional kernel and not to the input image (as the dimensionality of convolutional parameters is smaller than that of inputs).</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>For all the experiments, the baseline is a standard supervised learning setup, unless mentioned otherwise.</p>
+  </li>
+  <li>
+    <p>The metric is the performance on the previous tasks when the model has been trained on the newer tasks.</p>
+  </li>
+  <li>
+    <p>Input Interference</p>
+
+    <ul>
+      <li>
+        <p>The input distribution changes over time.</p>
+      </li>
+      <li>
+        <p>Permuted MNIST dataset is used where each permutation of the pixels corresponds to a new task.</p>
+      </li>
+      <li>
+        <p>A new task is sampled every 1000 mini-batches.</p>
+      </li>
+      <li>
+        <p>As the network size increases, the performance of Parameter Superposition (psp) outperforms the baseline significantly.</p>
+      </li>
+      <li>
+        <p>pspRotation &gt; pspComplex &gt; pspBinary in terms of both performance and the number of additional parameters required for each new task.</p>
+      </li>
+      <li>
+        <p>Given that pspBinary is the easiest to implement while being comparable to more sophisticated baselines like Elastic Weight Consolidation (EWC) and Synaptic Intelligence, the paper presents most of the results with the pspBinary model.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Continous Domain Shift</p>
+
+    <ul>
+      <li>
+        <p>Rotating-MNIST and Rotating-FashionMNIST tasks are proposed to simulate continuous domain shift.</p>
+      </li>
+      <li>
+        <p>In these tasks, the input images are rotated in-plane by a small angle such that the rotation is complete after 1000 steps.</p>
+      </li>
+      <li>
+        <p>A new context is assigned after 100 steps as per step changes in the angle would be very small.</p>
+      </li>
+      <li>
+        <p>The 10 context vectors used in the first 1000 steps are reused for the subsequent steps.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Randomly changing the context vector</p>
+
+    <ul>
+      <li>
+        <p>The paper considers an ablation where the context vector is randomly changed at every step (of the 1000 step cycle). This required the superposition model to store 1000 models.</p>
+      </li>
+      <li>
+        <p>This approach is better than the supervised learning baseline but not as good as the proposed psp* models.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Output Interference</p>
+
+    <ul>
+      <li>
+        <p>This is the setup where the model transitions from one classification task to another.</p>
+      </li>
+      <li>
+        <p>Incremental CIFAR dataset is used with Resnet18 as the base model.</p>
+      </li>
+      <li>
+        <p>Baseline is a standard supervised learning model where a new classification head is used for each task (since the classes have a different meaning in each dataset). The model component before the classification layer is shared across the tasks.</p>
+      </li>
+      <li>
+        <p>Even though the labels are different across the datasets, the pspBinary model, trained with a single output layer, outperforms the multi-headed baseline.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
diff --git a/_site/site/2020/01/09/Accurate-Large-Minibatch-SGD-Training-ImageNet-in-1-Hour.html b/_site/site/2020/01/09/Accurate-Large-Minibatch-SGD-Training-ImageNet-in-1-Hour.html
new file mode 100644
index 00000000..81179b39
--- /dev/null
+++ b/_site/site/2020/01/09/Accurate-Large-Minibatch-SGD-Training-ImageNet-in-1-Hour.html
@@ -0,0 +1,107 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Training models with large minibatches (using distributed synchronous SGD) can lead to optimization issues.</p>
+  </li>
+  <li>
+    <p>The paper presents techniques for training models with large batch size while matching the accuracy of small minibatch setups.</p>
+  </li>
+  <li>
+    <p>The paper focuses on the ImageNet dataset, but many of the proposed ideas are applicable broadly.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1706.02677">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="linear-scaling-rule">Linear Scaling Rule</h2>
+
+<ul>
+  <li>
+    <p>When the minibatch size increases by a factor of <em>k</em>, the learning rate should also be increased by a factor of <em>k</em> (while keeping all other hyperparameters like weight decay fixed).</p>
+  </li>
+  <li>
+    <p>Note that this is an empirical rule and is not expected to hold under all conditions.</p>
+  </li>
+  <li>
+    <p>One such condition is when the model is changing rapidly during the first few epochs. In this case, a warmup phase is introduced to stabilize the model.</p>
+  </li>
+  <li>
+    <p>The paper verifies that the scaling rule is applicable to batch sizes as large as 8K.</p>
+  </li>
+</ul>
+
+<h2 id="warmup">Warmup</h2>
+
+<ul>
+  <li>The learning rate should be gradually ramped up from a small value to a large value to allow convergence.</li>
+</ul>
+
+<h2 id="batch-normalization">Batch Normalization</h2>
+
+<ul>
+  <li>
+    <p>Batch normalization uses batch statistics to normalize the data. Hence, the loss corresponding to each data point (in the batch) is not independent. Thus, changing the batch size could change the underlying function being optimized.</p>
+  </li>
+  <li>
+    <p>In the distributed SGD setup, the per-GPU (or per-worker) batch size should be kept constant, and only one worker should compute the batch norm statistics.</p>
+  </li>
+</ul>
+
+<h2 id="pitfalls-when-using-distributed-sgd">Pitfalls when using distributed SGD</h2>
+
+<ul>
+  <li>
+    <p>When using weight decay, scaling the cross-entropy loss is not the same as scaling the learning rate.</p>
+  </li>
+  <li>
+    <p>When using momentum, changing the learning rate could require “momentum correction.”</p>
+  </li>
+  <li>
+    <p>Ensure that the per-worker loss is normalized by the size of the total minibatch and not just by the size of minibatch that each worker sees.</p>
+  </li>
+  <li>
+    <p>For each epoch, uses a single random shuffling of the training data (before dividing between the workers).</p>
+  </li>
+</ul>
+
+<h2 id="communication">Communication</h2>
+
+<ul>
+  <li>
+    <p>The paper describes various techniques to speed up the training pipeline by reducing the communication overhead between nodes. (Each node can have one or more GPUs).</p>
+  </li>
+  <li>
+    <p>First, a node sums the gradient from all the GPUs it has.</p>
+  </li>
+  <li>
+    <p>The gradients are shared and summed across all the nodes.</p>
+  </li>
+  <li>
+    <p>Each node broadcasts the resulting gradient to all the GPUs it has.</p>
+  </li>
+  <li>
+    <p>Gradient Aggregation is performed in parallel with the backpropagation operator. While aggregating the gradient for one layer, the system starts computing the gradient of the next layer.</p>
+  </li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<ul>
+  <li>
+    <p>Using these approaches, a Resnet50 model can be trained on the ImageNet dataset in an hour (using 256 workers).</p>
+  </li>
+  <li>
+    <p>When an appropriate warmup strategy is used, the training and the validation curves (for the large batch size setup) matches the corresponding curves for the small batch size setup.</p>
+  </li>
+  <li>
+    <p>The best performing warmup strategy is the one where training starts at a learning rate of 0.1 and linearly increases to 3.2 over five epochs.</p>
+  </li>
+  <li>
+    <p>The paper shows that the results are not specific to the Resnet50 model (experiments with Resnet101 model) or the use case (experiments with object detection and instance segmentation using Mask R-CNN).</p>
+  </li>
+  <li>
+    <p>Along with providing the empirical validation of the proposed ideas, the paper describes all the hyperparameters. It also includes the training and validation curves with the different configurations which enable others to replicate and build on this work.</p>
+  </li>
+</ul>
diff --git a/_site/site/2020/01/16/Rapid-Learning-or-Feature-Reuse-Towards-Understanding-the-Effectiveness-of-MAML.html b/_site/site/2020/01/16/Rapid-Learning-or-Feature-Reuse-Towards-Understanding-the-Effectiveness-of-MAML.html
new file mode 100644
index 00000000..1cbb6ec7
--- /dev/null
+++ b/_site/site/2020/01/16/Rapid-Learning-or-Feature-Reuse-Towards-Understanding-the-Effectiveness-of-MAML.html
@@ -0,0 +1,141 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper investigated two possible reasons behind the usefulness of MAML algorithm:</p>
+
+    <ul>
+      <li>
+        <p><strong>Rapid Learning</strong> - Does MAML learn features that are amenable for rapid learning?</p>
+      </li>
+      <li>
+        <p><strong>Feature Reuse</strong> - Does the MAML initialization provide high-quality features that are useful for the unseen tasks.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>This leads to a follow-up question: how much task-specific inner loop adaptation is needed.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1909.09157">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<ul>
+  <li>
+    <p>In a standard few-shot learning setup, the different datasets have different classes. Hence, the top-most layer (or the head) of the learning model should be different for different tasks.</p>
+  </li>
+  <li>
+    <p>The subsequent discussion only applies to the body of the network (ie, network minus the head).</p>
+  </li>
+  <li>
+    <p><strong>Freezing Layer Representations</strong></p>
+
+    <ul>
+      <li>
+        <p>In this setup, a subset (or all) of parameters are frozen (after MAML training) and are not adapted during the representation.</p>
+      </li>
+      <li>
+        <p>Even when the entire network is frozen, the performance drops only marginally.</p>
+      </li>
+      <li>
+        <p>This indicates that the representation learned by the meta-initialization is good enough to be useful on the test tasks (without requiring any adaptation step).</p>
+      </li>
+      <li>
+        <p>Note that the head of the network is still adapted during testing.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p><strong>Representational Similarity</strong></p>
+
+    <ul>
+      <li>
+        <p>In this setup, the paper reports the change in the latent representation (learned by the network) during the inner loop update with a fully trained model.</p>
+      </li>
+      <li>
+        <p>Canonical Correlation Analysis (CCA) and Central Kernel Alignment (CKA) metrics are used to measure the similarity between the representations.</p>
+      </li>
+      <li>
+        <p>The main finding is that the representations in the body of the network are very similar before and after the inner loop updates while the representations in the head of the network are very different.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>The above two observations indicate that feature reuse is the primary driving factor for the success of MAML.</p>
+  </li>
+  <li>
+    <p><strong>When does feature reuse happen</strong></p>
+
+    <ul>
+      <li>
+        <p>The paper considers the model at different stages of training and compares the similarity in the representation (before and after the inner loop update).</p>
+      </li>
+      <li>
+        <p>Even early in training, the CCA similarity between the representations (before and after the inner loop update) is quite high. Similarly, freezing the layers (for the test time update), early in training, does not degrade the test time performance much. This hints that the feature reuse happens early in the learning process.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="the-anil-almost-no-inner-loop-algorithm">The ANIL (Almost No Inner Loop) Algorithm</h2>
+
+<ul>
+  <li>
+    <p>The empirical evidence suggests that the success of MAML lies in the feature reuse.</p>
+  </li>
+  <li>
+    <p>The authors build on this observation and propose a simplification of the MAML algorithm: ANIL or Almost No Inner Loop Algorithm</p>
+  </li>
+  <li>
+    <p>In this algorithm, the inner loop updates are applied only to the head of the network.</p>
+  </li>
+  <li>
+    <p>Despite being much more straightforward, the performance of ANIL is close to the performance of MAML for both few-shot image classification and RL tasks.</p>
+  </li>
+  <li>
+    <p>Removing most of the inner loop parameters speed up the computation by a factor of 1.7 (during training) and 4.1 (during inference).</p>
+  </li>
+</ul>
+
+<h2 id="removing-the-inner-loop-update">Removing the Inner Loop Update</h2>
+
+<ul>
+  <li>
+    <p>Given that it is possible to remove most of the parameters from the inner loop update (without affecting the performance), the next step is to check if the inner loop update can be removed entirely.</p>
+  </li>
+  <li>
+    <p>This leads to the NIL (No Inner Loop) algorithm, which does not involve any inner loop adaptation steps.</p>
+  </li>
+</ul>
+
+<h3 id="algorithm">Algorithm</h3>
+
+<ul>
+  <li>
+    <p>A few-shot learning model is trained - either with MAML or ANIL.</p>
+  </li>
+  <li>
+    <p>During testing, the head is removed.</p>
+  </li>
+  <li>
+    <p>For each task, the K training examples are fed to the body to obtain class representations.</p>
+  </li>
+  <li>
+    <p>For a given test data point, the representation of the data point is compared with the different class representations to obtain the target class.</p>
+  </li>
+  <li>
+    <p>The NIL algorithm performs similar to the MAML and the ANIL algorithms for the few-shot image classification task.</p>
+  </li>
+  <li>
+    <p>Note that it is still important to use MAML/ANIL during training, even though the learned head is not used during evaluation.</p>
+  </li>
+</ul>
+
+<h2 id="conclusion">Conclusion</h2>
+
+<ul>
+  <li>The paper discusses the different classes of meta-learning approaches. It concludes with the observation that feature reuse (and not rapid adaptation) seems to be the common model of operation for both optimization-based meta-learning (e.g., MAML) and model-based meta-learning.</li>
+</ul>
diff --git a/_site/site/2020/01/23/Observational-Overfitting-in-Reinforcement-Learning.html b/_site/site/2020/01/23/Observational-Overfitting-in-Reinforcement-Learning.html
new file mode 100644
index 00000000..75c9bc7f
--- /dev/null
+++ b/_site/site/2020/01/23/Observational-Overfitting-in-Reinforcement-Learning.html
@@ -0,0 +1,134 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper studies <em>observational overfitting</em>: The phenomenon where an agent overfits to different observation spaces even though the underlying MDP remains fixed.</p>
+  </li>
+  <li>
+    <p>Unlike other works, the “background information” (in the pixel space) is correlated with the progress of the agent (and is not just noise).</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1912.02975">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="setup">Setup</h2>
+
+<ul>
+  <li>
+    <p>Base MDP $M = (S, A, R, T)$ where $S$ is the state space, $A$ is the action space, $R$ is the reward function, and $T$ is the transition dynamics.</p>
+  </li>
+  <li>
+    <p>$M$ is parameterized using $\theta$. In practice, it means introducing an observation function $\phi_{\theta}$ ie $M_{\theta} = (M, \phi_{\theta})$.</p>
+  </li>
+  <li>
+    <p>A distribution over $\theta$ defines a distribution over the MDPs.</p>
+  </li>
+  <li>
+    <p>The learning agent has access to the pixel space observations and not the state space observations.</p>
+  </li>
+  <li>
+    <p>Generalization gap is defined as $J_{\theta}(\pi) - J_{\theta^{train}}(\pi)$ where $\pi$ is the learning agent, $\theta$ is the distribution over all the observation functions, $\theta^{train}$ is the distribution over the observation functions corresponding to the training environments. $J_{\theta}(\pi)$ is the average reward that the agent obtains over environments sampled from $M_{\theta}$.</p>
+  </li>
+  <li>
+    <p>$\phi_{\theta}$ considers two featurs - generalizable (invariant across $\theta$) and non-generalizable (depends on $\theta$) ie $\phi_{\theta}(s) = concat(f(s), g_{\theta}(s))$ where $f$ is the invariant function and $g$ is the non-generalizable function.</p>
+  </li>
+  <li>
+    <p>The problem is set up such that “explicit regularization” can easily solve it. The focus is on understanding the effect of “implicit regularization”.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<h3 id="overparameterized-lqr">Overparameterized LQR</h3>
+
+<ul>
+  <li>
+    <p>LQR is used as a proxy for deep RL architectures given its advantages like enabling exact gradient descent.</p>
+  </li>
+  <li>
+    <p>The functions are parameterized as follows:</p>
+
+    <ul>
+      <li>
+        <p>$f(s) = W_c(s)$</p>
+      </li>
+      <li>
+        <p>$g_{\theta}(s) = W_{\theta}(s)$</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Observation at time $t$ , $o_t$, is given as $[W_c W_{\theta}]^{-1} s_t$.</p>
+  </li>
+  <li>
+    <p>Action at time $t$ is given as $a_t = K o_{t}$ where $K$ is the policy matrix.</p>
+  </li>
+  <li>
+    <p>Dimensionality:</p>
+
+    <ul>
+      <li>state $s$: $d_{state}$ 100</li>
+      <li>$f(s)$: $d_{state}$ 100</li>
+      <li>$g_{\theta}(s)$: $d_{noise}$ 100</li>
+      <li>observation $o$: $d_{state}$ + $d_{noise}$ 1100</li>
+    </ul>
+  </li>
+  <li>
+    <p>In case of training on just one environment, multiple solutions exist, and overfitting happens.</p>
+  </li>
+  <li>
+    <p>Increasing $d_{noise}$ increases the generalization gap.</p>
+  </li>
+  <li>
+    <p>Overparameterizing the network decreases the generalization gap and also reduces the norm of the policy.</p>
+  </li>
+</ul>
+
+<h3 id="projected-gym-environments">Projected Gym Environments</h3>
+
+<ul>
+  <li>
+    <p>The base MDP is the Gym Environment.</p>
+  </li>
+  <li>
+    <p>$M_{\theta}$ is generated as before.</p>
+  </li>
+  <li>
+    <p>Increasing both width and depth for basic MLPs improves generalization.</p>
+  </li>
+  <li>
+    <p>Generalization also depends on the choice of activation function, residual layers, etc.</p>
+  </li>
+</ul>
+
+<h3 id="deconvolutional-projections">Deconvolutional Projections</h3>
+
+<ul>
+  <li>
+    <p>In the Gym environment, the actual state is projected to a larger vector and reshaped into an 84x84 tensor (image).</p>
+  </li>
+  <li>
+    <p>The image from $f$ is concatenated with the image from $g$. This setup is referred to as the Gym-Deconv.</p>
+  </li>
+  <li>
+    <p>The relative order of performance between NatureCNN, IMPALA, and IMPALA-Large (on both CoinRun and Gym-Deconv) is the same as the order of the number of parameters they contain.</p>
+  </li>
+  <li>
+    <p>In an ablation, the policy is given access to only $g_{\theta}(s)$, which makes it impossible for the model to generalize. In this test of memorization capacity, implicit regularization seems to reduce the memorization effect.</p>
+  </li>
+</ul>
+
+<h3 id="overparameterization-in-coinrun">Overparameterization in CoinRun</h3>
+
+<ul>
+  <li>
+    <p>The pixel space observation in CoinRun is downsized from 64x64 to 32x32 and flattened into a vector.</p>
+  </li>
+  <li>
+    <p>In CoinRun, the dynamics change per level, and the noisy “irrelevant” features change location across the 1D input, making this setup more challenging than the previous ones.</p>
+  </li>
+  <li>
+    <p>Overparameterization improves generalization in this scenario as well.</p>
+  </li>
+</ul>
diff --git a/_site/site/2020/01/30/Massively-Multilingual-Neural-Machine-Translation-in-the-Wild-Findings-and-Challenges.html b/_site/site/2020/01/30/Massively-Multilingual-Neural-Machine-Translation-in-the-Wild-Findings-and-Challenges.html
new file mode 100644
index 00000000..5527a04c
--- /dev/null
+++ b/_site/site/2020/01/30/Massively-Multilingual-Neural-Machine-Translation-in-the-Wild-Findings-and-Challenges.html
@@ -0,0 +1,179 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper proposes to build a universal neural machine translation system that can translate between any pair of languages.</p>
+  </li>
+  <li>
+    <p>As a concrete instance, the paper prototypes a system that handles 103 languages (25 Billion translation pairs).</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1907.05019">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="why-universal-machine-translation">Why universal Machine Translation</h2>
+
+<ul>
+  <li>
+    <p>Hypothesis: <em>The learning signal from one language should benefit the quality of other languages</em><a href="https://link.springer.com/article/10.1023/A:1007379606734">1</a></p>
+  </li>
+  <li>
+    <p>This positive transfer is evident for low resource languages but tends to hurt the performance for high resource languages.</p>
+  </li>
+  <li>
+    <p>In practice, adding new languages reduces the effective per-task capacity of the model.</p>
+  </li>
+</ul>
+
+<h2 id="desiderata-for-multilingual-translation-model">Desiderata for Multilingual Translation Model</h2>
+
+<ul>
+  <li>
+    <p>Maximize the number of languages within one model.</p>
+  </li>
+  <li>
+    <p>Maximize the positive transfer to low resource languages.</p>
+  </li>
+  <li>
+    <p>Minimize the negative interference to high resource languages.</p>
+  </li>
+  <li>
+    <p>Perform well ion the realistic, multi-domain settings.</p>
+  </li>
+</ul>
+
+<h2 id="datasets">Datasets</h2>
+
+<ul>
+  <li>
+    <p>In-house corpus generated by crawling and extracting parallel sentences from the web.</p>
+  </li>
+  <li>
+    <p>102 languages, with 25 billion sentence pairs.</p>
+  </li>
+  <li>
+    <p>Compared with the existing datasets, this dataset is much larger, spans more domains, has a good variation in the amount of data available for different language pairs, and is noisier. These factors bring additional challenges to the universal NMT setup.</p>
+  </li>
+</ul>
+
+<h2 id="baselines">Baselines</h2>
+
+<ul>
+  <li>
+    <p>Dedicated Bilingual models (variants of Transformers).</p>
+  </li>
+  <li>
+    <p>Most bilingual experiments used Transformer big and a shared source-target sentence-piece model (SPE).</p>
+  </li>
+  <li>
+    <p>For medium and low resource languages, the Transformer Base was also considered.</p>
+  </li>
+  <li>
+    <p>Batch size of 1 M tokes per-batch. Increasing the batch size improves model quality and speeds up convergence.</p>
+  </li>
+</ul>
+
+<h2 id="effect-of-transfer-and-interference">Effect of Transfer and Interference</h2>
+
+<ul>
+  <li>
+    <p>The paper compares the following two setups with the baseline:</p>
+
+    <ul>
+      <li>
+        <p>Combine all the datasets and train over them as if it is a single dataset.</p>
+      </li>
+      <li>
+        <p>Combine all the datasets but upsample low resource languages so all that all the languages are equally likely to appear in the combined dataset.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>A target “index” is prepended with every input sentence to indicate which language it should be translated into.</p>
+  </li>
+  <li>
+    <p>Shared encoder and decoder are used across all the language pairs.</p>
+  </li>
+  <li>
+    <p>The two setups use a batch size of 4M tokens.</p>
+  </li>
+</ul>
+
+<h3 id="results">Results</h3>
+
+<ul>
+  <li>
+    <p>When all the languages are equally sampled, the performance on the low resource languages increases, at the cost of performance on high resource languages.</p>
+  </li>
+  <li>
+    <p>Training over all the data at once reverse this trend.</p>
+  </li>
+</ul>
+
+<h3 id="countering-interference">Countering Interference</h3>
+
+<ul>
+  <li>
+    <p>Temperature based sampling strategy is used to control the ratio of samples from different language pairs.</p>
+  </li>
+  <li>
+    <p>A balanced sampling strategy improves the performance for the high resource languages (though not as good as the multilingual baselines) while retaining the high transfer performance on the low resource languages.</p>
+  </li>
+  <li>
+    <p>Another reason behind the lagging performance (as compared to bilingual baselines) is the capacity of the multilingual models.</p>
+  </li>
+  <li>
+    <p>Some open problems to consider:</p>
+
+    <ul>
+      <li>
+        <p>Task Scheduling - How to decide the order in which different language pairs should be trained.</p>
+      </li>
+      <li>
+        <p>Optimization for multitask learning - How to design optimizer, loss functions, etc. that can exploit task similarity.</p>
+      </li>
+      <li>
+        <p>Understanding Transfer:</p>
+
+        <ul>
+          <li>
+            <p>For the low resource languages, translating multiple languages to English leads to improved performance than translating English to multiple languages.</p>
+          </li>
+          <li>
+            <p>This can be explained as follows: In the first case (many-to-one), the setup is that of a multi-domain model (each source language is a domain). In the second case (one-to-many), the setup is that of multitasking.</p>
+          </li>
+          <li>
+            <p>NMT models seem to be more amenable to transfer across multiple domains than transfer across tasks (since the decoder distribution does not change much).</p>
+          </li>
+          <li>
+            <p>In terms of zero-shot performance, the performance for most language pairs increases as the number of languages change from 10 to 102.</p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="effect-of-preprocessing-and-vocabulary">Effect of preprocessing and vocabulary</h2>
+
+<ul>
+  <li>
+    <p>Sentence Piece Model (SPM) is used.</p>
+  </li>
+  <li>
+    <p>Temperature sampling is used to sample vocabulary from different languages.</p>
+  </li>
+  <li>
+    <p>Using smaller vocabulary (and hence smaller sub-word tokens) perform better for low resource languages, probably due to improved generalization.</p>
+  </li>
+  <li>
+    <p>Low and medium resource languages tend to perform better with higher temperatures.</p>
+  </li>
+</ul>
+
+<h2 id="effect-of-capacity">Effect of Capacity</h2>
+
+<ul>
+  <li>Using deeper models improves performance (as compared to the wider models with the same number of parameters) on most language pairs.</li>
+</ul>
diff --git a/_site/site/2020/02/06/Your-Classifier-is-Secretly-an-Energy-Based-Model,-and-You-Should-Treat-it-Like-One.html b/_site/site/2020/02/06/Your-Classifier-is-Secretly-an-Energy-Based-Model,-and-You-Should-Treat-it-Like-One.html
new file mode 100644
index 00000000..f9de9301
--- /dev/null
+++ b/_site/site/2020/02/06/Your-Classifier-is-Secretly-an-Energy-Based-Model,-and-You-Should-Treat-it-Like-One.html
@@ -0,0 +1,112 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper proposed a framework for joint modeling of labels and data by interpreting a discriminative classifier <em>p(y|x)</em> as an energy-based model <em>p(x, y)</em>.</p>
+  </li>
+  <li>
+    <p>Joint modeling provides benefits like improved calibration (i.e., the predictive confidence should align with the miss classification rate), robustness, and out of order distribution.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1912.03263">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="motivation">Motivation</h2>
+
+<ul>
+  <li>
+    <p>Consider a standard classifier $f_{\theta}(x)$ which produces a k-dimensional vector of logits.</p>
+  </li>
+  <li>
+    <p>$p_{\theta}(y | x) = softmax(f_{\theta}(x)[y])$</p>
+  </li>
+  <li>
+    <p>Uisng concepts from energy based models, we write $p_{\theta}(x, y) = \frac{exp(-E_{\theta}(x, y))}{Z_{\theta}}$ where $E_{\theta}(x, y) = -f_{\theta}(x)[y]$</p>
+  </li>
+  <li>
+    <p>$p_{\theta}(x) = \sum_{y}{ \frac{exp(-E_{\theta}(x, y))}{Z_{\theta}}}$</p>
+  </li>
+  <li>
+    <p>$E_{\theta}(x) = -LogSumExp_y(f_{\theta}(x)[y])$</p>
+  </li>
+  <li>
+    <p>Note that in the standard discriminative setup, shiting the logits $f_{\theta}(x)$ does not affect the model but it affects $p_{\theta}(x)$.</p>
+  </li>
+  <li>
+    <p>Computing $p_{\theta}(y | x)$ using $p_{\theta}(x, y)$ and $p_{\theta}(x)$ gives back the same softmax parameterization as before.</p>
+  </li>
+  <li>
+    <p>This reinterpreted classifier is referred to as a Joint Energy-based Model (JEM).</p>
+  </li>
+</ul>
+
+<h2 id="optimization">Optimization</h2>
+
+<ul>
+  <li>
+    <p>The log-liklihood of the data can be factoized as $log p_{\theta}(x, y) = log p_{\theta}(x) + log p_{\theta}(y | x)$.</p>
+  </li>
+  <li>
+    <p>The second factor can be trained using the standard CE loss. In contrast, the first factor can be trained using a sampler based on Stochastic Gradient Langevin Dynamics.</p>
+  </li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<h3 id="hybrid-modelling">Hybrid Modelling</h3>
+
+<ul>
+  <li>
+    <p>Datasets: CIFAR10, CIFAR100, SVHN.</p>
+  </li>
+  <li>
+    <p>Metrics: Inception Score, Frechet Inception Distance</p>
+  </li>
+  <li>
+    <p>JEM outperforms generative, discriminative, and hybrid models on both generative and discriminative tasks.</p>
+  </li>
+</ul>
+
+<h3 id="calibration">Calibration</h3>
+
+<ul>
+  <li>
+    <p>A calibrated classifier is the one where the predictive confidence aligns with the misclassification rate.</p>
+  </li>
+  <li>
+    <p>Dataset: CIFAR100</p>
+  </li>
+  <li>
+    <p>JEM improves calibration while retaining high accuracy.</p>
+  </li>
+</ul>
+
+<h3 id="out-of-distribution-ood-detection">Out of Distribution (OOD) Detection</h3>
+
+<ul>
+  <li>
+    <p>One way to detect OOD samples is to learn a density model that assigns a higher likelihood to in-distribution examples and lower likelihood to out of distribution examples.</p>
+  </li>
+  <li>
+    <p>JEM consistently assigns a higher likelihood to in-distribution examples.</p>
+  </li>
+  <li>
+    <p>The paper also proposes an alternate metric called <em>approximate mass</em> to detect OOD examples.</p>
+  </li>
+  <li>
+    <p>The intuition is that a point could have likelihood but be impossible to sample because its surroundings have a very low density.</p>
+  </li>
+  <li>
+    <p>On the other hand, the in-distribution data points would lie in a region of high probability mass.</p>
+  </li>
+  <li>
+    <p>Hence the norm of the gradient of log density could provide a useful signal to detect OOD examples.</p>
+  </li>
+</ul>
+
+<h3 id="robustness">Robustness</h3>
+
+<ul>
+  <li>JEM is more robust to adversarial attacks as compared to discriminative classifiers.</li>
+</ul>
diff --git a/_site/site/2020/02/13/Gradient-based-sample-selection-for-online-continual-learning.html b/_site/site/2020/02/13/Gradient-based-sample-selection-for-online-continual-learning.html
new file mode 100644
index 00000000..1b28906a
--- /dev/null
+++ b/_site/site/2020/02/13/Gradient-based-sample-selection-for-online-continual-learning.html
@@ -0,0 +1,122 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Use of replay buffer (and rehearsal) is a common technique for mitigating catastrophic forgetting.</p>
+  </li>
+  <li>
+    <p>The paper builds on this idea but focuses on the sample selection aspect ie, which data points to store in the replay buffer.</p>
+  </li>
+  <li>
+    <p>It formulates sample selection as a constraint minimization problem and shows that the proposed formulation is equivalent to maximizing the diversity of the samples with respect to parameter gradient.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1903.08671">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="setup">Setup</h2>
+
+<ul>
+  <li>
+    <p>Supervised learning tasks</p>
+  </li>
+  <li>
+    <p>Online stream of data (i.e., one or few datapoints accessed at a time).</p>
+  </li>
+  <li>
+    <p>When considering the $t^{th}$ task, the objective is: minimize the loss on the current task without increasing the loss on any of the previous tasks.</p>
+  </li>
+  <li>
+    <p>The above constraint can be rephrased as $dot(g_t, g_i) \gt 0 \forall i \in [0, t-1]$ where $g_t$ is the gradient for the $t^{th}$ task.</p>
+  </li>
+  <li>
+    <p>This is equivalent to saying that the current task gradient should not interfere negatively with the previous task gradient.</p>
+  </li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<ul>
+  <li>
+    <p>In practice, the gradient constraint is enforced only over the examples in the minibatch (and not the full dataset).</p>
+  </li>
+  <li>
+    <p>The paper interprets the constraint satisfaction problem as approximating an optimal feasible region (in the gradient space) where current task performance can be improved without hurting the performance on the previous tasks.</p>
+  </li>
+  <li>
+    <p>The approximate region (of the shape of a polyhedral convex cone) is determined using only the examples from the replay buffer. Hence, the optimal region (defined for the entire dataset) would be contained within the approximate region.</p>
+  </li>
+  <li>
+    <p>The size of the approximate region can be measured in terms of the solid angle defined by the intersection between the approximate region and a unit sphere.</p>
+  </li>
+  <li>
+    <p>The paper argues that the approximate region can be made smaller by reducing the angle between each pair of gradients.</p>
+  </li>
+  <li>
+    <p>The set of points, satisfying the constraint, can be computed using the Integer Quadratic Programming (IQP).</p>
+  </li>
+  <li>
+    <p>Given that the problem setup is online learning, using IDP for every new data point is not feasible.</p>
+  </li>
+  <li>
+    <p>An in-exact, greedy alternative is suggested where a score is maintained for each example in the buffer.</p>
+  </li>
+  <li>
+    <p>When a new datapoint comes in, the score is computed and used to decide if the existing datapoint in the buffer should be replaced.</p>
+  </li>
+  <li>
+    <p>The score is the maximal cosine similarity of the current example with a random sample in the buffer.</p>
+  </li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<ul>
+  <li>
+    <p>Benchmarks</p>
+
+    <ul>
+      <li>
+        <p>Disjoint MNIST</p>
+      </li>
+      <li>
+        <p>Permuted MNIST</p>
+      </li>
+      <li>
+        <p>Disjoint CIFAR10</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Shared head setup</p>
+  </li>
+  <li>
+    <p>Baselines for sample selection</p>
+
+    <ul>
+      <li>
+        <p>Randomly select examples to keep in the buffer.</p>
+      </li>
+      <li>
+        <p>Perform clustering - either in the feature space or in the gradient space.</p>
+      </li>
+      <li>
+        <p>Use IQP to select the examples. This approach is not used for CIFAR10, as it is computationally costly.</p>
+      </li>
+      <li>
+        <p>It would be interesting if the paper had considered baselines like selecting samples which had the largest loss.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>The proposed greedy approach outperforms the other methods.</p>
+  </li>
+  <li>
+    <p>In an ablation experiment, the paper shows that the proposed approach works better than reservoir sampling (when the underlying data distribution is imbalanced).</p>
+  </li>
+  <li>
+    <p>Another experiment compares the proposed approach with <a href="https://papers.nips.cc/paper/7225-gradient-episodic-memory-for-continual-learning.pdf">Gradient Episodic Memory</a> and <a href="https://arxiv.org/abs/1611.07725">iCaRL</a>. For Permuted and Disjoint MNIST, the different methods perform quite similar though the proposed approach performs better on Disjoint CIFAR10.</p>
+  </li>
+</ul>
+
diff --git a/_site/site/2020/02/20/ELECTRA-Pre-training-Text-Encoders-as-Discriminators-Rather-Than-Generators.html b/_site/site/2020/02/20/ELECTRA-Pre-training-Text-Encoders-as-Discriminators-Rather-Than-Generators.html
new file mode 100644
index 00000000..4c11a803
--- /dev/null
+++ b/_site/site/2020/02/20/ELECTRA-Pre-training-Text-Encoders-as-Discriminators-Rather-Than-Generators.html
@@ -0,0 +1,150 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Masked Language Modeling (MLM) is a common technique for pre-training language-based models. The idea is to “corrupt” some tokens in the input text (around 15%) by replacing them with the [MASK] token and then training the network to reconstruct (or predict) the corrupted tokens.</p>
+  </li>
+  <li>
+    <p>Since the network learns from only about 15% of the tokens, the computational cost of training using MLM can be quite high.</p>
+  </li>
+  <li>
+    <p>The paper proposes to use a “replaced token detection” task where some tokens in the input text are replaced by other plausible tokens.</p>
+  </li>
+  <li>
+    <p>For each token in the modified text, the network has to predict if the token has been replaced or not.</p>
+  </li>
+  <li>
+    <p>The alternative token is generated using a small generator network.</p>
+  </li>
+  <li>
+    <p>Unlike the previous MLM setup, the proposed task is defined for all the input tokens, thus utilizing the training data more efficiently.</p>
+  </li>
+  <li>
+    <p><a href="https://openreview.net/forum?id=r1xMH1BtvB">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<ul>
+  <li>
+    <p>The proposed approach is called ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately)</p>
+  </li>
+  <li>
+    <p>Two neural networks - Generator (G) and Discriminator (D) are trained.</p>
+  </li>
+  <li>
+    <p>Each network has a Transformer-based text encoder that maps a sequence of words into a sequence of vectors.</p>
+  </li>
+  <li>
+    <p>Given an input sequence x (of length N), k indices are chosen for replacing the tokens.</p>
+  </li>
+  <li>
+    <p>For each index, the generator produces a distribution over tokens. A token is sampled to replace in the original sequence. The resulting sequence is referred to as the corrupted sequence.</p>
+  </li>
+  <li>
+    <p>Given the corrupted sequence, the Discriminator predicts which token comes from the data distribution and which comes from the generator.</p>
+  </li>
+  <li>
+    <p>The generator is trained using the MLM setup, and the Discriminator is trained using the discriminative loss.</p>
+  </li>
+  <li>
+    <p>After pre-training, only the Discriminator is finetuned on the downstream tasks.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>Datasets</p>
+
+    <ul>
+      <li>
+        <p>GLUE Benchmark</p>
+      </li>
+      <li>
+        <p>Stanford QA dataset</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Architecture Choices</p>
+
+    <ul>
+      <li>
+        <p>Sharing word embeddings between generator and Discriminator helps.</p>
+      </li>
+      <li>
+        <p>Tying all the encoder weights leads to marginal improvement but forces the generator and the Discriminator to be of the same size. Hence only embeddings are shared.</p>
+      </li>
+      <li>
+        <p>Generator model is kept smaller than the discriminator model as a strong generator can make the training difficult for the Discriminator.</p>
+      </li>
+      <li>
+        <p>A two-stage training procedure was explored where only the generator is trained for n steps. Then the weights of the generator are used to initialize the Discriminator. The Discriminator is then trained for n steps while keeping the generator fixed.</p>
+      </li>
+      <li>
+        <p>This two-stage setup provides a nice curriculum for the Discriminator but does not outperform the joint training based setup.</p>
+      </li>
+      <li>
+        <p>An adversarial loss based setup is also explored but it does not work well probably because of the following reasons:</p>
+
+        <ul>
+          <li>
+            <p>Adverserially trained generator is not as good as the MLM generator.</p>
+          </li>
+          <li>
+            <p>Adverserially trained generator produces a low entropy output distribution.</p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Results</p>
+
+    <ul>
+      <li>Both small and large ELECTRA models outperform baselines models like <a href="https://arxiv.org/abs/1810.04805">BERT</a>, <a href="https://arxiv.org/abs/1907.11692">RoBERTa</a>, <a href="https://arxiv.org/abs/1802.05365">ELMo</a> and <a href="https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf">GPT</a>.</li>
+    </ul>
+  </li>
+  <li>
+    <p>Ablations</p>
+
+    <ul>
+      <li>
+        <p>ELECTRA-15 is a variant of ELECTRA where the Discriminator is trained on only 15% of the tokens (similar to the MLM setup). This reduces performance significantly.</p>
+      </li>
+      <li>
+        <p>Replace MLM setup</p>
+
+        <ul>
+          <li>
+            <p>Perform MLM training, but instead of using [MASK], use a toke sampled from the generator.</p>
+          </li>
+          <li>
+            <p>This improves the performance marginally.</p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>All-token MLM</p>
+
+        <ul>
+          <li>
+            <p>In the MLM setup, replace the [MASK] token by the sampled tokens and train the MLM model to generate all the words.</p>
+          </li>
+          <li>
+            <p>In practice, the MLM model can either generate a word or copy the existing word.</p>
+          </li>
+          <li>
+            <p>This approach closes much of the gap between BERT and ELECTRA.</p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Interestingly, ELECTRA outperforms All-token MLM BERT suggesting the ELECTRA may be benefitting from parameter efficiency since it does not have to learn a distribution over all the words.</p>
+  </li>
+</ul>
diff --git a/_site/site/2020/02/27/mixup-Beyond-Empirical-Risk-Minimization.html b/_site/site/2020/02/27/mixup-Beyond-Empirical-Risk-Minimization.html
new file mode 100644
index 00000000..cd6e6b69
--- /dev/null
+++ b/_site/site/2020/02/27/mixup-Beyond-Empirical-Risk-Minimization.html
@@ -0,0 +1,71 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper proposes a simple and dataset-agnostic data augmentation mechanism called <em>mixup</em>.</p>
+  </li>
+  <li>
+    <p><a href="">Link to the paper</a></p>
+  </li>
+  <li>
+    <p>Consider two training examples, $(x_1, y_1)$ and $(y_1, y_2)$, where $x_1$ and $x_2$ are the datapoints and $y_1$ and $y_2$ are the labels.</p>
+  </li>
+  <li>
+    <p>New training examples of the form $(\lambda \times x_1 + (1-\lambda) \times x_2, \lambda \times y_1 + (1-\lambda) \times y_2)$ are constructured by considering the linear interpolation of the datapoints and the labels. Here $\lambda \in [0, 1]$.</p>
+  </li>
+  <li>
+    <p>$\lambda$ is sampled from a Beta distribution $Beta(\alpha, \alpha)$ where $\alpha \in (0, \infty)$.</p>
+  </li>
+  <li>
+    <p>Setting $\lambda$ to 0 or 1 eliminates the effect of <em>mixup</em>.</p>
+  </li>
+  <li>
+    <p>Mixup encourages the neural network to favor linear behavior between the training examples.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p><strong>Supervised Learning</strong></p>
+
+    <ul>
+      <li>
+        <p>ImageNet for ResNet-50, ResNet-101 and ResNext-101.</p>
+      </li>
+      <li>
+        <p>CIFAR10/CIFAR100 for PreAct ResNet-18, WideResNet-28-10 and DenseNet.</p>
+      </li>
+      <li>
+        <p>Google command dataset for LeNet and VGG.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>In all these setups, adding <em>mixup</em> improves the performance of the model.</p>
+  </li>
+  <li>
+    <p><em>Mixup</em> makes the model more robust to noisy labels. Moreover, <em>mixup</em> + dropout improves over <em>mixup</em> alone. This hints that <em>mixup</em>’s benefits are complementary to those of dropout.</p>
+  </li>
+  <li>
+    <p><em>Mixup</em> makes the network more robust to adversarial examples in both white-box and black-box settings (ImageNet + Resnet101).</p>
+  </li>
+  <li>
+    <p><em>Mixup</em> also stabilizes the training of GANs by acting as a regularizer for the gradient of the discriminator.</p>
+  </li>
+</ul>
+
+<h2 id="observations">Observations</h2>
+
+<ul>
+  <li>
+    <p>Convex combination of three or more examples (with weights sampled from a Dirichlet distribution) does not provide gains over the case of two examples.</p>
+  </li>
+  <li>
+    <p>In the authors’ implementation, <em>mixup</em> is applied between images of the same batch (after shuffling).</p>
+  </li>
+  <li>
+    <p>Interpolating only between inputs, with the same labels, did not lead to the same kind of gains as <em>mixup</em>.</p>
+  </li>
+</ul>
diff --git a/_site/site/2020/03/05/What-Does-Classifying-More-Than-10,000-Image-Categories-Tell-Us.html b/_site/site/2020/03/05/What-Does-Classifying-More-Than-10,000-Image-Categories-Tell-Us.html
new file mode 100644
index 00000000..ea9c212e
--- /dev/null
+++ b/_site/site/2020/03/05/What-Does-Classifying-More-Than-10,000-Image-Categories-Tell-Us.html
@@ -0,0 +1,44 @@
+<ul>
+  <li>
+    <p>The paper is among the first to study image classification at a large scale (10000 classes and 9 million examples).</p>
+  </li>
+  <li>
+    <p>This is a relatively old paper (2010). Some of the findings may not be relevant anymore. For instance, specific scaling challenges have been significantly overcome. Moreover, the paper uses approaches like SVM and KNN (popular at that time) and not use CNNs.</p>
+  </li>
+  <li>
+    <p>Other observations of the paper are still very relevant, and it is an educating paper. For example, since ImagetNet classes are based on WordNet, the paper looks at the effect of semantic relations (tree) of categories on the performance of the training models.</p>
+  </li>
+  <li>
+    <p><a href="http://openaccess.thecvf.com/content_cvpr_2015/papers/Jain_What_do_15000_2015_CVPR_paper.pdf">Link to the paper</a></p>
+  </li>
+  <li>
+    <p>The paper considers three variants of the ImageNet dataset - ImageNet 10K (10184 classes), ImageNet 7K (7404 classes) and ImageNet 1K (1000 classes).</p>
+  </li>
+  <li>
+    <p>They also consider smaller variants with randomly sampled classes or cases where the examples are sampled from one high-level category like vehicles.</p>
+  </li>
+  <li>
+    <p>SVM and KNN models are used with features like Bag of Words, GIST descriptors, and spatial pyramid of histograms.</p>
+  </li>
+  <li>
+    <p>Observations</p>
+
+    <ul>
+      <li>
+        <p>A model that performs well on the smaller dataset (with fewer classes) may not perform well on the larger dataset (with more classes).</p>
+      </li>
+      <li>
+        <p>There seems to be an approximate correlation between the structure of the semantic hierarchy of the labels (obtained via WordNet) and visual confusion between the categories.</p>
+      </li>
+      <li>
+        <p>For example, consider two high-level concepts - says artifacts and animals. The model is less likely to confuse between the classes across the high-level concepts but more likely to confuse between the classes in the respective concepts.</p>
+      </li>
+      <li>
+        <p>For dense categories (categories where the classes are semantically more closely related to each other), the model tends to make more mistakes (even if the number of classes is fewer).</p>
+      </li>
+      <li>
+        <p>Accounting for the label hierarchy (in the loss function) improves the classification performance.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
diff --git a/_site/site/2020/03/12/Competitive-Training-of-Mixtures-of-Independent-Deep-Generative-Models.html b/_site/site/2020/03/12/Competitive-Training-of-Mixtures-of-Independent-Deep-Generative-Models.html
new file mode 100644
index 00000000..66d161f2
--- /dev/null
+++ b/_site/site/2020/03/12/Competitive-Training-of-Mixtures-of-Independent-Deep-Generative-Models.html
@@ -0,0 +1,95 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper proposes a Competitive training mechanism to train a mixture of independent generative models.</p>
+  </li>
+  <li>
+    <p>The idea is that this mixture of different models would divide the data distribution amongst themselves and specialize to their respective splits.</p>
+  </li>
+  <li>
+    <p>The training procedure is related to clustering-based methods.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1804.11130">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="motivation">Motivation</h2>
+
+<ul>
+  <li>
+    <p>In causal modeling, a common assumption is that the data is generated by a set of independent mechanisms.</p>
+  </li>
+  <li>
+    <p>It is not known which mechanism generates which datapoint and recovering the underlying mechanisms can be modeled as learning a structural causal generative model.</p>
+  </li>
+</ul>
+
+<h2 id="setup">Setup</h2>
+
+<ul>
+  <li>
+    <p>The paper assumes that the support of the different generators do not overlap, i.e., the underlying data distribution is factorized into non-overlapping regions.</p>
+  </li>
+  <li>
+    <p>This data factorization is learned using a set of discriminators.</p>
+  </li>
+  <li>
+    <p>If there are $k$ generators, $k$ binary partition functions $c_i, … c_k$ are used.</p>
+  </li>
+  <li>
+    <p>For a given datapoint $x$, if $c_i(x) = 1$ then $c_j(x) = 0$ for all other $j$ and $x$ is assigned to $i^{th}$ generator.</p>
+  </li>
+  <li>
+    <p>For a fixed partition function $c_j^t$ ($t$ denotes the partition function at time $t$), minimize the sum of f-divergence between the model and the data distribution (that is assigned to it). The loss formulation is an upper bound on the f-divergence of the mixture model.</p>
+  </li>
+  <li>
+    <p>In the next step, the data points are re-assigned to the generative models, based on the likelihood of each data point for each model.</p>
+  </li>
+  <li>
+    <p>The likelihood is estimated by training a discriminator that can distinguish the generated samples from the real samples.</p>
+  </li>
+</ul>
+
+<h3 id="independence-as-an-inductive-bias">Independence as an inductive bias</h3>
+
+<ul>
+  <li>
+    <p>The independence assumption may be too restrictive because the low-level features will be common across the distribution splits.</p>
+  </li>
+  <li>
+    <p>This “violation” can be avoided by pretraining the model using a uniform random split of the dataset. In that case, the independence assumption will hold approximately after pretraining.</p>
+  </li>
+  <li>
+    <p>Another approach could be to share some parameters across the models.</p>
+  </li>
+  <li>
+    <p>A “load balancing” approach is also used where each model always keeps training on the data points assigned to it if not enough data points are assigned to it.</p>
+  </li>
+</ul>
+
+<h3 id="comparison-to-vaes-and-gans">Comparison to VAEs and GANs</h3>
+
+<ul>
+  <li>
+    <p>VAEs tend to be “overly inclusive” of the training distribution, i.e., they try to cover the entire support of the distribution.</p>
+  </li>
+  <li>
+    <p>GANs are prone to mode collapse where the model focuses only on one part of the distribution.</p>
+  </li>
+  <li>
+    <p>The proposed method provides a middle ground where the different generative models can focus on different parts of the distribution.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>The experiments seem to be limited. The paper shows that their proposed setup improves over the VAE and GAN baselines.</p>
+  </li>
+  <li>
+    <p>For datasets, the paper uses two-dimensional synthetic data, MNIST and CelebA</p>
+  </li>
+</ul>
diff --git a/_site/site/2020/04/09/CURL-Contrastive-Unsupervised-Representations-for-Reinforcement-Learning.html b/_site/site/2020/04/09/CURL-Contrastive-Unsupervised-Representations-for-Reinforcement-Learning.html
new file mode 100644
index 00000000..49a83345
--- /dev/null
+++ b/_site/site/2020/04/09/CURL-Contrastive-Unsupervised-Representations-for-Reinforcement-Learning.html
@@ -0,0 +1,116 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper proposes a contrastive learning approach, called CURL, for performing off-policy control from raw pixel observations (by transforming them into high dimensional features).</p>
+  </li>
+  <li>
+    <p>The idea is motivated by the application of contrastive losses in computer vision. But there are additional challenges:</p>
+
+    <ul>
+      <li>
+        <p>The learning agent has to perform both unsupervised and reinforcement learning.</p>
+      </li>
+      <li>
+        <p>The “dataset” for unsupervised learning is not fixed and keeps changing with the policy of the agent.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Unlike prior work, CURL introduces fewer changes in the underlying RL pipeline and provides more significant sample efficiency gains. For example, CURL (trained on pixels) nearly matches the performance of SAC policy (trained on state-based features).</p>
+  </li>
+  <li>
+    <p><a href="https://github.com/MishaLaskin/curl">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="implementation">Implementation</h2>
+
+<ul>
+  <li>
+    <p>CURL uses instance discrimination. Deep RL algorithms commonly use a stack of temporally consecutive frames as input to the policy. In such cases, instance discrimination is applied to all the images in the stack.</p>
+  </li>
+  <li>
+    <p>For generating the positive and negative samples, random crop data augmentation is used.</p>
+  </li>
+  <li>
+    <p>Bilinear inner product is used as the similarity metric as it outperforms the commonly used normalized dot product.</p>
+  </li>
+  <li>
+    <p>For encoding the anchors and the samples, InfoNCE is used. It learns two encoders $f_q$ and $f_k$ that transform the query (base input) and the key (positive/negative samples) into latent representations. The similarity loss is applied to these latents.</p>
+  </li>
+  <li>
+    <p>Momentum contrast is used to update the parameters ($\theta_k$) of the $f_k$ network. ie $\theta_k = m \theta_k + (1-m) \theta_q$. $\theta_q$ are the parameters of the $f_q$ network and are updated in the usual way, using both the contrastive loss and the RL loss.</p>
+  </li>
+</ul>
+
+<h2 id="experiment">Experiment</h2>
+
+<ul>
+  <li>
+    <p>DMControl100K and Atart100K refer to the setups where the agent is trained for 100K steps on DMControl and Atari, respectively.</p>
+  </li>
+  <li>
+    <p>Metrics:</p>
+
+    <ul>
+      <li>
+        <p>Sample Efficiency - How many steps does the baseline need to match CURL’s performance after 100K steps.</p>
+      </li>
+      <li>
+        <p>Performance - Ratio of episodic returns by CURL vs. the baseline after 100K steps.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Baselines:</p>
+
+    <ul>
+      <li>
+        <p>DMControl</p>
+
+        <ul>
+          <li><a href="https://arxiv.org/abs/1910.01741">SAC-AE</a></li>
+          <li><a href="https://arxiv.org/abs/1907.00953">SLAC</a></li>
+          <li><a href="https://planetrl.github.io/">PlaNet</a></li>
+          <li><a href="https://openreview.net/forum?id=S1lOTC4tDS">Dreamer</a></li>
+          <li><a href="https://arxiv.org/abs/1812.05905">Pixel SAC</a></li>
+          <li>SAC trained on state-space observations</li>
+        </ul>
+      </li>
+      <li>
+        <p>Atari</p>
+
+        <ul>
+          <li><a href="https://arxiv.org/abs/1903.00374">SimPLe</a></li>
+          <li><a href="https://arxiv.org/abs/1710.02298">RainbowDQN</a></li>
+          <li><a href="https://openreview.net/forum?id=Bke9u1HFwB">OTRainbow (Over Trained Rainbow)</a></li>
+          <li><a href="https://arxiv.org/abs/1906.05243">Efficient Rainbow</a></li>
+          <li>Random Agent</li>
+          <li>Human Performance</li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Results</p>
+
+    <ul>
+      <li>
+        <p>DM Control</p>
+
+        <ul>
+          <li>
+            <p>CURL outperforms all pixel-based RL algorithms by a significant margin for all environments on DMControl and most environments on Atari.</p>
+          </li>
+          <li>
+            <p>On DMControl, it closely matches the performance of the SAC agent trained on state-space observations.</p>
+          </li>
+          <li>
+            <p>On Atari, it achieves better median human normalizes score (HNS) than the other baselines and close to human efficiency in three environments.</p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+</ul>
diff --git a/_site/site/2020/04/30/Supervised-Contrastive-Learning.html b/_site/site/2020/04/30/Supervised-Contrastive-Learning.html
new file mode 100644
index 00000000..0c461c9a
--- /dev/null
+++ b/_site/site/2020/04/30/Supervised-Contrastive-Learning.html
@@ -0,0 +1,110 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper builds on the prior work on self-supervised contrastive learning and extends it for the supervised learning case where many positive examples are available for each anchor.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/2004.11362">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<ul>
+  <li>The representation learning framework has the following components:</li>
+</ul>
+
+<h3 id="data-augmentation-module">Data Augmentation Module</h3>
+
+<ul>
+  <li>
+    <p>This module transforms the input example. The paper considers the following strategies:</p>
+
+    <ul>
+      <li>Random crop, followed by resizing</li>
+      <li><a href="https://arxiv.org/abs/1805.09501">Auto Augment</a> - A method to search for data augmentation strategies.</li>
+      <li><a href="https://arxiv.org/abs/1909.13719">Rand Augment</a> - Randomly sampling a sequence of data augmentations, with repetition</li>
+      <li>SimAugment - Sequentially apply random color distortion and Gaussian blurring, followed by probabilistic sparse image wrap.</li>
+    </ul>
+  </li>
+</ul>
+
+<h3 id="encoder-network">Encoder Network</h3>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>* This module maps the input to a latent representation.
+
+* The same network is used to encode both the anchor and the sample.
+
+* The representation vector is normalized to lie on the unit hypersphere.
+</code></pre></div></div>
+
+<h3 id="projection-network">Projection Network</h3>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>* This module maps the normalized representation to another representation, on which the contrastive loss is computed.
+
+* This network is only used for training the supervised contrastive loss.
+</code></pre></div></div>
+
+<h3 id="loss-function">Loss function</h3>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>* The paper extends the standard contrastive loss formulation to handle multiple positive examples.
+
+* The main effect is that the modified loss accounts for all the same-class pairs (from within the sampled batch as well as the augmented batch).
+
+* The paper shows that the gradient (corresponding to the modified loss) causes the learning to focus more on hard examples. "Hard" cases are the ones where contrasting the anchor benefits the encoder more.
+
+* The proposed loss can also be seen as a generalization of the triplet loss.
+</code></pre></div></div>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>Dataset - ImageNet</p>
+  </li>
+  <li>
+    <p>Models - ResNet50, ResNet200</p>
+  </li>
+  <li>
+    <p>The network is “pretrained” using supervised contrastive loss.</p>
+  </li>
+  <li>
+    <p>After pre-training, the projection network is removed, and a linear classifier is added.</p>
+  </li>
+  <li>
+    <p>This classifier is trained with the CE loss while the rest of the network is kept fixed.</p>
+  </li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<ul>
+  <li>
+    <p>Using supervised contrastive loss improves over all the baseline models and data augmentation approaches.</p>
+  </li>
+  <li>
+    <p>The resulting classifier is more robust to image corruptions, as shown by the mean Corruption Error (mCE) metric on the ImageNet-C dataset.</p>
+  </li>
+  <li>
+    <p>The model is more stable to the choice oh hyperparameter values (like optimizers, data augmentation, and learning rates).</p>
+  </li>
+</ul>
+
+<h2 id="training-details">Training Details</h2>
+
+<ul>
+  <li>
+    <p>Supervised Contrastive loss is trained for 700 epochs during pre-training.</p>
+  </li>
+  <li>
+    <p>Each step is about 50% more expensive than performing CE.</p>
+  </li>
+  <li>
+    <p>The dense classifier layer can be trained in as few as ten epochs.</p>
+  </li>
+  <li>
+    <p>The temperature value is set to 0.07. Using a lower temperature is better than using a higher temperature.</p>
+  </li>
+</ul>
+
diff --git a/_site/site/2020/06/18/On-the-Difficulty-of-Warm-Starting-Neural-Network-Training.html b/_site/site/2020/06/18/On-the-Difficulty-of-Warm-Starting-Neural-Network-Training.html
new file mode 100644
index 00000000..b38c47dc
--- /dev/null
+++ b/_site/site/2020/06/18/On-the-Difficulty-of-Warm-Starting-Neural-Network-Training.html
@@ -0,0 +1,136 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper considers learning scenarios where the training data is available incrementally (and not at once).</p>
+  </li>
+  <li>
+    <p>For example, in some applications, new data is available periodically (e.g., latest news articles come out every day).</p>
+  </li>
+  <li>
+    <p>The paper highlights that, in such scenarios, the conventional wisdom of “warm start” does not apply.</p>
+  </li>
+  <li>
+    <p>When new data is available, it is better to train a new model from scratch than to update the model trained on previously available data.</p>
+  </li>
+  <li>
+    <p>While the two setups lead to similar training performance, the randomly initialized model has a much better generalization performance.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1910.08475">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="basic-batch-updating">Basic Batch Updating</h2>
+
+<ul>
+  <li>
+    <p>Create two random, equally-sized partitions of the training data.</p>
+  </li>
+  <li>
+    <p>Train the model till convergence on the first half of the data. Then train the model on the entire dataset.</p>
+  </li>
+  <li>
+    <p>Models: ResNet18, MLPs, Logisitic Regression (LR)</p>
+  </li>
+  <li>
+    <p>Dataset: CIFAR10, CIFAR100, SVHN</p>
+  </li>
+  <li>
+    <p>Optimizers: Adam, SGD</p>
+  </li>
+  <li>
+    <p>Warm starting hurts generalization in all the cases.</p>
+  </li>
+  <li>
+    <p>The effect is more pronounced in the case of ResNets and MLPs (compared to LR) and harder CIFAR 10 dataset (as compared to SVHN dataset).</p>
+  </li>
+</ul>
+
+<h2 id="online-learning">Online Learning</h2>
+
+<h3 id="passive-online-learning">Passive Online Learning</h3>
+
+<ul>
+  <li>
+    <p>The model is given access to k new learning examples at each iteration.</p>
+  </li>
+  <li>
+    <p>A warm started model reuses the previously initialized model and trains (till convergence) on the new batch of k items.</p>
+  </li>
+  <li>
+    <p>A “randomly initialized” model is trained on all the examples (seen so far) from scratch.</p>
+  </li>
+  <li>
+    <p>Dataset: CIFAR10</p>
+  </li>
+  <li>
+    <p>Model: ResNet18</p>
+  </li>
+  <li>
+    <p>As more training data becomes available, the generalization gap between the two setups increases, and warmup starts hurting generalization.</p>
+  </li>
+</ul>
+
+<h3 id="active-online-learning">Active Online Learning</h3>
+
+<ul>
+  <li>
+    <p>In this setup, the learner is trained to sample k new examples to add to the training dataset (using margin-based sampling).</p>
+  </li>
+  <li>
+    <p>Like the previous setup, warmup strategy still hurts generalization.</p>
+  </li>
+</ul>
+
+<h2 id="transfer-learning">Transfer Learning</h2>
+
+<ul>
+  <li>
+    <p>Train a Resnet18 model on the CIFAR10 dataset and use this model to warm start training on the SVHN dataset.</p>
+  </li>
+  <li>
+    <p>When a small percentage of the SVHN dataset is used, the setup resembles pretraining / transfer learning and performs better than training from scratch.</p>
+  </li>
+  <li>
+    <p>As the percentage of the SVHN dataset increases, the warmup approach starts underperforming.</p>
+  </li>
+</ul>
+
+<h2 id="overcoming-warm-start-problem">Overcoming warm start problem</h2>
+
+<ul>
+  <li>
+    <p>ResNet18 model on CIFAR10 dataset</p>
+  </li>
+  <li>
+    <p>When performing a hyper-parameter sweep over the learning rate and batch size, it is possible to train warm start models to reach the same generalization performance as training from scratch.</p>
+  </li>
+  <li>
+    <p>Though, in that case, there are no computational savings as the warm-started models take about the same time (to converge) as the randomly initialized model.</p>
+  </li>
+  <li>
+    <p>The increased training time indicates that the warm started model probably needs to forget the knowledge from previous training rounds.</p>
+  </li>
+  <li>
+    <p>Warm start Resnet models, that generalize well, have a low correlation to their initialization stage (measured via Pearson correlation coefficient between the model weights).</p>
+  </li>
+  <li>
+    <p>Generalization is damaged even when using a model trained on incomplete data for only a few epochs.</p>
+  </li>
+  <li>
+    <p>For warm start models, the gradient (corresponding to the “new” data) is higher than that for randomly initialized models. This hints that regularisation may help to close the generalization gap. But in practice, regularization helps both the warmup and randomly initialized model.</p>
+  </li>
+  <li>
+    <p>Warm starting only a few layers also does not close the gap.</p>
+  </li>
+  <li>
+    <p>Adding some noise to the warm started model (with the motivation of having a partially random initialization) does help somewhat but also increases the training time.</p>
+  </li>
+  <li>
+    <p>Motivating the problem as an instance of catastrophic forgetting, the authors use the EWC algorithm but report that using EWC hurts model performance.</p>
+  </li>
+  <li>
+    <p>The paper does not propose a solution to the problem but provides a thorough analysis of the problem setup, which is quite useful for understanding the phenomenon itself.</p>
+  </li>
+</ul>
diff --git a/_site/site/2020/06/25/Network-Randomization-A-Simple-Technique-for-Generalization-in-Deep-Reinforcement-Learning.html b/_site/site/2020/06/25/Network-Randomization-A-Simple-Technique-for-Generalization-in-Deep-Reinforcement-Learning.html
new file mode 100644
index 00000000..a63fdaa9
--- /dev/null
+++ b/_site/site/2020/06/25/Network-Randomization-A-Simple-Technique-for-Generalization-in-Deep-Reinforcement-Learning.html
@@ -0,0 +1,78 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper proposed a Technique for improving the generalization ability of RL agents when evaluated on an unseen environment (which is similar to the training environment).</p>
+  </li>
+  <li>
+    <p><a href="https://openreview.net/forum?id=HJgcvJBFvB">Link to the paper</a></p>
+  </li>
+  <li>
+    <p><a href="https://github.com/pokaxpoka/netrand">Link to the code</a></p>
+  </li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<ul>
+  <li>
+    <p>The key idea is to learn features that are invariant across environments by using a randomized CNN (<em>f</em>) that randomly perturbs the inputs.</p>
+  </li>
+  <li>
+    <p>The policy is trained using the randomized observations obtained using <em>f</em>.</p>
+  </li>
+  <li>
+    <p>Invariant features are learned using a feature matching (FM) loss that matches the feature representation of the original and randomized observations.</p>
+  </li>
+  <li>
+    <p>The random network’s parameters are initialized as $\alpha I + (1 - \alpha) N(0, \sqrt\frac{2}{n_{in} + n_{out}})$ where $\alpha \in [0, 1]$, $N$ denotes the Gaussian Distribution and $n_{in}, n_{out}$ denote the number of input and output channels respectively.</p>
+  </li>
+  <li>
+    <p>Xavier Normal distribution is used for randomization to maintain the variance between the input and the randomized input.</p>
+  </li>
+  <li>
+    <p><em>f</em> is randomized per iteration.</p>
+  </li>
+  <li>
+    <p>During inference, the expected action is computed by approximating over <em>M</em> samples (i.e., randomizing the input <em>M</em> times).</p>
+  </li>
+</ul>
+
+<h2 id="environments">Environments</h2>
+
+<ul>
+  <li>
+    <p>2D CoinRun, 3D DeepMind Lab, 3D Robotics Control Task</p>
+  </li>
+  <li>
+    <p>The evaluation environments consist of different styles of backgrounds, objects, and floors.</p>
+  </li>
+</ul>
+
+<h2 id="baselines">Baselines</h2>
+
+<ul>
+  <li>
+    <p>Regularization methods: Dropout, L2 regularization, Batch Normalization</p>
+  </li>
+  <li>
+    <p>Dataset Augmentation methods: Cutout, Gray out, Inversion, Color Jitter</p>
+  </li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<ul>
+  <li>
+    <p>On CoinRun, the proposed approaches significantly outperforms the other baselines during evaluation. The performance improvement saturates around 10 <em>M</em> samples.</p>
+  </li>
+  <li>
+    <p>Cycle consistency is used to measure the similarity between two trajectories. The proposed method improves the cycle consistency as compared to the vanilla PPO baseline. It also produces sharper activation maps in the evaluation environments.</p>
+  </li>
+  <li>
+    <p>For the large-scale experiments, when evaluated on 500 levels of CoinRun, the proposed method improves the success rates from 39.8% to 58.7%.</p>
+  </li>
+  <li>
+    <p>On DeepMind Lab and Surreal robotics control tasks, the proposed method leads to agents that generalize better on the unseen environments (during evaluation).</p>
+  </li>
+</ul>
diff --git a/_site/site/2020/07/02/When-to-use-parametric-models-in-reinforcement-learning.html b/_site/site/2020/07/02/When-to-use-parametric-models-in-reinforcement-learning.html
new file mode 100644
index 00000000..60fae34b
--- /dev/null
+++ b/_site/site/2020/07/02/When-to-use-parametric-models-in-reinforcement-learning.html
@@ -0,0 +1,106 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper compares replay-based approaches with model-based approaches in Reinforcement Learning (RL).</p>
+  </li>
+  <li>
+    <p>It hypothesizes that if the parametric model is only used for generation transitions for the update rule, then under certain conditions, replay-based approaches will be as good as model-based approaches.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1906.05243">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="terminology">Terminology</h2>
+
+<ul>
+  <li>
+    <p>Planning: Any algorithm that uses additional computations (but not additional experience) to improve its performance.</p>
+  </li>
+  <li>
+    <p>Learning: Any algorithm that uses additional experience to improve its performance.</p>
+  </li>
+  <li>
+    <p>In some cases, a replay buffer can be seen as a model. For example, querying using state-action pair (from the replay buffer) is similar to querying the (expected) next-state and reward from a model. In general, the model will be more flexible as any arbitrary state-action pair can be used for querying.</p>
+  </li>
+</ul>
+
+<h2 id="computation-properties">Computation Properties</h2>
+
+<ul>
+  <li>
+    <p>Parametric models require more computation than sampling from a replay buffer. In contrast, the cost of maintaining a replay buffer scales linearly with their capacity.</p>
+  </li>
+  <li>
+    <p>Parametric models are useful for planning multiple-steps into the future while it is much harder to do so with a replay buffer (even more so with pixel observations).</p>
+  </li>
+  <li>
+    <p>An imperfect model maybe be more suitable for selecting actions (instead of updating the policy) because the chosen action, when executed in the environment, will lead to transitions that would improve the model.</p>
+  </li>
+  <li>
+    <p>When planning with an imperfect model, it is better to plan backward, as the update is applied on an imaginary state (which would not be encountered if the model is poor).</p>
+  </li>
+  <li>
+    <p>If the model is accurate, forward and backward planning is equivalent. This distinction between forward and backward updates does not apply to replay buffers.</p>
+  </li>
+</ul>
+
+<h2 id="failure-to-learn">Failure to learn</h2>
+
+<ul>
+  <li>
+    <p>When using a replay buffer and (i) uniformly replaying transitions, (ii) from a buffer containing only full episodes, and (iii) using TD updates, then the algorithm is stable.</p>
+  </li>
+  <li>
+    <p>When using a replay buffer and (i) uniformly replaying transitions, (ii) generating transitions using a model, and (iii) using TD updates, then the algorithm can diverge.</p>
+  </li>
+  <li>
+    <p>This case can be fixed by:</p>
+
+    <ul>
+      <li>
+        <p>Repeatedly interating over the model and sampling transitions <em>to</em> and <em>from</em> the state model generates (not a satisfactory solution).</p>
+      </li>
+      <li>
+        <p>Using multiple-step returns (this can increase the variance).</p>
+      </li>
+      <li>
+        <p>Use algorithms specifically for stable off-policy learning (not a definitive solution).</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="model-based-algorithms-at-scale">Model-based algorithms at scale</h2>
+
+<ul>
+  <li>
+    <p>The paper compares against SimPLe (model-based) with Rainbow DQN (replay-based).</p>
+  </li>
+  <li>
+    <p>The paper shows that when using a similar number of real interactions, Rainbow DQN needs fewer replay samples than model samples in SimPLe, making it more efficient (computation-wise).</p>
+  </li>
+  <li>Changes to Rainbow DQN:
+    <ul>
+      <li>Increase number of steps, for bootstrapping, from 3 to 20.</li>
+      <li>Reduce the number of steps, before sampling starts from the replay buffer, from 20K to 1600.</li>
+    </ul>
+  </li>
+  <li>With these changes, Rainbow DQN outperforms SimPLe in 17 out of 26 games.</li>
+</ul>
+
+<h2 id="conclusion">Conclusion</h2>
+
+<ul>
+  <li>
+    <p>When using a parametric model in a replay-like setting (sampling observed states from the past), model-based learning can be unstable (in theory). Using a replay buffer is likely a better strategy under the state sampling distribution.</p>
+  </li>
+  <li>
+    <p>Parametric models are likely more useful when:</p>
+    <ul>
+      <li>planning backward for credit assignment - even if the model is in-accurate, backward planning will only update fictional states.</li>
+      <li>planning forward for behavior - the resulting plan is only used to collect real <em>experience</em> in the environment (and not directly update the policy).</li>
+    </ul>
+  </li>
+</ul>
diff --git a/_site/site/2020/07/09/Decentralized-Reinforcement-Learning-Global-Decision-Making-via-Local-Economic-Transactions.html b/_site/site/2020/07/09/Decentralized-Reinforcement-Learning-Global-Decision-Making-via-Local-Economic-Transactions.html
new file mode 100644
index 00000000..78bb1586
--- /dev/null
+++ b/_site/site/2020/07/09/Decentralized-Reinforcement-Learning-Global-Decision-Making-via-Local-Economic-Transactions.html
@@ -0,0 +1,134 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper explores the connections between the concepts of a single agent vs. society of agents.</p>
+  </li>
+  <li>
+    <p>A society of agents can be modeled as a single agent while a single agent can be modeled as a society of components (or sub-agents).</p>
+  </li>
+  <li>
+    <p>The paper focuses on mechanisms for training a society of self-interested agents to solve a given task – as if the system was a single task.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/2007.02382">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="contributions">Contributions</h2>
+
+<ul>
+  <li>
+    <p><strong>Societal-decision making</strong> framework relates the local optimization problem of a single agent with the global optimization problem of a society of agents.</p>
+  </li>
+  <li>
+    <p><strong>Cloned Vickrey Society</strong> is proposed as a mechanism to guarantee that an agent’s dominant strategy equilibrium coincides with the group’s optimal policy.</p>
+  </li>
+  <li>
+    <p>A class of <strong>decentralized RL algorithms</strong> that optimize the MDP object of the society as a whole, as a consequence of individual agents optimizing their objectives.</p>
+  </li>
+  <li>
+    <p>Empirical evaluation of Cloned Vickrey Society using any implementation called <strong>Credit Conserving Vickery</strong>.</p>
+  </li>
+</ul>
+
+<h2 id="terminology">Terminology</h2>
+
+<ul>
+  <li>
+    <p><em>Environment</em> - a tuple that specifies an input space, an output space, and parameters for determining an objective.</p>
+
+    <ul>
+      <li>A standard RL setup can be mapped to <em>environment</em> by mapping state space to input space, action space to output space and reward function, transition function, and discount factors to the parameters specifying the objective.</li>
+    </ul>
+  </li>
+  <li>
+    <p><em>Agent</em> - a function that maps input space to output space.</p>
+  </li>
+  <li>
+    <p><em>Objective</em> - a functional that maps an agent to a real number.</p>
+  </li>
+  <li>
+    <p>In <em>auction environments</em>, the input space is a single auction item (say <em>s</em>), and the output space is bidding space <em>B</em>.</p>
+  </li>
+  <li>
+    <p>There are <em>N</em> agents who compete by bidding for an item <em>s</em> using their bidding policy.</p>
+  </li>
+  <li>
+    <p>$b$ is a vector of bids produced by the agents.</p>
+  </li>
+  <li>
+    <p>$v_s$ is a vector of agent’s valuations of item <em>s</em>.</p>
+  </li>
+  <li>
+    <p>The $i^{th}$ agent’s utility is given as $v_s^i \times X^i(b) - P^i(b)$. Here, $X^i(b)$ is the portion of $s$ allocated to $i^{th}$ agent and $P^i(b)$ is the price that $i^{th}$ agent is willing to pay.</p>
+  </li>
+</ul>
+
+<h2 id="design-choices">Design Choices</h2>
+
+<ul>
+  <li>
+    <p>Each agent is independently maximizing its utility.</p>
+  </li>
+  <li>
+    <p>In certain conditions (i.e., if the auction is dominant strategy incentive compatible), it is optimal for each agent to bid its valuation.</p>
+  </li>
+  <li>
+    <p>These conditions are satisfied by the Vickery auction where $P^i(b)$ is set to be the second-highest bid and $X^i(b) = 1$ if the $i^{th}$ agent wins (and 0 otherwise).</p>
+  </li>
+  <li>
+    <p>A <em>society</em> is a set of agents where each agent is a tuple of bidding policy $\psi$ and a transformation function.</p>
+  </li>
+  <li>
+    <p>The environment is modeled at two levels - (i) global environment (referred to as the global MDP) and local environment (referred to as local auction).</p>
+  </li>
+  <li>
+    <p>Each state $s$ in the global MDP is an auction item in a different auction. The winner (of local auction at $s$) transforms $s$ into some other state $s’$.</p>
+  </li>
+  <li>
+    <p>If these transformations are modeled as actions, then the proposed framework can be interpreted as a decentralized reinforcement learning framework.</p>
+  </li>
+  <li>
+    <p>Motivated by the design of market economy (where economic transactions determine wealth distribution), the paper proposes that, for an agent, the valuation of winning an auction is the revenue it can receive in the auction at the next timestep by selling the transformed state.</p>
+  </li>
+  <li>
+    <p>A global MDP that adhere to this design is referred to as the Market MDP.</p>
+  </li>
+  <li>
+    <p>There is a catch in the design of the market MDP - the winning agent, at time $t-1$, gets the amount that the highest bidder is willing to pay at time $t+1$. But the winner at time $t+1$ only paid the second-highest bid. Hence, the credit is not conserved.</p>
+  </li>
+  <li>
+    <p>This inconsistency can be fixed by introducing “duplicate” (or cloned) agents, and the society is called the Cloned Vickery Society.</p>
+  </li>
+  <li>
+    <p>The Cloned Vickrey Auction mechanism is compared against alternate bidding mechanisms like <em>first price auction</em> (where winner pays the bid they proposed), solitary version of Vickrey auction (no cloning), and <em>Environment Reward</em> where only environment reward is used, and there is no price term.</p>
+  </li>
+  <li>
+    <p>It is empirically shown that Cloned Vickrey Auction learns bids that are most close to their actual valuations. Moreover, solitary version leads bids which are more spread out than the ones learned by cloned version. This highlights the importance of competitive pressure to learn bid values.</p>
+  </li>
+  <li>
+    <p>Three different implementations of Cloned Vickrey Auction are considered:</p>
+
+    <ul>
+      <li>
+        <p>Bucket Brigade (BB) - winner at timestep $t$ receives the highest bid at time step $t+1$, and the subsequent winner pays the highest bid. This case satisfies Credit Conservation and Bellman Optimality.</p>
+      </li>
+      <li>
+        <p>Vickrey (V) - winner at timestep $t$ receives the highest bid at time step $t+1$, and the subsequent winner pays the second-highest bid. This case satisfies Truthful Dominant Strategy and Bellman Optimality.</p>
+      </li>
+      <li>
+        <p>Credit Conserving Vickrey (CCV) - winner at timestep $t$ receives the second-highest bid at time step $t+1$, and the subsequent winner pays the second-highest bid. This case satisfies Truthful Dominant Strategy and Credit Conservation.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>CCV implementation provides bid values closest to the optimal Q-values.</p>
+  </li>
+  <li>
+    <p>In one experiment, the paper explores the use of the proposed approach for selecting between sub-policies. It shows that CVV is more sample efficient for pretraining sub-policies and adapting them to transfer tasks.</p>
+  </li>
+  <li>
+    <p>In another experiment, the task is to transform MNIST images by composing two out of 6 affine transformations. The transformed images are fed to a pretrained classifier that predicts a label. The agent gets a reward of 1 if the classifier makes correct prediction and 0 otherwise. CCV implementation obtains a mean reward of 0.933, thus highlighting the effectiveness of the CCV model.</p>
+  </li>
+</ul>
diff --git a/_site/site/2020/07/16/Averaging-Weights-leads-to-Wider-Optima-and-Better-Generalization.html b/_site/site/2020/07/16/Averaging-Weights-leads-to-Wider-Optima-and-Better-Generalization.html
new file mode 100644
index 00000000..371b9cc9
--- /dev/null
+++ b/_site/site/2020/07/16/Averaging-Weights-leads-to-Wider-Optima-and-Better-Generalization.html
@@ -0,0 +1,91 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper proposes Stochastic Weight Averaging (SWA) procedure for improving the generalization performance of models trained with SGD (with cyclic or constant learning rate).</p>
+  </li>
+  <li>
+    <p>Specifically, the model is checkpointed at several points along the training trajectory, and these checkpoints are averaged (in the parameter space) to obtain a single model.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1803.05407">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="idea">Idea</h2>
+
+<ul>
+  <li>
+    <p>“Stochastic” in the name refers to the idea that with cyclical or constant learning rate, SGD proposals are approximately sampled from a neural network’s loss surface and are hence stochastic.</p>
+  </li>
+  <li>
+    <p>SWA uses a learning rate schedule that allows exploration in the weight space.</p>
+  </li>
+  <li>
+    <p>SGD with cyclical and constant learning rates explore points (model instances) at the periphery of high-performing networks.</p>
+  </li>
+  <li>
+    <p>With different initializations, SGD will find different points (of low training loss) on this boundary, but will not move inside it.</p>
+  </li>
+  <li>
+    <p>Averaging the points provide a mechanism to move inside this periphery.</p>
+  </li>
+  <li>
+    <p>The train and the test error surfaces, while being similar, are not perfectly aligned. Hence, averaging several models (along the optimization trajectory) could lead to a more robust model.</p>
+  </li>
+</ul>
+
+<h2 id="algorithm">Algorithm</h2>
+
+<ul>
+  <li>
+    <p>Given a model $w$ and some training budget $B$, train the model in the conventional way for approx 75% of the budget.</p>
+  </li>
+  <li>
+    <p>Starting from that point, continue training with the remaining budget, with a constant or cyclical learning rate.</p>
+  </li>
+  <li>
+    <p>For fixed learning rate, checkpoint models at each epoch. For cyclical learning rate, checkpoint the model at the lowest learning rate in the cycle.</p>
+  </li>
+  <li>
+    <p>Average all the models to get the SWA model.</p>
+  </li>
+  <li>
+    <p>If the model has Batch Normalization layers, run an additional pass to compute the SWA model’s running mean and standard deviation.</p>
+  </li>
+  <li>
+    <p>The computational and space complexity of computing the SWA model is relatively low.</p>
+  </li>
+  <li>
+    <p>The paper highlights the ensembling like the effect of SWA by showing that if the model checkpoints ($w_i$) are generated by training with Fast Geometric Ensembling (FGE), the difference between averaging the weights and averaging the predictions is of the order $O(\Delta)$ where $\Delta = max ||w_i - w_{SA}||$.</p>
+  </li>
+  <li>
+    <p>Note that SWA does not have the overhead of an extra-forward pass during inference.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>Datasets: CIFAR10, CIFAR100, ImageNet</p>
+  </li>
+  <li>
+    <p>Models: VGG16, WideResNet, 164-layer preactivation ResNet, ShakeShake, Pyramid Net.</p>
+  </li>
+  <li>
+    <p>Baselines: Conventional SGD, Exponentially decaying average with SGD and FGE.</p>
+  </li>
+  <li>
+    <p>In all the CIFAR experiments, SWA consistently outperforms SGD in one budget and consistently improves with training.</p>
+  </li>
+  <li>
+    <p>SWA also achieves performance comparable to FGE, despite FGE being an ensemble method.</p>
+  </li>
+  <li>
+    <p>On ImageNet, SWA is run on a pre-trained model, and it improves performance in all the cases.</p>
+  </li>
+  <li>
+    <p>An ablation experiment (on CIFAR-100) shows that it is possible to train a network (with SWA) using a fixed learning rate. In that setup, using SWA improves performance by 16%.</p>
+  </li>
+</ul>
diff --git a/_site/site/2020/07/23/TASKNORM-Rethinking-Batch-Normalization-for-Meta-Learning.html b/_site/site/2020/07/23/TASKNORM-Rethinking-Batch-Normalization-for-Meta-Learning.html
new file mode 100644
index 00000000..4c229152
--- /dev/null
+++ b/_site/site/2020/07/23/TASKNORM-Rethinking-Batch-Normalization-for-Meta-Learning.html
@@ -0,0 +1,168 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Meta-learning techniques are shown to benefit from the use of deep neural networks.</p>
+  </li>
+  <li>
+    <p>BatchNorm is a commonly used component when training deep networks, especially for vision tasks.</p>
+  </li>
+  <li>
+    <p>However, BatchNorm and meta-learning make contradictory assumptions, and their combination may not work well in practice.</p>
+  </li>
+  <li>
+    <p>The paper proposes TaskNorm, a normalization method that is designed explicitly for meta-learning.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/2003.03284">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="setup">Setup</h2>
+
+<ul>
+  <li>
+    <p>Standard meta-learning setup with $k$ tasks, each task with its own context and target set.</p>
+  </li>
+  <li>
+    <p>Two sets of parameters are considered during meta-learning - (i) global parameters, and (ii) task-specific parameters.</p>
+  </li>
+  <li>
+    <p>Meta-learning setup can be viewed as an inference task, where the task-specific parameters are inferred using a context set and some additional (trainable) parameters.</p>
+  </li>
+  <li>
+    <p>Normalization layers are commonly used to accelerate the training of neural networks. The general approach is to use normalization moments (statistics) along with some learned parameters.</p>
+  </li>
+  <li>
+    <p>BatchNorm is a well-known and widely used normalization approach. It relies on the implicit assumption that the dataset comprises of iid samples from some underlying distribution.</p>
+  </li>
+  <li>
+    <p>However, in meta-learning, data points are assumed to be iid only within a specific task.</p>
+  </li>
+  <li>
+    <p>This leaves open the question of what moments to use during meta-train and meta-test time.</p>
+  </li>
+</ul>
+
+<h2 id="variants-of-batchnorm">Variants of BatchNorm</h2>
+
+<h3 id="conventional-batchnorm-cbn">Conventional BatchNorm (CBN)</h3>
+
+<ul>
+  <li>
+    <p>Compute moments at meta train time and use during meta test time.</p>
+  </li>
+  <li>
+    <p>This is equivalent to lumping the moments with the global parameters. I.e., the running moments are shared globally, while the data is iid only locally.</p>
+  </li>
+  <li>
+    <p>Using CBN with MAML leads to poor results.</p>
+  </li>
+  <li>
+    <p>Moreover, meta-learning setup can some times require the use of a very small batch size. (e.g., 1-shot learning) In those cases, the computed statistics are likely to be inaccurate.</p>
+  </li>
+</ul>
+
+<h3 id="transductive-batchnorm-tbn">Transductive BatchNorm (TBN)</h3>
+
+<ul>
+  <li>
+    <p>Use context/target set statistics at both meta-train and meta-test time.</p>
+  </li>
+  <li>
+    <p>This is the default BatchNorm mode used in MAML.</p>
+  </li>
+</ul>
+
+<h3 id="instance-based-normalization">Instance-based normalization</h3>
+
+<ul>
+  <li>
+    <p>Moments are computed separately for each instance.</p>
+  </li>
+  <li>
+    <p>This mode corresponds to treating the statistics as local at the observation level.</p>
+  </li>
+  <li>
+    <p>These methods provide only limited improvement in performance, and can sometimes have a large overhead.</p>
+  </li>
+</ul>
+
+<h2 id="task-normalization-proposed">Task Normalization (Proposed)</h2>
+
+<ul>
+  <li>
+    <p>The normalization statistics are local at the task level, and statistics for a given data point should only depend on the context set’s data point. It should not depend on the other elements of the target set.</p>
+  </li>
+  <li>
+    <p>Meta-Batch Normalisation (METABN) is a precursor to TaskNorm where the context set alone is used to compute the normalization statistics for both the context and the target set (during both meta-test and meta-train time).</p>
+  </li>
+  <li>
+    <p>METABN does not perform well when used with small context sets.</p>
+  </li>
+  <li>
+    <p>TaskNorm overcomes this limitation by using a set of non-transductive, secondary moments (computed from the input being normalized).</p>
+  </li>
+  <li>
+    <p>When the context is small, using additional moments will help to improve the moment estimates.</p>
+  </li>
+  <li>
+    <p>In the general case, a trainable blending factor, $\alpha$, is used to combine the two sets of moments.</p>
+  </li>
+  <li>
+    <p>While the computational cost of TaskNorm is slightly more than CBN, it converges faster than CBN in practice.</p>
+  </li>
+  <li>
+    <p>Normalization mechanism in Reptile can be interpreted as a particular case of TaskNorm.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>Small scale few-shot classification experiments</p>
+
+    <ul>
+      <li>
+        <p>Omniglot and imin ImageNet dataset</p>
+      </li>
+      <li>
+        <p>First order MAML, with different kinds of normalization schemes.</p>
+      </li>
+      <li>
+        <p>Transductive BatchNorm performs the best.</p>
+      </li>
+      <li>
+        <p>Among non-transductive approaches, TaskNorm using Instance Normalisation augmentation performs the best.</p>
+      </li>
+      <li>
+        <p>Similar trend holds for the speed of convergence as well.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Large scale few-shot classification experiments</p>
+
+    <ul>
+      <li>
+        <p>MetaDataset dataset</p>
+      </li>
+      <li>
+        <p>CNAPs model</p>
+      </li>
+      <li>
+        <p>The context set’s size varies across tasks in this setup and can be as small as 5.</p>
+      </li>
+      <li>
+        <p>TaskNorm with Instance Normalisation ranks first in 10 (out of 13) datasets and is also the fastest to train.</p>
+      </li>
+      <li>
+        <p>While Instance-based methods (Instance Normalisation and Layer Normalisation) are the slowest to converge, they still outperform the running average based methods (conventional BatchNorm).</p>
+      </li>
+      <li>
+        <p>The results demonstrate that designing meta-learning specific normalization methods can significantly improve performance and that Transductive BatchNorm may not always be the optimal choice.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
diff --git a/_site/site/2020/07/30/GradNorm-Gradient-Normalization-for-Adaptive-Loss-Balancing-in-Deep-Multitask-Networks.html b/_site/site/2020/07/30/GradNorm-Gradient-Normalization-for-Adaptive-Loss-Balancing-in-Deep-Multitask-Networks.html
new file mode 100644
index 00000000..e6ad85c2
--- /dev/null
+++ b/_site/site/2020/07/30/GradNorm-Gradient-Normalization-for-Adaptive-Loss-Balancing-in-Deep-Multitask-Networks.html
@@ -0,0 +1,99 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper proposes GradNorm, a gradient normalization algorithm that improves multi-task training by dynamically tuning the magnitude of gradients corresponding to different tasks.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1711.02257">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="motivation">Motivation</h2>
+
+<ul>
+  <li>
+    <p>During multi-task training, some tasks can dominate the training, at the expense of others.</p>
+  </li>
+  <li>
+    <p>It is common to define the multi-task loss as a linearly weighted combination of the individual task losses.</p>
+  </li>
+  <li>
+    <p>The paper proposes two changes to this setup:</p>
+
+    <ul>
+      <li>
+        <p>Adapt weight-coefficients, assigned to each loss term, at each training step.</p>
+      </li>
+      <li>
+        <p>Directly modify the gradient magnitudes, corresponding to different tasks, so that all the tasks are learning at similar rates.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Proposed GradNorm algorithm is similar to BatchNorm, but it performs normalization across tasks, not data batches.</p>
+  </li>
+</ul>
+
+<h2 id="algorithm">Algorithm</h2>
+
+<ul>
+  <li>
+    <p>Gradient norm at timestep $t$, for the $i^{th}$ task, is computed as the product between average gradient norm (across all tasks at timestep $t$) and $r_i(t) ^ {\alpha}$.</p>
+  </li>
+  <li>
+    <p>$r_i$ is the relative inverse training rate of task $i$. It is defined as the ratio between the loss ratio of task $i$ and the average loss ratio (across all the tasks).</p>
+  </li>
+  <li>
+    <p>$\alpha$ is a hyperparameter.</p>
+  </li>
+  <li>
+    <p>This computed per-task gradient norm is treated as the target value for actual gradient norms.</p>
+  </li>
+  <li>
+    <p>An additional $L_1$ loss is incorporated between the actual and the target gradient norms, summed over all the tasks, and optimizes the weight-coefficients only.</p>
+  </li>
+  <li>
+    <p>After every step, the weight-coefficients are renormalized to decouple the gradient normalization from the global learning rate.</p>
+  </li>
+  <li>
+    <p>Note that all the gradient norm computations are performed only for the layers on which GradNorm is applied. Generally, GradNorm is used with only the last shared layer of weights (to save on computational costs).</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>Two variants of NYUv2 dataset – NYUv2+seg (small dataset) and NYUv2+kpts (big dataset).</p>
+  </li>
+  <li>
+    <p>Both regression and classification setups were used.</p>
+  </li>
+  <li>
+    <p>Models:</p>
+
+    <ul>
+      <li>
+        <p>SegNet with a symmetric VGG16 encoder/decoder</p>
+      </li>
+      <li>
+        <p>FCN with modified ResNet-50 as the encoder and shallow ResNet as the decoder.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Standard pixel-wise losses for each task.</p>
+  </li>
+</ul>
+
+<h3 id="results">Results</h3>
+
+<ul>
+  <li>
+    <p>GradNorm with $\alpha=1.5$ outperforms the equal-weight baseline and either surpasses or matches the best performance of single networks for each task.</p>
+  </li>
+  <li>
+    <p>Almost any value of 0 &lt; $\alpha$ &lt; 3 improves the network’s performance over an equal weight baseline.</p>
+  </li>
+</ul>
diff --git a/_site/site/2020/08/06/Gradient-Surgery-for-Multi-Task-Learning.html b/_site/site/2020/08/06/Gradient-Surgery-for-Multi-Task-Learning.html
new file mode 100644
index 00000000..cb090973
--- /dev/null
+++ b/_site/site/2020/08/06/Gradient-Surgery-for-Multi-Task-Learning.html
@@ -0,0 +1,113 @@
+<ul>
+  <li>
+    <p>The paper hypothesizes that main optimization challenges in multi-task learning arise because of negative interference between different tasks’ gradients.</p>
+  </li>
+  <li>
+    <p>It hypothesizes that negative interference happens when:</p>
+
+    <ul>
+      <li>
+        <p>The gradients are conflicting (i.e., have a negative cosine similarity).</p>
+      </li>
+      <li>
+        <p>The gradients coincide with high positive curvature.</p>
+      </li>
+      <li>
+        <p>The difference in gradient magnitude is quite large.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>The paper proses to work around this problem by performing “gradient surgery.”</p>
+  </li>
+  <li>
+    <p>If two gradients are conflicting, modify the gradients by projecting each onto the other’s normal plane.</p>
+  </li>
+  <li>
+    <p>This modification is equivalent to removing the conflicting component of the gradient.</p>
+  </li>
+  <li>
+    <p>This approach is referred to as <em>projecting conflicting gradients</em> (PCGrad).</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/2001.06782">Link to the paper</a></p>
+  </li>
+  <li>
+    <p>Theoretical Analysis</p>
+
+    <ul>
+      <li>
+        <p>The paper proves the local conditions under which PCGrad improves multi-task gradient descent in the two-task setup.</p>
+      </li>
+      <li>
+        <p>The conditions are:</p>
+
+        <ul>
+          <li>
+            <p>Angle between the task gradients is not too small.</p>
+          </li>
+          <li>
+            <p>Difference in the magnitude of the gradients is sufficiently large.</p>
+          </li>
+          <li>
+            <p>Curvature of the multi-task gradient is large.</p>
+          </li>
+          <li>
+            <p>Large enough learning rate.</p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Experimental Setup</p>
+
+    <ul>
+      <li>
+        <p>Multi-task supervised learning</p>
+
+        <ul>
+          <li>
+            <p>MutliMNIST, Multi-task CIFAR100, NYUv2.</p>
+          </li>
+          <li>
+            <p>For Multi-task CIFAR-100, PCGrad is used with the shared parameters of the routing networks.</p>
+          </li>
+          <li>
+            <p>For NYUv2, PCGrad is combined with MTAN.</p>
+          </li>
+          <li>
+            <p>In all the cases, using PCGrad improves the performance.</p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>Multi-task Reinforcement Learning</p>
+
+        <ul>
+          <li>
+            <p>Meta-World Benchmark</p>
+          </li>
+          <li>
+            <p>PCGrad + SAC outperforms all other baselines.</p>
+          </li>
+          <li>
+            <p>In the context of SAC, the paper suggests learning temperature $\alpha$ on a per-task basis.</p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>Goal-conditioned Reinforcement Learning</p>
+
+        <ul>
+          <li>
+            <p>Goal-conditioned robotic pushing task with a Sawyer robot.</p>
+          </li>
+          <li>
+            <p>PCGrad + SAC outperforms vanilla SAC.</p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+</ul>
diff --git a/_site/site/2020/08/14/Outrageously-Large-Neural-Networks-The-Sparsely-Gated-Mixture-of-Experts-Layer.html b/_site/site/2020/08/14/Outrageously-Large-Neural-Networks-The-Sparsely-Gated-Mixture-of-Experts-Layer.html
new file mode 100644
index 00000000..447137c2
--- /dev/null
+++ b/_site/site/2020/08/14/Outrageously-Large-Neural-Networks-The-Sparsely-Gated-Mixture-of-Experts-Layer.html
@@ -0,0 +1,143 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Conditional computation is a technique to increase a model’s capacity (without a proportional increase in computation) by activating parts of the network on a per example basis.</p>
+  </li>
+  <li>
+    <p>The paper describes (and address) the computational and algorithmic challenges in conditional computation. It introduces a sparsely-gated Mixture-of-Experts layer (MoE) with 1000s of feed-forward sub-networks.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1701.06538">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="practical-challenges">Practical Challenges</h2>
+
+<ul>
+  <li>
+    <p>GPUs are fast at matrix arithmetic but slow at branching.</p>
+  </li>
+  <li>
+    <p>Large batch sizes amortizes the cost of updates. Conditional computation reduces the effective batch size for different components of the model.</p>
+  </li>
+  <li>
+    <p>Network bandwidth can be a bottleneck with the network demand overshadowing the computational demand.</p>
+  </li>
+  <li>
+    <p>Additional losses may be needed to achieve the desired level of sparsity.</p>
+  </li>
+  <li>
+    <p>Conditional computation is most useful for large datasets.</p>
+  </li>
+</ul>
+
+<h2 id="architecture">Architecture</h2>
+
+<ul>
+  <li>
+    <p><em>n</em> Expert Networks - $E_1$, …, $E_n$.</p>
+  </li>
+  <li>
+    <p>Gating Network $G$ to select a sparse combination of experts.</p>
+  </li>
+  <li>
+    <p>Output of the MoE module is the weighted sum of predictions of experts (weighted by the output of the gate).</p>
+  </li>
+  <li>
+    <p>If the gating network’s output is sparse, then some of the experts’ value does not have to be computed.</p>
+  </li>
+  <li>
+    <p>In theory, one could use a hierarchical mixture of experts where a mixture of experts is trained at each level.</p>
+  </li>
+</ul>
+
+<h3 id="choices-for-the-gating-network">Choices for the Gating Network</h3>
+
+<ul>
+  <li>
+    <p>Softmax Gating</p>
+  </li>
+  <li>
+    <p>Noisy top-k gating - Add tunable Gaussian noise to the output of softmax gating and retain only the top-k values. A second trainable weight matrix controls the amount of noise per component.</p>
+  </li>
+</ul>
+
+<h2 id="addressing-performance-challenge">Addressing Performance Challenge</h2>
+
+<ul>
+  <li>
+    <p>Shrinking Batch Problem</p>
+
+    <ul>
+      <li>
+        <p>If the MoE selects <em>k</em> out of <em>n</em> experts, the effective batch size reduces by a factor of <em>k</em> / <em>n</em>.</p>
+      </li>
+      <li>
+        <p>This reduction in batch size is accounted for by combining data parallelism (for standard layers and gasting networks) and model parallelism (for experts in MoE). Thus, with <em>d</em> devices, the batch size changes by a factor of (<em>k</em> x <em>d</em> ) / <em>n</em>.</p>
+      </li>
+      <li>
+        <p>For hierarchical MoE, the primary gating network uses data parallelism while secondary MoEs use model parallelism.</p>
+      </li>
+      <li>
+        <p>The paper considers LSTM models where the MoE is applied once the previous layer has finished. This increases the batch size (for the current MoE layer) by a factor equal to the number of unrolling timesteps.</p>
+      </li>
+      <li>
+        <p>Network Bandwith limitations can be overcome by ensuring that the ratio of computation (of each expert) to the input and output size is greater than (or equal to) the ratio of computational to network capacity.</p>
+      </li>
+      <li>
+        <p>Computational efficiency can be improved by using larger hidden layers (or more hidden layers).</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Balancing Expert Utilization</p>
+
+    <ul>
+      <li>
+        <p>Importance of an expert (relative to a batch of training examples) is defined as the batchwise sum of the expert’s goal values.</p>
+      </li>
+      <li>
+        <p>An additional loss, called importance loss, is added to encourage the experts to have equal importance.</p>
+      </li>
+      <li>
+        <p>The importance loss is defined as the square of the coefficient of variation (of a set of importance values) multiplied by a (hand-tuned) scaling factor $w_{importance}$.</p>
+      </li>
+      <li>
+        <p>In practice, an additional loss called $L_{load}$ might be needed to ensure that the different experts get equal load (along with equal importance).</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>Datasets</p>
+
+    <ul>
+      <li>
+        <p>Billon Word Language modeling Benchmark</p>
+      </li>
+      <li>
+        <p>100 Billion word Google News Corpus</p>
+      </li>
+      <li>
+        <p>Machine Translation datasets</p>
+
+        <ul>
+          <li>
+            <p>Single Language Pairs - WMT’14 En to Fr (36M sentence pairs) and En to De (5M sentence pairs).</p>
+          </li>
+          <li>
+            <p>Multilingual Machine Translation - large combine dataset of twelve language pairs.</p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>In all the setups, the proposed MoE models achieve significantly better results than the baseline models, at a lower computational cost.</p>
+  </li>
+</ul>
diff --git a/_site/site/2020/08/24/Alpha-Net-Adaptation-with-Composition-in-Classifier-Space.html b/_site/site/2020/08/24/Alpha-Net-Adaptation-with-Composition-in-Classifier-Space.html
new file mode 100644
index 00000000..b8e6ec7f
--- /dev/null
+++ b/_site/site/2020/08/24/Alpha-Net-Adaptation-with-Composition-in-Classifier-Space.html
@@ -0,0 +1,88 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Common transfer learning method focuses on transferring knowledge in the model feature space.</p>
+  </li>
+  <li>
+    <p>In contrast, the paper argues that the learned knowledge is more concisely captured in the “classifier space” as the classifier is fitted for all the samples for a given class, while the feature representation is specific to each sample.</p>
+  </li>
+  <li>
+    <p>Building on this intuition, the paper proposes to combine strong classifiers (trained on large datasets) with weak classifiers (trained on smaller datasets) to improve the weak classifiers’ performance.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/2008.07073">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="high-level-idea">High-Level Idea</h2>
+
+<ul>
+  <li>
+    <p>Given $n$ classifiers, $C_1, …, C_n$, trained with a large amount of data and a weak classifier $a$ trained for a class with few samples.</p>
+  </li>
+  <li>
+    <p>Find the nearest neighbors of $a$.</p>
+  </li>
+  <li>
+    <p>Train a new classifier by linearly combining $a$ with its nearest classifiers.</p>
+  </li>
+  <li>
+    <p>The coefficients (for linearly combining the classifiers) are learned using another classifier called as AlphaNet.</p>
+  </li>
+  <li>
+    <p>In theory, this approach can be used with any set of classifiers.</p>
+  </li>
+</ul>
+
+<h2 id="setup">Setup</h2>
+
+<ul>
+  <li>
+    <p>A long-tailed dataset is one where some classes (referred to as the tail classes) have very few examples—for example, ImageNet-LT and Places-LT.</p>
+  </li>
+  <li>
+    <p>Split the long-tailed dataset into two splits - “base” classes with $B$ (number of) classes and “few” classes with $F$ (number of) classes.</p>
+  </li>
+  <li>
+    <p>Total number of classes $N = B + F$.</p>
+  </li>
+  <li>
+    <p>Start with a pre-trained model, with classifiers $w_j$ and biases $b_j$ for $j \in (1, N)$.</p>
+  </li>
+  <li>
+    <p>For a given target class $j$, find its top $k$ nearest neighbor classifiers and concatenate their output.</p>
+  </li>
+  <li>
+    <p>For each “few” class, learn a feedforward network that takes the concatenated representation (of classifiers) as the input and returns a vector of $k \alpha$ values.</p>
+  </li>
+  <li>
+    <p>These $\alpha$ values are interpreted as the classifier’s strength (or confidence) in its nearest neighbors.</p>
+  </li>
+  <li>
+    <p>The (normalized) alpha values are used for defining the weight and bias for the classifier for the given “few” class.</p>
+  </li>
+  <li>
+    <p>The collection of all the “few” classifiers is referred to as the AlphaNet.</p>
+  </li>
+  <li>
+    <p>The paper outlines a degenerate case, where the confidence in the prediction of all the strong classifiers goes to 0. The paper proposes to counter this case by clamping the $\alpha$ values.</p>
+  </li>
+  <li>
+    <p>The entire setup is trained end-to-end using cross-entropy loss on AlphaNet.</p>
+  </li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<ul>
+  <li>
+    <p>Given the proposed approach’s flexibility, it is used to combine the state-of-the-art models on ImageNet-LT, namely retraining classifiers on class-balanced samples and training models with weight normalization. The combined setup outperforms the individual models.</p>
+  </li>
+  <li>
+    <p>One interesting observation is that it is useful to include the weak classifiers, along with the strong classifiers, as AlphaNet adjusts the position of weak classifiers towards the appropriate strong classifier.</p>
+  </li>
+  <li>
+    <p>While the idea is described in the context of long-tail data distribution, the idea is useful in the general context of non-stationary data distribution. One instantiation could be lifelong class incremental learning where the model encounters new data classes during training. For some time duration (till sufficient data points are seen), the newly seen classes are the “few” classes. This approach can help with faster adaptation when the model is yet to see sufficient examples for the unseen classes.</p>
+  </li>
+</ul>
diff --git a/_site/site/2020/08/31/Deep-Reinforcement-Learning-and-the-Deadly-Triad.html b/_site/site/2020/08/31/Deep-Reinforcement-Learning-and-the-Deadly-Triad.html
new file mode 100644
index 00000000..dde4f919
--- /dev/null
+++ b/_site/site/2020/08/31/Deep-Reinforcement-Learning-and-the-Deadly-Triad.html
@@ -0,0 +1,139 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper investigates the practical impact of the deadly triad (function approximation, bootstrapping, and off-policy learning) in deep Q-networks (trained with experience replay).</p>
+  </li>
+  <li>
+    <p>The deadly triad is called so because when all the three components are combined, TD learning can diverge, and value estimates can become unbounded.</p>
+  </li>
+  <li>
+    <p>However, in practice, the component of the deadly triad has been combined successfully. An example is training DQN agents to play Atari.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1812.02648">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="setup">Setup</h2>
+
+<ul>
+  <li>
+    <p>The effect of each component of the triad can be regulated with some design choices:</p>
+
+    <ul>
+      <li>
+        <p>Bootstrapping - by controlling the number of steps before bootstrapping.</p>
+      </li>
+      <li>
+        <p>Function approximation - by controlling the size of the neural network.</p>
+      </li>
+      <li>
+        <p>Off-policy learning - by controlling how data points are sampled from the replay buffer (i.e., using different prioritization approaches)</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>The problem is studied in two contexts: toy example and Atari 2600 games.</p>
+  </li>
+  <li>
+    <p>The paper makes several hypotheses about how different components may interact in the triad and evaluate these hypotheses by training DQN with different hyperparameters:</p>
+
+    <ul>
+      <li>
+        <p>Number of steps before bootstrapping - 1, 3, 10</p>
+      </li>
+      <li>
+        <p>Four levels of prioritization (for sampling data from the replay buffer)</p>
+      </li>
+      <li>
+        <p>Bootstrap target - Q-learning, target Q-learning, inverse double Q-learning, and double Q-learning</p>
+      </li>
+      <li>
+        <p>Network sizes-small, medium, large and extra-large.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Each experiment was run with three different seeds.</p>
+  </li>
+  <li>
+    <p>The paper formulates a series of hypotheses and designs experiments to support/reject the hypotheses.</p>
+  </li>
+</ul>
+
+<h2 id="hypothesis-1-combining-q-learning-with-conventional-deep-rl-function-spaces-does-not-commonly-lead-to-divergence">Hypothesis 1: Combining Q learning with conventional deep RL function spaces does not commonly lead to divergence</h2>
+
+<ul>
+  <li>
+    <p>Rewards are clipped between -1 and 1, and the discount factor is set to 0.99. Hence, the maximum absolute action value is bound to smaller than 100. This upper bound is used soft-divergence in the value estimates.</p>
+  </li>
+  <li>
+    <p>The paper reports that while soft-divergence does occur, the values do not become unbounded, thus supporting the hypothesis.</p>
+  </li>
+</ul>
+
+<h2 id="hypothesis-2-there-is-less-divergence-when-correcting-for-overestimation-bias-or-when-bootstrapping-on-separate-networks">Hypothesis 2: There is less divergence when correcting for overestimation bias or when bootstrapping on separate networks.</h2>
+
+<ul>
+  <li>
+    <p>One manifestation of bootstrapping on separate networks is target-Q learning. While using separate networks helps on Atari, it does not entirely solve the problem on the toy setup.</p>
+  </li>
+  <li>
+    <p>One manifestation of correcting for the overestimation bias is using double Q-learning.</p>
+  </li>
+  <li>
+    <p>In the standard form, double Q-learning benefits by bootstrapping on a separate network. To isolate the gains by using each component independently, an inverse double Q-learning update is used that does not use a separate target-network for bootstrapping.</p>
+  </li>
+  <li>
+    <p>Experimentally, Q-learning is the most unstable while target Q-learning and double Q-learning are the most stable. This observation supports the hypothesis.</p>
+  </li>
+</ul>
+
+<h2 id="hypothesis-3-longer-multi-step-returns-will-diverge-easily">Hypothesis 3: Longer multi-step returns will diverge easily</h2>
+
+<ul>
+  <li>
+    <p>This hypothesis is intuitive as the dependence on bootstrapping is reduced with multi-step returns.</p>
+  </li>
+  <li>
+    <p>Experimental results support this hypothesis.</p>
+  </li>
+</ul>
+
+<h2 id="hypothesis-4-larger-more-capacity-networks-will-diverge-less-easily">Hypothesis 4: Larger, more capacity networks will diverge less easily.</h2>
+
+<ul>
+  <li>
+    <p>This hypothesis is based on the assumption that more flexible value function approximations may behave more like the tabular case.</p>
+  </li>
+  <li>
+    <p>In practice, smaller networks show fewer instances of instability than the larger networks.</p>
+  </li>
+  <li>
+    <p>The hypothesis is not supported by the experiments.</p>
+  </li>
+</ul>
+
+<h2 id="hypothesis-5-stronger-prioritization-of-updates-will-diverge-more-easily">Hypothesis 5: Stronger prioritization of updates will diverge more easily.</h2>
+
+<ul>
+  <li>This hypothesis is supported by the experiments for all the four updates.</li>
+</ul>
+
+<h2 id="effect-of-the-deadly-triad-on-the-agents-performance">Effect of the deadly triad on the agent’s performance</h2>
+
+<ul>
+  <li>
+    <p>Generally, soft-divergence correlates with poor control performance.</p>
+  </li>
+  <li>
+    <p>For example, longer multi-step returns lead to fewer instances of instabilities and better performance.</p>
+  </li>
+  <li>
+    <p>The trend is more interesting in terms of network capacity. Large networks tend to diverge more but also perform the best.</p>
+  </li>
+  <li>
+    <p>While action-value estimates can grow to large values, they can recover to plausible values as training progresses.</p>
+  </li>
+</ul>
diff --git a/_site/site/2020/09/07/Revisiting-Fundamentals-of-Experience-Replay.html b/_site/site/2020/09/07/Revisiting-Fundamentals-of-Experience-Replay.html
new file mode 100644
index 00000000..93564e53
--- /dev/null
+++ b/_site/site/2020/09/07/Revisiting-Fundamentals-of-Experience-Replay.html
@@ -0,0 +1,130 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper presents an extensive study of the effects of experience replay in Q-learning based methods.</p>
+  </li>
+  <li>
+    <p>It focuses explicitly on the replay capacity and replay ratio (ratio of learning updates to experience collected).</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/2007.06700">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="setup">Setup</h2>
+
+<ul>
+  <li>
+    <p>Replay capacity is defined as the total number of transitions stored in the replay buffer.</p>
+  </li>
+  <li>
+    <p>Age of a transition (stored in the replay buffer) is defined as the number of gradient steps taken by the agent since the transition was stored.</p>
+  </li>
+  <li>
+    <p>More is the replay capacity, more will be the age of the oldest transition (also referred to as the age of the oldest policy).</p>
+  </li>
+  <li>
+    <p>More is the replay capacity, more will be the degree of “off-policyness” of the transitions in the buffer (with everything else held constant).</p>
+  </li>
+  <li>
+    <p>Replay ratio is the number of gradient updates per environment transition. This ratio can be used as a proxy for how often the agent uses old data (vs. collecting new data) and is related to off-policyness.</p>
+  </li>
+  <li>
+    <p>In <a href="https://www.nature.com/articles/nature14236">DQN paper</a>, the replay ratio is set to be 0.25.</p>
+  </li>
+  <li>
+    <p>For experiments, a subset  (of 14 games) is selected from Atari ALE (Arcade Learning Environment) with sticky actions.</p>
+  </li>
+  <li>
+    <p>Each experiment is repeated with three seeds.</p>
+  </li>
+  <li>
+    <p>Rainbow is used as the base algorithm.</p>
+  </li>
+  <li>
+    <p>Total number of gradient updates and batch size (per gradient update) are fixed for all the experiments.</p>
+  </li>
+  <li>
+    <p>Rainbow used replay capacity of 1M and oldest policy of age 250K.</p>
+  </li>
+  <li>
+    <p>In experiments, replay capacity varies from 0.1M to 10M ( 5 values), and the age of the oldest policy varies from 25K to 25M (4 values).</p>
+  </li>
+</ul>
+
+<h2 id="observations">Observations</h2>
+
+<ul>
+  <li>
+    <p>With the age of the oldest policy fixed, performance improves with higher replay capacity, probably due to increased state-action coverage.</p>
+  </li>
+  <li>
+    <p>With fixed replay capacity, reducing the oldest policy’s age improves performance, probably due to the reduced off-policyness of the data in the replay buffer.</p>
+  </li>
+  <li>
+    <p>However, in some specific instances (with sparse reward, hard exploration setup), performance can drop when reducing the oldest policy’s age.</p>
+  </li>
+  <li>
+    <p>Increasing replay capacity, while keeping the replay ratio fixed, provides varying improvements and depends on the particular values of replacy capacity and replay ratio.</p>
+  </li>
+  <li>
+    <p>The paper reports the effect of these choices for DQN as well.</p>
+  </li>
+  <li>
+    <p>Unlike Rainbow, DQN does not improve with larger replay capacity, irrespective of whether the replay ratio or age of the oldest policy is kept fixed.</p>
+  </li>
+  <li>
+    <p>Given that the Rainbow agent is a DQN agent with additional components, the paper explores which of these components leads to an improvement in Rainbow’s performance as replay capacity increases.</p>
+  </li>
+</ul>
+
+<h2 id="additive-experiments">Additive Experiments</h2>
+
+<ul>
+  <li>
+    <p>Four new DQN variants are created by adding each of Rainbow’s four components to the base DQN agent.</p>
+  </li>
+  <li>
+    <p>DQN with n-step returns is the only variant that benefits by increased replay capacity.</p>
+  </li>
+  <li>
+    <p>The usefulness of n-step returns is further validated by verifying that Rainbow agent without n-step returns does not benefit by increased replay capacity. While Rainbow agent without any other component benefits by the increased capacity.</p>
+  </li>
+  <li>
+    <p>Prioritized Experience Replay does not significantly affect the performance with increased replay capacity.</p>
+  </li>
+  <li>
+    <p>The observation that n-step returns are critical for taking advantage of larger replay sizes is surprising because the uncorrected n-step returns are theoretically not suitable for off-policy learning.</p>
+  </li>
+  <li>
+    <p>The paper tests the limits of increasing replay capacity (with n-step returns) by performing experiments in the offline-RL setup, the agent collects a dataset of about 200M frames. These frames are used to train another agent.</p>
+  </li>
+  <li>
+    <p>Even in this extreme setup, n-step returns improve the learning agent’s performance.</p>
+  </li>
+</ul>
+
+<h2 id="why-do-n-step-returns-help">Why do n-step returns help?</h2>
+
+<ul>
+  <li>
+    <p>Hypothesis 1: n-step returns help to counter the increased off-policyness produced by a larger replay buffer.</p>
+
+    <ul>
+      <li>This hypothesis does not seem to hold as keeping the oldest policy fixed or using the same contrastive factor as an n-step update does not improve the 1-step update’s performance.</li>
+    </ul>
+  </li>
+  <li>
+    <p>Hypothesis 2: Increasing the replay buffer’s capacity may reduce the variance of the n-step returns.</p>
+
+    <ul>
+      <li>
+        <p>This hypothesis is evaluated by training on environments with lesser variance or by turning off the sticky actions in the atari domain.</p>
+      </li>
+      <li>
+        <p>While the hypothesis does explain the gains by using n-step returns to some extent, n-step gains are observed even in environments with low variance.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
diff --git a/_site/site/2020/09/14/MONet-Unsupervised-Scene-Decomposition-and-Representation.html b/_site/site/2020/09/14/MONet-Unsupervised-Scene-Decomposition-and-Representation.html
new file mode 100644
index 00000000..311ec262
--- /dev/null
+++ b/_site/site/2020/09/14/MONet-Unsupervised-Scene-Decomposition-and-Representation.html
@@ -0,0 +1,91 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper introduces Multi-Object Network (MONet) architecture that learns a modular representation of images by spatially decomposing scenes into <em>objects</em> and learning a representation for these <em>objects</em>.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1901.11390">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="architecture">Architecture</h2>
+
+<ul>
+  <li>
+    <p>Two components:</p>
+
+    <ul>
+      <li>
+        <p>Attention Module: generates spatial masks corresponding to the <em>objects</em> in the scene.</p>
+      </li>
+      <li>
+        <p>VAE: learn representation for each <em>object</em>.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>VAE components:</p>
+
+    <ul>
+      <li>
+        <p>Encoder: It takes as input the image and the attention mask generated by the attention module and produce the parameters for distribution over latent variable <em>z</em>.</p>
+      </li>
+      <li>
+        <p>Decoder: It takes as input the latent variable <em>z</em> and attempts to reproduce the image.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>The decoder loss term is weighted by mask, i.e., the decoder tries to reproduce only those parts of the image that the attention mask focuses on.</p>
+  </li>
+  <li>
+    <p>The attention mechanism is auto-regressive with an ongoing state (called a scope) that tracks which parts of the image are not yet attended over.</p>
+  </li>
+  <li>
+    <p>In the last step, no attention mask is computed, and the previous scope is used as-is. This ensures that all the masks sum to 1.</p>
+  </li>
+  <li>
+    <p>The VAE also models the attention mask over the components, i.e., the probability that the pixels belong to a particular component.</p>
+  </li>
+</ul>
+
+<h2 id="motivation">Motivation</h2>
+
+<ul>
+  <li>
+    <p>A model could efficiently process compositional visual scenes if it can exploit some recurring structures in the scene.</p>
+  </li>
+  <li>
+    <p>The paper validates this hypothesis by showing that an autoencoder performs better if it can build up the scenes compositionally, processing one mask at a time (these masks are ground-truth spatial masks) rather than processing the scene at once.</p>
+  </li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<ul>
+  <li>
+    <p>VAE encoder parameterizes a diagonal Gaussian latent posterior with a spatial broadcast decoder that encourages the VAE to learn disentangled features.</p>
+  </li>
+  <li>
+    <p>MONet with seven slots is trained on <em>Objects Room</em> dataset with 1-3 objects.</p>
+
+    <ul>
+      <li>
+        <p>It learns to generate different attention mask for different objects.</p>
+      </li>
+      <li>
+        <p>Combining the reconstructed components using the corresponding attention masks produces good quality reconstruction for the entire scene.</p>
+      </li>
+      <li>
+        <p>Since it is an autoregressive model, MONet can be evaluated for more slots. The model generalizes to novel scene configurations (not seen during training).</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>On the Multi-dSprites dataset (modification of the dSprites dataset), the model (post-training) distinguishes individual sprites and background.</p>
+  </li>
+  <li>
+    <p>On the CLEVER data (2-10 objects per image), the model generates good image segmentation and reconstructions and can distinguish between overlapping shapes.</p>
+  </li>
+</ul>
diff --git a/_site/site/2020/09/21/Harvest,-Yield,-and-Scalable-Tolerant-Systems.html b/_site/site/2020/09/21/Harvest,-Yield,-and-Scalable-Tolerant-Systems.html
new file mode 100644
index 00000000..9d5131dc
--- /dev/null
+++ b/_site/site/2020/09/21/Harvest,-Yield,-and-Scalable-Tolerant-Systems.html
@@ -0,0 +1,82 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>A classic paper that looks into strategies for scaling large systems that can tolerate graceful degradation.</p>
+  </li>
+  <li>
+    <p><a href="https://dl.acm.org/doi/10.5555/822076.822436">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="cap-theorem">CAP Theorem</h2>
+
+<ul>
+  <li>
+    <p>CAP refers to strong <strong>C</strong>onsistency, high <strong>A</strong>vailability, and <strong>P</strong>artitionability.</p>
+  </li>
+  <li>
+    <p>Strong consistency refers to single copy ACID consistency.</p>
+  </li>
+  <li>
+    <p>High availability means any consumer can access the data anytime. Generally, this is achieved by adding one or more data replicas.</p>
+  </li>
+  <li>
+    <p>Partitionability means that the system can survive a partition between the different replicas.</p>
+  </li>
+  <li>
+    <p>Strong CAP theorem states that any system can have only two out of three properties.</p>
+  </li>
+  <li>
+    <p>Weak CAP theorem says that stronger are the guarantees about any two properties, weaker are the third property’s guarantees.</p>
+  </li>
+</ul>
+
+<h2 id="harvest-yield-and-cap-theorem">Harvest, Yield, and CAP Theorem</h2>
+
+<ul>
+  <li>
+    <p>Assume that the clients are making a request to a server.</p>
+  </li>
+  <li>
+    <p>There are two quantities of interest here:</p>
+
+    <ul>
+      <li>Yield - the probability of completing a request.</li>
+      <li>Harvest - completeness of answer to a query.</li>
+    </ul>
+  </li>
+  <li>
+    <p>In the presence of faults, a tradeoff can is made between yield and harvest. This tradeoff applies to both read and update queries.</p>
+  </li>
+</ul>
+
+<h2 id="two-strategies-for-scaling-systems">Two strategies for scaling systems</h2>
+
+<h3 id="trading-harvest-for-yield">Trading Harvest for Yield</h3>
+
+<ul>
+  <li>
+    <p>In a hundred node cluster (without replication), a single-node failure reduces harvest by 1 %, and in the case of multi-node failure, the harvest degrades linearly.</p>
+  </li>
+  <li>
+    <p>The probability of losing high-priority data can be reduced by replicating it. However, replicating all the data would not n guarantee 100% harvest and yield despite significant costs.</p>
+  </li>
+</ul>
+
+<h3 id="application-decomposition-and-orthogonal-mechanisms">Application Decomposition and Orthogonal Mechanisms</h3>
+
+<ul>
+  <li>
+    <p>Decompose a large application into subcomponents so that each component can be provisioned separately. Strong consistency can only be applied only on the components that need it, instead of the application as a whole.</p>
+  </li>
+  <li>
+    <p>Further, failure of one or more components need not cause the application to fail as a whole.</p>
+  </li>
+  <li>
+    <p>Decomposition also provides the opportunity to use orthogonal mechanisms, i.e., mechanisms independent of other mechanisms with no runtime interface.</p>
+  </li>
+  <li>
+    <p>Composition of orthogonal subsystems improves the robustness of runtime interactions by <em>locally</em> containing the errors. For example, the orthogonal components can be restarted /replaced independently without affecting other running components.</p>
+  </li>
+</ul>
diff --git a/_site/site/2020/09/28/A-Foliated-View-of-Transfer-Learning.html b/_site/site/2020/09/28/A-Foliated-View-of-Transfer-Learning.html
new file mode 100644
index 00000000..04468877
--- /dev/null
+++ b/_site/site/2020/09/28/A-Foliated-View-of-Transfer-Learning.html
@@ -0,0 +1,72 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper presents a formalism for transfer learning, offers a definition of relatedness between tasks, and proposes foliations as a mathematical framework to represent the relationship between tasks.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/2008.00546">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="summary">Summary</h2>
+
+<ul>
+  <li>
+    <p>The term <em>representation</em> denotes a mechanism for <em>describing</em> and <em>realizing</em> abstract objects, thus allowing manipulation and reasoning about the objects. This description goes beyond the usual meaning (in deep learning), where <em>representation</em> denotes some useful information about data.</p>
+  </li>
+  <li>
+    <p><em>Relatedness</em> describes <em>what</em> changes between tasks. Consider a set of transformations (or functions) that convert one task to another. A <em>relationship</em> between two tasks is an element of this transformation set.</p>
+  </li>
+  <li>
+    <p>Given a transformation set, one can define a <em>set of related tasks</em>, which is the set of all the tasks that can be transformed into each other using the functions from the given transformation set. This set of tasks is an equivalence class, and the transformation set is the equivalence relationship.</p>
+  </li>
+  <li>
+    <p>Given two related tasks <em>t1</em> and <em>t2</em>, denote the corresponding models (trained on those tasks) as <em>m1</em> and <em>m2</em>. One can assume that <em>m1</em> and <em>m2</em> are related in the same way as <em>t1</em> and <em>t2</em> (equivariance).</p>
+  </li>
+  <li>
+    <p>Now, given a set of transformations, one can partition the space of continuous functions into non-overlapping spaces, which describe a set of related tasks. These spaces are referred to as the <em>parallel spaces</em> or <em>transfer spaces</em>.</p>
+  </li>
+  <li>
+    <p>The parallel space represents a lower dimension than the original space. So knowing which parallel space a model lies on can make it easier to find it. This is the primary motivation behind transfer learning - knowing the relationship between tasks can make it easier to find a solution to new tasks.</p>
+  </li>
+  <li>
+    <p>Another way of partitioning the set of transformations is to use tessellation (e.g., Voronoi diagrams). Tasks in the same partition are similar to each other as compared to a task from another partition.</p>
+  </li>
+  <li>
+    <p>Two tasks are defined as <em>similar</em> if the distance between them (under some distance metric) is small.</p>
+  </li>
+  <li>
+    <p>Similarity is a <em>geometric</em> notion, while relatedness is a <em>transformative</em> notion. Parallelized space is to relatedness what tessellation is to similarity.</p>
+  </li>
+  <li>
+    <p>The distinction between similarity and relatedness is quite nuanced, and the authors provide several examples to differentiate between them.</p>
+  </li>
+  <li>
+    <p>Similarity can only be measured in terms of a reference element (similar to what). For example, when one finetunes a pre-trained model on a new task, one assumes that the model’s pretraining task is similar to the current task.</p>
+  </li>
+  <li>
+    <p>Given a set (say <em>T</em>), a <em>quantity</em> (a function that maps elemenets of <em>T</em> to a <em>k</em> dimensional vector) is said to be <em>invariant</em> with respect to a transformation <em>p</em> (defined on <em>T</em>) if <em>q(f) = q(p(f))</em> ie the value of <em>f</em> (belonging to <em>T</em>) does not change if <em>f</em> is transformed by <em>p</em>.</p>
+  </li>
+  <li>
+    <p>If one assumes that the set of transformations is a group, specifically a Lie group whose action on the set of tasks is locally free and regular, then one can define a parallel partitioning of the space of tasks and the space of models.</p>
+  </li>
+  <li>
+    <p>One can develop a hierarchial categorization scheme for the set of all considered tasks using the invariant quantities.</p>
+  </li>
+  <li>
+    <p>One can consider the space of tasks and models to be smooth manifolds as manifolds naturally give a notion of representation and transformations between them.</p>
+  </li>
+  <li>
+    <p>A manifold is a topological space that can be locally mapped to a Euclidean space using coordinate charts. One can define regular foliation by choosing charts that satisfy certain conditions. In that case, the manifold has immersed, connected, non-intersecting submanifolds called leaves.</p>
+  </li>
+  <li>
+    <p>The charts (that satisfies those conditions) give a set of rectified coordinates, where the notions of “which leaf a point is on” and “where on the leaf it is” are clearly separated.</p>
+  </li>
+  <li>
+    <p>Thus, foliation can provide the theoretical tools to work with parallel spaces.</p>
+  </li>
+  <li>
+    <p>How can the foliations be incorporated into theory and solutions for transfer learning is left aa future work.</p>
+  </li>
+</ul>
diff --git a/_site/site/2020/10/12/Remembering-for-the-Right-Reasons-Explanations-Reduce-Catastrophic-Forgetting.html b/_site/site/2020/10/12/Remembering-for-the-Right-Reasons-Explanations-Reduce-Catastrophic-Forgetting.html
new file mode 100644
index 00000000..77e4a073
--- /dev/null
+++ b/_site/site/2020/10/12/Remembering-for-the-Right-Reasons-Explanations-Reduce-Catastrophic-Forgetting.html
@@ -0,0 +1,63 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>The paper hypothesizes that catastrophic forgetting can happen if the model can not rely on “reasoning” used for an old datapoint. If that is the case, catastrophic forgetting may be alleviated when the model “remembers” why it made a prediction previously.</li>
+  <li>The paper presents a simple instantiation of this hypothesis, in the form of a technique called Remembering for the Right Reasons (RRR).</li>
+  <li>The idea is to store model explanations, along with previous examples in the replay buffer. During replay, an additional <em>explanation loss</em> is used, along with the regular replay loss.</li>
+  <li><a href="https://arxiv.org/abs/2010.01528">Link to the paper</a></li>
+  <li><a href="https://github.com/SaynaEbrahimi/Remembering-for-the-Right-Reasons">Link to the code</a></li>
+</ul>
+
+<h2 id="setup">Setup</h2>
+
+<ul>
+  <li>The model is trained over a sequence of data distributions in the class-incremental learning setup. A single-head architecture is used so that the task ID is not required during inference.</li>
+  <li>Along with the standard replay buffer (\(M^{rep}\)) for the raw input examples (from different tasks), another replay buffer (\(M^{RRR}\)) is maintained for storing the “explanations” (in the form of saliency maps), corresponding to examples in \(M^{rep}\).</li>
+  <li>RRR is implemented as an L1 loss on the error between the saliency map generated after training on the current task and the saliency map in \(M^{RRR}\).</li>
+  <li>Saliency maps need to be generated while the model is training. This requirement rules out black-box saliency methods, which can be used only after training.</li>
+  <li>The gradient-based white-box explainability techniques that are used include:
+    <ul>
+      <li>Vanilla backpropagation - Perform a forward pass through the model and take the gradient of the given output class with respect to the input.</li>
+      <li>Backpropagation with SmoothGrad - Saliency maps generated using Vanilla backpropagation can be visually noisy. These maps can be improved by adding pixel-wise Gaussian noise to <em>n</em> copies of the image and averaging the resulting gradients. The paper used <em>n=40</em>.</li>
+      <li>Gradient-weighted Class Activation Mapping (Grad-CAM) - Uses gradients to determine the importance of feature map activations on a given prediction.</li>
+    </ul>
+  </li>
+  <li>RRR can be easily used with memory and regularization based approaches.</li>
+  <li>The paper combined RRR with the following standard Class Incremental Learning (CIL) models:
+    <ul>
+      <li><a href="https://arxiv.org/abs/2003.11652">iTAML : An incremental task-agnostic meta-learning approach</a></li>
+      <li><a href="https://arxiv.org/abs/1807.09536">End-to-end incremental learning (EEIL)</a></li>
+      <li><a href="https://arxiv.org/abs/1905.13260">Large scale incremental learning (BiC)</a></li>
+      <li><a href="https://arxiv.org/abs/2004.10956">TOpology-Preserving knowledge InCrementer (TOPIC)</a></li>
+      <li><a href="https://arxiv.org/abs/1611.07725">iCaRL: Incremental Classifier and Representation Learning</a></li>
+      <li><a href="https://arxiv.org/abs/1612.00796">Elastic Weight Consolidation</a></li>
+      <li><a href="https://arxiv.org/abs/1606.09282">Learning without forgetting</a></li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<h3 id="few-shiot-class-incremental-learning">Few-Shiot Class Incremental Learning</h3>
+
+<ul>
+  <li>C-way K-shot class incremental learning with C classes and K training samples per class and b base classes to learn as the first task.</li>
+  <li>Caltech-UCSD Birds dataset with 100 base classes and remaining 100 classes divided into ten tasks, with three samples per class. The test set is not changed.</li>
+  <li>In teems of saliency maps., Grad-CAM is better than Vanilla Backpropagation, which in turn is comparable to SmoothGrad. The same trend is seen in terms of memory overhead, with Grad-CAM having the least memory overhead.</li>
+  <li>Adding the RRR loss improves the performance of all the baselines.</li>
+</ul>
+
+<h3 id="standard-class-incremental-learning">Standard Class Incremental Learning</h3>
+
+<ul>
+  <li>CIFAR100 and ImageNet100 with a memory budget of 2000 samples.</li>
+  <li>Adding the RRR loss improves all the baselines’ performance, and the gains for ImageNet100 are more significant than the gains for CIFAR100.</li>
+</ul>
+
+<h3 id="how-often-does-the-model-remember-its-decision-for-the-right-reason">How often does the model remember its decision for the right reason?</h3>
+
+<ul>
+  <li>The paper uses the Pointing Game (PG) experiment, which uses the ground truth image segmentation to define the true object region.</li>
+  <li>If the maximum attention location (in the predicted saliency map) falls inside the objects, it is considered a <em>hit</em>, else a <em>miss</em>. A <em>hit</em> on a previous example is considered a proxy for the model remembering its decision for the right reason.</li>
+  <li>The precision and recall are reported for the <em>hit</em> metric. Using RRR increases both precision (i.e., less often the model makes the correct decision without looking at the right evidence) and recall (i.e., less frequently does the model makes an incorrect decision, despite looking at the proper evidence).</li>
+</ul>
diff --git a/_site/site/2020/10/19/Learning-Explanations-That-Are-Hard-To-Vary.html b/_site/site/2020/10/19/Learning-Explanations-That-Are-Hard-To-Vary.html
new file mode 100644
index 00000000..b723cde7
--- /dev/null
+++ b/_site/site/2020/10/19/Learning-Explanations-That-Are-Hard-To-Vary.html
@@ -0,0 +1,86 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>The paper builds on the principle “good explanations are hard to vary” to propose that <em>invariant mechanisms</em> can be identified by finding explanations (say model parameters) that are hard to vary across examples.</li>
+  <li><a href="https://arxiv.org/abs/2009.00329">Link to the paper</a></li>
+  <li><a href="https://github.com/gibipara92/learning-explanations-hard-to-vary">Link to the code</a></li>
+</ul>
+
+<h2 id="setup">Setup</h2>
+
+<ul>
+  <li>Collection of <em>d</em> different datasets (from different environments). Each dataset is a collection of input-target tuples.</li>
+  <li>Objective is to learn a function <em>f</em> (also called <em>mechanism</em>) to map the input to the target (for all the environments).</li>
+  <li>The standard approach is to pool the loss for examples corresponding to the different environments and perform gradient updates on this average-pooled loss.</li>
+  <li>In this standard gradient-based setup, the model may not learn invariances due to the following reasons:
+    <ul>
+      <li>Model learned the spurious features first, and now the training loss is too small.</li>
+      <li>The pooled loss is generally computed by summing (or averaging) the loss corresponding to individual examples. Thus the gradient for each example is calculated independently. Each sample can be thought of as a dataset of size 1, for which all the features are relevant.</li>
+      <li>Gradient descent with averaging (of gradients across the environments) greedily maximizes for the learning speed and not invariance.</li>
+    </ul>
+  </li>
+  <li>Performing arithmetic mean can be seen as performing an OR operation (i.e., the sum can be high if any one of the constituents is high), whereas performing geometric mean can be seen as performing an AND operation (i.e., the product can be high only if all the constituents are high).</li>
+</ul>
+
+<h3 id="invariant-learning-consistencyilc">Invariant Learning Consistency(ILC)</h3>
+
+<ul>
+  <li>Given an algorithm \(A\), let \(\theta_{A}^{*}\) denote the set of convergence points of \(A\) when trained on all the environments.</li>
+  <li>Each convergence point is associated with a consistency score.</li>
+  <li>Intuitively, given a convergence point and an environment <em>e</em>, find the set of parameters equivalent to the convergence point (in terms of loss) with respect to <em>e</em>. Let’s call this set as <em>S</em>.</li>
+  <li>Evaluate the points in this set for all the remaining environments. For the given convergence point, an environment <em>e’</em> is consistent with <em>e</em> if the maximum difference in the loss for two environments is small, for all points belonging to <em>S</em>.</li>
+  <li>This idea is used to define the invariant learning consistency score for algorithm \(A\), which measures the expected consistency of the converged points (on the pooled data) across all the environments.</li>
+  <li>The paper shows that the converged points’ consistency is linked to the Hessians’ geometric mean and that for the convex quadratic case, using the elementwise geometric mean of gradients improves consistency.</li>
+  <li>However, there are some practical challenges:
+    <ul>
+      <li>Geometric mean is defined only when all signs are consistent. This issue can potentially be handled by treating different signs as 0.</li>
+      <li>There is very little flexibility in “partial” agreement, and even a single zero gradient component can stop optimization for that component. This can probably be handled by not masking if many environments have a gradient for that component.</li>
+      <li>Geometric component needs to be computed in the log-domain (for numerical scalability), but that can be computationally more expensive.</li>
+      <li>When using adaptive optimizers like Adam, the exact magnitude of geometric mean will be ignored because of rescaling for the local curvature adaptation.</li>
+    </ul>
+  </li>
+  <li>Some of these challenges can be handled using average gradients when the geometric mean would be 0 and masking out components based on the sign.</li>
+</ul>
+
+<h3 id="and-mask">AND-mask</h3>
+
+<ul>
+  <li>The ideas from the previous section can be used to develop a practical algorithm called AND-mask.</li>
+  <li>Zero-out gradients that have inconsistent signs across some threshold number (hyper-parameter) of environments.</li>
+  <li>In the presence of purely random gradient patterns, the AND-mask decreases the signals’ strength exponentially fast.</li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<h3 id="synthetic-memorization-dataset">Synthetic Memorization Dataset</h3>
+
+<ul>
+  <li>This is a binary classification task with two kind of features: (i) “meaningful” features that are shared across environments but harder for the model to learn and (ii) “shortcut” features that are easy to learn but not shared across environments.</li>
+  <li>While the dataset may look simple, it is difficult to find the invariant mechanism because the “shortcut” features allow for a simple, linear decision boundary, with a large margin that is fast to learn, has perfect accuracy, robust to input noise, and no iid generalization gap.</li>
+  <li>Baselines:
+    <ul>
+      <li>MLPs trained with regularizers like dropout, L1, L2, and batch norm.</li>
+      <li>Domain Adversarial Neural Networks (DANN)</li>
+      <li>Invariant Risk Minimization (IRM)</li>
+    </ul>
+  </li>
+  <li>In terms of results, AND-mask with L1/L2 regularizers gives the best results.</li>
+  <li>Empirically, the paper shows that the signal from the “meaningful” features is present when the gradients are averaged, but their magnitude is much smaller than the signal from the “shortcut” features.</li>
+</ul>
+
+<h3 id="experiments-on-cifar-10">Experiments on CIFAR-10</h3>
+
+<ul>
+  <li>A ResNet model is trained on the CIFAR-10 dataset with random labels, with and without the AND-mask.</li>
+  <li>The model with the AND-mask did not memorize the data, whereas the model without the AND-mask did. As sanity, the paper ensured that both the models generalize well when trained with the original labels.</li>
+  <li>Note that for this experiment, every example was treated to have come from its own environment.</li>
+</ul>
+
+<h3 id="behavioral-cloning-on-coinrun">Behavioral Cloning on CoinRun</h3>
+
+<ul>
+  <li>Train an expert policy using PPO for 400M steps on the full distribution of levels.</li>
+  <li>Generate a dataset of state-action pairs. Training data consists of 1000 states from each of the 64 levels, while the test data comes from 2000 levels.</li>
+  <li>A ResNet18 model is used as an imitation learning policy.</li>
+  <li>The exact implementation of the AND-mask is a little more involved, but the key takeaway is that model trained with AND-mask identifies invariant mechanisms across different levels.</li>
+</ul>
diff --git a/_site/site/2020/11/02/One-Solution-is-Not-All-You-Need-Few-Shot-Extrapolation-via-Structured-MaxEnt-RL.html b/_site/site/2020/11/02/One-Solution-is-Not-All-You-Need-Few-Shot-Extrapolation-via-Structured-MaxEnt-RL.html
new file mode 100644
index 00000000..8c6c0253
--- /dev/null
+++ b/_site/site/2020/11/02/One-Solution-is-Not-All-You-Need-Few-Shot-Extrapolation-via-Structured-MaxEnt-RL.html
@@ -0,0 +1,119 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>Key idea: Practicing and remembering diverse solutions to a task can lead to robustness to that task’s variations.</p>
+  </li>
+  <li>
+    <p>The paper proposes a framework to implement this idea - train multiple policies such that they are <em>collectively</em> robust to a new distribution over environments while using a single training environment.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/2010.14484">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="setup">Setup</h2>
+
+<ul>
+  <li>
+    <p>During training, the agent has access to only one MDP.</p>
+  </li>
+  <li>
+    <p>During the evaluation, the agent encounters a new MDP which has the same state and action space but may have a different reward and transition function.</p>
+  </li>
+  <li>
+    <p>The agent is allowed some interactions (say <em>k</em>) with the test MDP and is then evaluated on the test MDP. The setup is referred to as <em>few-shot robustness</em>.</p>
+  </li>
+</ul>
+
+<h2 id="structured-maximum-entropy-reinforcement-learning-smerl">Structured Maximum Entropy Reinforcement Learning (SMERL)</h2>
+
+<ul>
+  <li>
+    <p>Represent a set of policies using a latent variable policy (i.e., a policy conditioned on a latent variable <em>z</em>).</p>
+  </li>
+  <li>
+    <p>This has two benefits: (i) Multiple policies can be represented by the same object, and (ii) diverse behaviors can be learned by encouraging the trajectories, corresponding to different <em>z</em> to be different, while being able to solve the task.</p>
+  </li>
+  <li>
+    <p>A diversity-inducing objective is used to encourage the agent to learn different trajectories for different <em>z</em>.</p>
+  </li>
+  <li>
+    <p>Specifically, the mutual information between <em>p(Z)</em> and marginal trajectory distribution for the latent variable policy is maximized, subject to the constraint that each policy achieves close to optimal returns in the train MDP.</p>
+  </li>
+  <li>
+    <p>The mutual information between <em>p(Z)</em> and marginal trajectory distribution for the latent variable policy is lower bounded by the sum of mutual information terms over individual states (appearing in the trajectory).</p>
+  </li>
+  <li>
+    <p>An unsupervised reward function is defined using the mutual information between states and latent variables.</p>
+  </li>
+  <li>
+    <p>\(r(s, a) = log(q_{\phi})(z\|s) - log(p(z))\) where \(q_{\phi}\) is a learned discriminator.</p>
+  </li>
+  <li>
+    <p>This unsupervised reward is optimized for only when the policy achieves close to an optimal return, i.e., the environment return is close to the optimal return. Otherwise, the agent optimizes only for the environment return.</p>
+  </li>
+</ul>
+
+<h3 id="implementation">Implementation</h3>
+
+<ul>
+  <li>
+    <p>SMERL is implemented using SAC with a latent variable maximum entropy policy.</p>
+  </li>
+  <li>
+    <p>The set of latent variables is a fixed discrete set \(Z\) and \(p(z)\) is set to be a uniform distribution over this set.</p>
+  </li>
+  <li>
+    <p>At the start of an episode, a \(z\) is sampled and used throughout the episode.</p>
+  </li>
+  <li>
+    <p>Discriminator \(q_{\phi}(z\|s)\) is trained to infer \(z\) from the visited states.</p>
+  </li>
+  <li>
+    <p>A baseline SAC agent is trained beforehand to evaluate if the current training policy achieves close to optimal environment return.</p>
+  </li>
+  <li>
+    <p>During the evaluation, the policy corresponding to each latent variable is executed in the test MDP, and the policy with the maximum return is returned.</p>
+  </li>
+</ul>
+
+<h2 id="theoretical-analysis">Theoretical Analysis</h2>
+
+<ul>
+  <li>
+    <p>Given an MDP \(M\) and \(\epsilon&gt;0\), the MDP robustness set is defined as the set of all MDPs \(M'\) where the optimal policy of \(M'\) produces the same trajectory distribution in \(M'\) as \(M\). Moreover, on the training MDP \(M\), the optimal policies (corresponding to \(M\) and \(M'\)) obtain similar returns.</p>
+  </li>
+  <li>
+    <p>The paper shows that SMERL generalizes to MDPs belong to the robustness set.</p>
+  </li>
+  <li>
+    <p>It also provides a simplified view of the optimization objective and shows how it naturally leads to a trajectory-centric mutual information objective.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>Environments</p>
+
+    <ul>
+      <li>
+        <p>2D navigation environments with point mass.</p>
+      </li>
+      <li>
+        <p>Mujoco Environments: HalfCheetah-Goal, Walker2d-Velocity, Hopper-Velocity.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>On the 2D navigation environment, the paper shows that SMERL learns to use different trajectories to reach the goal.</p>
+  </li>
+  <li>
+    <p>On the Mujoco setup, the evaluation shows that SMERL generally outperforms the best-performing baseline or is close to the best-performing baseline on different tasks.</p>
+  </li>
+  <li>
+    <p>Generally, higher train performance does not correlate with higher test performance, and there is no single policy that performs the best across all the tasks. Thus, it should be beneficial to learn multiple diverse policies that can be selected from during testing.</p>
+  </li>
+</ul>
diff --git a/_site/site/2020/11/09/Searching-for-Build-Debt-Experiences-Managing-Technical-Debt-at-Google.html b/_site/site/2020/11/09/Searching-for-Build-Debt-Experiences-Managing-Technical-Debt-at-Google.html
new file mode 100644
index 00000000..a529dcfe
--- /dev/null
+++ b/_site/site/2020/11/09/Searching-for-Build-Debt-Experiences-Managing-Technical-Debt-at-Google.html
@@ -0,0 +1,140 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper describes the efforts to control and repay the technical debt in the build system at Google (called the Build Debt).</p>
+  </li>
+  <li>
+    <p>Guiding Principles:</p>
+
+    <ul>
+      <li>
+        <p>Automate techniques to analyze and fix issues that contribute to technical debt.</p>
+      </li>
+      <li>
+        <p>Make it easier to do the right thing as developers can incur technical debt unknowingly.</p>
+      </li>
+      <li>
+        <p>Make it hard to do the wrong thing, e.g., by building stricter checks into the build process.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Note that some of the metrics and design decisions may be outdated now (the paper was written in 2012). However, the core message is still relevant.</p>
+  </li>
+  <li>
+    <p><a href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37755.pdf">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="googles-build-system-debt">Google’s Build System Debt</h2>
+
+<ul>
+  <li>
+    <p>BUILD files encapsulate the specifications for building software.</p>
+  </li>
+  <li>
+    <p>Generally, these files are maintained manually, and the dependencies may not be up-to-date over time.</p>
+  </li>
+  <li>
+    <p>In extreme cases, some of the build targets are not built for months. Such targets are called zombie targets.</p>
+  </li>
+  <li>
+    <p>Originally, any project could depend on any other project’s internal details, thus creating (sometimes unwanted) couplings.</p>
+  </li>
+  <li>
+    <p>If the lower-level project did not intend to expose some internal details, the unwanted couplings introduce technical debt and make it harder to modify the lower-level project.</p>
+  </li>
+  <li>
+    <p>One form of technical debt is the visibility debt or the cost of back-fitting visibility rules onto the existing build specifications to re-establish the appropriate encapsulations.</p>
+  </li>
+  <li>
+    <p>Another example of technical debt is dead code that can confuse the developers looking for useful APIs.</p>
+  </li>
+</ul>
+
+<h2 id="dependency-debt">Dependency Debt</h2>
+
+<ul>
+  <li>
+    <p><em>Over-declared</em> or <em>underutilized</em> dependencies can slow the build and testing of systems.</p>
+  </li>
+  <li>
+    <p><em>Under-declared</em> dependencies can make the build process brittle and make it difficult to remove <em>over-declared</em> dependencies.</p>
+  </li>
+  <li>
+    <p>Potential solutions for <em>over-declared</em> dependencies include:</p>
+
+    <ul>
+      <li>
+        <p>Setting aside some dedicated time for fixing build rules. But this approach is not automated, and potential breakages make it harder for developers to do the right thing.</p>
+      </li>
+      <li>
+        <p>Automatically add all the <em>under-declared</em> dependencies to the BUILD files. The system can raise an error if a direct dependency is missing, making it harder to do the wrong thing.</p>
+      </li>
+      <li>
+        <p>Automation can be applied for finding/reporting the over-declared dependencies as well.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Potential solutions for <em>underutilized</em> dependencies include:</p>
+
+    <ul>
+      <li>
+        <p>While it is challenging to automate fixing <em>underutilized</em> dependencies, automating the discovery of such dependencies is still useful.</p>
+      </li>
+      <li>
+        <p>Highlighting dependencies with high cost and low removal effort could incentivize developers to clean up their projects.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="zombie-targets">Zombie Targets</h2>
+
+<ul>
+  <li>
+    <p>Zombie targets can be identified by query the results of build and test runs.</p>
+  </li>
+  <li>
+    <p>A target is marked as “dead” if the attempts to build it have failed for at least 90 days. Until then, build errors are considered to be transient.</p>
+  </li>
+  <li>
+    <p>A zombie target can be eliminated by deleting its definition from the BUILD and deleting the source files, which are reachable only via the zombie target.</p>
+  </li>
+</ul>
+
+<h2 id="visibility-debt">Visibility Debt</h2>
+
+<ul>
+  <li>
+    <p>Originally, the default visibility of all the targets was public, leading to unintended dependencies.</p>
+  </li>
+  <li>
+    <p>The visibility of all the existing builds was set to <em>legacy_public</em>, and the default visibility was changed to private.</p>
+  </li>
+  <li>
+    <p>This encouraged developers to explicitly consider if they wanted other projects to depend on their project.</p>
+  </li>
+</ul>
+
+<h2 id="dead-flags">Dead Flags</h2>
+
+<ul>
+  <li>
+    <p>Google developed its command-line parsing utilities and defined a set of recognized command-line flags for libraries and binaries.</p>
+  </li>
+  <li>
+    <p>Overtime, the number of flags grew to half a million, and many of these flags are not useful anymore (i.e., dead).</p>
+  </li>
+  <li>
+    <p>These dead flags can it hard to understand and refactor code.</p>
+  </li>
+  <li>
+    <p>Existing flags are analyzed to check which ones have always been set to the same value and replaced by those contents, clearing about 150 thousand flags.</p>
+  </li>
+  <li>
+    <p>Removing dead flags also helps to clean up dead/unreachable code.</p>
+  </li>
+</ul>
diff --git a/_site/site/2020/11/16/Data-Management-for-Internet-Scale-Single-Sign-On.html b/_site/site/2020/11/16/Data-Management-for-Internet-Scale-Single-Sign-On.html
new file mode 100644
index 00000000..8fe53e7f
--- /dev/null
+++ b/_site/site/2020/11/16/Data-Management-for-Internet-Scale-Single-Sign-On.html
@@ -0,0 +1,127 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper describes the architecture of an erstwhile single-sign-on (SSO) service used by Google, called Google Accounts (2006).</p>
+  </li>
+  <li>
+    <p>Note that some of the metrics and design decisions may be outdated now (the paper was written in 2006). However, the core message is still relevant.</p>
+  </li>
+  <li>
+    <p><a href="https://www.usenix.org/legacy/event/worlds06/tech/prelim_papers/perl/perl.pdf">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="operational-constraints">Operational Constraints</h2>
+
+<ul>
+  <li>
+    <p>SSO’s availability affects the availability of all applications that require user sign-in.</p>
+  </li>
+  <li>
+    <p>Generally, systems can achieve high availability by sacrificing consistency, but given the nature of SSO (matching username/passwords), providing an inconsistent view is not a good option, and single-copy consistency is a usability requirement.</p>
+  </li>
+</ul>
+
+<h2 id="berkeley-db">Berkeley DB</h2>
+
+<ul>
+  <li>
+    <p>Berkeley DB is an embedded, high-performance, scalable, transactional storage system for key-value data and provides both keyed and sequential lookup.</p>
+  </li>
+  <li>
+    <p>It provides a primary copy replication model with a single writer (called master) and multiple read-only replicas.</p>
+  </li>
+  <li>
+    <p>All writes are sent to the master, which first applies the changes and then propagates them to the replicas.</p>
+  </li>
+  <li>
+    <p>The master and the replicas have identical logs, and in case of master failure, a new master is elected from the replicas.</p>
+  </li>
+  <li>
+    <p>Some synchronization may be needed between the replicas in case, e.g., the master dies in between a transaction.</p>
+  </li>
+</ul>
+
+<h2 id="sso-architecture">SSO Architecture</h2>
+
+<ul>
+  <li>
+    <p>SSO service maps usernames to user account data and services to service-specific data.</p>
+  </li>
+  <li>
+    <p>The SSO database is partitioned into shards, where each shard is a replicated Berkeley DB (having 5 to 15 replicas).</p>
+  </li>
+  <li>
+    <p>Each replica stores the data in a B+-link tree data structure.</p>
+  </li>
+  <li>
+    <p>Consistent reads must go to the master, while non-master replicas can serve “ stale” reads.</p>
+  </li>
+  <li>
+    <p>In the case of larger replication groups (say 15 replicas), only a subset of replicas can become master (“electable replicas”).</p>
+  </li>
+  <li>
+    <p>In general, replicas are spread geographically to handle machine-failure, network-failure, and data center-failure.</p>
+  </li>
+  <li>
+    <p>Replicas in a share are kept close to reduce the communication latency, which affects the time to commit a write operation or electing a new master.</p>
+  </li>
+  <li>
+    <p>Some of the shards implement ID-map, i.e., map of username to userid and userid to shards.</p>
+  </li>
+</ul>
+
+<h2 id="database-integration">Database Integration</h2>
+
+<ul>
+  <li>Berkeley DB leaves decisions regarding quorums, leases, etc., up to the application.</li>
+</ul>
+
+<h3 id="quorums">Quorums</h3>
+
+<ul>
+  <li>
+    <p>SSO chooses a quorum protocol that guarantees that updates are never lost.</p>
+  </li>
+  <li>
+    <p>For the write queries, the master waits for a positive acknowledgment from a majority of the replicas, including itself, before marking the query as completed.</p>
+  </li>
+  <li>
+    <p>When selecting a new leader, SSO requires a majority of replicas to agree. Moreover, Berkeley DB elections always choose a replica with the latest log entry during an election, thus guaranteeing that the new master’s log will include all the previous master’s updates.</p>
+  </li>
+</ul>
+
+<h3 id="leases">Leases</h3>
+
+<ul>
+  <li>
+    <p>The master holds a <em>master lease</em> when responding to read queries and refreshes this lease periodically by communicating with a majority of replicas.</p>
+  </li>
+  <li>
+    <p>The lease guarantees that the master is not returning stale data if a partition or failure causes the master to lose its mastership, i.e., holding the lease guarantees that the master is still the master.</p>
+  </li>
+  <li>
+    <p>Moreover, elections can not be completed within the lease timeout interval.</p>
+  </li>
+</ul>
+
+<h3 id="replica-group-membership">Replica Group Membership</h3>
+
+<ul>
+  <li>
+    <p>SSO maintains a replica configuration containing the logical (DNS) name and IP address of each replica.</p>
+  </li>
+  <li>
+    <p>In case of any changes to the configuration, the changes are specified in a file that the master reads periodically.</p>
+  </li>
+  <li>
+    <p>If the configuration changes, the master initiates a configuration change and update the database.</p>
+  </li>
+  <li>
+    <p>Non-master replicas can get the new configuration from the database.</p>
+  </li>
+  <li>
+    <p>A new replica or a replica that lost state (say due to a failure) starts as a non-voting replica and can not participate in an election till it has caught up with the master as of the time the replica joined (again).</p>
+  </li>
+</ul>
diff --git a/_site/site/2020/11/23/Exploring-Simple-Siamese-Representation-Learning.html b/_site/site/2020/11/23/Exploring-Simple-Siamese-Representation-Learning.html
new file mode 100644
index 00000000..f99c1500
--- /dev/null
+++ b/_site/site/2020/11/23/Exploring-Simple-Siamese-Representation-Learning.html
@@ -0,0 +1,107 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper shows that Siamese networks can be used for unsupervised learning with images without needing techniques like negative sample pairs, large batch training, or momentum encoders. The training mechanism is referred to as the SimSiam method.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/2011.10566">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="method">Method</h2>
+
+<ul>
+  <li>
+    <p>Given an input image <em>x</em>, create two augmented views <em>x1</em> and <em>x2</em>.</p>
+  </li>
+  <li>
+    <p>These views are processed by an encoder network <em>f</em>.</p>
+  </li>
+  <li>
+    <p>One of the views (say <em>x1</em>) is processed by the encoder <em>f</em> as well as a predictor MLP <em>h</em> to obtain a projection <em>p1</em> ie <em>p1 = h(f(x1))</em>.</p>
+  </li>
+  <li>
+    <p>The second view (<em>x2</em>) is processed only by the encoder <em>f</em> to obtain an encoding <em>z2</em> i.e., <em>z2 = f(x2)</em>.</p>
+  </li>
+  <li>
+    <p>Negative cosine similarity is minimized between <em>p1</em> and <em>z2</em> with the catch that the resulting gradients are not used to update the encoder via <em>z2</em>. I.e., Loss = <em>D(p1, stopgrad(z2))</em> where <em>D</em> is the negative cosine similarity and <em>stopgrad</em> is an operation that stops the flow of gradients.</p>
+  </li>
+  <li>
+    <p>In practice, both <em>p1, z2</em> and <em>p2, z1</em> pairs are used for computing the loss. ie  Loss = <em>0.5 * (D(p1, stopgrad(z2)) + D(p2, stopgrad(z1)))</em>.</p>
+  </li>
+</ul>
+
+<h2 id="implementation-details">Implementation Details</h2>
+
+<ul>
+  <li>
+    <p>Encoder uses batch norm in all the layers (including output) while projection MLP uses batch norm only in the hidden layers.</p>
+  </li>
+  <li>
+    <p>SGD optimizer with learning rate as <em>0.05 * batchsize / 256</em>, cosine learning rate decay schedule and SGD momentum = 0.9.</p>
+  </li>
+  <li>
+    <p>Unsupervised pretraining on the ImageNet dataset followed by training a supervised linear classifier on the frozen representations.</p>
+  </li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<ul>
+  <li>
+    <p>Stop-gradient operation is necessary to avoid a degenerate solution. Without stop-gradient, the model maps all inputs to a constant <em>z</em>.</p>
+  </li>
+  <li>
+    <p>If the projection layer is removed, the method does not work (because of the loss’s symmetric nature). If the loss is also made asymmetric, the method still does not work without the projection layer. However, asymmetric loss + projection layer works.</p>
+  </li>
+  <li>
+    <p>Keeping the projection layer fixed (i.e., not updating during training) avoids collapse but leads to poor validation performance.</p>
+  </li>
+  <li>
+    <p>Training the projection layer with a constant learning rate works better in practice, likely because the projection layer needs to keep adapting before the encoder layer is sufficiently trained.</p>
+  </li>
+  <li>
+    <p>The method works well across different batch sizes.</p>
+  </li>
+  <li>
+    <p>Removing batch norm layers from all the layers in all the networks does not lead to collapse, though the model’s performance degrades on the validation dataset. Adding batch norm to the hidden layers alone is sufficient.</p>
+  </li>
+  <li>
+    <p>Adding batch norm to the encoder’s output further improves the performance but adding batch norm to all the layers of all the networks makes the training unstable, with the loss oscillating.</p>
+  </li>
+  <li>
+    <p>Overall, while batch norm helps to improve performance, it is not sufficient to avoid collapse.</p>
+  </li>
+  <li>
+    <p>The setup does not collapse when the cross-entropy loss replaces the cosine loss.</p>
+  </li>
+</ul>
+
+<h2 id="what-is-simsiam-solving">What is SimSiam solving?</h2>
+
+<ul>
+  <li>
+    <p>Given that the stop-gradient operation seems to be the critical ingredient for avoiding collapse, the paper hypothesizes that SimSiam is solving a different optimization problem.</p>
+  </li>
+  <li>
+    <p>The hypothesis is that SimSiam is implementing an Expectation-Maximisation (EM) algorithm with two sets of variables and two underlying sub-problems.</p>
+  </li>
+  <li>
+    <p>The paper performs several experiments to test this hypothesis. For example, they consider <em>k</em> SGD steps for the first problem before performing an update for the second problem, showing that the alternating optimization is a valid formulation, of which SimSiam is a particular case.</p>
+  </li>
+</ul>
+
+<h2 id="comparison-to-other-methods">Comparison to other methods</h2>
+
+<ul>
+  <li>
+    <p>SimSiam achieves the highest accuracy among SimCLR, MoCo, BYOL, and SwAV for training under 100 epochs. However, it lags behind other methods when trained longer.</p>
+  </li>
+  <li>
+    <p>SimSiam’s representations are transferable beyond the ImageNet tasks.</p>
+  </li>
+  <li>
+    <p>Adding projection layer and stop-gradient operator to SimCLR does not improve its performance.</p>
+  </li>
+</ul>
diff --git a/_site/site/2020/11/30/Consistency-Tradeoffs-in-Modern-Distributed-Database-System-Design.html b/_site/site/2020/11/30/Consistency-Tradeoffs-in-Modern-Distributed-Database-System-Design.html
new file mode 100644
index 00000000..8f28c788
--- /dev/null
+++ b/_site/site/2020/11/30/Consistency-Tradeoffs-in-Modern-Distributed-Database-System-Design.html
@@ -0,0 +1,123 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>CAP theorem has been influential in the design decisions for distributed databases.</p>
+  </li>
+  <li>
+    <p>However, designers incorrectly assume that the CAP theorem “always” imposes restrictions in terms of the tradeoff between availability and consistency. In contrast, the tradeoff is applicable only in the case of partitions.</p>
+  </li>
+  <li>
+    <p>CAP theorem led to the development of highly available systems with reduced consistency models (and reduced ACID guarantees).</p>
+  </li>
+  <li>
+    <p>Another tradeoff - between latency and consistency - has also been influential for database design.</p>
+  </li>
+  <li>
+    <p>The paper unifies CAP and latency-consistency tradeoffs into a single formulation called PACELC.</p>
+  </li>
+  <li>
+    <p>Note that some of the observations, especially ones about the databases, may be outdated now (the paper was written in 2012). However, the core message is still relevant.</p>
+  </li>
+  <li>
+    <p><a href="https://www.cs.umd.edu/~abadi/papers/abadi-pacelc.pdf">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="latency-consistency-tradeoff">Latency-Consistency Tradeoff</h2>
+
+<ul>
+  <li>
+    <p>Low latency (or high availability) means that the system must replicate data.</p>
+  </li>
+  <li>
+    <p>In case of an update query, three possibilities arise:</p>
+
+    <ul>
+      <li>
+        <p>The system can choose to send data updates to all the replicas at once. This leads to two possibilities:</p>
+
+        <ul>
+          <li>
+            <p>A replica can receive the update queries in an arbitrary order, thus breaking consistency with other replicas.</p>
+          </li>
+          <li>
+            <p>Alternatively, the replicas could use some protocol to agree on the order of updates. However, this can introduce latency.</p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>The update queries can be first sent to a master replica.</p>
+
+        <ul>
+          <li>
+            <p>The master replica can apply the updates and send them to the other replicas using one of the following strategies:</p>
+
+            <ul>
+              <li>
+                <p>Synchronous replication where the master waits for all the updates to be applied to a replica(s). However, this approach introduces latency.</p>
+              </li>
+              <li>
+                <p>Asynchronous replication where the master assumes the update to be complete before it completes. In this case, the latency-consistency tradeoff depends on how read queries are handled:</p>
+
+                <ul>
+                  <li>
+                    <p>The system can send all read queries to the master. In this case, there are no consistency issues, but additional latency is introduced because all the read queries go to the same replica, thus potentially overloading it.</p>
+                  </li>
+                  <li>
+                    <p>Alternatively, the read query can be served from any replica. While this improves read latency, the results can be inconsistent now.</p>
+                  </li>
+                </ul>
+              </li>
+              <li>
+                <p>Use a mix of Synchronous and Asynchronous replication - i.e., some of the write queries are Synchronous, and others are Asynchronous. In this case, the latency-consistency tradeoff depends on how read queries are handled:</p>
+
+                <ul>
+                  <li>
+                    <p>If the read is routed to at least one replica that has been Synchrnously updated, the consistency can be preserved, with additional latency for discovering the updated replica, etc.</p>
+                  </li>
+                  <li>
+                    <p>If the read query can not be routed to an updated replica (maybe because none of the replicas is updated), then either latency suffers or inconsistent read can be performed.</p>
+                  </li>
+                </ul>
+              </li>
+            </ul>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>The update query is first sent to an arbitrary replica.</p>
+
+        <ul>
+          <li>This is the same as the previous case, with the query going to an arbitrary replica instead of the master replica, and suffers from the same latency issues as the last case.</li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>In a nutshell, the tradeoff between latency and consistency  is always present, irrespective of network failure.</p>
+  </li>
+  <li>
+    <p>This contrasts with the CAP theorem, which imposes the tradeoff between availability and consistency only in the case of a network partition.</p>
+  </li>
+</ul>
+
+<h2 id="pacelc">PACELC</h2>
+
+<ul>
+  <li>
+    <p>If there is a partition (P), how does the system tradeoff availability (A) and consistency (C); else (E), when the system is running without failures, how does the system tradeoff latency (L) and consistency (C)?</p>
+  </li>
+  <li>
+    <p>The latency-consistency tradeoff (ELC) is relevant only when the data is replicated.</p>
+  </li>
+  <li>
+    <p>Default versions of Dynamo, Cassandra, and Riak were PA/EL systems, i.e., if a partition occurs, availability is prioritized. In the absence of partition, lower latency is prioritized.</p>
+  </li>
+  <li>
+    <p>Fully ACID systems (VoltDB, H-Store, and Megastore) and others like BigTable and HB are PC/EC, i.e., they prioritize consistency and give up availability and latency.</p>
+  </li>
+  <li>
+    <p>MongoDB can be classified as a PA/EC system, while PNUTS is a PC/EL system.</p>
+  </li>
+</ul>
diff --git a/_site/site/2020/12/07/CAP-twelve-years-later-How-the-rules-have-changed.html b/_site/site/2020/12/07/CAP-twelve-years-later-How-the-rules-have-changed.html
new file mode 100644
index 00000000..efce6723
--- /dev/null
+++ b/_site/site/2020/12/07/CAP-twelve-years-later-How-the-rules-have-changed.html
@@ -0,0 +1,154 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The CAP theorem states that any system sharing data over the network can only have at most two (out of three) desirable properties:</p>
+
+    <ul>
+      <li>
+        <p>consistency (C), i.e., a single, up-to-date copy of the data;</p>
+      </li>
+      <li>
+        <p>high availability (A) of that data (for updates); and</p>
+      </li>
+      <li>
+        <p>tolerance to network partitions (P).</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>This “2 of 3” formulation is misleading as it oversimplifies the interplay between properties.</p>
+  </li>
+  <li>
+    <p><a href="https://ieeexplore.ieee.org/abstract/document/6133253">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="acid-vs-base">ACID vs. BASE</h2>
+
+<ul>
+  <li>
+    <p>ACID is a design philosophy that focuses on consistency as reflected in the traditional relational databases.</p>
+  </li>
+  <li>
+    <p>The four properties in ACID are:</p>
+
+    <ul>
+      <li>
+        <p>Atomicity (A), i.e., the operations are atomic, and either the entire operation succeeds or none of it succeeds.</p>
+      </li>
+      <li>
+        <p>Consistency (C), i.e., a transaction preserves all the rules. Note that the consistency in CAP is a subset of consistency in ACID.</p>
+      </li>
+      <li>
+        <p>Isolation (I), i.e., transactions occur in isolation and do not affect each other.</p>
+      </li>
+      <li>
+        <p>Durability (D), i.e., the transactions are durable irrespective of system failure.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>BASE is an alternate design philosophy that focuses on availability as reflected in the NoSQL databases.</p>
+  </li>
+  <li>
+    <p>The four properties in BASE are:</p>
+
+    <ul>
+      <li>
+        <p>Basic Availability (BA), i.e., the database appears to work most of the time.</p>
+      </li>
+      <li>
+        <p>Soft state (S), i.e., the system’s state can change over time as it becomes eventually consistent.</p>
+      </li>
+      <li>
+        <p>Eventual consistency (E), i.e., the system will eventually become consistent over time.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="cap-confusion">CAP confusion</h2>
+
+<ul>
+  <li>
+    <p>Generally, partitionability is seen as a must-have, thus reducing the choice to be between availability and consistency.</p>
+  </li>
+  <li>
+    <p>This view is somewhat misleading because the choice between C, A, and P is not binary but granular.</p>
+  </li>
+  <li>
+    <p>The choice between C and A can occur at various granularity levels, and different components (of a larger system) can prioritize different aspects.</p>
+  </li>
+  <li>
+    <p>Similarly, the CAP theorem generally ignores latency even though it is closely related to partitionability. For example, failing to achieve consistency within a time-bound (i.e., latency) implies a partition.</p>
+  </li>
+  <li>
+    <p>In general, there is no global notion of partition - some subset of nodes may experience a partition, and others may not.</p>
+  </li>
+  <li>
+    <p>Once a partition is detected, the system can then choose between C and A.</p>
+  </li>
+</ul>
+
+<h2 id="managing-partitions">Managing Partitions</h2>
+
+<ul>
+  <li>
+    <p>Three-step process for managing partitions:</p>
+
+    <ul>
+      <li>
+        <p>Detect the start of a partition.</p>
+      </li>
+      <li>
+        <p>Enter an explicit partition mode that may limit some operations.</p>
+
+        <ul>
+          <li>
+            <p>Possible strategies:</p>
+
+            <ul>
+              <li>
+                <p>Reduce availability by limiting some operations.</p>
+              </li>
+              <li>
+                <p>Record extra information that can be used during partition recovery.</p>
+              </li>
+            </ul>
+          </li>
+          <li>
+            <p>The strategy depends on the invariants that the system should maintain.</p>
+          </li>
+          <li>
+            <p>For example, if the invariant is that the keys (in a table) should be unique, the system could allow duplicate keys for some time and perform a de-duplication step during partition recovery.</p>
+          </li>
+          <li>
+            <p>A counterexample is a monetary transaction (e.g., charging a credit card). In such cases, the system could disable the operation and record it for performing later. Sometimes this “unavailability” is not visible to the user.</p>
+          </li>
+          <li>
+            <p>History of operations (over replicas across different partitions) can be tracked using version vectors of the form (node, logical time). The system can easily recreate the order in which they were executed (or mark them as being concurrent).</p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>Initiate partition recovery when communication is restored and make the state across the partitions consistent.</p>
+      </li>
+      <li>
+        <p>One common approach is to revert to the state when the partition was detected and apply the operations consistently across all the replicas.</p>
+      </li>
+      <li>
+        <p>This may require some extra effort to merge conflicts.</p>
+      </li>
+      <li>
+        <p>One workaround can be to constrain the use of certain operations so that the system does not encounter merge conflicts during recovery.</p>
+      </li>
+      <li>
+        <p>Sometimes, certain invariants may be violated when the system is in the partition mode and needs to be fixed during recovery.</p>
+      </li>
+      <li>
+        <p>The key takeaway is that when partitions exist, the choice between availability and consistency is not binary, and both can be optimized for.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
diff --git a/_site/site/2020/12/14/Cassandra-a-decentralized-structured-storage-system.html b/_site/site/2020/12/14/Cassandra-a-decentralized-structured-storage-system.html
new file mode 100644
index 00000000..3bfd749e
--- /dev/null
+++ b/_site/site/2020/12/14/Cassandra-a-decentralized-structured-storage-system.html
@@ -0,0 +1,101 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>Cassandra is a distributed storage system that runs over cheap commodity servers and handles high write throughput while maintaining low latency for read operations.</li>
+  <li>At the time of writing, it was used to support the search for Facebook Inbox.</li>
+  <li><a href="https://dl.acm.org/doi/10.1145/1773912.1773922">Link to the paper</a></li>
+  <li><a href="https://cassandra.apache.org/">Link to the implementation</a></li>
+</ul>
+
+<h2 id="data-model">Data Model</h2>
+
+<ul>
+  <li>A table is a distributed multidimensional map.</li>
+  <li>The key is a string (generally 16-36 bytes long), while the value is a structured object.</li>
+  <li>Every operation under a single row key is atomic per replica.</li>
+  <li>Columns are grouped together into sets called column families.</li>
+  <li>There are two types of columns families:
+    <ul>
+      <li>Simple families.</li>
+      <li>Super column families: visualized as a column family within a column family.</li>
+    </ul>
+  </li>
+  <li>Columns can be sorted by name or time (used to display results in time sorted order).</li>
+  <li>The API supports insert, get and delete operations.</li>
+</ul>
+
+<h2 id="system-architecture">System Architecture</h2>
+
+<h3 id="handling-requests">Handling Requests</h3>
+
+<ul>
+  <li>Any read/write request gets routed to any node in the cluster. The node determines the replicas for a given key and routes the request.</li>
+  <li>For write query, the system waits for a quorum of replicas to acknowledge the writes’ completion.</li>
+  <li>For read query, the system either routes the requests to the closest replica (might fetch stale results) or routes the requests to all replicas and waits for a quorum of responses.</li>
+</ul>
+
+<h3 id="partitioning">Partitioning</h3>
+
+<ul>
+  <li>Cassandra partitions data across the cluster using consistent hashing with an order-preserving hash function.</li>
+  <li>The hash function’s output range is treated as a fixed circular ring, and each node is assigned a random position on the ring.</li>
+  <li>An incoming request specifies a key used to route requests.</li>
+  <li>One benefit of this approach is that the addition/removal of a node only affects its immediate neighbors.</li>
+  <li>However, randomly assigning nodes leads to non-uniform data and load distribution.</li>
+  <li>Cassandra uses the load information and moves lightly loaded nodes to reduce the load on other nodes.</li>
+</ul>
+
+<h3 id="replication">Replication</h3>
+
+<ul>
+  <li>Each data item is replicated at N hosts, where N is the per-instance replication factor.</li>
+  <li>Cassandra supports the following replication policies: Rack Unaware, Rack Aware (within a datacenter), and Datacenter Aware.</li>
+  <li>For “Rack Aware” and “Datacenter Aware” strategies, Zookeeper elects a leader among the nodes and holds metadata about which range a node is responsible for.</li>
+  <li>In case of node failure and network partitions, the quorum requirements are relaxed.</li>
+</ul>
+
+<h3 id="membership">Membership</h3>
+
+<ul>
+  <li>Cluster membership is based on Scuttlebutt, a very efficient anti-entropy Gossip based mechanism.</li>
+  <li>Cassandra uses a modified version of $\phi$ Accrual Failure Detector for detecting failures, which provides the suspicion level (of failure) for each node.</li>
+</ul>
+
+<h3 id="bootstrapping">Bootstrapping</h3>
+
+<ul>
+  <li>A node, starting for the first time, chooses a random position in the ring.</li>
+  <li>This information is persisted on the local disk, on Zookeeper, and gossiped around the cluster (so any node can route any query in the cluster).</li>
+  <li>During bootstrapping, the newly joined node reads a list of contact points (within the cluster) using a configuration file.</li>
+</ul>
+
+<h3 id="local-persistence">Local Persistence</h3>
+
+<ul>
+  <li>Generally, a write operation involves a write into a commit log (for durability and recoverability), followed by a write into the in-memory data structures.</li>
+  <li>A read operation starts with querying the in-memory data and then looks into the filesystem.</li>
+  <li>Read queries on the filesystem use bloom filters.</li>
+  <li>Column indices are maintained to make it faster to look up relevant columns.</li>
+</ul>
+
+<h2 id="implementation-details">Implementation Details</h2>
+
+<ul>
+  <li>Components implemented in Java.</li>
+  <li>System control messages use UDP while messages for replication and request routing uses TCP.</li>
+  <li>A new commit log is rolled out after the older one exceeds 128MB of size.</li>
+  <li>All the data is indexed using a primary key.</li>
+  <li>Data on the disk is chunked into sequences of blocks. Each block contains at most 128 keys and is demarcated by a block index.</li>
+  <li>When the data is written to the disk, a block index is generated and maintained in the memory for faster access.</li>
+  <li>A compaction process is performed to merge multiple files (on disk) into one file.</li>
+</ul>
+
+<h2 id="practical-experience">Practical Experience</h2>
+
+<ul>
+  <li>Data from MySQL servers is added to Cassandra using MapReduce processes.</li>
+  <li>Although Cassandra is a completely decentralized system, adding some coordination (via Zookeeper) is helpful.</li>
+  <li>For Inbox Search, a per-user index is maintained for all the messages.</li>
+  <li>For “term search”, the key is the userid, and the words in the message become the super column.</li>
+  <li>For searching all the messages ever sent/received by a user, the key is the userid, and the recipient ids are the super columns.</li>
+</ul>
diff --git a/_site/site/2020/12/21/Design-patterns-for-container-based-distributed-systems.html b/_site/site/2020/12/21/Design-patterns-for-container-based-distributed-systems.html
new file mode 100644
index 00000000..1d569013
--- /dev/null
+++ b/_site/site/2020/12/21/Design-patterns-for-container-based-distributed-systems.html
@@ -0,0 +1,68 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>The paper describes three design patterns for container-based distributed systems: single-container pattern, single-node pattern, and multi-node pattern.</li>
+  <li><a href="https://www.usenix.org/conference/hotcloud16/workshop-program/presentation/burns">Link to the paper</a></li>
+</ul>
+
+<h2 id="single-container-management-patterns">Single-container management patterns</h2>
+
+<ul>
+  <li>Traditionally, containers have exposed three functions: run, pause and stop.</li>
+  <li>A richer API can be implemented to provide fine-grained control to system developers and operators.</li>
+  <li>Similarly, much more application information (including monitoring metrics) can be exposed.</li>
+  <li>The container interface can be used to define a contract for a complex lifecycle. For example, instead of arbitrarily shutting down the container, the system could signal that it will be terminated, giving the container some time to perform some cleanup/follow-up actions.</li>
+</ul>
+
+<h2 id="single-node-multi-container-pattern">Single-node, multi-container pattern</h2>
+
+<h3 id="sidecar-pattern">Sidecar pattern</h3>
+
+<ul>
+  <li>Multiple containers extend and enhance the main container.</li>
+  <li>For example, a web-server serves from the local disk (main container) while a side container updates the data.</li>
+  <li>Benefits:
+    <ul>
+      <li>independent development, deployment, and scaling of containers</li>
+      <li>possibility of combining different type of containers</li>
+      <li>failure containment boundary, i.e., one failing container, need not bring down the entire system.</li>
+    </ul>
+  </li>
+</ul>
+
+<h3 id="ambassador-pattern">Ambassador pattern</h3>
+
+<ul>
+  <li>Proxy communication to and from the main container with the ambassador hiding the complexities of communication with a distributed (multi-shard system) that may be written in a different language.</li>
+</ul>
+
+<h3 id="adapter-pattern">Adapter pattern</h3>
+
+<ul>
+  <li>Standardize output and interfaces across the containers to provide a simple, homogenized view to external applications.</li>
+  <li>A common example is using a single tool for collecting/processing metrics from multiple applications.</li>
+  <li>This is different from the adapter pattern, which aims to provide a simplified view of the external world to an application.</li>
+</ul>
+
+<h2 id="multi-node-application-patterns">Multi-node application patterns</h2>
+
+<h3 id="leader-election-pattern">Leader election pattern</h3>
+
+<ul>
+  <li>In a sharded (or replication-based) system, the system may have to elect a leader (or multiple leaders) among the replicas (or shards).</li>
+  <li>Instead of using a leader election library, a leader election container can be used (that communicates with other containers over, say, HTTP). This removes the restriction of using a leader election library compatible with the containers (e.g., using the same language).</li>
+</ul>
+
+<h3 id="work-queue-pattern">Work queue pattern</h3>
+
+<ul>
+  <li>A work coordinator container can queue different containers, each of which may have a different implementation or dependencies, thus removing the restriction that all the works use the same runtime.</li>
+</ul>
+
+<h3 id="scattergather-pattern">Scatter/gather pattern</h3>
+
+<ul>
+  <li>An external client sends a request to a root container.</li>
+  <li>This container fans out the request to many containers that may perform the computation in parallel.</li>
+  <li>The root container gathers these parallel computations’ results and aggregates them into a response to the external client.</li>
+</ul>
diff --git a/_site/site/2021/01/04/Compositional-Explanations-of-Neurons.html b/_site/site/2021/01/04/Compositional-Explanations-of-Neurons.html
new file mode 100644
index 00000000..a1a6fb08
--- /dev/null
+++ b/_site/site/2021/01/04/Compositional-Explanations-of-Neurons.html
@@ -0,0 +1,155 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper describes a method to explain/interpret the representations learned by individual neurons in deep neural networks.</p>
+  </li>
+  <li>
+    <p>The explanations are generated by searching for logical forms defined by a set of composition operators (like OR, AND, NOT) over primitive concepts (like water).</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/2006.14032">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="generating-compositional-explanations">Generating compositional explanations</h2>
+
+<ul>
+  <li>
+    <p>Given a neural network <em>f</em>, the goal is to explain a neuron’s behavior (of this network) in human-understandable terms.</p>
+  </li>
+  <li>
+    <p><a href="http://netdissect.csail.mit.edu/">Previous work</a> builds on the idea that a good explanation is a description that identifies the inputs for which the neuron activates.</p>
+  </li>
+  <li>
+    <p>Given a set of pre-defined atomic concepts $c \in C$ and a similarity measure $\delta(n, c)$ where $n$ represents the activation of the $n^{th}$ neuron, the explanation, for the $n^{th}$ neuron, is the concept most similar to $n$.</p>
+  </li>
+  <li>
+    <p>For images, a concept could be represented as an image segmentation map. For example, the water concept can be represented by the segments of the images that show water.</p>
+  </li>
+  <li>
+    <p>The similarity can be measured by first thresholding the neuron activations (to get a neuron mask) and then computing the IoU score (or Jaccard Similarity) between the neuron mask and the concept.</p>
+  </li>
+  <li>
+    <p>One limitation of this approach is that the explanations are restricted to pre-defined concepts.</p>
+  </li>
+  <li>
+    <p>The paper expands the set of candidate concepts by considering the logical forms of the atomics concepts.</p>
+  </li>
+  <li>
+    <p>In theory, the search space would explode exponentially. In practice, it is restricted to explanations with at most $N$ atomics concepts, and beam search is performed (instead of exhaustive search).</p>
+  </li>
+</ul>
+
+<h2 id="setup">Setup</h2>
+
+<ul>
+  <li>
+    <p><strong>Image Classification Setup</strong></p>
+
+    <ul>
+      <li>
+        <p>Neurons from the final 512-unit convolutional layer of a ResNet-18 trained on the <a href="https://ieeexplore.ieee.org/abstract/document/7968387">Places365 dataset</a>.</p>
+      </li>
+      <li>
+        <p>Probing for concepts from <a href="https://openaccess.thecvf.com/content_cvpr_2017/html/Zhou_Scene_Parsing_Through_CVPR_2017_paper.html">ADE20k scenes dataset</a> with atomic concepts defined by annotations in the <a href="http://netdissect.csail.mit.edu/">Broden dataset</a></p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p><strong>NLI Setup</strong></p>
+
+    <ul>
+      <li>
+        <p>BiLSTM baseline followed by MLP layers trained on <a href="https://nlp.stanford.edu/projects/snli/">Stanford Natural Language Inference (SNLI) corpus</a>.</p>
+      </li>
+      <li>
+        <p>Probing the penultimate hidden layer (of the MLP component) for sentence-level explanations.</p>
+      </li>
+      <li>
+        <p>Concepts are created using the 2000 most common words in the validation split of the SNLI dataset.</p>
+      </li>
+      <li>
+        <p>Additional concepts are created based on the lexical overlap between premise and hypothesis.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="do-neurons-learn-compositional-concepts">Do neurons learn compositional concepts</h2>
+
+<ul>
+  <li>
+    <p><strong>Image Classification Setup</strong></p>
+
+    <ul>
+      <li>
+        <p>As $N$ increases, the mean IoU increases (i.e., the explanation quality increases) though the returns become diminishing beyond $N=10$.</p>
+      </li>
+      <li>
+        <p>Manual inspection of 128 neurons and their length 10 explanations show that 69% neurons learned some meaningful combination of concepts, while 31% learned some unrelated concepts.</p>
+      </li>
+      <li>
+        <p>The meaningful combination of concepts include:</p>
+
+        <ul>
+          <li>
+            <p>perceptual abstraction that is also lexically coherent (e.g., “skyscraper OR lighthouse OR water tower”).</p>
+          </li>
+          <li>
+            <p>perceptual abstraction that is not lexically coherent (e.g., “cradle OR autobus OR fire escape”).</p>
+          </li>
+          <li>
+            <p>specialized abstraction of the form L1 AND NOT L2 (e.g. (water OR river) AND NOT blue).</p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p><strong>NLI Setup</strong></p>
+
+    <ul>
+      <li>
+        <p>As $N$ increases, the mean IoU increases (as in the image classification setup) though the IoU keeps increasing past $N=30$.</p>
+      </li>
+      <li>
+        <p>Many neurons correspond to lexical features. For example, some neurons are gender-sensitive or activate for verbs like sitting, eating or sleeping. Some neurons are activated when the lexical overlap between premise and hypothesis is high.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="do-interpretable-neurons-contribute-to-model-accuracy">Do interpretable neurons contribute to model accuracy?</h2>
+
+<ul>
+  <li>
+    <p>In image classification setup, the more interpretable the neuron is, the more accurate is the model (when the neuron is active).</p>
+  </li>
+  <li>
+    <p>However, the opposite trend is seen in NLI models. i.e., the more interpretable neurons are less accurate.</p>
+  </li>
+  <li>
+    <p>Key takeaway - interpretability (as measured by the paper) is not correlated with performance. Given a concept space, the identified behaviors may be correlated or anti-correlated with the model’s performance.</p>
+  </li>
+</ul>
+
+<h2 id="targeting-explanations-to-change-model-behavior">Targeting explanations to change model behavior</h2>
+
+<ul>
+  <li>
+    <p>The idea is to construct examples that activate (or inhibit) certain neurons, causing a change in the model’s predictions.</p>
+  </li>
+  <li>
+    <p>These adversarial examples are referred to as “copy-paste” adversarial examples.</p>
+  </li>
+  <li>
+    <p>For example, the neuron corresponding to “(water OR river) AND (NOT blue)” is a major contributor for detecting “swimming hole” classes. An adversarial example is created by making the water blue. This prompts the model to predict “grotto” instead of “swimming hole.”</p>
+  </li>
+  <li>
+    <p>Similarly, in the NLI model, a neuron detects the word “nobody” in the hypothesis as highly indicative of contradiction. An adversarial example can be created by adding the word “nobody” to the hypothesis, prompting the model to predict contradiction while the true label should be neutral.</p>
+  </li>
+  <li>
+    <p>These observations support the hypothesis that one can use explanations to create adversarial examples.</p>
+  </li>
+</ul>
diff --git a/_site/site/2021/01/11/GPipe-Easy-Scaling-with-Micro-Batch-Pipeline-Parallelism.html b/_site/site/2021/01/11/GPipe-Easy-Scaling-with-Micro-Batch-Pipeline-Parallelism.html
new file mode 100644
index 00000000..02fc15f4
--- /dev/null
+++ b/_site/site/2021/01/11/GPipe-Easy-Scaling-with-Micro-Batch-Pipeline-Parallelism.html
@@ -0,0 +1,70 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper introduces GPipe, a pipeline parallelism library for scaling networks that can be expressed as a sequence of layers.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1811.06965">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="design">Design</h2>
+
+<ul>
+  <li>
+    <p>Consider training a deep neural network with <em>L</em> layers using <em>K</em> accelerators (say GPUs).</p>
+  </li>
+  <li>
+    <p>Each of the <em>i<sup>th</sup></em> layer has its <em>forward</em> function <em>f<sub>i</sub></em>, <em>backward</em> function <em>b<sub>i</sub></em>, weights <em>w<sub>i</sub></em> and a cost <em>c<sub>i</sub></em> (say the memory footprint or computational time).</p>
+  </li>
+  <li>
+    <p>GPipe partitions this network into <em>K</em> cells and places the <em>i<sup>th</sup></em> cell on the <em>i<sup>th</sup></em> accelerator. Output from the <em>i<sup>th</sup></em> accelerator is passed to the <em>i+1<sup>th</sup></em> accelerator as input.</p>
+  </li>
+  <li>
+    <p>During the forward pass, the input batch (of size <em>N</em>) is divided into <em>M</em> equal micro-batches. These micro-batches are pipelined through the <em>K</em> accelerators one after another.</p>
+  </li>
+  <li>
+    <p>During the backward pass, gradients are computed for each micro-batch. The gradients are accumulated and applied at the end of each minibatch.</p>
+  </li>
+  <li>
+    <p>In batch normalization, the statistics are computed over each micro-batch (used during training) and mini-batch (used during evaluation).</p>
+  </li>
+  <li>
+    <p>Micro-batching improves over the naive mode parallelism approach by reducing the underutilization of resources (due to the network’s sequential dependencies).</p>
+  </li>
+</ul>
+
+<h2 id="performance-optimization">Performance Optimization</h2>
+
+<ul>
+  <li>
+    <p>GPipe supports re-materialization (or checkpointing), i.e., during the forward pass, only the output activations (at partition boundaries) are stored.</p>
+  </li>
+  <li>
+    <p>During backward pass, the forward function is recomputed at each accelerator. This trades off the memory requirement with increased time.</p>
+  </li>
+  <li>
+    <p>One potential downside is that partitioning can introduce some idle time per accelerator (referred to as the bubble overhead). However, with a sufficiently large number of micro-batches ( more than 4 times the number of partitions), the bubble overhead is negligible.</p>
+  </li>
+</ul>
+
+<h2 id="performance-analysis">Performance Analysis</h2>
+
+<ul>
+  <li>
+    <p>Two different types of model architectures are compared: AmoebaNet convolutional model and Transformer sequence-to-sequence model.</p>
+  </li>
+  <li>
+    <p>For AmoebaNet, the size of the largest trainable model (on a single 8GB Cloud TPU v2) increases from 82M to 318M. Further, a 1.8 billion parameter model can be trained on 8 accelerators (25x improvement in size using GPipe).</p>
+  </li>
+  <li>
+    <p>For transformers, GPipe scales the model size to 83.9 B parameters with 128 partitions (298x improvement in size compared to a single accelerator).</p>
+  </li>
+  <li>
+    <p>Since the computation is evenly distributed across transformer layers, the training throughput scales almost linearly with the number of devices.</p>
+  </li>
+  <li>
+    <p>Quantitative experiments on ImageNet and multilingual machine translation show that models can be effectively trained using GPipe.</p>
+  </li>
+</ul>
diff --git a/_site/site/2021/01/18/Energy-based-Models-for-Continual-Learning.html b/_site/site/2021/01/18/Energy-based-Models-for-Continual-Learning.html
new file mode 100644
index 00000000..3ac8b8ab
--- /dev/null
+++ b/_site/site/2021/01/18/Energy-based-Models-for-Continual-Learning.html
@@ -0,0 +1,134 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper proposes to use Energy-based Models (EBMs) for Continual Learning.</p>
+  </li>
+  <li>
+    <p>In classification tasks, the standard approach uses a cross-entropy objective function along with a normalized probability distribution.</p>
+  </li>
+  <li>
+    <p>However, cross-entropy reduces all negative classes’ likelihood when updating the model for a given sample, potentially leading to catastrophic forgetting.</p>
+  </li>
+  <li>
+    <p>Classification can be seen as learning an EBM across separate classes.</p>
+  </li>
+  <li>
+    <p>During an update, the energy for a pair of samples and its ground truth class decreases while the energy corresponding to the pairs of sample and negative classes increases.</p>
+  </li>
+  <li>
+    <p>Unlike the cross-entropy loss, EBMs allow choosing the negative classes to update.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/2011.12216">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="applications-of-ebms-for-continual-learning">Applications of EBMs for Continual Learning</h2>
+
+<ul>
+  <li>
+    <p>EBMs can be used for class-incremental learning without requiring a replay-buffer or generative model for replay.</p>
+  </li>
+  <li>
+    <p>EBMs can be used for continual learning in setups without task boundaries, i.e., setups where the data distribution can change without a clear separation between tasks.</p>
+  </li>
+</ul>
+
+<h2 id="ebms">EBMs</h2>
+
+<ul>
+  <li>
+    <p>Boltzman distribution is used to define the conditional likelihood of label $y$, given an input $x$. ie, $p(y|x) = \frac{exp(E(x, y))}{Z(x)}$ where $Z(x) = \sum_{y \in Y}(-E(x, y))$. Here $E$ is the learnt energy function that maps an input-label pair to a scalar energy value.</p>
+  </li>
+  <li>
+    <p>During training, the contrastive divergence loss is used.</p>
+  </li>
+  <li>
+    <p>During inference, the class, for which the input-class pair has the least energy, is selected as the predicted class.</p>
+  </li>
+</ul>
+
+<h2 id="ebms-for-continual-learning">EBMs for Continual Learning</h2>
+
+<h3 id="selection-of-negative-samples">Selection of Negative Samples</h3>
+
+<ul>
+  <li>
+    <p>The paper considers several strategies for the selection of negative samples:</p>
+
+    <ul>
+      <li>
+        <p>one negative class per sample. The negative class is sampled from the current batch of data. This selection approach performs best.</p>
+      </li>
+      <li>
+        <p>all the negative classes in a batch are used for creating the negative samples.</p>
+      </li>
+      <li>
+        <p>all the classes seen so far in training are used as the negative samples. This approach works the worst in practice.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Given the flexibility of sampling the negative classes, EBMs can be used in the boundary-agnostic setups (where the data distribution can change smoothly without an explicit task boundary).</p>
+  </li>
+</ul>
+
+<h3 id="energy-network">Energy Network</h3>
+
+<ul>
+  <li>
+    <p>EBMs take both the sample and the class as the input. The class can be treated as an attention filter to select the most relevant information between the sample and the class.</p>
+  </li>
+  <li>
+    <p>In theory, EBMs can train for any number of classes without knowing the number of classes beforehand. This is an advantage over the softmax-based approaches, where adding new classes requires changing the size of the softmax output layer.</p>
+  </li>
+</ul>
+
+<h3 id="inference">Inference</h3>
+
+<ul>
+  <li>During inference, all the classes seen so far are evaluated via the energy function. The class, which corresponds to the least energy sample-class pair, is returned as the selected class.</li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<h3 id="datasets">Datasets</h3>
+
+<ul>
+  <li>
+    <p>Split MNIST</p>
+  </li>
+  <li>
+    <p>Permuted MNIST</p>
+  </li>
+  <li>
+    <p>CIFAR-10</p>
+  </li>
+  <li>
+    <p>CIFAR-100</p>
+  </li>
+</ul>
+
+<h3 id="results-in-boundary-aware-setting">Results in Boundary-Aware Setting</h3>
+
+<ul>
+  <li>
+    <p>The paper outperforms the standard continual learning approaches that neither uses a replay-buffer nor a generative model.</p>
+  </li>
+  <li>
+    <p>Additionally, the paper shows that for the same number of parameters, the effective capacity of EMB models is higher than the effective capacity of standard classification models.</p>
+  </li>
+  <li>
+    <p>The paper also shows that standard classification models tend to assign a high probability to new classes for both old and new data. EBMs assign the probability more uniformly (and correctly) across the classes.</p>
+  </li>
+  <li>
+    <p>In an ablation study, the paper shows that both label conditioning and contrastive divergence loss help in improving the performance of EBMs.</p>
+  </li>
+</ul>
+
+<h3 id="results-in-boundary-agnostic-setting">Results in Boundary-Agnostic Setting</h3>
+
+<ul>
+  <li>The performance gains in the boundary-agnostic setting are even more significant than the improvements in the boundary-aware setting.</li>
+</ul>
diff --git a/_site/site/2021/01/25/HyperNetworks.html b/_site/site/2021/01/25/HyperNetworks.html
new file mode 100644
index 00000000..e50b0f77
--- /dev/null
+++ b/_site/site/2021/01/25/HyperNetworks.html
@@ -0,0 +1,83 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper explores HyperNetworks. The idea is to use one network (HyperNetwork) to generate the weights for another network.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1609.09106">Link to the paper</a></p>
+  </li>
+  <li>
+    <p><a href="https://github.com/hardmaru/supercell/blob/master/supercell.py">Author’s implementation</a></p>
+  </li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<h3 id="static-hypernetworks---hypernetworks-for-cnns">Static HyperNetworks - HyperNetworks for CNNs</h3>
+
+<ul>
+  <li>
+    <p>Consider a $D$ layer CNN where the parameters for the $j^{th}$ layer are stored in a matrix $K^j$ of the shape $N_{in}f_{size} \times N_{out}f_{size}$.</p>
+  </li>
+  <li>
+    <p>The HyperNetwork is implemented as a two-layer linear network where the input is a layer embedding $z^j$, and the output is $K^j$.</p>
+  </li>
+  <li>
+    <p>The first layer (of the HyperNetwork) maps the input to $N_{in}$ different outputs using $N_{in}$ weight matrices.</p>
+  </li>
+  <li>
+    <p>The second layer maps the different $N_{in}$ inputs to $K_{i}$ using a shared matrix. The resulting $N_{in}$ (number of) $K_{i}$ matrices are concatenated to obtain $K^j$.</p>
+  </li>
+  <li>
+    <p>As a side note, HyperNetworks have much fewer params than the network for which it produces weights.</p>
+  </li>
+  <li>
+    <p>In a general case, the kernel dimensions (across layers) are not of the same size but integer multiples of some basic sizes. In that case, the HyperNetwork can generate kernels for the basic size, which can be concatenated to form larger kernels. This would require additional input embeddings but not require a change in the architecture of HyperNetwork.</p>
+  </li>
+</ul>
+
+<h3 id="dynamic-hypernetworks---hypernetworks-for-rnns">Dynamic HyperNetworks - HyperNetworks for RNNs</h3>
+
+<ul>
+  <li>
+    <p>HyperRNNs/HyperLSTMs denote HyperNetworks that generates weights for RNNs/LSTMs.</p>
+  </li>
+  <li>
+    <p>HyperRNNs implement a form of relaxed weight sharing - an alternative to the full weight sharing of the traditional RNNs.</p>
+  </li>
+  <li>
+    <p>At any timestamp $t$, the input to the HyperRNN is the concatenated vector $x_{t}$ (input to the RNN at time $t$) and the hidden state $h_{t-1}$ of the RNN. The output is the weight for the main RNN at timestep $t$.</p>
+  </li>
+  <li>
+    <p>In practice, a <em>weight scaling vector</em> $d$ is used to reduce the memory footprint, which would otherwise be $dim$ times the memory of a standard RNN. $dim$ is the dimensionality of the embedding vector $z_j$.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>HyperNetworks are used to train standard CNNs for MNIST and ResNets for CIFAR 10. In these experiments, HyperNetworks slightly underperform the best performing models but uses much fewer parameters.</p>
+  </li>
+  <li>
+    <p>HyperLSTMs trained on the Penn Treebank dataset and Hutter Prize Wikipedia dataset outperform the stacked LSTMs and perform similar to layer-norm LSTMs. Interestingly, using HyperLSTMs with layer-norm improves performance over HyperLSTMs.</p>
+  </li>
+  <li>
+    <p>Given the similar performance of HyperLSTMs and layer-norm LSTMs, the paper conducted an ablation study to understand if HyperLSTMs learned a weight adjustment policy similar to the statistics-based approach used by layer-norm LSTMs.</p>
+
+    <ul>
+      <li>However, the analysis of the histogram of the hidden states suggests that using layer-norm reduces the saturation effect while in HyperLSTMs, the cell is saturated most of the time. This indicates that the two models are learning different policies.</li>
+    </ul>
+  </li>
+  <li>
+    <p>HyperLSTMs are also evaluated for handwriting sequence generation by training in the IAM online handwriting dataset.</p>
+
+    <ul>
+      <li>While HyperLSTMs are quite effective on this task, combining them with layer-norm degrades the performance.</li>
+    </ul>
+  </li>
+  <li>
+    <p>On the WMT’14 En-to-Fr machine translation task, HyperLSTMs outperform LSTM based approaches.</p>
+  </li>
+</ul>
diff --git a/_site/site/2021/02/01/Zero-shot-Learning-by-Generating-Task-specific-Adapters.html b/_site/site/2021/02/01/Zero-shot-Learning-by-Generating-Task-specific-Adapters.html
new file mode 100644
index 00000000..59cbe826
--- /dev/null
+++ b/_site/site/2021/02/01/Zero-shot-Learning-by-Generating-Task-specific-Adapters.html
@@ -0,0 +1,117 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper introduces HYPTER - a framework for zero-shot learning (ZSL) in text-to-text transformer models by training a <a href="https://shagunsodhani.com/papers-I-read/HyperNetworks"><strong>Hyp</strong>erNetwork</a> to generate task-specific <a href="https://arxiv.org/abs/1902.00751">adap<strong>ter</strong>s</a> from task descriptions.</p>
+  </li>
+  <li>
+    <p>The focus is on <em>in-task</em> zero-shot learning (e.g., learning to predict an unseen class or relation) and not on <em>cross-task</em> learning (e.g., training on sentiment analysis and evaluating on question-answering task).</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/2101.00420">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="terminology">Terminology</h2>
+
+<ul>
+  <li>
+    <p><em>Task</em> - a NLP task, like classification or question answering.</p>
+  </li>
+  <li>
+    <p><em>Sub-task</em></p>
+
+    <ul>
+      <li>
+        <p>A class/relation/question within a task.</p>
+      </li>
+      <li>
+        <p>Denotes by a tuple $(d, D)$ where $d$ is the language description while $D$ represents the subtask’s dataset.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="setup">Setup</h2>
+
+<ul>
+  <li>Develop ZSL approach for transfer to new subtasks within a task, using the task description available for each subtask.</li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<ul>
+  <li>
+    <p>HYPTER has two main parts:</p>
+
+    <ul>
+      <li>
+        <p>Main network</p>
+
+        <ul>
+          <li>
+            <p>A pretrained text-to-text network</p>
+          </li>
+          <li>
+            <p>Instantiated as a BERT-Base/Large</p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>HyperNetwork</p>
+
+        <ul>
+          <li>Generates the weights for adapter networks that will be plugged into the main network.</li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>HyperNetwork has two parts:</p>
+
+    <ul>
+      <li>
+        <p>Encoder</p>
+
+        <ul>
+          <li>
+            <p>Encodes the task description</p>
+          </li>
+          <li>
+            <p>Instantiated as a RoBERTa-Base model</p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>Decoder</p>
+
+        <ul>
+          <li>
+            <p>Decodes the encoding into weights for multiple adapters (in parallel)</p>
+          </li>
+          <li>
+            <p>Instantiated as a Feedforward Network</p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>The model trains in two phases:</p>
+
+    <ul>
+      <li>
+        <p>Main network is trained on all the data by concatenating the task description with the input.</p>
+      </li>
+      <li>
+        <p>Adapters are trained by sampling a task from the train set while keeping the main network frozen.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>While the idea is very promising and interesting, the evaluation felt quite limited. It uses just two datasets <a href="https://leaderboard.allenai.org/zest/submissions/public">Zero-shot learning from Task Descriptions</a> and <a href="https://eval.ai/web/challenges/challenge-page/689/overview">Zero-shot Relation Extraction</a> and shows some improvements over the baseline of directly finetuning with task descriptions as the prompt.</li>
+</ul>
diff --git a/_site/site/2021/02/08/Continual-learning-with-hypernetworks.html b/_site/site/2021/02/08/Continual-learning-with-hypernetworks.html
new file mode 100644
index 00000000..e16efe1d
--- /dev/null
+++ b/_site/site/2021/02/08/Continual-learning-with-hypernetworks.html
@@ -0,0 +1,120 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper proposes the use of task-conditioned <a href="https://shagunsodhani.com/papers-I-read/HyperNetworks">HyperNetworks</a> for lifelong learning / continual learning setups.</p>
+  </li>
+  <li>
+    <p>The idea is, the HyperNetwork would only need to remember the task-conditioned weights and not the input-output mapping for all the data points.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1906.00695">Link to the paper</a></p>
+  </li>
+  <li>
+    <p><a href="https://github.com/chrhenning/hypercl">Author’s Implementation</a></p>
+  </li>
+</ul>
+
+<h2 id="terminology">Terminology</h2>
+
+<ul>
+  <li>
+    <p>$f$ denotes the network for the given $t^{th}$ task.</p>
+  </li>
+  <li>
+    <p>$h$ denotes the HyperNetwork that generates the weights for $f$.</p>
+  </li>
+  <li>
+    <p>$\Theta_{h}$ denotes the parameters of $h$.</p>
+  </li>
+  <li>
+    <p>$e^{t}$ denotes the input task-embedding for the $t^{th}$ task.</p>
+  </li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<ul>
+  <li>
+    <p>When training on the $t^{th}$ task, the HyperNetworks generates the weights for the network $f$.</p>
+  </li>
+  <li>
+    <p>The current task loss is computed using the generated weights, and the candidate weight update ($\Delta \Theta_{h}$) is computed for $h$.</p>
+  </li>
+  <li>
+    <p>The actual parameter change is computed by the following expression:</p>
+  </li>
+</ul>
+
+<p>$L_{total} = L{task}(\Theta_{h}, e^{T}, X^{T}, Y^{T}) + \frac{\beta_{output}}{T-1} \sum_{t=1}^{T-1} | f_{h}(e^{t}, \Theta_{h}^*) - f_{h}(e^{(t)}, \Theta_{h} + \Delta \Theta_{h} ))|^2$</p>
+
+<ul>
+  <li>
+    <p>$L_{task}$ is the loss for the current task.</p>
+  </li>
+  <li>
+    <p>$(X^{T}, Y^{T})$ denotes the training datapoints for the $T^{th}$ task.</p>
+  </li>
+  <li>
+    <p>$\beta_{output}$ is a hyperparameter to control the regularizer’s strength.</p>
+  </li>
+  <li>
+    <p>$\Theta_{h}^*$ denotes the optimal parameters after training on the $T-1$ tasks.</p>
+  </li>
+  <li>
+    <p>$\Theta_{h} + \Delta \Theta_{h}$ denotes the one-step update on the current $h$ model.</p>
+  </li>
+  <li>
+    <p>In practice, the task encoding $e^{t}$ is chunked into smaller vectors, and these vectors are fed as input to the HyperNetwork.</p>
+  </li>
+  <li>
+    <p>This enables the HyperNetwork to produce weights iteratively, instead of all at once, thus helping to scale to larger models.</p>
+  </li>
+  <li>
+    <p>The paper also considers the problem of inferring the task embedding from a given input pattern.</p>
+  </li>
+  <li>
+    <p>Specifically, the paper uses task-dependent uncertainty, where the task embedding with the least predictive uncertainty is chosen as the task embedding for the given unknown task. This approach is referred to as HNET+ENT.</p>
+  </li>
+  <li>
+    <p>The paper also considers using HyperNetworks to learn the weights for a task-specific generative model. This generative model will be used to generate pseudo samples for rehearsal-based approaches. The paper considers two cases:</p>
+
+    <ul>
+      <li>
+        <p>HNET+R where the replay model (i.e., the generative model) is parameterized using a HyperNetwork.</p>
+      </li>
+      <li>
+        <p>HNET+TIR, where an auxiliary task inference classifier is used to predict the task identity.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>Three setups are considered</p>
+
+    <ul>
+      <li>
+        <p>CL1 - Task identity is given to the model.</p>
+      </li>
+      <li>
+        <p>CL2 - Task identity is not given, but task-specific heads are used.</p>
+      </li>
+      <li>
+        <p>CL3 - Task identity needs to be explicitly inferred.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>On the permuted MNIST task, the proposed approach outperforms baselines like Synaptic Intelligence and Online EWC, and the performance gap is more significant for larger task sequences.</p>
+  </li>
+  <li>
+    <p>Forward knowledge transfer is observed with the CIFAR datasets.</p>
+  </li>
+  <li>
+    <p>One potential limitation (which is more of a limitation of HyperNetworks) is that HyperNetworks may be harder to scale for larger models like ResNet50 or transformers, thus limiting their usefulness for lifelong learning use cases.</p>
+  </li>
+</ul>
diff --git a/_site/site/2021/02/15/When-Do-Curricula-Work.html b/_site/site/2021/02/15/When-Do-Curricula-Work.html
new file mode 100644
index 00000000..485b58c1
--- /dev/null
+++ b/_site/site/2021/02/15/When-Do-Curricula-Work.html
@@ -0,0 +1,114 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper systematically investigates when does curriculum learning help.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/2012.03107">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="implicit-curricula">Implicit Curricula</h2>
+
+<ul>
+  <li>
+    <p>Implicit curricula refers to the order in which a network learns data points when trained using stochastic gradient descent, with iid sampling of data.</p>
+  </li>
+  <li>
+    <p>When training, let us say that the model makes a correct prediction for a given datapoint in the $i^{th}$ epoch (and correct prediction in all the subsequent epochs). The $i^{th}$ epoch is referred to as the <em>learned iteration</em> of the datapoint  (iteration in which the datapoint was learned).</p>
+  </li>
+  <li>
+    <p>The paper studied multiple models (VGG, ResNet, WideResNet, DenseNet, and EfficientNet) with different optimizers (Adam and SGD with momentum).</p>
+  </li>
+  <li>
+    <p>The resulting implicit curricula are broadly consistent within the model families, making the following discussion less dependent on the model architecture.</p>
+  </li>
+</ul>
+
+<h2 id="explicit-curricula">Explicit Curricula</h2>
+
+<ul>
+  <li>When defining an explicit curriculum, three important components stand out.</li>
+</ul>
+
+<h3 id="scoring-function">Scoring Function</h3>
+
+<ul>
+  <li>
+    <p>Maps a data point to a numerical score of <em>difficulty</em>.</p>
+  </li>
+  <li>
+    <p>Choices:</p>
+
+    <ul>
+      <li>
+        <p>Loss function for a model</p>
+      </li>
+      <li>
+        <p><em>learned iteration</em></p>
+      </li>
+      <li>
+        <p>Estimated c-score - It captures a given model’s consistency to correctly predict a given datapoint’s label when trained on an iid dataset (not containing the datapoint).</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>The three scoring functions are computed for two models on the CIFAR dataset.</p>
+  </li>
+  <li>
+    <p>The resulting six scores have a high Spearman Rank correlation. Hence for the rest of the discussion, only the c-score is used.</p>
+  </li>
+</ul>
+
+<h3 id="pacing-function">Pacing Function</h3>
+
+<ul>
+  <li>
+    <p>This function, denoted by $g(t)$, controls the size of the training dataset at step $t$.</p>
+  </li>
+  <li>
+    <p>At step $t$, the model would be trained on the first $g(t)$ examples (as per the ordering).</p>
+  </li>
+  <li>
+    <p>Choices: logarithmic, exponential, step, linear, quadratic, and root.</p>
+  </li>
+</ul>
+
+<h3 id="order">Order</h3>
+
+<ul>
+  <li>
+    <p>Order in which the data points are picked:</p>
+
+    <ul>
+      <li>
+        <p><em>Curriculum</em> - Ordering points from lowest score to highest and training on the easiest data points first.</p>
+      </li>
+      <li>
+        <p><em>Anti Curriculum</em> - Ordering points from highest score to lowest and training on the hardest data points first.</p>
+      </li>
+      <li>
+        <p><em>Random</em> - Randomly selecting the data points to train on.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="observations">Observations</h2>
+
+<ul>
+  <li>
+    <p>The paper performed a hyperparameter sweep over 180 pacing functions and three orderings for three random seeds over the CIFAR10 and CIFAR100 datasets. For both the datasets, the best performance is obtained with random ordering, indicating that curricula did not give any benefits.</p>
+  </li>
+  <li>
+    <p>However, the curriculum is useful when the number of training iterations is small.</p>
+  </li>
+  <li>
+    <p>It also helps with noisy data training (which is simulated by randomly permuting the labels).</p>
+  </li>
+  <li>
+    <p>The observations for the smaller CIFAR10/100 dataset generalize to slightly larger datasets like FOOD101 and FOOD101N.</p>
+  </li>
+</ul>
+
diff --git a/_site/site/2021/02/22/Anatomy-of-Catastrophic-Forgetting-Hidden-Representations-and-Task-Semantics.html b/_site/site/2021/02/22/Anatomy-of-Catastrophic-Forgetting-Hidden-Representations-and-Task-Semantics.html
new file mode 100644
index 00000000..7beb2a80
--- /dev/null
+++ b/_site/site/2021/02/22/Anatomy-of-Catastrophic-Forgetting-Hidden-Representations-and-Task-Semantics.html
@@ -0,0 +1,182 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper studies the effect of catastrophic forgetting on representations in neural networks.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/2007.07400">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="setup">Setup</h2>
+
+<ul>
+  <li>
+    <p>Techniques:</p>
+
+    <ul>
+      <li>
+        <p>Representational Similarity Measures</p>
+      </li>
+      <li>
+        <p>Layer Freezing</p>
+      </li>
+      <li>
+        <p>Layer Reset</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Datasets</p>
+
+    <ul>
+      <li>
+        <p>Split CIFAR-10</p>
+
+        <ul>
+          <li>
+            <p>CIFAR-10 dataset is split into <em>m</em> (=2) tasks, where each task is a <em>n</em> way classification task.</p>
+          </li>
+          <li>
+            <p>The underlying network has a shared trunk with <em>m</em> heads, one head per task.</p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>Split CIFAR-100 Distribution Shift</p>
+
+        <ul>
+          <li>Each task requires distinguishing between <em>n</em> CIFAR-100 <em>superclasses</em> with training/test data corresponding to a <em>subset</em> of constituent classes.</li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Network Architecture</p>
+
+    <ul>
+      <li>VGG, ResNet and DenseNet</li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="questions">Questions</h2>
+
+<ul>
+  <li>
+    <p>Are all representations (throughout the network) equally responsible for forgetting?</p>
+
+    <ul>
+      <li>
+        <p><em>Higher</em> layer (layers closer to the output) are the primary source of catastrophic forgetting.</p>
+      </li>
+      <li>
+        <p><a href="https://arxiv.org/abs/1905.00414">Central Kernel Alignment (CKA)</a> technique is used to compare the similarity between the layer representations, before and after training on the second task.</p>
+      </li>
+      <li>
+        <p>Higher layer representations change significantly when training over two tasks while the lower layer representations remain stable.</p>
+      </li>
+      <li>
+        <p>When finetuning on the second task, freezing the lower layers has only a minor effect on the accuracy of the second task.</p>
+      </li>
+      <li>
+        <p>In <em>layer reset</em> experiments, after training on the second task, the weights of some of the layers are reset to their values after training on the first task.</p>
+
+        <ul>
+          <li>Resetting the weights of higher layers leads to significant improvement in the performance on the first task.</li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Do common approaches for countering catastrophic forgetting work by stabilizing the higher layers?</p>
+
+    <ul>
+      <li>
+        <p>Yes - both <a href="https://arxiv.org/abs/1612.00796">EWC</a> and replay-based approaches counter catastrophic forgetting work by stabilizing the higher layers.</p>
+      </li>
+      <li>
+        <p>This is demonstrated by showing that as the quadratic penalty for EWC (or fraction of data from replay buffer) increases (to reduce catastrophic forgetting), the representations for higher layers change less during the second task.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>When training over a sequence of tasks, are similar tasks more likely to be forgotten than different tasks?</p>
+
+    <ul>
+      <li>
+        <p>Setup I</p>
+
+        <ul>
+          <li>
+            <p>Training over a sequence of two binary classification tasks.</p>
+          </li>
+          <li>
+            <p>Task 1: Two related classes (say <code class="language-plaintext highlighter-rouge">ship</code> and <code class="language-plaintext highlighter-rouge">truck</code>)</p>
+          </li>
+          <li>
+            <p>Task 2: Two related classes, which may or may not be related to the classes for Task 1. For example, the classes could be</p>
+
+            <ul>
+              <li>
+                <p><code class="language-plaintext highlighter-rouge">cat</code> and <code class="language-plaintext highlighter-rouge">horse</code> (not related to classes of the first task)</p>
+              </li>
+              <li>
+                <p><code class="language-plaintext highlighter-rouge">plane</code> and <code class="language-plaintext highlighter-rouge">car</code> (related to the classes of the first task)</p>
+              </li>
+            </ul>
+          </li>
+          <li>
+            <p>Training over semantically similar tasks (here <code class="language-plaintext highlighter-rouge">plane</code> and <code class="language-plaintext highlighter-rouge">car</code>) leads to less forgetting.</p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>Setup II</p>
+
+        <ul>
+          <li>
+            <p>Training over a sequence of two classification tasks.</p>
+          </li>
+          <li>
+            <p>Task 1: Four classes where the classes can be grouped into two groups (say <code class="language-plaintext highlighter-rouge">deer</code>, <code class="language-plaintext highlighter-rouge">dog</code>, <code class="language-plaintext highlighter-rouge">ship</code> and <code class="language-plaintext highlighter-rouge">truck</code>)</p>
+          </li>
+          <li>
+            <p>Task 2: Two related classes, which may be related to group 1 or group 2. For example, the classes could be two animals or two objects.</p>
+          </li>
+          <li>
+            <p>After training on the second task, classes (from Task 1), which are in the different group as classes from Task 2, are forgotten less.</p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>Conclusion</p>
+
+        <ul>
+          <li>
+            <p>Task representational similarity is a function of both underlying data and optimization procedure.</p>
+          </li>
+          <li>
+            <p>Forgetting is most severe for task representations of intermediate similarity.</p>
+          </li>
+          <li>
+            <p>Representational similarity is necessary but not a sufficient condition for forgetting.</p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>How does catastrophic forgetting change as the task similarity changes?</p>
+
+    <ul>
+      <li>
+        <p>If the model learns different representations for dissimilar tasks, increasing dissimilarity can help to avoid forgetting.</p>
+      </li>
+      <li>
+        <p>When training the two-task, two-class (per task) CIFAR-10 setup with an “others” class (classes not already used in the setup), the forgetting is reduced.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
diff --git a/_site/site/2021/03/01/Ad-Click-Prediction-a-View-from-the-Trenches.html b/_site/site/2021/03/01/Ad-Click-Prediction-a-View-from-the-Trenches.html
new file mode 100644
index 00000000..612ae485
--- /dev/null
+++ b/_site/site/2021/03/01/Ad-Click-Prediction-a-View-from-the-Trenches.html
@@ -0,0 +1,144 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper presents case studies from the experience of deploying an ad click-through rate (CTR) prediction model at Google.</p>
+  </li>
+  <li>
+    <p>The paper focuses on themes related to memory footprint, performance analysis, calibration, confidence in the predictions, and feature engineering.</p>
+  </li>
+  <li>
+    <p><a href="https://research.google/pubs/pub41159/">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="system-overview">System Overview</h2>
+
+<ul>
+  <li>
+    <p>Features (corresponding to a given ad) include search query and the metadata in the ad. The features are very sparse.</p>
+  </li>
+  <li>
+    <p>Single layer, regularized Logistic Regression model is trained with Online Gradient Descent (same as Stochastic Gradient Descent, but in the online setting).</p>
+  </li>
+  <li>
+    <p>From a memory perspective, it is important to minimize the size of the final model.</p>
+  </li>
+  <li>
+    <p>Adding just the L1 penalty is not sufficient to produce weights that are precisely equal to 0.</p>
+  </li>
+  <li>
+    <p><a href="http://proceedings.mlr.press/v15/mcmahan11b.html">“Follow The (Proximally) Regularized Leader” algorithm or FTRL-Proximal algorithm</a> is used to learn sparse models without losing on the accuracy.</p>
+  </li>
+  <li>
+    <p>Using per-coordinate learning rates improves the performance at the cost of memory as both the sum of gradients and the sum of the square of gradients are tracked for each feature.</p>
+
+    <ul>
+      <li>
+        <p>In practice, some of the cost can be alleviated by approximating that all the events containing a given feature have the same probability.</p>
+      </li>
+      <li>
+        <p>In such a case, the sum of the square of gradients can be approximated using the counts of positive and negative events alone.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Some memory overhead can be reduced based on the following observation: the vast majority of features are extremely rare. Hence, it is not necessary to track the statistics for such rare features.</p>
+
+    <ul>
+      <li>
+        <p>However, in an online setting, it is not known upfront as to which features will be sparse.</p>
+      </li>
+      <li>
+        <p>The paper proposes to use probabilistic feature inclusion - a feature is added to the model with probability $p$. Once it is added, the feature is not removed.</p>
+      </li>
+      <li>
+        <p>An alternative approach is to use a rolling set of counting Bloom filters to check if a feature has appeared at least $n$ times in training. Bloom filters are probabilistic data structures and can return false positives.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Memory can also be saved by using fewer bits for encoding weights.</p>
+
+    <ul>
+      <li>
+        <p>Most of the weight coefficients lie in the range $(-2, 2)$, and a $16-$ bit encoding is used in place of $32$ or $64$ bit encoding.</p>
+      </li>
+      <li>
+        <p>This quantization approach needs to account for roundoff problems. The fix is easy to implement.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>When training many models with similar hyperparameters, per-model learning rate counters can be replaced by statistics shared by all the models, thus reducing memory footprint.</p>
+  </li>
+  <li>
+    <p>A Single Value Structure is used to reduce the memory footprint when evaluating a very large set of model variants that differ only in addition/removal of a small subset of features.</p>
+
+    <ul>
+      <li>
+        <p>All the models, that use a feature, share a single value structure corresponding to the feature. This reduces the memory overhead by order of magnitude.</p>
+      </li>
+      <li>
+        <p>During the update, each model computes the weight updates corresponding to all the features that it is using. The updated weight is averaged across all the models and used to update the single value structure.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Since CTR datasets are generally highly imbalanced, the training data (for the negative class) can be subsampled to reduce the amount of data to train over. The loss component (corresponding to negative class) can be appropriately scaled up.</p>
+  </li>
+  <li>
+    <p>Metrics</p>
+
+    <ul>
+      <li>
+        <p>Offline metrics like AucLoss (1 - AUC), Log Loss, Squared Error</p>
+      </li>
+      <li>
+        <p>Online loss is computed on the new training data (new incoming traffic)	before training on it.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>The confidence in the model’s prediction is estimated using a heuristic called <em>uncertainty score</em>. It can be measured using the dot product of the feature and the vector of learning rates.</p>
+
+    <ul>
+      <li>
+        <p>The idea is that the learning rates already maintain a notion of uncertainty.</p>
+      </li>
+      <li>
+        <p>Features for which the learning rate is high are the features for which uncertainty is also high.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Calibrating Predictions</p>
+
+    <ul>
+      <li>
+        <p>The calibration can be improved by applying correction functions $\tau_d(p)$ where $p$ is the predicted CTR, and $d$ is an element of a partition of the training data.</p>
+      </li>
+      <li>
+        <p>$\tau$ can be modeled as $\gamma^{\kappa}$ where $\gamma$ and $\kappa$ are learned using Poisson regression.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Unsuccessful Experiments</p>
+
+    <ul>
+      <li>
+        <p>Aggressive feature hashing was tried to reduce the memory overhead. However, it leads to a significant loss in performance.</p>
+      </li>
+      <li>
+        <p>Using dropout did not help, probably because the features are sparse.</p>
+      </li>
+      <li>
+        <p>Using feature bagging hurt the AucLoss.</p>
+      </li>
+      <li>
+        <p>Feature vector normalization did not improve performance, probably because of per-coordinate learning rates and regularization.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
diff --git a/_site/site/2021/03/08/Practical-Lessons-from-Predicting-Clicks-on-Ads-at-Facebook.html b/_site/site/2021/03/08/Practical-Lessons-from-Predicting-Clicks-on-Ads-at-Facebook.html
new file mode 100644
index 00000000..0b8c6193
--- /dev/null
+++ b/_site/site/2021/03/08/Practical-Lessons-from-Predicting-Clicks-on-Ads-at-Facebook.html
@@ -0,0 +1,180 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper describes several design choices for developing a model for predicting user response (clicks) on ads.</p>
+  </li>
+  <li>
+    <p><a href="https://research.fb.com/publications/practical-lessons-from-predicting-clicks-on-ads-at-facebook/">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="experimental-setup">Experimental Setup</h2>
+
+<ul>
+  <li>
+    <p>The model is trained/evaluated on offline data.</p>
+  </li>
+  <li>
+    <p>Evaluation metrics:</p>
+
+    <ul>
+      <li>
+        <p>Normalized Cross-Entropy (or Normalized Entropy, NE)</p>
+
+        <ul>
+          <li>
+            <p>Defined as the predictive log-loss per impression, divided by the entropy of the background CTR (click-through rate).</p>
+          </li>
+          <li>
+            <p>Background CTR is the average empirical CTR of the training data.</p>
+          </li>
+          <li>
+            <p>Lower normalized cross-entropy is better.</p>
+          </li>
+          <li>
+            <p>The normalization term is important to make the metric insensitive to the background CTR. Otherwise, the log loss can easily be made low when background CTR is close to 0 or 1.</p>
+          </li>
+          <li>
+            <p>NE can also be written as $RIG - 1$, where $RIG$ is the Relative Information Gain.</p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>Calibration</p>
+
+        <ul>
+          <li>Ratio of average estimated CTR and empirical CTR.</li>
+        </ul>
+      </li>
+      <li>
+        <p>Area-Under-ROC (AUC) is a good metric for measuring ranking quality (among ads). However, it is <strong>not used</strong> as a metric to avoid over-delivery or under-delivery of ads.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="implementation-details">Implementation Details</h2>
+
+<ul>
+  <li>
+    <p>Feature Transformation</p>
+
+    <ul>
+      <li>
+        <p>A given add impression, $e$, is transformed into a $n-$dimensional vector, $x$, where the $i^{th}$ index denotes the value of the $i^{th}$ categorical feature.</p>
+      </li>
+      <li>
+        <p>Continous features are binned, and the bin index is used as a categorical feature, thus applying a non-linear transformation to the features.</p>
+      </li>
+      <li>
+        <p>Categorical features that are tuple-like (i.e., have a tuple of values) can be converted into new categorical features by taking a cartesian product.</p>
+      </li>
+      <li>
+        <p>Boosted decision trees can be used to implement the previous two transformations in one go.</p>
+
+        <ul>
+          <li>
+            <p>Each tree is used as a categorical feature that takes the value of the index of the leaf node than an ad maps to.</p>
+          </li>
+          <li>
+            <p>The paper used the Gradient Boosting Machine with the $L_2-$TreeBoost algorithm.</p>
+          </li>
+          <li>
+            <p>Using the tree feature transformation improves the Normalized Cross-Entropy by $3.4\%$.</p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Model</p>
+
+    <ul>
+      <li>
+        <p>Logistic Regression (LR) or Bayesian online learning scheme for probit regression (BOPR) algorithms are used for training a linear classifier model.</p>
+      </li>
+      <li>
+        <p>While both LR and BOPR models provide similar performance, the LR model is half the BOPR model’s size and faster for performing training/inference.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="role-of-data-freshness">Role of Data Freshness</h2>
+
+<ul>
+  <li>
+    <p>When a model is trained on the data from a particular day and evaluated on data from the subsequent days, the model’s performance degrades as the delay between training and test set increases.</p>
+  </li>
+  <li>
+    <p>This highlights the importance of the freshness of the training data.</p>
+  </li>
+  <li>
+    <p>One straightforward approach can be to train the model every day.</p>
+  </li>
+  <li>
+    <p>Alternatively, the linear classifier can be trained using online learning, while the boosted decision tree can still be trained daily.</p>
+  </li>
+  <li>
+    <p>Different choices for setting the learning rate (for online training of linear classifier) are compared, and the <a href="https://research.google/pubs/pub41159/">per-coordinate learning rate</a> is found to perform best in practice.</p>
+  </li>
+</ul>
+
+<h2 id="generating-real-time-training-data">Generating Real-Time Training Data</h2>
+
+<ul>
+  <li>
+    <p>An “online joiner” system is used to generate real-time training data for the linear classifier.</p>
+  </li>
+  <li>
+    <p>The challenging part is, while there are data points with a “positive” label (i.e., the user clicked on the ad), there are no datapoints with a “negative” label (since there is no “no-click” button that the user can click).</p>
+  </li>
+  <li>
+    <p>An impression is considered to have the “no-click” label if the user does not click on the ad within a (long) time window of seeing the ad.</p>
+  </li>
+  <li>
+    <p>Too short a time window could mislabel some impressions, while too long a time window will delay the real-time training data.</p>
+  </li>
+  <li>
+    <p>The online joiner performs a distributed stream-to-stream join on the stream of ad impressions and stream of ad clicks using a HashQueue.</p>
+  </li>
+  <li>
+    <p>A HashQueue:</p>
+
+    <ul>
+      <li>
+        <p>comprises of a First-In-First-Out queue as a buffer window and a hash map for fast random access to label impressions.</p>
+      </li>
+      <li>
+        <p>supports three operations on key-value pairs: enqueue, dequeue, and lookup.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h2 id="memory-and-latency">Memory and Latency</h2>
+
+<ul>
+  <li>
+    <p>Increasing the number of boosting trees shows diminishing returns, and most of the improvements come from the first 500 trees.</p>
+  </li>
+  <li>
+    <p>Top 10 features account for half of the total feature importance, while the last 300 features add less than 1% feature importance.</p>
+  </li>
+  <li>
+    <p>Features in the boosting model can be broadly classified as contextual or historical.</p>
+  </li>
+  <li>
+    <p>Historical feature provides much more explanatory power than the contextual features through contextual features are helpful to handle the cold start problem.</p>
+  </li>
+  <li>
+    <p>Models trained with just the contextual features rely more heavily on data freshness than models trained with just the historical features.</p>
+  </li>
+  <li>
+    <p>Uniform subsampling and negative downsampling techniques are used to limit the amount of training data.</p>
+  </li>
+  <li>
+    <p>In the case of negative downsampling, the model needs to be re-calibrated as well.</p>
+  </li>
+</ul>
diff --git a/_site/site/2021/03/15/The-Tail-at-Scale.html b/_site/site/2021/03/15/The-Tail-at-Scale.html
new file mode 100644
index 00000000..b62dd5f1
--- /dev/null
+++ b/_site/site/2021/03/15/The-Tail-at-Scale.html
@@ -0,0 +1,184 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper presents some causes for (temporary) high-latency episodes in large-scale online systems and techniques to mitigate their impact so that the tail of latency distribution remains short.</p>
+  </li>
+  <li>
+    <p><a href="https://research.google/pubs/pub40801/">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="why-does-variability-in-response-time-exist">Why does variability in response time exist</h2>
+
+<ul>
+  <li>
+    <p>Shared resources between processes on the same node</p>
+  </li>
+  <li>
+    <p>Background processes (daemons) could use cause a momentary spike in resource usage.</p>
+  </li>
+  <li>
+    <p>Processes running on different nodes may contend for global resources like shared file systems.</p>
+  </li>
+  <li>
+    <p>Maintenance activities like disk compaction or garbage collection.</p>
+  </li>
+  <li>
+    <p>Others like queueing, power limits, or energy management.</p>
+  </li>
+  <li>
+    <p>In the case of large-scale systems, the component-level variability is further amplified.</p>
+  </li>
+</ul>
+
+<h2 id="reducing-component-variability">Reducing Component Variability</h2>
+
+<ul>
+  <li>
+    <p>Use differentiated service classes to prioritize user requests over non-interactive requests.</p>
+  </li>
+  <li>
+    <p>Reduce head-of-line blocking by breaking long-running requests into smaller requests.</p>
+  </li>
+  <li>
+    <p>Synchronize maintenance jobs across nodes to minimize the window for high latency.</p>
+  </li>
+  <li>
+    <p>Caching generally does not help to address tail latency.</p>
+  </li>
+</ul>
+
+<h2 id="adapting-to-latency-variability">Adapting to Latency Variability</h2>
+
+<ul>
+  <li>
+    <p>Two categories of adaptation approaches</p>
+
+    <ul>
+      <li>
+        <p>Within Request Short-Term Adaptations</p>
+
+        <ul>
+          <li>
+            <p>These approaches are more relevant for services that perform many read queries on loosely consistent datasets.</p>
+          </li>
+          <li>
+            <p>Hedged Request</p>
+
+            <ul>
+              <li>
+                <p>Send the request to multiple replicas, and once one of the replicas returns the result, cancel the other requests.</p>
+              </li>
+              <li>
+                <p>In practice, start by sending the request to only one replica. Send the secondary requests if the first request is outstanding for more than $95^{th}$ percentile of expected latency.</p>
+              </li>
+              <li>
+                <p>This introduces an additional $5\%$ load while substantially shortening the latency tail.</p>
+              </li>
+              <li>
+                <p>This approach work because often, the cause of latency is not the query itself but other factors like overloaded nodes.</p>
+              </li>
+            </ul>
+          </li>
+          <li>
+            <p>Tied Request</p>
+
+            <ul>
+              <li>
+                <p>Hedged request approach makes a tradeoff regarding how long to wait before initiating requests to other replicas. The sooner the request is made, the lower should be the latency in serving the request, but more will be the overall load in the system.</p>
+              </li>
+              <li>
+                <p>The load in the system can be reduced by “tieing” requests (sent to different replicas) so that as soon as one replica starts processing the request, it can notify the other replicas, which could drop the request or deprioritize it.</p>
+              </li>
+              <li>
+                <p>In practice, “tieing” requests means that each replica has the identity of other replicas which may execute the request.</p>
+              </li>
+              <li>
+                <p>Note that there is a short window (of the average network message delay) when multiple replicas could start executing the request. This can be mitigated if the client (issuing the requests) introduces a delay to twice the average network message delay.</p>
+              </li>
+            </ul>
+          </li>
+          <li>
+            <p>Submit the request to the least loaded replica</p>
+
+            <ul>
+              <li>This is less effective for reasons like the load on a replica can change after the request is made but before it is executed.</li>
+            </ul>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>Cross-Request Long-Term Adaptations</p>
+
+        <ul>
+          <li>
+            <p>These approaches are more relevant for situations where different services have different throughput.</p>
+          </li>
+          <li>
+            <p>Micro-partitions</p>
+
+            <ul>
+              <li>
+                <p>Generate more paritions than the number of nodes.</p>
+              </li>
+              <li>
+                <p>The partitions can be dynamically assigned to machines to ensure proper load balancing.</p>
+              </li>
+              <li>
+                <p>In case of machine failure, many nodes can be used to quickly re-create the micro-partitions instead of waiting on one machine to read one single large partition.</p>
+              </li>
+            </ul>
+          </li>
+          <li>
+            <p>Selective Replication</p>
+
+            <ul>
+              <li>With micro-partitioning, replicas for micro-partitions can be created ahead of time to achieve good load balancing.</li>
+            </ul>
+          </li>
+          <li>
+            <p>Latency induced probation</p>
+
+            <ul>
+              <li>In some cases, removing a slow node can improve the overall latency of the system. The probated node can be re-incorporated when its latency improves.</li>
+            </ul>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Large Information Retrieval Systems</p>
+
+    <ul>
+      <li>
+        <p>In such systems, speed can be more critical than the quality of the result.</p>
+      </li>
+      <li>
+        <p>The system should return a “good enough” result that is available with low latency instead of waiting for the “best result” that is available with high latency.</p>
+      </li>
+      <li>
+        <p>In some cases, a request could trigger an unexpected code path or cause some other exception that could slow down the entire system.</p>
+      </li>
+      <li>
+        <p>In such cases, the <em>canary request</em> technique can be used where the system sends the request initially to only 1 or 2 nodes. The request is sent over to the other nodes only after receiving a successful response from the initial nodes.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Requests that update state are easier to handle for several reasons:</p>
+
+    <ul>
+      <li>
+        <p>The scale of latency-critical modifications is generally small.</p>
+      </li>
+      <li>
+        <p>The update can be performed asynchronously after responding to the user.</p>
+      </li>
+      <li>
+        <p>Quorum-based approaches (often used for ensuring consistent updates) are inherently tail-tolerant.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
diff --git a/_site/site/2021/03/22/Deep-Neural-Networks-for-YouTube-Recommendations.html b/_site/site/2021/03/22/Deep-Neural-Networks-for-YouTube-Recommendations.html
new file mode 100644
index 00000000..dfc9122d
--- /dev/null
+++ b/_site/site/2021/03/22/Deep-Neural-Networks-for-YouTube-Recommendations.html
@@ -0,0 +1,151 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper describes YouTube’s deep learning-based recommendation system.</p>
+  </li>
+  <li>
+    <p><a href="https://research.google/pubs/pub45530/">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="challenges">Challenges</h2>
+
+<ul>
+  <li>
+    <p>Scale - Very large number of users and videos.</p>
+  </li>
+  <li>
+    <p>Freshness - Very large number of videos uploaded every hour. The recommendation system should take these new videos into account as well.</p>
+  </li>
+  <li>
+    <p>Noise - User satisfaction needs to be modeled from noisy implicit feedback signal as the explicit signal is very sparse.</p>
+  </li>
+</ul>
+
+<h2 id="system-overview">System Overview</h2>
+
+<ul>
+  <li>
+    <p>Two neural networks: one for candidate generation and another one for ranking.</p>
+  </li>
+  <li>
+    <p>Metrics</p>
+
+    <ul>
+      <li>
+        <p>Offline metrics like precision, recall, ranking loss</p>
+      </li>
+      <li>
+        <p>A/B testing via live experiments</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h3 id="candidate-generation">Candidate Generation</h3>
+
+<ul>
+  <li>
+    <p>Input: events from a user’s YouTube activity history.</p>
+  </li>
+  <li>
+    <p>Output: small subset (hundreds) of videos.</p>
+  </li>
+  <li>
+    <p>Approach:</p>
+
+    <ul>
+      <li>
+        <p>Recommendation is modeled as extreme multiclass classification.</p>
+      </li>
+      <li>
+        <p>Predict the video (from a corpus) that a user will watch at a given time.</p>
+      </li>
+      <li>
+        <p>The neural network’s task is to learn useful user embeddings, given the user’s context and history.</p>
+      </li>
+      <li>
+        <p>For each positive class (relevant video), negative classes (non-relevant videos) are sampled from the video corpus.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Model Architecture</p>
+
+    <ul>
+      <li>
+        <p>A feedforward network with input as user embeddings and context embeddings (watch history).</p>
+      </li>
+      <li>
+        <p>Watch history is a variable-length sequence of video ids, where each video id is mapped to an embedding.</p>
+      </li>
+      <li>
+        <p>The sequence of video ids is mapped to a sequence of embeddings, and this sequence is averaged to obtain fixed-sized embedding.</p>
+      </li>
+      <li>
+        <p>Additional signals like demographic features and search query embeddings can be added along with the context embeddings.</p>
+      </li>
+      <li>
+        <p>The age of a video is also used as a feature during training to account for the freshness of the content. This feature is set to zero (or slightly negative) during inference.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Other Insights</p>
+
+    <ul>
+      <li>
+        <p>Training examples are generated from all YouTube watches, including the watches from the videos embedded on other sites, to surface new content.</p>
+      </li>
+      <li>
+        <p>Generating the same number of training examples per user is important to avoid a small set of active users from dominating the model training.</p>
+      </li>
+      <li>
+        <p>Predicting a user’s next watch leads to better results than predicting a randomly held-out watch. This can be attributed to the general consumption pattern of videos (e.g., episodes are usually watched in order).</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h3 id="ranking">Ranking</h3>
+
+<ul>
+  <li>
+    <p>Input: list of candidate videos to rank from.</p>
+  </li>
+  <li>
+    <p>Output: score for each video.</p>
+  </li>
+  <li>
+    <p>Approach</p>
+
+    <ul>
+      <li>A feedforward network (similar to candidate generation model) trained using logistic regression loss.</li>
+    </ul>
+  </li>
+  <li>
+    <p>Feature representation</p>
+
+    <ul>
+      <li>
+        <p>Different types of features: categorical vs. continuous, univalent vs. multivalent, describes video vs. describes user or context.</p>
+      </li>
+      <li>
+        <p>Important signals include user’s interaction with the video (or similar videos), which source/channel added the video to the candidate set.</p>
+      </li>
+      <li>
+        <p>Embeddings are shared across features. For example, the representation for a video id remains the same, irrespective of whether it is being used for representing the “video to recommend” or the “last seen video.”</p>
+      </li>
+      <li>
+        <p>Feature normalization and transformations like exponents (square or square root) for continuous variables improve the performance.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>To model the expected watch time, the logistic regression loss is weighted by the observed watch time. For example, if a video was watched, its weight is given by the observed watch time, and if the video was not watched, its weight is set to 1.</p>
+  </li>
+  <li>
+    <p>In practice, this means that the logistic regression model learns odds that approximate the expected watch time of the video.</p>
+  </li>
+</ul>
diff --git a/_site/site/2021/03/29/Synthesized-Policies-for-Transfer-and-Adaptation-across-Tasks-and-Environments.html b/_site/site/2021/03/29/Synthesized-Policies-for-Transfer-and-Adaptation-across-Tasks-and-Environments.html
new file mode 100644
index 00000000..1289dc54
--- /dev/null
+++ b/_site/site/2021/03/29/Synthesized-Policies-for-Transfer-and-Adaptation-across-Tasks-and-Environments.html
@@ -0,0 +1,192 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper studies transfer learning in RL, focusing on simultaneous transfer across both tasks and environments.</p>
+  </li>
+  <li>
+    <p>The key idea is to learn task and environment embeddings and compose them using a meta-rule, and the proposed approach is called SYNPO (Synthesized Policies).</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1904.03276">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="setup">Setup</h2>
+
+<ul>
+  <li>
+    <p>Three settings considered:</p>
+
+    <ul>
+      <li>
+        <p><em>S1</em>: Transfer to a new (environment, task) pair when the agent has been trained on the environment and the task before (but not simultaneously).</p>
+      </li>
+      <li>
+        <p><em>S2</em>: Transfer to a new (environment, task) pair where either the environment or the task is not seen previously.</p>
+      </li>
+      <li>
+        <p><em>S3</em>: Transfer to a new (environment, task) pair where neither the environment nor the task is seen previously.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>In the second and third settings, the agent is allowed to collect some data in the new environment or task.</p>
+  </li>
+  <li>
+    <p>The (environment, task) combinations that the agent has seen during training are referred to as <em>seen</em> combinations, while the remaining combinations are referred to as the <em>unseen</em> combinations.</p>
+  </li>
+  <li>
+    <p>The key idea is to:</p>
+
+    <ul>
+      <li>
+        <p>learn embeddings of environments and tasks</p>
+      </li>
+      <li>
+        <p>use these embeddings to compose a policy (parameterized as the linear combination of the policy basis).</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>A disentanglement objective is used to decouple the task and environment embedding.</p>
+  </li>
+</ul>
+
+<h3 id="policy-composition">Policy Composition</h3>
+
+<ul>
+  <li>
+    <p>Given an (environment, task) pair $z = (\epsilon, \tau)$, the policy is given as $\pi_z(a|s) \propto exp(\psi_s^TU(e_{\epsilon}, e_{\tau})\phi_{a} + b_{\pi} )) $.</p>
+  </li>
+  <li>
+    <p>Here $b_{\pi}$ is a scalar bias, $\psi_{s}$ and $\phi_{a}$ are state and action representations, $U$ is parameterized as the linear comination of $K$ basis matrices $\Theta_k$</p>
+  </li>
+  <li>
+    <p>$U(e_{\epsilon}, e_{\tau}) = \sum_{k=1}^{K}\alpha_k(e_{\epsilon}, e_{\tau})\Theta_k$.</p>
+  </li>
+  <li>
+    <p>The basis matrices (denoted by $\Theta_k$) are shared across tasks while the coefficients ($\alpha_k$) are specific to the (environment, task) pair.</p>
+  </li>
+  <li>
+    <p>During training, the agent also predicts rewards using the same set of basis but different coefficients.</p>
+  </li>
+</ul>
+
+<h3 id="disentangling-environment-and-task-embeddings">Disentangling environment and task embeddings</h3>
+
+<ul>
+  <li>
+    <p>Given an (environment, task) pair, the agent is trained to decode the environment (and task) given the agent’s trajectory.</p>
+  </li>
+  <li>
+    <p>The sequence of state-action pairs (in the trajectory) is mapped to a sequence of state-action representations, given by $\psi_s^T\Theta_k\phi_{a}$</p>
+  </li>
+</ul>
+
+<h2 id="experiment-setup">Experiment Setup</h2>
+
+<ul>
+  <li>The agent is trained (and evaluated) on imitation learning (mostly) and reinforcement learning setup.</li>
+</ul>
+
+<h3 id="environments">Environments</h3>
+
+<ul>
+  <li>
+    <p>GRIDWORLD</p>
+
+    <ul>
+      <li>
+        <p>Twenty $16 \times 16$ gird-aligned mazes that are similar in appearance but differ in topology.</p>
+      </li>
+      <li>
+        <p>The task is to collect colored blocks in a given order. In each task, the starting position of the agent and the position of the blocks is randomized.</p>
+      </li>
+      <li>
+        <p>Each environment has 20 tasks, leading to a total of 400 (environment, task) combinations.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1712.05474">THOR</a></p>
+
+    <ul>
+      <li>
+        <p>This is a 3D simulator where the agent is placed in indoor photo-realistic scenes.</p>
+      </li>
+      <li>
+        <p>The task is the search for objects and perform actions like “put cabbage on the fridge.”</p>
+      </li>
+      <li>
+        <p>The setup uses 19 scenes (environments), with each environment comprising of 21 tasks.</p>
+      </li>
+    </ul>
+  </li>
+</ul>
+
+<h3 id="baselines">Baselines</h3>
+
+<ul>
+  <li>
+    <p>MLPs that concatenate state, environment embeddings, and task embedding.</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1606.05312">Successor feature model</a></p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/1609.07088">Module Network</a></p>
+  </li>
+  <li>
+    <p>Multi-task Learning where the distinction between the environments is ignored.</p>
+  </li>
+</ul>
+
+<h2 id="results">Results</h2>
+
+<ul>
+  <li>
+    <p>GRIDWORLD</p>
+
+    <ul>
+      <li>
+        <p>In the first setting (<em>S1</em>)</p>
+
+        <ul>
+          <li>
+            <p>SYNPO outperforms all the baselines.</p>
+          </li>
+          <li>
+            <p>As the agent is trained on more (environment, task) combinations, its performance on the unseen combinations improves. This trend saturates when the <em>seem/total</em> ratio reaches about 0.4 (i.e., training on 40% of all the combinations).</p>
+          </li>
+          <li>
+            <p>Task disentanglement is more important than environment disentanglement.</p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>In the second and third setting (<em>S2</em> and <em>S3</em>)</p>
+
+        <ul>
+          <li>
+            <p>The agent uses one demonstration from each test pair to finetune the embeddings.</p>
+          </li>
+          <li>
+            <p><em>S2</em> is an easier setting than <em>S3</em>.</p>
+          </li>
+          <li>
+            <p>Transfer learning across tasks is easier than transfer learning across environments.</p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>THOR</p>
+
+    <ul>
+      <li>SYNPO outperforms all the baselines on both seen and unseen combinations.</li>
+    </ul>
+  </li>
+</ul>
+
diff --git a/_site/site/2023/02/10/Toolformer-Language-Models-Can-Teach-Themselves-to-Use-Tools.html b/_site/site/2023/02/10/Toolformer-Language-Models-Can-Teach-Themselves-to-Use-Tools.html
new file mode 100644
index 00000000..ca331af6
--- /dev/null
+++ b/_site/site/2023/02/10/Toolformer-Language-Models-Can-Teach-Themselves-to-Use-Tools.html
@@ -0,0 +1,220 @@
+<h2 id="introduction">Introduction</h2>
+
+<ul>
+  <li>
+    <p>The paper presents Toolformer, a language model that uses simple APIs to use external tools (calculator, QA system, search engine, translation system, and calendar).</p>
+  </li>
+  <li>
+    <p><a href="https://arxiv.org/abs/2302.04761">Link to the paper</a></p>
+  </li>
+</ul>
+
+<h2 id="approach">Approach</h2>
+
+<ul>
+  <li>
+    <p>Starting with a language model, M, the goal is to enable the language model to use tools by invoking API calls.</p>
+  </li>
+  <li>
+    <p>An API call is denoted by the tuple $c =$ (api_name, api_input). It can be linearized as $e(c) =$ [api_name(api_input)$]$ or as $e(c, r) = [$api_name(api_input) $ -&gt; r]$ where $r$ denotes the result of the API.</p>
+  </li>
+  <li>
+    <p>The given dataset of plain text, $C$, is converted into a dataset $C*$ augmented with the API calls using a three-step process.</p>
+  </li>
+  <li>
+    <p>In the first step, a position ($i$) and API call candidates (for the position $i$) are sampled.</p>
+
+    <ul>
+      <li>
+        <p>Positions are sampled by (i) computing the probability that M assigns to starting an API call for each position and (ii) retaining the top-$k$ positions with a probability greater than a threshold value.</p>
+      </li>
+      <li>
+        <p>For each of the sampled positions (say $i$), API calls are sampled by concatenating a prompt to the tokens till index $i$ and sampling from the model M. Examples that do not generate the “end of the API” token (i.e.,”]”) are discarded.</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>In the second step, the API calls are executed to obtain response $r$ (text sequence).</p>
+
+    <ul>
+      <li>API calls are filtered using the following criteria: if providing M with both the input and the output of the API makes it easier for M to predict the future token, compared to not using the API call at all or just using the input to the API, then the API call is helpful for M, and the example should be retained.</li>
+    </ul>
+  </li>
+  <li>
+    <p>In the last step, the remaining API calls are merged to obtain the augmented dataset $C*$ that is used for finetuning M.</p>
+  </li>
+  <li>
+    <p>Note that $C*$ contains $C$, so M is finetuned on the original dataset and examples where a tool is helpful.</p>
+  </li>
+  <li>
+    <p>During inference, the model is used for decoding in the usual way. Decoding is stopped when it produces the “-&gt;” token, and the corresponding API is used to generate the response. The decoding process (using the model) resumes with the API output appended to the decoded text.</p>
+  </li>
+</ul>
+
+<h2 id="tools">Tools</h2>
+
+<ul>
+  <li>
+    <p>There are two constraints on the tools: (i) their input and output should be expressible as text, and (ii) few demonstrations can be obtained from the tools. The second constraint means that the tool should be useable or accessible.</p>
+  </li>
+  <li>
+    <p>The paper considered the following tools: a question-answering system, a Wikipedia search engine, a calculator, a calendar, and a machine translation system. Of these, only the calculator and calendar are non-neural network tools.</p>
+  </li>
+</ul>
+
+<h2 id="experiments">Experiments</h2>
+
+<ul>
+  <li>
+    <p>Subset of CCNet is used as the language modeling dataset.</p>
+  </li>
+  <li>
+    <p>GPT-J is used as the language model.</p>
+  </li>
+  <li>
+    <p>For finetuning, the batch size is 128, the learning rate is 1e-5, and a linear warmup for the first 10% of training is used.</p>
+  </li>
+  <li>
+    <p>Following models are compared:</p>
+
+    <ul>
+      <li>
+        <p>GPT-J: Regular GPT-J model without any finetuning.</p>
+      </li>
+      <li>
+        <p>GPT-J + CC: GPT-J finetuned on $C$ without any API calls.</p>
+      </li>
+      <li>
+        <p>Toolformer, i.e. GPT-J finetuned on $C*$.</p>
+      </li>
+      <li>
+        <p>Toolformer with API calls disabled during training.</p>
+      </li>
+      <li>
+        <p>OPT 66B</p>
+      </li>
+      <li>
+        <p>GPT-3</p>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>The models are evaluated in the prompted zero-shot setup, where models are instructed to solve a task without any in-context examples.</p>
+  </li>
+  <li>
+    <p>One difference from the standard greedy decoding is that the API call is used whenever it is one of the top-10 most likely next tokens. This is done to increase the use of API calls.</p>
+  </li>
+  <li>
+    <p>Evaluation Tasks</p>
+
+    <ul>
+      <li>
+        <p>SQuAD, GoogleRE, and T-REx subsets of the LAMA benchmark where the model has to complete a short statement with a missing fact.</p>
+
+        <ul>
+          <li>
+            <p>Since LAMA questions are based on Wikipedia, Toolformer isn’t allowed to use Wikipedia search.</p>
+          </li>
+          <li>
+            <p>The evaluation criteria is to check if the correct word is among the first five words predicted by the model.</p>
+          </li>
+          <li>
+            <p>Toolformer uses the question-answering tool for most cases, outperforming all the baselines.</p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>Math Dataset</p>
+
+        <ul>
+          <li>
+            <p>eSDiv, SVAMP, and MAWPS benchmarks.</p>
+          </li>
+          <li>
+            <p>The first number predicted by the model is considered to be the output.</p>
+          </li>
+          <li>
+            <p>Toolformer uses the calculator tool for most cases, thereby outperforming all the baselines.</p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>Question Answering</p>
+
+        <ul>
+          <li>
+            <p>Web Questions, Natural Questions, and TriviaQA datasets.</p>
+          </li>
+          <li>
+            <p>The evaluation criteria is to check if the correct word is among the first 20 words predicted by the model.</p>
+          </li>
+          <li>
+            <p>Question Answering tool is disabled for this setup.</p>
+          </li>
+          <li>
+            <p>Toolformer uses the Wikipedia tool for most cases, thereby outperforming all the baselines other than the much larger GPT-3 model.</p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>Multilingual Question Answering</p>
+
+        <ul>
+          <li>
+            <p>MLQA benchmark.</p>
+          </li>
+          <li>
+            <p>The evaluation criteria is to check if the correct word is among the first ten words predicted by the model.</p>
+          </li>
+          <li>
+            <p>Toolformer uses the translation tool for most of the questions, with questions in Hindi being an exception.</p>
+          </li>
+          <li>
+            <p>However, Toolformer does not consistently outperform the GPT-J baseline, likely because, for some languages, finetuning on CCNet could hurt performance.</p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>Temporal Datasets</p>
+
+        <ul>
+          <li>
+            <p>TEMPLAMA (cloze style queries where the answer changes with time) and DATESET (dataset generated through a series of templates and populated with random dates/durations).</p>
+          </li>
+          <li>
+            <p>While Toolformer outperforms the baselines for both datasets, it relies on the Wikipedia search and Question Answering tools (and not the calendar tool) for the LAMA dataset. On the DATESET dataset, it uses the calendar tool in the majority.</p>
+          </li>
+        </ul>
+      </li>
+      <li>
+        <p>Language Modeling</p>
+
+        <ul>
+          <li>
+            <p>WikiText and a subset of 10,000 randomly selected documents from CCNet (not used during training of M).</p>
+          </li>
+          <li>
+            <p>Training on $C*$ does not increase perplexity (compared to training on C). In this experiment, the API calls are disabled during inference.</p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+  </li>
+  <li>
+    <p>Varying the size of the underlying models show that the ability to use tools emerges only around 755M parameters.</p>
+  </li>
+</ul>
+
+<h2 id="future-work">Future Work</h2>
+
+<ul>
+  <li>
+    <p>Extending Toolformer to chain the use of tools and use tools interactively.</p>
+  </li>
+  <li>
+    <p>In some cases, the use of tools is very sample-inefficient.</p>
+  </li>
+  <li>
+    <p>Decision to use a tool does not account for the cost of using the tool.</p>
+  </li>
+</ul>
diff --git a/_site/site/LICENSE.md b/_site/site/LICENSE.md
new file mode 100755
index 00000000..af1b0ec7
--- /dev/null
+++ b/_site/site/LICENSE.md
@@ -0,0 +1,9 @@
+# Released under MIT License
+
+Copyright (c) 2014 Mark Otto.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
\ No newline at end of file
diff --git a/_site/site/README.md b/_site/site/README.md
new file mode 100755
index 00000000..b6c7d402
--- /dev/null
+++ b/_site/site/README.md
@@ -0,0 +1,134 @@
+# Lanyon
+
+Lanyon is an unassuming [Jekyll](http://jekyllrb.com) theme that places content first by tucking away navigation in a hidden drawer. It's based on [Poole](http://getpoole.com), the Jekyll butler.
+
+![Lanyon](https://f.cloud.github.com/assets/98681/1825266/be03f014-71b0-11e3-9539-876e61530e24.png)
+![Lanyon with open sidebar](https://f.cloud.github.com/assets/98681/1825267/be04a914-71b0-11e3-966f-8afe9894c729.png)
+
+
+## Contents
+
+- [Usage](#usage)
+- [Options](#options)
+  - [Sidebar menu](#sidebar-menu)
+  - [Themes](#themes)
+  - [Reverse layout](#reverse-layout)
+- [Development](#development)
+- [Author](#author)
+- [License](#license)
+
+
+## Usage
+
+Lanyon is a theme built on top of [Poole](https://github.com/poole/poole), which provides a fully furnished Jekyll setup—just download and start the Jekyll server. See [the Poole usage guidelines](https://github.com/poole/poole#usage) for how to install and use Jekyll.
+
+
+## Options
+
+Lanyon includes some customizable options, typically applied via classes on the `<body>` element.
+
+
+### Sidebar menu
+
+Create a list of nav links in the sidebar by assigning each Jekyll page the correct layout in the page's [front-matter](http://jekyllrb.com/docs/frontmatter/).
+
+```
+---
+layout: page
+title: About
+---
+```
+
+**Why require a specific layout?** Jekyll will return *all* pages, including the `atom.xml`, and with an alphabetical sort order. To ensure the first link is *Home*, we exclude the `index.html` page from this list by specifying the `page` layout.
+
+
+### Themes
+
+Lanyon ships with eight optional themes based on the [base16 color scheme](https://github.com/chriskempson/base16). Apply a theme to change the color scheme (mostly applies to sidebar and links).
+
+![Lanyon with red theme](https://f.cloud.github.com/assets/98681/1825270/be065110-71b0-11e3-9ed8-9b8de753a4af.png)
+![Lanyon with red theme and open sidebar](https://f.cloud.github.com/assets/98681/1825269/be05ec20-71b0-11e3-91ea-a9138ef07186.png)
+
+There are eight themes available at this time.
+
+![Available theme classes](https://f.cloud.github.com/assets/98681/1817044/e5b0ec06-6f68-11e3-83d7-acd1942797a1.png)
+
+To use a theme, add any one of the available theme classes to the `<body>` element in the `default.html` layout, like so:
+
+```html
+<body class="theme-base-08">
+  ...
+</body>
+```
+
+To create your own theme, look to the Themes section of [included CSS file](https://github.com/poole/lanyon/blob/master/public/css/lanyon.css). Copy any existing theme (they're only a few lines of CSS), rename it, and change the provided colors.
+
+
+### Reverse layout
+
+![Lanyon with reverse layout](https://f.cloud.github.com/assets/98681/1825265/be03f2e4-71b0-11e3-89f1-360705524495.png)
+![Lanyon with reverse layout and open sidebar](https://f.cloud.github.com/assets/98681/1825268/be056174-71b0-11e3-88c8-5055bca4307f.png)
+
+Reverse the page orientation with a single class.
+
+```html
+<body class="layout-reverse">
+  ...
+</body>
+```
+
+
+### Sidebar overlay instead of push
+
+Make the sidebar overlap the viewport content with a single class:
+
+```html
+<body class="sidebar-overlay">
+  ...
+</body>
+```
+
+This will keep the content stationary and slide in the sidebar over the side content. It also adds a `box-shadow` based outline to the toggle for contrast against backgrounds, as well as a `box-shadow` on the sidebar for depth.
+
+It's also available for a reversed layout when you add both classes:
+
+```html
+<body class="layout-reverse sidebar-overlay">
+  ...
+</body>
+```
+
+### Sidebar open on page load
+
+Show an open sidebar on page load by modifying the `<input>` tag within the `sidebar.html` layout to add the `checked` boolean attribute:
+
+```html
+<input type="checkbox" class="sidebar-checkbox" id="sidebar-checkbox" checked>
+```
+
+Using Liquid you can also conditionally show the sidebar open on a per-page basis. For example, here's how you could have it open on the homepage only:
+
+```html
+<input type="checkbox" class="sidebar-checkbox" id="sidebar-checkbox" {% if page.title =="Home" %}checked{% endif %}>
+```
+
+## Development
+
+Lanyon has two branches, but only one is used for active development.
+
+- `master` for development.  **All pull requests should be to submitted against `master`.**
+- `gh-pages` for our hosted site, which includes our analytics tracking code. **Please avoid using this branch.**
+
+
+## Author
+
+**Mark Otto**
+- <https://github.com/mdo>
+- <https://twitter.com/mdo>
+
+
+## License
+
+Open sourced under the [MIT license](LICENSE.md).
+
+<3
diff --git a/_site/site/archieve.html b/_site/site/archieve.html
new file mode 100644
index 00000000..1d529fb3
--- /dev/null
+++ b/_site/site/archieve.html
@@ -0,0 +1,439 @@
+<h2 id="blog-posts">Blog Posts</h2>
+
+<ul>
+  <li>
+    <p>10 Feb 2023 » <a href="/site/2023/02/10/Toolformer-Language-Models-Can-Teach-Themselves-to-Use-Tools.html"> Toolformer - Language Models Can Teach Themselves to Use Tools </a></p>
+  </li>
+  <li>
+    <p>29 Mar 2021 » <a href="/site/2021/03/29/Synthesized-Policies-for-Transfer-and-Adaptation-across-Tasks-and-Environments.html"> Synthesized Policies for Transfer and Adaptation across Tasks and Environments </a></p>
+  </li>
+  <li>
+    <p>22 Mar 2021 » <a href="/site/2021/03/22/Deep-Neural-Networks-for-YouTube-Recommendations.html"> Deep Neural Networks for YouTube Recommendations </a></p>
+  </li>
+  <li>
+    <p>15 Mar 2021 » <a href="/site/2021/03/15/The-Tail-at-Scale.html"> The Tail at Scale </a></p>
+  </li>
+  <li>
+    <p>08 Mar 2021 » <a href="/site/2021/03/08/Practical-Lessons-from-Predicting-Clicks-on-Ads-at-Facebook.html"> Practical Lessons from Predicting Clicks on Ads at Facebook </a></p>
+  </li>
+  <li>
+    <p>01 Mar 2021 » <a href="/site/2021/03/01/Ad-Click-Prediction-a-View-from-the-Trenches.html"> Ad Click Prediction - a View from the Trenches </a></p>
+  </li>
+  <li>
+    <p>22 Feb 2021 » <a href="/site/2021/02/22/Anatomy-of-Catastrophic-Forgetting-Hidden-Representations-and-Task-Semantics.html"> Anatomy of Catastrophic Forgetting - Hidden Representations and Task Semantics </a></p>
+  </li>
+  <li>
+    <p>15 Feb 2021 » <a href="/site/2021/02/15/When-Do-Curricula-Work.html"> When Do Curricula Work? </a></p>
+  </li>
+  <li>
+    <p>08 Feb 2021 » <a href="/site/2021/02/08/Continual-learning-with-hypernetworks.html"> Continual learning with hypernetworks </a></p>
+  </li>
+  <li>
+    <p>01 Feb 2021 » <a href="/site/2021/02/01/Zero-shot-Learning-by-Generating-Task-specific-Adapters.html"> Zero-shot Learning by Generating Task-specific Adapters </a></p>
+  </li>
+  <li>
+    <p>25 Jan 2021 » <a href="/site/2021/01/25/HyperNetworks.html"> HyperNetworks </a></p>
+  </li>
+  <li>
+    <p>18 Jan 2021 » <a href="/site/2021/01/18/Energy-based-Models-for-Continual-Learning.html"> Energy-based Models for Continual Learning </a></p>
+  </li>
+  <li>
+    <p>11 Jan 2021 » <a href="/site/2021/01/11/GPipe-Easy-Scaling-with-Micro-Batch-Pipeline-Parallelism.html"> GPipe - Easy Scaling with Micro-Batch Pipeline Parallelism </a></p>
+  </li>
+  <li>
+    <p>04 Jan 2021 » <a href="/site/2021/01/04/Compositional-Explanations-of-Neurons.html"> Compositional Explanations of Neurons </a></p>
+  </li>
+  <li>
+    <p>21 Dec 2020 » <a href="/site/2020/12/21/Design-patterns-for-container-based-distributed-systems.html"> Design patterns for container-based distributed systems </a></p>
+  </li>
+  <li>
+    <p>14 Dec 2020 » <a href="/site/2020/12/14/Cassandra-a-decentralized-structured-storage-system.html"> Cassandra - a decentralized structured storage system </a></p>
+  </li>
+  <li>
+    <p>07 Dec 2020 » <a href="/site/2020/12/07/CAP-twelve-years-later-How-the-rules-have-changed.html"> CAP twelve years later - How the rules have changed </a></p>
+  </li>
+  <li>
+    <p>30 Nov 2020 » <a href="/site/2020/11/30/Consistency-Tradeoffs-in-Modern-Distributed-Database-System-Design.html"> Consistency Tradeoffs in Modern Distributed Database System Design </a></p>
+  </li>
+  <li>
+    <p>23 Nov 2020 » <a href="/site/2020/11/23/Exploring-Simple-Siamese-Representation-Learning.html"> Exploring Simple Siamese Representation Learning </a></p>
+  </li>
+  <li>
+    <p>16 Nov 2020 » <a href="/site/2020/11/16/Data-Management-for-Internet-Scale-Single-Sign-On.html"> Data Management for Internet-Scale Single-Sign-On </a></p>
+  </li>
+  <li>
+    <p>09 Nov 2020 » <a href="/site/2020/11/09/Searching-for-Build-Debt-Experiences-Managing-Technical-Debt-at-Google.html"> Searching for Build Debt - Experiences Managing Technical Debt at Google </a></p>
+  </li>
+  <li>
+    <p>02 Nov 2020 » <a href="/site/2020/11/02/One-Solution-is-Not-All-You-Need-Few-Shot-Extrapolation-via-Structured-MaxEnt-RL.html"> One Solution is Not All You Need - Few-Shot Extrapolation via Structured MaxEnt RL </a></p>
+  </li>
+  <li>
+    <p>19 Oct 2020 » <a href="/site/2020/10/19/Learning-Explanations-That-Are-Hard-To-Vary.html"> Learning Explanations That Are Hard To Vary </a></p>
+  </li>
+  <li>
+    <p>12 Oct 2020 » <a href="/site/2020/10/12/Remembering-for-the-Right-Reasons-Explanations-Reduce-Catastrophic-Forgetting.html"> Remembering for the Right Reasons - Explanations Reduce Catastrophic Forgetting </a></p>
+  </li>
+  <li>
+    <p>28 Sep 2020 » <a href="/site/2020/09/28/A-Foliated-View-of-Transfer-Learning.html"> A Foliated View of Transfer Learning </a></p>
+  </li>
+  <li>
+    <p>21 Sep 2020 » <a href="/site/2020/09/21/Harvest,-Yield,-and-Scalable-Tolerant-Systems.html"> Harvest, Yield, and Scalable Tolerant Systems </a></p>
+  </li>
+  <li>
+    <p>14 Sep 2020 » <a href="/site/2020/09/14/MONet-Unsupervised-Scene-Decomposition-and-Representation.html"> MONet - Unsupervised Scene Decomposition and Representation </a></p>
+  </li>
+  <li>
+    <p>07 Sep 2020 » <a href="/site/2020/09/07/Revisiting-Fundamentals-of-Experience-Replay.html"> Revisiting Fundamentals of Experience Replay </a></p>
+  </li>
+  <li>
+    <p>31 Aug 2020 » <a href="/site/2020/08/31/Deep-Reinforcement-Learning-and-the-Deadly-Triad.html"> Deep Reinforcement Learning and the Deadly Triad </a></p>
+  </li>
+  <li>
+    <p>24 Aug 2020 » <a href="/site/2020/08/24/Alpha-Net-Adaptation-with-Composition-in-Classifier-Space.html"> Alpha Net–Adaptation with Composition in Classifier Space </a></p>
+  </li>
+  <li>
+    <p>14 Aug 2020 » <a href="/site/2020/08/14/Outrageously-Large-Neural-Networks-The-Sparsely-Gated-Mixture-of-Experts-Layer.html"> Outrageously Large Neural Networks–The Sparsely-Gated Mixture-of-Experts Layer </a></p>
+  </li>
+  <li>
+    <p>06 Aug 2020 » <a href="/site/2020/08/06/Gradient-Surgery-for-Multi-Task-Learning.html"> Gradient Surgery for Multi-Task Learning </a></p>
+  </li>
+  <li>
+    <p>30 Jul 2020 » <a href="/site/2020/07/30/GradNorm-Gradient-Normalization-for-Adaptive-Loss-Balancing-in-Deep-Multitask-Networks.html"> GradNorm–Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks </a></p>
+  </li>
+  <li>
+    <p>23 Jul 2020 » <a href="/site/2020/07/23/TASKNORM-Rethinking-Batch-Normalization-for-Meta-Learning.html"> TaskNorm–Rethinking Batch Normalization for Meta-Learning </a></p>
+  </li>
+  <li>
+    <p>16 Jul 2020 » <a href="/site/2020/07/16/Averaging-Weights-leads-to-Wider-Optima-and-Better-Generalization.html"> Averaging Weights leads to Wider Optima and Better Generalization </a></p>
+  </li>
+  <li>
+    <p>09 Jul 2020 » <a href="/site/2020/07/09/Decentralized-Reinforcement-Learning-Global-Decision-Making-via-Local-Economic-Transactions.html"> Decentralized Reinforcement Learning – Global Decision-Making via Local Economic Transactions </a></p>
+  </li>
+  <li>
+    <p>02 Jul 2020 » <a href="/site/2020/07/02/When-to-use-parametric-models-in-reinforcement-learning.html"> When to use parametric models in reinforcement learning? </a></p>
+  </li>
+  <li>
+    <p>25 Jun 2020 » <a href="/site/2020/06/25/Network-Randomization-A-Simple-Technique-for-Generalization-in-Deep-Reinforcement-Learning.html"> Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning </a></p>
+  </li>
+  <li>
+    <p>18 Jun 2020 » <a href="/site/2020/06/18/On-the-Difficulty-of-Warm-Starting-Neural-Network-Training.html"> On the Difficulty of Warm-Starting Neural Network Training </a></p>
+  </li>
+  <li>
+    <p>30 Apr 2020 » <a href="/site/2020/04/30/Supervised-Contrastive-Learning.html"> Supervised Contrastive Learning </a></p>
+  </li>
+  <li>
+    <p>09 Apr 2020 » <a href="/site/2020/04/09/CURL-Contrastive-Unsupervised-Representations-for-Reinforcement-Learning.html"> CURL - Contrastive Unsupervised Representations for Reinforcement Learning </a></p>
+  </li>
+  <li>
+    <p>12 Mar 2020 » <a href="/site/2020/03/12/Competitive-Training-of-Mixtures-of-Independent-Deep-Generative-Models.html"> Competitive Training of Mixtures of Independent Deep Generative Models </a></p>
+  </li>
+  <li>
+    <p>05 Mar 2020 » <a href="/site/2020/03/05/What-Does-Classifying-More-Than-10,000-Image-Categories-Tell-Us.html"> What Does Classifying More Than 10,000 Image Categories Tell Us? </a></p>
+  </li>
+  <li>
+    <p>27 Feb 2020 » <a href="/site/2020/02/27/mixup-Beyond-Empirical-Risk-Minimization.html"> mixup - Beyond Empirical Risk Minimization </a></p>
+  </li>
+  <li>
+    <p>20 Feb 2020 » <a href="/site/2020/02/20/ELECTRA-Pre-training-Text-Encoders-as-Discriminators-Rather-Than-Generators.html"> ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators </a></p>
+  </li>
+  <li>
+    <p>13 Feb 2020 » <a href="/site/2020/02/13/Gradient-based-sample-selection-for-online-continual-learning.html"> Gradient based sample selection for online continual learning </a></p>
+  </li>
+  <li>
+    <p>06 Feb 2020 » <a href="/site/2020/02/06/Your-Classifier-is-Secretly-an-Energy-Based-Model,-and-You-Should-Treat-it-Like-One.html"> Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One </a></p>
+  </li>
+  <li>
+    <p>30 Jan 2020 » <a href="/site/2020/01/30/Massively-Multilingual-Neural-Machine-Translation-in-the-Wild-Findings-and-Challenges.html"> Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges </a></p>
+  </li>
+  <li>
+    <p>23 Jan 2020 » <a href="/site/2020/01/23/Observational-Overfitting-in-Reinforcement-Learning.html"> Observational Overfitting in Reinforcement Learning </a></p>
+  </li>
+  <li>
+    <p>16 Jan 2020 » <a href="/site/2020/01/16/Rapid-Learning-or-Feature-Reuse-Towards-Understanding-the-Effectiveness-of-MAML.html"> Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML </a></p>
+  </li>
+  <li>
+    <p>09 Jan 2020 » <a href="/site/2020/01/09/Accurate-Large-Minibatch-SGD-Training-ImageNet-in-1-Hour.html"> Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour </a></p>
+  </li>
+  <li>
+    <p>02 Jan 2020 » <a href="/site/2020/01/02/Superposition-of-many-models-into-one.html"> Superposition of many models into one </a></p>
+  </li>
+  <li>
+    <p>26 Dec 2019 » <a href="/site/2019/12/26/Towards-a-Unified-Theory-of-State-Abstraction-for-MDPs.html"> Towards a Unified Theory of State Abstraction for MDPs </a></p>
+  </li>
+  <li>
+    <p>19 Dec 2019 » <a href="/site/2019/12/19/ALBERT-A-Lite-BERT-for-Self-supervised-Learning-of-Language-Representations.html"> ALBERT - A Lite BERT for Self-supervised Learning of Language Representations </a></p>
+  </li>
+  <li>
+    <p>12 Dec 2019 » <a href="/site/2019/12/12/Everything-Happens-for-a-Reason-Discovering-the-Purpose-of-Actions-in-Procedural-Text.html"> Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text </a></p>
+  </li>
+  <li>
+    <p>05 Dec 2019 » <a href="/site/2019/12/05/Mastering-Atari,-Go,-Chess-and-Shogi-by-Planning-with-a-Learned-Model.html"> Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model </a></p>
+  </li>
+  <li>
+    <p>28 Nov 2019 » <a href="/site/2019/11/28/Contrastive-Learning-of-Structured-World-Models.html"> Contrastive Learning of Structured World Models </a></p>
+  </li>
+  <li>
+    <p>12 Sep 2019 » <a href="/site/2019/09/12/Gossip-based-Actor-Learner-Architectures-for-Deep-RL.html"> Gossip based Actor-Learner Architectures for Deep RL </a></p>
+  </li>
+  <li>
+    <p>05 Sep 2019 » <a href="/site/2019/09/05/How-to-train-your-MAML.html"> How to train your MAML </a></p>
+  </li>
+  <li>
+    <p>29 Aug 2019 » <a href="/site/2019/08/29/PHYRE-A-New-Benchmark-for-Physical-Reasoning.html"> PHYRE - A New Benchmark for Physical Reasoning </a></p>
+  </li>
+  <li>
+    <p>22 Aug 2019 » <a href="/site/2019/08/22/Large-Memory-Layers-with-Product-Keys.html"> Large Memory Layers with Product Keys </a></p>
+  </li>
+  <li>
+    <p>15 Aug 2019 » <a href="/site/2019/08/15/Abductive-Commonsense-Reasoning.html"> Abductive Commonsense Reasoning </a></p>
+  </li>
+  <li>
+    <p>08 Aug 2019 » <a href="/site/2019/08/08/Deep-Reinforcement-Learning-in-a-Handful-of-Trials-using-Probabilistic-Dynamics-Models.html"> Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models </a></p>
+  </li>
+  <li>
+    <p>01 Aug 2019 » <a href="/site/2019/08/01/Assessing-Generalization-in-Deep-Reinforcement-Learning.html"> Assessing Generalization in Deep Reinforcement Learning </a></p>
+  </li>
+  <li>
+    <p>25 Jul 2019 » <a href="/site/2019/07/25/Quantifying-Generalization-in-Reinforcement-Learning.html"> Quantifying Generalization in Reinforcement Learning </a></p>
+  </li>
+  <li>
+    <p>18 Jul 2019 » <a href="/site/2019/07/18/Set-Transformer-A-Framework-for-Attention-based-Permutation-Invariant-Neural-Networks.html"> Set Transformer - A Framework for Attention-based Permutation-Invariant Neural Networks </a></p>
+  </li>
+  <li>
+    <p>27 Jun 2019 » <a href="/site/2019/06/27/Measuring-Abstract-Reasoning-in-Neural-Networks.html"> Measuring abstract reasoning in neural networks </a></p>
+  </li>
+  <li>
+    <p>20 Jun 2019 » <a href="/site/2019/06/20/Hamiltonian-Neural-Networks.html"> Hamiltonian Neural Networks </a></p>
+  </li>
+  <li>
+    <p>13 Jun 2019 » <a href="/site/2019/06/13/Extrapolating-Beyond-Suboptimal-Demonstrations-via-Inverse-Reinforcement-Learning-from-Observations.html"> Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations </a></p>
+  </li>
+  <li>
+    <p>08 Jun 2019 » <a href="/site/2019/06/08/Meta-Reinforcement-Learning-of-Structured-Exploration-Strategies.html"> Meta-Reinforcement Learning of Structured Exploration Strategies </a></p>
+  </li>
+  <li>
+    <p>01 Jun 2019 » <a href="/site/2019/06/01/Relational-Reinforcement-Learning.html"> Relational Reinforcement Learning </a></p>
+  </li>
+  <li>
+    <p>21 May 2019 » <a href="/site/2019/05/21/Good-Enough-Compositional-Data-Augmentation.html"> Good-Enough Compositional Data Augmentation </a></p>
+  </li>
+  <li>
+    <p>14 May 2019 » <a href="/site/2019/05/14/Multiple-Model-Based-Reinforcement-Learning.html"> Multiple Model-Based Reinforcement Learning </a></p>
+  </li>
+  <li>
+    <p>09 Apr 2019 » <a href="/site/2019/04/09/Towards-a-natural-benchmark-for-continual-learning.html"> Towards a natural benchmark for continual learning </a></p>
+  </li>
+  <li>
+    <p>02 Apr 2019 » <a href="/site/2019/04/02/Meta-Learning-Update-Rules-for-Unsupervised-Representation-Learning.html"> Meta-Learning Update Rules for Unsupervised Representation Learning </a></p>
+  </li>
+  <li>
+    <p>26 Mar 2019 » <a href="/site/2019/03/26/GNN-Explainer-A-Tool-for-Post-hoc-Explanation-of-Graph-Neural-Networks.html"> GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks </a></p>
+  </li>
+  <li>
+    <p>16 Mar 2019 » <a href="/site/2019/03/16/To-Tune-or-Not-to-Tune-Adapting-Pretrained-Representations-to-Diverse-Tasks.html"> To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks </a></p>
+  </li>
+  <li>
+    <p>12 Mar 2019 » <a href="/site/2019/03/12/Model-Primitive-Hierarchical-Lifelong-Reinforcement-Learning.html"> Model Primitive Hierarchical Lifelong Reinforcement Learning </a></p>
+  </li>
+  <li>
+    <p>19 Feb 2019 » <a href="/site/2019/02/19/TuckER-Tensor-Factorization-for-Knowledge-Graph-Completion.html"> TuckER - Tensor Factorization for Knowledge Graph Completion </a></p>
+  </li>
+  <li>
+    <p>05 Feb 2019 » <a href="/site/2019/02/05/Linguistic-Knowledge-as-Memory-for-Recurrent-Neural-Networks.html"> Linguistic Knowledge as Memory for Recurrent Neural Networks </a></p>
+  </li>
+  <li>
+    <p>29 Jan 2019 » <a href="/site/2019/01/29/Diversity-is-All-You-Need-Learning-Skills-without-a-Reward-Function.html"> Diversity is All You Need - Learning Skills without a Reward Function </a></p>
+  </li>
+  <li>
+    <p>22 Jan 2019 » <a href="/site/2019/01/22/Modular-meta-learning.html"> Modular meta-learning </a></p>
+  </li>
+  <li>
+    <p>15 Jan 2019 » <a href="/site/2019/01/15/Hierarchical-RL-Using-an-Ensemble-of-Proprioceptive-Periodic-Policies.html"> Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies </a></p>
+  </li>
+  <li>
+    <p>08 Jan 2019 » <a href="/site/2019/01/08/Efficient-Lifelong-Learning-with-A-GEM.html"> Efficient Lifelong Learning with A-GEM </a></p>
+  </li>
+  <li>
+    <p>02 Jan 2019 » <a href="/site/2019/01/02/Pre-training-Graph-Neural-Networks-with-Kernels.html"> Pre-training Graph Neural Networks with Kernels </a></p>
+  </li>
+  <li>
+    <p>25 Dec 2018 » <a href="/site/2018/12/25/Smooth-Loss-Functions-for-Deep-Top-k-Classification.html"> Smooth Loss Functions for Deep Top-k Classification </a></p>
+  </li>
+  <li>
+    <p>18 Dec 2018 » <a href="/site/2018/12/18/Hindsight-Experience-Replay.html"> Hindsight Experience Replay </a></p>
+  </li>
+  <li>
+    <p>11 Dec 2018 » <a href="/site/2018/12/11/Representation-Tradeoffs-for-Hyperbolic-Embeddings.html"> Representation Tradeoffs for Hyperbolic Embeddings </a></p>
+  </li>
+  <li>
+    <p>01 Nov 2018 » <a href="/site/2018/11/01/Learned-Optimizers-that-Scale-and-Generalize.html"> Learned Optimizers that Scale and Generalize </a></p>
+  </li>
+  <li>
+    <p>25 Oct 2018 » <a href="/site/2018/10/25/One-shot-Learning-with-Memory-Augmented-Neural-Networks.html"> One-shot Learning with Memory-Augmented Neural Networks </a></p>
+  </li>
+  <li>
+    <p>18 Oct 2018 » <a href="/site/2018/10/18/BabyAI-First-Steps-Towards-Grounded-Language-Learning-With-a-Human-In-the-Loop.html"> BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop </a></p>
+  </li>
+  <li>
+    <p>11 Oct 2018 » <a href="/site/2018/10/11/Poincare-Embeddings-for-Learning-Hierarchical-Representations.html"> Poincaré Embeddings for Learning Hierarchical Representations </a></p>
+  </li>
+  <li>
+    <p>04 Oct 2018 » <a href="/site/2018/10/04/When-Recurrent-Models-Don-t-Need-To-Be-Recurrent.html"> When Recurrent Models Don’t Need To Be Recurrent </a></p>
+  </li>
+  <li>
+    <p>27 Sep 2018 » <a href="/site/2018/09/27/HoME-a-Household-Multimodal-Environment.html"> HoME - a Household Multimodal Environment </a></p>
+  </li>
+  <li>
+    <p>12 Sep 2018 » <a href="/site/2018/09/12/Emergence-of-Grounded-Compositional-Language-in-Multi-Agent-Populations.html"> Emergence of Grounded Compositional Language in Multi-Agent Populations </a></p>
+  </li>
+  <li>
+    <p>21 Aug 2018 » <a href="/site/2018/08/21/A-Semantic-Loss-Function-for-Deep-Learning-with-Symbolic-Knowledge.html"> A Semantic Loss Function for Deep Learning with Symbolic Knowledge </a></p>
+  </li>
+  <li>
+    <p>16 Aug 2018 » <a href="/site/2018/08/16/Hierarchical-Graph-Representation-Learning-with-Differentiable-Pooling.html"> Hierarchical Graph Representation Learning with Differentiable Pooling </a></p>
+  </li>
+  <li>
+    <p>08 Aug 2018 » <a href="/site/2018/08/08/Imagination-Augmented-Agents-for-Deep-Reinforcement-Learning.html"> Imagination-Augmented Agents for Deep Reinforcement Learning </a></p>
+  </li>
+  <li>
+    <p>19 Jul 2018 » <a href="/site/2018/07/19/Kronecker-Recurrent-Units.html"> Kronecker Recurrent Units </a></p>
+  </li>
+  <li>
+    <p>11 Jul 2018 » <a href="/site/2018/07/11/Learning-Independent-Causal-Mechanisms.html"> Learning Independent Causal Mechanisms </a></p>
+  </li>
+  <li>
+    <p>04 Jul 2018 » <a href="/site/2018/07/04/Memory-Based-Parameter-Adaption.html"> Memory-based Parameter Adaptation </a></p>
+  </li>
+  <li>
+    <p>09 Jun 2018 » <a href="/site/2018/06/09/Born-Again-Neural-Networks.html"> Born Again Neural Networks </a></p>
+  </li>
+  <li>
+    <p>21 May 2018 » <a href="/site/2018/05/21/Net2Net-Accelerating-Learning-via-Knowledge-Transfer.html"> Net2Net-Accelerating Learning via Knowledge Transfer </a></p>
+  </li>
+  <li>
+    <p>06 May 2018 » <a href="/site/2018/05/06/Learning-to-Count-Objects-in-Natural-Images-for-Visual-Question-Answering.html"> Learning to Count Objects in Natural Images for Visual Question Answering </a></p>
+  </li>
+  <li>
+    <p>08 Apr 2018 » <a href="/site/2018/04/08/Neural-Message-Passing-for-Quantum-Chemistry.html"> Neural Message Passing for Quantum Chemistry </a></p>
+  </li>
+  <li>
+    <p>02 Apr 2018 » <a href="/site/2018/04/02/Unsupervised-Learning-By-Predicting-Noise.html"> Unsupervised Learning by Predicting Noise </a></p>
+  </li>
+  <li>
+    <p>25 Mar 2018 » <a href="/site/2018/03/25/The-Lottery-Ticket-Hypothesis-Training-Pruned-Neural-Networks.html"> The Lottery Ticket Hypothesis - Training Pruned Neural Networks </a></p>
+  </li>
+  <li>
+    <p>18 Mar 2018 » <a href="/site/2018/03/18/Cyclical-Learning-Rates-for-Training-Neural-Networks.html"> Cyclical Learning Rates for Training Neural Networks </a></p>
+  </li>
+  <li>
+    <p>11 Mar 2018 » <a href="/site/2018/03/11/Improving-Information-Extraction-by-Acquiring-External-Evidence-with-Reinforcement-Learning.html"> Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning </a></p>
+  </li>
+  <li>
+    <p>05 Mar 2018 » <a href="/site/2018/03/05/An-Empirical-Investigation-of-Catastrophic-Forgetting-in-Gradient-Based-Neural-Networks.html"> An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks </a></p>
+  </li>
+  <li>
+    <p>24 Feb 2018 » <a href="/site/2018/02/24/Learning-a-SAT-Solver-from-Single-Bit-Supervision.html"> Learning an SAT Solver from Single-Bit Supervision </a></p>
+  </li>
+  <li>
+    <p>17 Feb 2018 » <a href="/site/2018/02/17/Neural-Relational-Inference-for-Interacting-Systems.html"> Neural Relational Inference for Interacting Systems </a></p>
+  </li>
+  <li>
+    <p>11 Feb 2018 » <a href="/site/2018/02/11/Stylistic-Transfer-in-Natural-Language-Generation-Systems-Using-Recurrent-Neural-Networks.html"> Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks </a></p>
+  </li>
+  <li>
+    <p>05 Feb 2018 » <a href="/site/2018/02/05/Get-To-The-Point-Summarization-with-Pointer-Generator-Networks.html"> Get To The Point - Summarization with Pointer-Generator Networks </a></p>
+  </li>
+  <li>
+    <p>29 Jan 2018 » <a href="/site/2018/01/29/StarSpace-Embed-All-The-Things.html"> StarSpace - Embed All The Things! </a></p>
+  </li>
+  <li>
+    <p>22 Jan 2018 » <a href="/site/2018/01/22/Emotional-Chatting-Machine-Emotional-Conversation-Generation-with-Internal-and-External-Memory.html"> Emotional Chatting Machine - Emotional Conversation Generation with Internal and External Memory </a></p>
+  </li>
+  <li>
+    <p>14 Jan 2018 » <a href="/site/2018/01/14/Exploring-Models-and-Data-for-Image-Question-Answering.html"> Exploring Models and Data for Image Question Answering </a></p>
+  </li>
+  <li>
+    <p>06 Jan 2018 » <a href="/site/2018/01/06/How-transferable-are-features-in-deep-neural-networks.html"> How transferable are features in deep neural networks </a></p>
+  </li>
+  <li>
+    <p>31 Dec 2017 » <a href="/site/2017/12/31/Distilling-the-Knowledge-in-a-Neural-Network.html"> Distilling the Knowledge in a Neural Network </a></p>
+  </li>
+  <li>
+    <p>24 Dec 2017 » <a href="/site/2017/12/24/PTE-Predictive-Text-Embedding-through-Large-scale-Heterogeneous-Text-Networks.html"> PTE - Predictive Text Embedding through Large-scale Heterogeneous Text Networks </a></p>
+  </li>
+  <li>
+    <p>11 Dec 2017 » <a href="/site/2017/12/11/Revisiting-Semi-Supervised-Learning-with-Graph-Embeddings.html"> Revisiting Semi-Supervised Learning with Graph Embeddings </a></p>
+  </li>
+  <li>
+    <p>28 Nov 2017 » <a href="/site/2017/11/28/Two-Stage-Synthesis-Networks-for-Transfer-Learning-in-Machine-Comprehension.html"> Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension </a></p>
+  </li>
+  <li>
+    <p>19 Nov 2017 » <a href="/site/2017/11/19/Higher-order-organization-of-complex-networks.html"> Higher-order organization of complex networks </a></p>
+  </li>
+  <li>
+    <p>12 Nov 2017 » <a href="/site/2017/11/12/Network-Motifs-Simple-Building-Blocks-of-Complex-Networks.html"> Network Motifs - Simple Building Blocks of Complex Networks </a></p>
+  </li>
+  <li>
+    <p>05 Nov 2017 » <a href="/site/2017/11/05/Word-Representations-via-Gaussian-Embedding.html"> Word Representations via Gaussian Embedding </a></p>
+  </li>
+  <li>
+    <p>28 Oct 2017 » <a href="/site/2017/10/28/HARP-Hierarchical-Representation-Learning-for-Networks.html"> HARP - Hierarchical Representation Learning for Networks </a></p>
+  </li>
+  <li>
+    <p>22 Oct 2017 » <a href="/site/2017/10/22/Swish-A-self-gated-activation-function.html"> Swish - a Self-Gated Activation Function </a></p>
+  </li>
+  <li>
+    <p>15 Oct 2017 » <a href="/site/2017/10/15/Reading-Wikipedia-to-Answer-Open-Domain-Questions.html"> Reading Wikipedia to Answer Open-Domain Questions </a></p>
+  </li>
+  <li>
+    <p>01 Oct 2017 » <a href="/site/2017/10/01/Task-Oriented-Query-Reformulation-with-Reinforcement-Learning.html"> Task-Oriented Query Reformulation with Reinforcement Learning </a></p>
+  </li>
+  <li>
+    <p>22 Sep 2017 » <a href="/site/2017/09/22/Refining-Source-Representations-with-Relation-Networks-for-Neural-Machine-Translation.html"> Refining Source Representations with Relation Networks for Neural Machine Translation </a></p>
+  </li>
+  <li>
+    <p>27 Aug 2017 » <a href="/site/2017/08/27/Pointer-Networks.html"> Pointer Networks </a></p>
+  </li>
+  <li>
+    <p>21 Aug 2017 » <a href="/site/2017/08/21/Learning-to-Compute-Word-Embeddings-On-the-Fly.html"> Learning to Compute Word Embeddings On the Fly </a></p>
+  </li>
+  <li>
+    <p>07 Aug 2017 » <a href="/site/2017/08/07/R-NET-Machine-Reading-Comprehension-with-Self-matching-Networks.html"> R-NET - Machine Reading Comprehension with Self-matching Networks </a></p>
+  </li>
+  <li>
+    <p>24 Jul 2017 » <a href="/site/2017/07/24/ReasoNet-Learning-to-Stop-Reading-in-Machine-Comprehension.html"> ReasoNet - Learning to Stop Reading in Machine Comprehension </a></p>
+  </li>
+  <li>
+    <p>17 Jul 2017 » <a href="/site/2017/07/17/Principled-Detection-of-Out-of-Distribution-Examples-in-Neural-Networks.html"> Principled Detection of Out-of-Distribution Examples in Neural Networks </a></p>
+  </li>
+  <li>
+    <p>09 Jul 2017 » <a href="/site/2017/07/09/Ask-Me-Anything-Dynamic-Memory-Networks-for-Natural-Language-Processing.html"> Ask Me Anything -  Dynamic Memory Networks for Natural Language Processing </a></p>
+  </li>
+  <li>
+    <p>01 Jul 2017 » <a href="/site/2017/07/01/One-Model-To-Learn-Them-All.html"> One Model To Learn Them All </a></p>
+  </li>
+  <li>
+    <p>26 Jun 2017 » <a href="/site/2017/06/26/Two-Too-Simple-Adaptations-of-Word2Vec-for-Syntax-Problems.html"> Two/Too Simple Adaptations of Word2Vec for Syntax Problems </a></p>
+  </li>
+  <li>
+    <p>17 Jun 2017 » <a href="/site/2017/06/17/A-Decomposable-Attention-Model-for-Natural-Language-Inference.html"> A Decomposable Attention Model for Natural Language Inference </a></p>
+  </li>
+  <li>
+    <p>03 Jun 2017 » <a href="/site/2017/06/03/A-Fast-and-Accurate-Dependency-Parser-using-Neural-Networks.html"> A Fast and Accurate Dependency Parser using Neural Networks </a></p>
+  </li>
+  <li>
+    <p>23 May 2017 » <a href="/site/2017/05/23/Neural-Module-Networks.html"> Neural Module Networks </a></p>
+  </li>
+  <li>
+    <p>14 May 2017 » <a href="/site/2017/05/14/Making-the-V-in-VQA-Matter-Elevating-the-Role-of-Image-Understanding-in-Visual-Question-Answering.html"> Making the V in VQA Matter - Elevating the Role of Image Understanding in Visual Question Answering </a></p>
+  </li>
+  <li>
+    <p>07 May 2017 » <a href="/site/2017/05/07/Conditional-Similarity-Networks.html"> Conditional Similarity Networks </a></p>
+  </li>
+  <li>
+    <p>28 Apr 2017 » <a href="/site/2017/04/28/Simple-Baseline-for-Visual-Question-Answering.html"> Simple Baseline for Visual Question Answering </a></p>
+  </li>
+  <li>
+    <p>27 Apr 2017 » <a href="/site/2017/04/27/VQA-Visual-Question-Answering.html"> VQA-Visual Question Answering </a></p>
+  </li>
+</ul>
diff --git a/_site/site/atom.xml b/_site/site/atom.xml
new file mode 100644
index 00000000..7c28fad4
--- /dev/null
+++ b/_site/site/atom.xml
@@ -0,0 +1,17028 @@
+<?xml version="1.0" encoding="utf-8"?>
+<feed xmlns="http://www.w3.org/2005/Atom">
+
+ <title></title>
+ <link href="/atom.xml" rel="self"/>
+ <link href="/"/>
+ <updated>2023-02-12T14:07:39-05:00</updated>
+ <id></id>
+ <author>
+   <name></name>
+   <email></email>
+ </author>
+
+ 
+ <entry>
+   <title>Toolformer - Language Models Can Teach Themselves to Use Tools</title>
+   <link href="/site/2023/02/10/Toolformer-Language-Models-Can-Teach-Themselves-to-Use-Tools.html"/>
+   <updated>2023-02-10T00:00:00-05:00</updated>
+   <id>/site/2023/02/10/Toolformer - Language Models Can Teach Themselves to Use Tools</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper presents Toolformer, a language model that uses simple APIs to use external tools (calculator, QA system, search engine, translation system, and calendar).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/2302.04761&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Starting with a language model, M, the goal is to enable the language model to use tools by invoking API calls.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;An API call is denoted by the tuple $c =$ (api_name, api_input). It can be linearized as $e(c) =$ [api_name(api_input)$]$ or as $e(c, r) = [$api_name(api_input) $ -&amp;gt; r]$ where $r$ denotes the result of the API.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The given dataset of plain text, $C$, is converted into a dataset $C*$ augmented with the API calls using a three-step process.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the first step, a position ($i$) and API call candidates (for the position $i$) are sampled.&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Positions are sampled by (i) computing the probability that M assigns to starting an API call for each position and (ii) retaining the top-$k$ positions with a probability greater than a threshold value.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;For each of the sampled positions (say $i$), API calls are sampled by concatenating a prompt to the tokens till index $i$ and sampling from the model M. Examples that do not generate the “end of the API” token (i.e.,”]”) are discarded.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the second step, the API calls are executed to obtain response $r$ (text sequence).&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;API calls are filtered using the following criteria: if providing M with both the input and the output of the API makes it easier for M to predict the future token, compared to not using the API call at all or just using the input to the API, then the API call is helpful for M, and the example should be retained.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the last step, the remaining API calls are merged to obtain the augmented dataset $C*$ that is used for finetuning M.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Note that $C*$ contains $C$, so M is finetuned on the original dataset and examples where a tool is helpful.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;During inference, the model is used for decoding in the usual way. Decoding is stopped when it produces the “-&amp;gt;” token, and the corresponding API is used to generate the response. The decoding process (using the model) resumes with the API output appended to the decoded text.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;tools&quot;&gt;Tools&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;There are two constraints on the tools: (i) their input and output should be expressible as text, and (ii) few demonstrations can be obtained from the tools. The second constraint means that the tool should be useable or accessible.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper considered the following tools: a question-answering system, a Wikipedia search engine, a calculator, a calendar, and a machine translation system. Of these, only the calculator and calendar are non-neural network tools.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Subset of CCNet is used as the language modeling dataset.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;GPT-J is used as the language model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For finetuning, the batch size is 128, the learning rate is 1e-5, and a linear warmup for the first 10% of training is used.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Following models are compared:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;GPT-J: Regular GPT-J model without any finetuning.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;GPT-J + CC: GPT-J finetuned on $C$ without any API calls.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Toolformer, i.e. GPT-J finetuned on $C*$.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Toolformer with API calls disabled during training.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;OPT 66B&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;GPT-3&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The models are evaluated in the prompted zero-shot setup, where models are instructed to solve a task without any in-context examples.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One difference from the standard greedy decoding is that the API call is used whenever it is one of the top-10 most likely next tokens. This is done to increase the use of API calls.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Evaluation Tasks&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;SQuAD, GoogleRE, and T-REx subsets of the LAMA benchmark where the model has to complete a short statement with a missing fact.&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Since LAMA questions are based on Wikipedia, Toolformer isn’t allowed to use Wikipedia search.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;The evaluation criteria is to check if the correct word is among the first five words predicted by the model.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Toolformer uses the question-answering tool for most cases, outperforming all the baselines.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Math Dataset&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;eSDiv, SVAMP, and MAWPS benchmarks.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;The first number predicted by the model is considered to be the output.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Toolformer uses the calculator tool for most cases, thereby outperforming all the baselines.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Question Answering&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Web Questions, Natural Questions, and TriviaQA datasets.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;The evaluation criteria is to check if the correct word is among the first 20 words predicted by the model.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Question Answering tool is disabled for this setup.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Toolformer uses the Wikipedia tool for most cases, thereby outperforming all the baselines other than the much larger GPT-3 model.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Multilingual Question Answering&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;MLQA benchmark.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;The evaluation criteria is to check if the correct word is among the first ten words predicted by the model.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Toolformer uses the translation tool for most of the questions, with questions in Hindi being an exception.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;However, Toolformer does not consistently outperform the GPT-J baseline, likely because, for some languages, finetuning on CCNet could hurt performance.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Temporal Datasets&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;TEMPLAMA (cloze style queries where the answer changes with time) and DATESET (dataset generated through a series of templates and populated with random dates/durations).&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;While Toolformer outperforms the baselines for both datasets, it relies on the Wikipedia search and Question Answering tools (and not the calendar tool) for the LAMA dataset. On the DATESET dataset, it uses the calendar tool in the majority.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Language Modeling&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;WikiText and a subset of 10,000 randomly selected documents from CCNet (not used during training of M).&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Training on $C*$ does not increase perplexity (compared to training on C). In this experiment, the API calls are disabled during inference.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Varying the size of the underlying models show that the ability to use tools emerges only around 755M parameters.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;future-work&quot;&gt;Future Work&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Extending Toolformer to chain the use of tools and use tools interactively.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In some cases, the use of tools is very sample-inefficient.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Decision to use a tool does not account for the cost of using the tool.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Synthesized Policies for Transfer and Adaptation across Tasks and Environments</title>
+   <link href="/site/2021/03/29/Synthesized-Policies-for-Transfer-and-Adaptation-across-Tasks-and-Environments.html"/>
+   <updated>2021-03-29T00:00:00-04:00</updated>
+   <id>/site/2021/03/29/Synthesized Policies for Transfer and Adaptation across Tasks and Environments</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper studies transfer learning in RL, focusing on simultaneous transfer across both tasks and environments.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The key idea is to learn task and environment embeddings and compose them using a meta-rule, and the proposed approach is called SYNPO (Synthesized Policies).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1904.03276&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Three settings considered:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;em&gt;S1&lt;/em&gt;: Transfer to a new (environment, task) pair when the agent has been trained on the environment and the task before (but not simultaneously).&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;em&gt;S2&lt;/em&gt;: Transfer to a new (environment, task) pair where either the environment or the task is not seen previously.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;em&gt;S3&lt;/em&gt;: Transfer to a new (environment, task) pair where neither the environment nor the task is seen previously.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the second and third settings, the agent is allowed to collect some data in the new environment or task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The (environment, task) combinations that the agent has seen during training are referred to as &lt;em&gt;seen&lt;/em&gt; combinations, while the remaining combinations are referred to as the &lt;em&gt;unseen&lt;/em&gt; combinations.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The key idea is to:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;learn embeddings of environments and tasks&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;use these embeddings to compose a policy (parameterized as the linear combination of the policy basis).&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A disentanglement objective is used to decouple the task and environment embedding.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;policy-composition&quot;&gt;Policy Composition&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given an (environment, task) pair $z = (\epsilon, \tau)$, the policy is given as $\pi_z(a|s) \propto exp(\psi_s^TU(e_{\epsilon}, e_{\tau})\phi_{a} + b_{\pi} )) $.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Here $b_{\pi}$ is a scalar bias, $\psi_{s}$ and $\phi_{a}$ are state and action representations, $U$ is parameterized as the linear comination of $K$ basis matrices $\Theta_k$&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;$U(e_{\epsilon}, e_{\tau}) = \sum_{k=1}^{K}\alpha_k(e_{\epsilon}, e_{\tau})\Theta_k$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The basis matrices (denoted by $\Theta_k$) are shared across tasks while the coefficients ($\alpha_k$) are specific to the (environment, task) pair.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;During training, the agent also predicts rewards using the same set of basis but different coefficients.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;disentangling-environment-and-task-embeddings&quot;&gt;Disentangling environment and task embeddings&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given an (environment, task) pair, the agent is trained to decode the environment (and task) given the agent’s trajectory.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The sequence of state-action pairs (in the trajectory) is mapped to a sequence of state-action representations, given by $\psi_s^T\Theta_k\phi_{a}$&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiment-setup&quot;&gt;Experiment Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The agent is trained (and evaluated) on imitation learning (mostly) and reinforcement learning setup.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;environments&quot;&gt;Environments&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;GRIDWORLD&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Twenty $16 \times 16$ gird-aligned mazes that are similar in appearance but differ in topology.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The task is to collect colored blocks in a given order. In each task, the starting position of the agent and the position of the blocks is randomized.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Each environment has 20 tasks, leading to a total of 400 (environment, task) combinations.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1712.05474&quot;&gt;THOR&lt;/a&gt;&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;This is a 3D simulator where the agent is placed in indoor photo-realistic scenes.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The task is the search for objects and perform actions like “put cabbage on the fridge.”&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The setup uses 19 scenes (environments), with each environment comprising of 21 tasks.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;baselines&quot;&gt;Baselines&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;MLPs that concatenate state, environment embeddings, and task embedding.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1606.05312&quot;&gt;Successor feature model&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1609.07088&quot;&gt;Module Network&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Multi-task Learning where the distinction between the environments is ignored.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;GRIDWORLD&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;In the first setting (&lt;em&gt;S1&lt;/em&gt;)&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;SYNPO outperforms all the baselines.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;As the agent is trained on more (environment, task) combinations, its performance on the unseen combinations improves. This trend saturates when the &lt;em&gt;seem/total&lt;/em&gt; ratio reaches about 0.4 (i.e., training on 40% of all the combinations).&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Task disentanglement is more important than environment disentanglement.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;In the second and third setting (&lt;em&gt;S2&lt;/em&gt; and &lt;em&gt;S3&lt;/em&gt;)&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;The agent uses one demonstration from each test pair to finetune the embeddings.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;&lt;em&gt;S2&lt;/em&gt; is an easier setting than &lt;em&gt;S3&lt;/em&gt;.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Transfer learning across tasks is easier than transfer learning across environments.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;THOR&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;SYNPO outperforms all the baselines on both seen and unseen combinations.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Deep Neural Networks for YouTube Recommendations</title>
+   <link href="/site/2021/03/22/Deep-Neural-Networks-for-YouTube-Recommendations.html"/>
+   <updated>2021-03-22T00:00:00-04:00</updated>
+   <id>/site/2021/03/22/Deep Neural Networks for YouTube Recommendations</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper describes YouTube’s deep learning-based recommendation system.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://research.google/pubs/pub45530/&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;challenges&quot;&gt;Challenges&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Scale - Very large number of users and videos.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Freshness - Very large number of videos uploaded every hour. The recommendation system should take these new videos into account as well.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Noise - User satisfaction needs to be modeled from noisy implicit feedback signal as the explicit signal is very sparse.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;system-overview&quot;&gt;System Overview&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Two neural networks: one for candidate generation and another one for ranking.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Metrics&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Offline metrics like precision, recall, ranking loss&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;A/B testing via live experiments&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;candidate-generation&quot;&gt;Candidate Generation&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Input: events from a user’s YouTube activity history.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Output: small subset (hundreds) of videos.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Approach:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Recommendation is modeled as extreme multiclass classification.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Predict the video (from a corpus) that a user will watch at a given time.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The neural network’s task is to learn useful user embeddings, given the user’s context and history.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;For each positive class (relevant video), negative classes (non-relevant videos) are sampled from the video corpus.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Model Architecture&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;A feedforward network with input as user embeddings and context embeddings (watch history).&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Watch history is a variable-length sequence of video ids, where each video id is mapped to an embedding.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The sequence of video ids is mapped to a sequence of embeddings, and this sequence is averaged to obtain fixed-sized embedding.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Additional signals like demographic features and search query embeddings can be added along with the context embeddings.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The age of a video is also used as a feature during training to account for the freshness of the content. This feature is set to zero (or slightly negative) during inference.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Other Insights&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Training examples are generated from all YouTube watches, including the watches from the videos embedded on other sites, to surface new content.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Generating the same number of training examples per user is important to avoid a small set of active users from dominating the model training.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Predicting a user’s next watch leads to better results than predicting a randomly held-out watch. This can be attributed to the general consumption pattern of videos (e.g., episodes are usually watched in order).&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;ranking&quot;&gt;Ranking&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Input: list of candidate videos to rank from.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Output: score for each video.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Approach&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;A feedforward network (similar to candidate generation model) trained using logistic regression loss.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Feature representation&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Different types of features: categorical vs. continuous, univalent vs. multivalent, describes video vs. describes user or context.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Important signals include user’s interaction with the video (or similar videos), which source/channel added the video to the candidate set.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Embeddings are shared across features. For example, the representation for a video id remains the same, irrespective of whether it is being used for representing the “video to recommend” or the “last seen video.”&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Feature normalization and transformations like exponents (square or square root) for continuous variables improve the performance.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;To model the expected watch time, the logistic regression loss is weighted by the observed watch time. For example, if a video was watched, its weight is given by the observed watch time, and if the video was not watched, its weight is set to 1.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In practice, this means that the logistic regression model learns odds that approximate the expected watch time of the video.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>The Tail at Scale</title>
+   <link href="/site/2021/03/15/The-Tail-at-Scale.html"/>
+   <updated>2021-03-15T00:00:00-04:00</updated>
+   <id>/site/2021/03/15/The Tail at Scale</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper presents some causes for (temporary) high-latency episodes in large-scale online systems and techniques to mitigate their impact so that the tail of latency distribution remains short.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://research.google/pubs/pub40801/&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;why-does-variability-in-response-time-exist&quot;&gt;Why does variability in response time exist&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Shared resources between processes on the same node&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Background processes (daemons) could use cause a momentary spike in resource usage.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Processes running on different nodes may contend for global resources like shared file systems.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Maintenance activities like disk compaction or garbage collection.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Others like queueing, power limits, or energy management.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the case of large-scale systems, the component-level variability is further amplified.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;reducing-component-variability&quot;&gt;Reducing Component Variability&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Use differentiated service classes to prioritize user requests over non-interactive requests.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Reduce head-of-line blocking by breaking long-running requests into smaller requests.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Synchronize maintenance jobs across nodes to minimize the window for high latency.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Caching generally does not help to address tail latency.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;adapting-to-latency-variability&quot;&gt;Adapting to Latency Variability&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Two categories of adaptation approaches&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Within Request Short-Term Adaptations&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;These approaches are more relevant for services that perform many read queries on loosely consistent datasets.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Hedged Request&lt;/p&gt;
+
+            &lt;ul&gt;
+              &lt;li&gt;
+                &lt;p&gt;Send the request to multiple replicas, and once one of the replicas returns the result, cancel the other requests.&lt;/p&gt;
+              &lt;/li&gt;
+              &lt;li&gt;
+                &lt;p&gt;In practice, start by sending the request to only one replica. Send the secondary requests if the first request is outstanding for more than $95^{th}$ percentile of expected latency.&lt;/p&gt;
+              &lt;/li&gt;
+              &lt;li&gt;
+                &lt;p&gt;This introduces an additional $5\%$ load while substantially shortening the latency tail.&lt;/p&gt;
+              &lt;/li&gt;
+              &lt;li&gt;
+                &lt;p&gt;This approach work because often, the cause of latency is not the query itself but other factors like overloaded nodes.&lt;/p&gt;
+              &lt;/li&gt;
+            &lt;/ul&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Tied Request&lt;/p&gt;
+
+            &lt;ul&gt;
+              &lt;li&gt;
+                &lt;p&gt;Hedged request approach makes a tradeoff regarding how long to wait before initiating requests to other replicas. The sooner the request is made, the lower should be the latency in serving the request, but more will be the overall load in the system.&lt;/p&gt;
+              &lt;/li&gt;
+              &lt;li&gt;
+                &lt;p&gt;The load in the system can be reduced by “tieing” requests (sent to different replicas) so that as soon as one replica starts processing the request, it can notify the other replicas, which could drop the request or deprioritize it.&lt;/p&gt;
+              &lt;/li&gt;
+              &lt;li&gt;
+                &lt;p&gt;In practice, “tieing” requests means that each replica has the identity of other replicas which may execute the request.&lt;/p&gt;
+              &lt;/li&gt;
+              &lt;li&gt;
+                &lt;p&gt;Note that there is a short window (of the average network message delay) when multiple replicas could start executing the request. This can be mitigated if the client (issuing the requests) introduces a delay to twice the average network message delay.&lt;/p&gt;
+              &lt;/li&gt;
+            &lt;/ul&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Submit the request to the least loaded replica&lt;/p&gt;
+
+            &lt;ul&gt;
+              &lt;li&gt;This is less effective for reasons like the load on a replica can change after the request is made but before it is executed.&lt;/li&gt;
+            &lt;/ul&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Cross-Request Long-Term Adaptations&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;These approaches are more relevant for situations where different services have different throughput.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Micro-partitions&lt;/p&gt;
+
+            &lt;ul&gt;
+              &lt;li&gt;
+                &lt;p&gt;Generate more paritions than the number of nodes.&lt;/p&gt;
+              &lt;/li&gt;
+              &lt;li&gt;
+                &lt;p&gt;The partitions can be dynamically assigned to machines to ensure proper load balancing.&lt;/p&gt;
+              &lt;/li&gt;
+              &lt;li&gt;
+                &lt;p&gt;In case of machine failure, many nodes can be used to quickly re-create the micro-partitions instead of waiting on one machine to read one single large partition.&lt;/p&gt;
+              &lt;/li&gt;
+            &lt;/ul&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Selective Replication&lt;/p&gt;
+
+            &lt;ul&gt;
+              &lt;li&gt;With micro-partitioning, replicas for micro-partitions can be created ahead of time to achieve good load balancing.&lt;/li&gt;
+            &lt;/ul&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Latency induced probation&lt;/p&gt;
+
+            &lt;ul&gt;
+              &lt;li&gt;In some cases, removing a slow node can improve the overall latency of the system. The probated node can be re-incorporated when its latency improves.&lt;/li&gt;
+            &lt;/ul&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Large Information Retrieval Systems&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;In such systems, speed can be more critical than the quality of the result.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The system should return a “good enough” result that is available with low latency instead of waiting for the “best result” that is available with high latency.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;In some cases, a request could trigger an unexpected code path or cause some other exception that could slow down the entire system.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;In such cases, the &lt;em&gt;canary request&lt;/em&gt; technique can be used where the system sends the request initially to only 1 or 2 nodes. The request is sent over to the other nodes only after receiving a successful response from the initial nodes.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Requests that update state are easier to handle for several reasons:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;The scale of latency-critical modifications is generally small.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The update can be performed asynchronously after responding to the user.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Quorum-based approaches (often used for ensuring consistent updates) are inherently tail-tolerant.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Practical Lessons from Predicting Clicks on Ads at Facebook</title>
+   <link href="/site/2021/03/08/Practical-Lessons-from-Predicting-Clicks-on-Ads-at-Facebook.html"/>
+   <updated>2021-03-08T00:00:00-05:00</updated>
+   <id>/site/2021/03/08/Practical Lessons from Predicting Clicks on Ads at Facebook</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper describes several design choices for developing a model for predicting user response (clicks) on ads.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://research.fb.com/publications/practical-lessons-from-predicting-clicks-on-ads-at-facebook/&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experimental-setup&quot;&gt;Experimental Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model is trained/evaluated on offline data.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Evaluation metrics:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Normalized Cross-Entropy (or Normalized Entropy, NE)&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Defined as the predictive log-loss per impression, divided by the entropy of the background CTR (click-through rate).&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Background CTR is the average empirical CTR of the training data.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Lower normalized cross-entropy is better.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;The normalization term is important to make the metric insensitive to the background CTR. Otherwise, the log loss can easily be made low when background CTR is close to 0 or 1.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;NE can also be written as $RIG - 1$, where $RIG$ is the Relative Information Gain.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Calibration&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;Ratio of average estimated CTR and empirical CTR.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Area-Under-ROC (AUC) is a good metric for measuring ranking quality (among ads). However, it is &lt;strong&gt;not used&lt;/strong&gt; as a metric to avoid over-delivery or under-delivery of ads.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;implementation-details&quot;&gt;Implementation Details&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Feature Transformation&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;A given add impression, $e$, is transformed into a $n-$dimensional vector, $x$, where the $i^{th}$ index denotes the value of the $i^{th}$ categorical feature.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Continous features are binned, and the bin index is used as a categorical feature, thus applying a non-linear transformation to the features.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Categorical features that are tuple-like (i.e., have a tuple of values) can be converted into new categorical features by taking a cartesian product.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Boosted decision trees can be used to implement the previous two transformations in one go.&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Each tree is used as a categorical feature that takes the value of the index of the leaf node than an ad maps to.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;The paper used the Gradient Boosting Machine with the $L_2-$TreeBoost algorithm.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Using the tree feature transformation improves the Normalized Cross-Entropy by $3.4\%$.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Model&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Logistic Regression (LR) or Bayesian online learning scheme for probit regression (BOPR) algorithms are used for training a linear classifier model.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;While both LR and BOPR models provide similar performance, the LR model is half the BOPR model’s size and faster for performing training/inference.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;role-of-data-freshness&quot;&gt;Role of Data Freshness&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;When a model is trained on the data from a particular day and evaluated on data from the subsequent days, the model’s performance degrades as the delay between training and test set increases.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This highlights the importance of the freshness of the training data.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One straightforward approach can be to train the model every day.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Alternatively, the linear classifier can be trained using online learning, while the boosted decision tree can still be trained daily.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Different choices for setting the learning rate (for online training of linear classifier) are compared, and the &lt;a href=&quot;https://research.google/pubs/pub41159/&quot;&gt;per-coordinate learning rate&lt;/a&gt; is found to perform best in practice.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;generating-real-time-training-data&quot;&gt;Generating Real-Time Training Data&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;An “online joiner” system is used to generate real-time training data for the linear classifier.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The challenging part is, while there are data points with a “positive” label (i.e., the user clicked on the ad), there are no datapoints with a “negative” label (since there is no “no-click” button that the user can click).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;An impression is considered to have the “no-click” label if the user does not click on the ad within a (long) time window of seeing the ad.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Too short a time window could mislabel some impressions, while too long a time window will delay the real-time training data.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The online joiner performs a distributed stream-to-stream join on the stream of ad impressions and stream of ad clicks using a HashQueue.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A HashQueue:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;comprises of a First-In-First-Out queue as a buffer window and a hash map for fast random access to label impressions.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;supports three operations on key-value pairs: enqueue, dequeue, and lookup.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;memory-and-latency&quot;&gt;Memory and Latency&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Increasing the number of boosting trees shows diminishing returns, and most of the improvements come from the first 500 trees.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Top 10 features account for half of the total feature importance, while the last 300 features add less than 1% feature importance.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Features in the boosting model can be broadly classified as contextual or historical.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Historical feature provides much more explanatory power than the contextual features through contextual features are helpful to handle the cold start problem.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Models trained with just the contextual features rely more heavily on data freshness than models trained with just the historical features.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Uniform subsampling and negative downsampling techniques are used to limit the amount of training data.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the case of negative downsampling, the model needs to be re-calibrated as well.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Ad Click Prediction - a View from the Trenches</title>
+   <link href="/site/2021/03/01/Ad-Click-Prediction-a-View-from-the-Trenches.html"/>
+   <updated>2021-03-01T00:00:00-05:00</updated>
+   <id>/site/2021/03/01/Ad Click Prediction - a View from the Trenches</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper presents case studies from the experience of deploying an ad click-through rate (CTR) prediction model at Google.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper focuses on themes related to memory footprint, performance analysis, calibration, confidence in the predictions, and feature engineering.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://research.google/pubs/pub41159/&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;system-overview&quot;&gt;System Overview&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Features (corresponding to a given ad) include search query and the metadata in the ad. The features are very sparse.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Single layer, regularized Logistic Regression model is trained with Online Gradient Descent (same as Stochastic Gradient Descent, but in the online setting).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;From a memory perspective, it is important to minimize the size of the final model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Adding just the L1 penalty is not sufficient to produce weights that are precisely equal to 0.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;http://proceedings.mlr.press/v15/mcmahan11b.html&quot;&gt;“Follow The (Proximally) Regularized Leader” algorithm or FTRL-Proximal algorithm&lt;/a&gt; is used to learn sparse models without losing on the accuracy.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Using per-coordinate learning rates improves the performance at the cost of memory as both the sum of gradients and the sum of the square of gradients are tracked for each feature.&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;In practice, some of the cost can be alleviated by approximating that all the events containing a given feature have the same probability.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;In such a case, the sum of the square of gradients can be approximated using the counts of positive and negative events alone.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Some memory overhead can be reduced based on the following observation: the vast majority of features are extremely rare. Hence, it is not necessary to track the statistics for such rare features.&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;However, in an online setting, it is not known upfront as to which features will be sparse.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The paper proposes to use probabilistic feature inclusion - a feature is added to the model with probability $p$. Once it is added, the feature is not removed.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;An alternative approach is to use a rolling set of counting Bloom filters to check if a feature has appeared at least $n$ times in training. Bloom filters are probabilistic data structures and can return false positives.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Memory can also be saved by using fewer bits for encoding weights.&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Most of the weight coefficients lie in the range $(-2, 2)$, and a $16-$ bit encoding is used in place of $32$ or $64$ bit encoding.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;This quantization approach needs to account for roundoff problems. The fix is easy to implement.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;When training many models with similar hyperparameters, per-model learning rate counters can be replaced by statistics shared by all the models, thus reducing memory footprint.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A Single Value Structure is used to reduce the memory footprint when evaluating a very large set of model variants that differ only in addition/removal of a small subset of features.&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;All the models, that use a feature, share a single value structure corresponding to the feature. This reduces the memory overhead by order of magnitude.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;During the update, each model computes the weight updates corresponding to all the features that it is using. The updated weight is averaged across all the models and used to update the single value structure.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Since CTR datasets are generally highly imbalanced, the training data (for the negative class) can be subsampled to reduce the amount of data to train over. The loss component (corresponding to negative class) can be appropriately scaled up.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Metrics&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Offline metrics like AucLoss (1 - AUC), Log Loss, Squared Error&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Online loss is computed on the new training data (new incoming traffic)	before training on it.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The confidence in the model’s prediction is estimated using a heuristic called &lt;em&gt;uncertainty score&lt;/em&gt;. It can be measured using the dot product of the feature and the vector of learning rates.&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;The idea is that the learning rates already maintain a notion of uncertainty.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Features for which the learning rate is high are the features for which uncertainty is also high.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Calibrating Predictions&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;The calibration can be improved by applying correction functions $\tau_d(p)$ where $p$ is the predicted CTR, and $d$ is an element of a partition of the training data.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;$\tau$ can be modeled as $\gamma^{\kappa}$ where $\gamma$ and $\kappa$ are learned using Poisson regression.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Unsuccessful Experiments&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Aggressive feature hashing was tried to reduce the memory overhead. However, it leads to a significant loss in performance.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Using dropout did not help, probably because the features are sparse.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Using feature bagging hurt the AucLoss.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Feature vector normalization did not improve performance, probably because of per-coordinate learning rates and regularization.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Anatomy of Catastrophic Forgetting - Hidden Representations and Task Semantics</title>
+   <link href="/site/2021/02/22/Anatomy-of-Catastrophic-Forgetting-Hidden-Representations-and-Task-Semantics.html"/>
+   <updated>2021-02-22T00:00:00-05:00</updated>
+   <id>/site/2021/02/22/Anatomy of Catastrophic Forgetting - Hidden Representations and Task Semantics</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper studies the effect of catastrophic forgetting on representations in neural networks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/2007.07400&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Techniques:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Representational Similarity Measures&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Layer Freezing&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Layer Reset&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Datasets&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Split CIFAR-10&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;CIFAR-10 dataset is split into &lt;em&gt;m&lt;/em&gt; (=2) tasks, where each task is a &lt;em&gt;n&lt;/em&gt; way classification task.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;The underlying network has a shared trunk with &lt;em&gt;m&lt;/em&gt; heads, one head per task.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Split CIFAR-100 Distribution Shift&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;Each task requires distinguishing between &lt;em&gt;n&lt;/em&gt; CIFAR-100 &lt;em&gt;superclasses&lt;/em&gt; with training/test data corresponding to a &lt;em&gt;subset&lt;/em&gt; of constituent classes.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Network Architecture&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;VGG, ResNet and DenseNet&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;questions&quot;&gt;Questions&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Are all representations (throughout the network) equally responsible for forgetting?&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;em&gt;Higher&lt;/em&gt; layer (layers closer to the output) are the primary source of catastrophic forgetting.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1905.00414&quot;&gt;Central Kernel Alignment (CKA)&lt;/a&gt; technique is used to compare the similarity between the layer representations, before and after training on the second task.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Higher layer representations change significantly when training over two tasks while the lower layer representations remain stable.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;When finetuning on the second task, freezing the lower layers has only a minor effect on the accuracy of the second task.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;In &lt;em&gt;layer reset&lt;/em&gt; experiments, after training on the second task, the weights of some of the layers are reset to their values after training on the first task.&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;Resetting the weights of higher layers leads to significant improvement in the performance on the first task.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Do common approaches for countering catastrophic forgetting work by stabilizing the higher layers?&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Yes - both &lt;a href=&quot;https://arxiv.org/abs/1612.00796&quot;&gt;EWC&lt;/a&gt; and replay-based approaches counter catastrophic forgetting work by stabilizing the higher layers.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;This is demonstrated by showing that as the quadratic penalty for EWC (or fraction of data from replay buffer) increases (to reduce catastrophic forgetting), the representations for higher layers change less during the second task.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;When training over a sequence of tasks, are similar tasks more likely to be forgotten than different tasks?&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Setup I&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Training over a sequence of two binary classification tasks.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Task 1: Two related classes (say &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ship&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;truck&lt;/code&gt;)&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Task 2: Two related classes, which may or may not be related to the classes for Task 1. For example, the classes could be&lt;/p&gt;
+
+            &lt;ul&gt;
+              &lt;li&gt;
+                &lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cat&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;horse&lt;/code&gt; (not related to classes of the first task)&lt;/p&gt;
+              &lt;/li&gt;
+              &lt;li&gt;
+                &lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;plane&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;car&lt;/code&gt; (related to the classes of the first task)&lt;/p&gt;
+              &lt;/li&gt;
+            &lt;/ul&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Training over semantically similar tasks (here &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;plane&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;car&lt;/code&gt;) leads to less forgetting.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Setup II&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Training over a sequence of two classification tasks.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Task 1: Four classes where the classes can be grouped into two groups (say &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;deer&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dog&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ship&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;truck&lt;/code&gt;)&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Task 2: Two related classes, which may be related to group 1 or group 2. For example, the classes could be two animals or two objects.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;After training on the second task, classes (from Task 1), which are in the different group as classes from Task 2, are forgotten less.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Conclusion&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Task representational similarity is a function of both underlying data and optimization procedure.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Forgetting is most severe for task representations of intermediate similarity.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Representational similarity is necessary but not a sufficient condition for forgetting.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;How does catastrophic forgetting change as the task similarity changes?&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;If the model learns different representations for dissimilar tasks, increasing dissimilarity can help to avoid forgetting.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;When training the two-task, two-class (per task) CIFAR-10 setup with an “others” class (classes not already used in the setup), the forgetting is reduced.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>When Do Curricula Work?</title>
+   <link href="/site/2021/02/15/When-Do-Curricula-Work.html"/>
+   <updated>2021-02-15T00:00:00-05:00</updated>
+   <id>/site/2021/02/15/When Do Curricula Work</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper systematically investigates when does curriculum learning help.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/2012.03107&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;implicit-curricula&quot;&gt;Implicit Curricula&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Implicit curricula refers to the order in which a network learns data points when trained using stochastic gradient descent, with iid sampling of data.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;When training, let us say that the model makes a correct prediction for a given datapoint in the $i^{th}$ epoch (and correct prediction in all the subsequent epochs). The $i^{th}$ epoch is referred to as the &lt;em&gt;learned iteration&lt;/em&gt; of the datapoint  (iteration in which the datapoint was learned).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper studied multiple models (VGG, ResNet, WideResNet, DenseNet, and EfficientNet) with different optimizers (Adam and SGD with momentum).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The resulting implicit curricula are broadly consistent within the model families, making the following discussion less dependent on the model architecture.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;explicit-curricula&quot;&gt;Explicit Curricula&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;When defining an explicit curriculum, three important components stand out.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;scoring-function&quot;&gt;Scoring Function&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Maps a data point to a numerical score of &lt;em&gt;difficulty&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Choices:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Loss function for a model&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;em&gt;learned iteration&lt;/em&gt;&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Estimated c-score - It captures a given model’s consistency to correctly predict a given datapoint’s label when trained on an iid dataset (not containing the datapoint).&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The three scoring functions are computed for two models on the CIFAR dataset.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The resulting six scores have a high Spearman Rank correlation. Hence for the rest of the discussion, only the c-score is used.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;pacing-function&quot;&gt;Pacing Function&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;This function, denoted by $g(t)$, controls the size of the training dataset at step $t$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;At step $t$, the model would be trained on the first $g(t)$ examples (as per the ordering).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Choices: logarithmic, exponential, step, linear, quadratic, and root.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;order&quot;&gt;Order&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Order in which the data points are picked:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;em&gt;Curriculum&lt;/em&gt; - Ordering points from lowest score to highest and training on the easiest data points first.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;em&gt;Anti Curriculum&lt;/em&gt; - Ordering points from highest score to lowest and training on the hardest data points first.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;em&gt;Random&lt;/em&gt; - Randomly selecting the data points to train on.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;observations&quot;&gt;Observations&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper performed a hyperparameter sweep over 180 pacing functions and three orderings for three random seeds over the CIFAR10 and CIFAR100 datasets. For both the datasets, the best performance is obtained with random ordering, indicating that curricula did not give any benefits.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;However, the curriculum is useful when the number of training iterations is small.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It also helps with noisy data training (which is simulated by randomly permuting the labels).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The observations for the smaller CIFAR10/100 dataset generalize to slightly larger datasets like FOOD101 and FOOD101N.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Continual learning with hypernetworks</title>
+   <link href="/site/2021/02/08/Continual-learning-with-hypernetworks.html"/>
+   <updated>2021-02-08T00:00:00-05:00</updated>
+   <id>/site/2021/02/08/Continual learning with hypernetworks</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes the use of task-conditioned &lt;a href=&quot;https://shagunsodhani.com/papers-I-read/HyperNetworks&quot;&gt;HyperNetworks&lt;/a&gt; for lifelong learning / continual learning setups.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The idea is, the HyperNetwork would only need to remember the task-conditioned weights and not the input-output mapping for all the data points.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1906.00695&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://github.com/chrhenning/hypercl&quot;&gt;Author’s Implementation&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;terminology&quot;&gt;Terminology&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;$f$ denotes the network for the given $t^{th}$ task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;$h$ denotes the HyperNetwork that generates the weights for $f$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;$\Theta_{h}$ denotes the parameters of $h$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;$e^{t}$ denotes the input task-embedding for the $t^{th}$ task.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;When training on the $t^{th}$ task, the HyperNetworks generates the weights for the network $f$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The current task loss is computed using the generated weights, and the candidate weight update ($\Delta \Theta_{h}$) is computed for $h$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The actual parameter change is computed by the following expression:&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;$L_{total} = L{task}(\Theta_{h}, e^{T}, X^{T}, Y^{T}) + \frac{\beta_{output}}{T-1} \sum_{t=1}^{T-1} | f_{h}(e^{t}, \Theta_{h}^*) - f_{h}(e^{(t)}, \Theta_{h} + \Delta \Theta_{h} ))|^2$&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;$L_{task}$ is the loss for the current task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;$(X^{T}, Y^{T})$ denotes the training datapoints for the $T^{th}$ task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;$\beta_{output}$ is a hyperparameter to control the regularizer’s strength.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;$\Theta_{h}^*$ denotes the optimal parameters after training on the $T-1$ tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;$\Theta_{h} + \Delta \Theta_{h}$ denotes the one-step update on the current $h$ model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In practice, the task encoding $e^{t}$ is chunked into smaller vectors, and these vectors are fed as input to the HyperNetwork.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This enables the HyperNetwork to produce weights iteratively, instead of all at once, thus helping to scale to larger models.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper also considers the problem of inferring the task embedding from a given input pattern.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Specifically, the paper uses task-dependent uncertainty, where the task embedding with the least predictive uncertainty is chosen as the task embedding for the given unknown task. This approach is referred to as HNET+ENT.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper also considers using HyperNetworks to learn the weights for a task-specific generative model. This generative model will be used to generate pseudo samples for rehearsal-based approaches. The paper considers two cases:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;HNET+R where the replay model (i.e., the generative model) is parameterized using a HyperNetwork.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;HNET+TIR, where an auxiliary task inference classifier is used to predict the task identity.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Three setups are considered&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;CL1 - Task identity is given to the model.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;CL2 - Task identity is not given, but task-specific heads are used.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;CL3 - Task identity needs to be explicitly inferred.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;On the permuted MNIST task, the proposed approach outperforms baselines like Synaptic Intelligence and Online EWC, and the performance gap is more significant for larger task sequences.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Forward knowledge transfer is observed with the CIFAR datasets.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One potential limitation (which is more of a limitation of HyperNetworks) is that HyperNetworks may be harder to scale for larger models like ResNet50 or transformers, thus limiting their usefulness for lifelong learning use cases.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Zero-shot Learning by Generating Task-specific Adapters</title>
+   <link href="/site/2021/02/01/Zero-shot-Learning-by-Generating-Task-specific-Adapters.html"/>
+   <updated>2021-02-01T00:00:00-05:00</updated>
+   <id>/site/2021/02/01/Zero-shot Learning by Generating Task-specific Adapters</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper introduces HYPTER - a framework for zero-shot learning (ZSL) in text-to-text transformer models by training a &lt;a href=&quot;https://shagunsodhani.com/papers-I-read/HyperNetworks&quot;&gt;&lt;strong&gt;Hyp&lt;/strong&gt;erNetwork&lt;/a&gt; to generate task-specific &lt;a href=&quot;https://arxiv.org/abs/1902.00751&quot;&gt;adap&lt;strong&gt;ter&lt;/strong&gt;s&lt;/a&gt; from task descriptions.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The focus is on &lt;em&gt;in-task&lt;/em&gt; zero-shot learning (e.g., learning to predict an unseen class or relation) and not on &lt;em&gt;cross-task&lt;/em&gt; learning (e.g., training on sentiment analysis and evaluating on question-answering task).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/2101.00420&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;terminology&quot;&gt;Terminology&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;Task&lt;/em&gt; - a NLP task, like classification or question answering.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;Sub-task&lt;/em&gt;&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;A class/relation/question within a task.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Denotes by a tuple $(d, D)$ where $d$ is the language description while $D$ represents the subtask’s dataset.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Develop ZSL approach for transfer to new subtasks within a task, using the task description available for each subtask.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;HYPTER has two main parts:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Main network&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;A pretrained text-to-text network&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Instantiated as a BERT-Base/Large&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;HyperNetwork&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;Generates the weights for adapter networks that will be plugged into the main network.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;HyperNetwork has two parts:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Encoder&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Encodes the task description&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Instantiated as a RoBERTa-Base model&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Decoder&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Decodes the encoding into weights for multiple adapters (in parallel)&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Instantiated as a Feedforward Network&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model trains in two phases:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Main network is trained on all the data by concatenating the task description with the input.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Adapters are trained by sampling a task from the train set while keeping the main network frozen.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;While the idea is very promising and interesting, the evaluation felt quite limited. It uses just two datasets &lt;a href=&quot;https://leaderboard.allenai.org/zest/submissions/public&quot;&gt;Zero-shot learning from Task Descriptions&lt;/a&gt; and &lt;a href=&quot;https://eval.ai/web/challenges/challenge-page/689/overview&quot;&gt;Zero-shot Relation Extraction&lt;/a&gt; and shows some improvements over the baseline of directly finetuning with task descriptions as the prompt.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>HyperNetworks</title>
+   <link href="/site/2021/01/25/HyperNetworks.html"/>
+   <updated>2021-01-25T00:00:00-05:00</updated>
+   <id>/site/2021/01/25/HyperNetworks</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper explores HyperNetworks. The idea is to use one network (HyperNetwork) to generate the weights for another network.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1609.09106&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://github.com/hardmaru/supercell/blob/master/supercell.py&quot;&gt;Author’s implementation&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;h3 id=&quot;static-hypernetworks---hypernetworks-for-cnns&quot;&gt;Static HyperNetworks - HyperNetworks for CNNs&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Consider a $D$ layer CNN where the parameters for the $j^{th}$ layer are stored in a matrix $K^j$ of the shape $N_{in}f_{size} \times N_{out}f_{size}$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The HyperNetwork is implemented as a two-layer linear network where the input is a layer embedding $z^j$, and the output is $K^j$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The first layer (of the HyperNetwork) maps the input to $N_{in}$ different outputs using $N_{in}$ weight matrices.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The second layer maps the different $N_{in}$ inputs to $K_{i}$ using a shared matrix. The resulting $N_{in}$ (number of) $K_{i}$ matrices are concatenated to obtain $K^j$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;As a side note, HyperNetworks have much fewer params than the network for which it produces weights.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In a general case, the kernel dimensions (across layers) are not of the same size but integer multiples of some basic sizes. In that case, the HyperNetwork can generate kernels for the basic size, which can be concatenated to form larger kernels. This would require additional input embeddings but not require a change in the architecture of HyperNetwork.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;dynamic-hypernetworks---hypernetworks-for-rnns&quot;&gt;Dynamic HyperNetworks - HyperNetworks for RNNs&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;HyperRNNs/HyperLSTMs denote HyperNetworks that generates weights for RNNs/LSTMs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;HyperRNNs implement a form of relaxed weight sharing - an alternative to the full weight sharing of the traditional RNNs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;At any timestamp $t$, the input to the HyperRNN is the concatenated vector $x_{t}$ (input to the RNN at time $t$) and the hidden state $h_{t-1}$ of the RNN. The output is the weight for the main RNN at timestep $t$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In practice, a &lt;em&gt;weight scaling vector&lt;/em&gt; $d$ is used to reduce the memory footprint, which would otherwise be $dim$ times the memory of a standard RNN. $dim$ is the dimensionality of the embedding vector $z_j$.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;HyperNetworks are used to train standard CNNs for MNIST and ResNets for CIFAR 10. In these experiments, HyperNetworks slightly underperform the best performing models but uses much fewer parameters.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;HyperLSTMs trained on the Penn Treebank dataset and Hutter Prize Wikipedia dataset outperform the stacked LSTMs and perform similar to layer-norm LSTMs. Interestingly, using HyperLSTMs with layer-norm improves performance over HyperLSTMs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given the similar performance of HyperLSTMs and layer-norm LSTMs, the paper conducted an ablation study to understand if HyperLSTMs learned a weight adjustment policy similar to the statistics-based approach used by layer-norm LSTMs.&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;However, the analysis of the histogram of the hidden states suggests that using layer-norm reduces the saturation effect while in HyperLSTMs, the cell is saturated most of the time. This indicates that the two models are learning different policies.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;HyperLSTMs are also evaluated for handwriting sequence generation by training in the IAM online handwriting dataset.&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;While HyperLSTMs are quite effective on this task, combining them with layer-norm degrades the performance.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;On the WMT’14 En-to-Fr machine translation task, HyperLSTMs outperform LSTM based approaches.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Energy-based Models for Continual Learning</title>
+   <link href="/site/2021/01/18/Energy-based-Models-for-Continual-Learning.html"/>
+   <updated>2021-01-18T00:00:00-05:00</updated>
+   <id>/site/2021/01/18/Energy-based Models for Continual Learning</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes to use Energy-based Models (EBMs) for Continual Learning.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In classification tasks, the standard approach uses a cross-entropy objective function along with a normalized probability distribution.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;However, cross-entropy reduces all negative classes’ likelihood when updating the model for a given sample, potentially leading to catastrophic forgetting.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Classification can be seen as learning an EBM across separate classes.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;During an update, the energy for a pair of samples and its ground truth class decreases while the energy corresponding to the pairs of sample and negative classes increases.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Unlike the cross-entropy loss, EBMs allow choosing the negative classes to update.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/2011.12216&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;applications-of-ebms-for-continual-learning&quot;&gt;Applications of EBMs for Continual Learning&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;EBMs can be used for class-incremental learning without requiring a replay-buffer or generative model for replay.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;EBMs can be used for continual learning in setups without task boundaries, i.e., setups where the data distribution can change without a clear separation between tasks.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;ebms&quot;&gt;EBMs&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Boltzman distribution is used to define the conditional likelihood of label $y$, given an input $x$. ie, $p(y|x) = \frac{exp(E(x, y))}{Z(x)}$ where $Z(x) = \sum_{y \in Y}(-E(x, y))$. Here $E$ is the learnt energy function that maps an input-label pair to a scalar energy value.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;During training, the contrastive divergence loss is used.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;During inference, the class, for which the input-class pair has the least energy, is selected as the predicted class.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;ebms-for-continual-learning&quot;&gt;EBMs for Continual Learning&lt;/h2&gt;
+
+&lt;h3 id=&quot;selection-of-negative-samples&quot;&gt;Selection of Negative Samples&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper considers several strategies for the selection of negative samples:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;one negative class per sample. The negative class is sampled from the current batch of data. This selection approach performs best.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;all the negative classes in a batch are used for creating the negative samples.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;all the classes seen so far in training are used as the negative samples. This approach works the worst in practice.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given the flexibility of sampling the negative classes, EBMs can be used in the boundary-agnostic setups (where the data distribution can change smoothly without an explicit task boundary).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;energy-network&quot;&gt;Energy Network&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;EBMs take both the sample and the class as the input. The class can be treated as an attention filter to select the most relevant information between the sample and the class.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In theory, EBMs can train for any number of classes without knowing the number of classes beforehand. This is an advantage over the softmax-based approaches, where adding new classes requires changing the size of the softmax output layer.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;inference&quot;&gt;Inference&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;During inference, all the classes seen so far are evaluated via the energy function. The class, which corresponds to the least energy sample-class pair, is returned as the selected class.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;h3 id=&quot;datasets&quot;&gt;Datasets&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Split MNIST&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Permuted MNIST&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;CIFAR-10&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;CIFAR-100&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;results-in-boundary-aware-setting&quot;&gt;Results in Boundary-Aware Setting&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper outperforms the standard continual learning approaches that neither uses a replay-buffer nor a generative model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Additionally, the paper shows that for the same number of parameters, the effective capacity of EMB models is higher than the effective capacity of standard classification models.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper also shows that standard classification models tend to assign a high probability to new classes for both old and new data. EBMs assign the probability more uniformly (and correctly) across the classes.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In an ablation study, the paper shows that both label conditioning and contrastive divergence loss help in improving the performance of EBMs.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;results-in-boundary-agnostic-setting&quot;&gt;Results in Boundary-Agnostic Setting&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The performance gains in the boundary-agnostic setting are even more significant than the improvements in the boundary-aware setting.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>GPipe - Easy Scaling with Micro-Batch Pipeline Parallelism</title>
+   <link href="/site/2021/01/11/GPipe-Easy-Scaling-with-Micro-Batch-Pipeline-Parallelism.html"/>
+   <updated>2021-01-11T00:00:00-05:00</updated>
+   <id>/site/2021/01/11/GPipe - Easy Scaling with Micro-Batch Pipeline Parallelism</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper introduces GPipe, a pipeline parallelism library for scaling networks that can be expressed as a sequence of layers.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1811.06965&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;design&quot;&gt;Design&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Consider training a deep neural network with &lt;em&gt;L&lt;/em&gt; layers using &lt;em&gt;K&lt;/em&gt; accelerators (say GPUs).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each of the &lt;em&gt;i&lt;sup&gt;th&lt;/sup&gt;&lt;/em&gt; layer has its &lt;em&gt;forward&lt;/em&gt; function &lt;em&gt;f&lt;sub&gt;i&lt;/sub&gt;&lt;/em&gt;, &lt;em&gt;backward&lt;/em&gt; function &lt;em&gt;b&lt;sub&gt;i&lt;/sub&gt;&lt;/em&gt;, weights &lt;em&gt;w&lt;sub&gt;i&lt;/sub&gt;&lt;/em&gt; and a cost &lt;em&gt;c&lt;sub&gt;i&lt;/sub&gt;&lt;/em&gt; (say the memory footprint or computational time).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;GPipe partitions this network into &lt;em&gt;K&lt;/em&gt; cells and places the &lt;em&gt;i&lt;sup&gt;th&lt;/sup&gt;&lt;/em&gt; cell on the &lt;em&gt;i&lt;sup&gt;th&lt;/sup&gt;&lt;/em&gt; accelerator. Output from the &lt;em&gt;i&lt;sup&gt;th&lt;/sup&gt;&lt;/em&gt; accelerator is passed to the &lt;em&gt;i+1&lt;sup&gt;th&lt;/sup&gt;&lt;/em&gt; accelerator as input.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;During the forward pass, the input batch (of size &lt;em&gt;N&lt;/em&gt;) is divided into &lt;em&gt;M&lt;/em&gt; equal micro-batches. These micro-batches are pipelined through the &lt;em&gt;K&lt;/em&gt; accelerators one after another.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;During the backward pass, gradients are computed for each micro-batch. The gradients are accumulated and applied at the end of each minibatch.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In batch normalization, the statistics are computed over each micro-batch (used during training) and mini-batch (used during evaluation).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Micro-batching improves over the naive mode parallelism approach by reducing the underutilization of resources (due to the network’s sequential dependencies).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;performance-optimization&quot;&gt;Performance Optimization&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;GPipe supports re-materialization (or checkpointing), i.e., during the forward pass, only the output activations (at partition boundaries) are stored.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;During backward pass, the forward function is recomputed at each accelerator. This trades off the memory requirement with increased time.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One potential downside is that partitioning can introduce some idle time per accelerator (referred to as the bubble overhead). However, with a sufficiently large number of micro-batches ( more than 4 times the number of partitions), the bubble overhead is negligible.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;performance-analysis&quot;&gt;Performance Analysis&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Two different types of model architectures are compared: AmoebaNet convolutional model and Transformer sequence-to-sequence model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For AmoebaNet, the size of the largest trainable model (on a single 8GB Cloud TPU v2) increases from 82M to 318M. Further, a 1.8 billion parameter model can be trained on 8 accelerators (25x improvement in size using GPipe).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For transformers, GPipe scales the model size to 83.9 B parameters with 128 partitions (298x improvement in size compared to a single accelerator).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Since the computation is evenly distributed across transformer layers, the training throughput scales almost linearly with the number of devices.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Quantitative experiments on ImageNet and multilingual machine translation show that models can be effectively trained using GPipe.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Compositional Explanations of Neurons</title>
+   <link href="/site/2021/01/04/Compositional-Explanations-of-Neurons.html"/>
+   <updated>2021-01-04T00:00:00-05:00</updated>
+   <id>/site/2021/01/04/Compositional Explanations of Neurons</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper describes a method to explain/interpret the representations learned by individual neurons in deep neural networks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The explanations are generated by searching for logical forms defined by a set of composition operators (like OR, AND, NOT) over primitive concepts (like water).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/2006.14032&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;generating-compositional-explanations&quot;&gt;Generating compositional explanations&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given a neural network &lt;em&gt;f&lt;/em&gt;, the goal is to explain a neuron’s behavior (of this network) in human-understandable terms.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;http://netdissect.csail.mit.edu/&quot;&gt;Previous work&lt;/a&gt; builds on the idea that a good explanation is a description that identifies the inputs for which the neuron activates.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given a set of pre-defined atomic concepts $c \in C$ and a similarity measure $\delta(n, c)$ where $n$ represents the activation of the $n^{th}$ neuron, the explanation, for the $n^{th}$ neuron, is the concept most similar to $n$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For images, a concept could be represented as an image segmentation map. For example, the water concept can be represented by the segments of the images that show water.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The similarity can be measured by first thresholding the neuron activations (to get a neuron mask) and then computing the IoU score (or Jaccard Similarity) between the neuron mask and the concept.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One limitation of this approach is that the explanations are restricted to pre-defined concepts.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper expands the set of candidate concepts by considering the logical forms of the atomics concepts.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In theory, the search space would explode exponentially. In practice, it is restricted to explanations with at most $N$ atomics concepts, and beam search is performed (instead of exhaustive search).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Image Classification Setup&lt;/strong&gt;&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Neurons from the final 512-unit convolutional layer of a ResNet-18 trained on the &lt;a href=&quot;https://ieeexplore.ieee.org/abstract/document/7968387&quot;&gt;Places365 dataset&lt;/a&gt;.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Probing for concepts from &lt;a href=&quot;https://openaccess.thecvf.com/content_cvpr_2017/html/Zhou_Scene_Parsing_Through_CVPR_2017_paper.html&quot;&gt;ADE20k scenes dataset&lt;/a&gt; with atomic concepts defined by annotations in the &lt;a href=&quot;http://netdissect.csail.mit.edu/&quot;&gt;Broden dataset&lt;/a&gt;&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;NLI Setup&lt;/strong&gt;&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;BiLSTM baseline followed by MLP layers trained on &lt;a href=&quot;https://nlp.stanford.edu/projects/snli/&quot;&gt;Stanford Natural Language Inference (SNLI) corpus&lt;/a&gt;.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Probing the penultimate hidden layer (of the MLP component) for sentence-level explanations.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Concepts are created using the 2000 most common words in the validation split of the SNLI dataset.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Additional concepts are created based on the lexical overlap between premise and hypothesis.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;do-neurons-learn-compositional-concepts&quot;&gt;Do neurons learn compositional concepts&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Image Classification Setup&lt;/strong&gt;&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;As $N$ increases, the mean IoU increases (i.e., the explanation quality increases) though the returns become diminishing beyond $N=10$.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Manual inspection of 128 neurons and their length 10 explanations show that 69% neurons learned some meaningful combination of concepts, while 31% learned some unrelated concepts.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The meaningful combination of concepts include:&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;perceptual abstraction that is also lexically coherent (e.g., “skyscraper OR lighthouse OR water tower”).&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;perceptual abstraction that is not lexically coherent (e.g., “cradle OR autobus OR fire escape”).&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;specialized abstraction of the form L1 AND NOT L2 (e.g. (water OR river) AND NOT blue).&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;NLI Setup&lt;/strong&gt;&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;As $N$ increases, the mean IoU increases (as in the image classification setup) though the IoU keeps increasing past $N=30$.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Many neurons correspond to lexical features. For example, some neurons are gender-sensitive or activate for verbs like sitting, eating or sleeping. Some neurons are activated when the lexical overlap between premise and hypothesis is high.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;do-interpretable-neurons-contribute-to-model-accuracy&quot;&gt;Do interpretable neurons contribute to model accuracy?&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;In image classification setup, the more interpretable the neuron is, the more accurate is the model (when the neuron is active).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;However, the opposite trend is seen in NLI models. i.e., the more interpretable neurons are less accurate.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Key takeaway - interpretability (as measured by the paper) is not correlated with performance. Given a concept space, the identified behaviors may be correlated or anti-correlated with the model’s performance.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;targeting-explanations-to-change-model-behavior&quot;&gt;Targeting explanations to change model behavior&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The idea is to construct examples that activate (or inhibit) certain neurons, causing a change in the model’s predictions.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;These adversarial examples are referred to as “copy-paste” adversarial examples.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For example, the neuron corresponding to “(water OR river) AND (NOT blue)” is a major contributor for detecting “swimming hole” classes. An adversarial example is created by making the water blue. This prompts the model to predict “grotto” instead of “swimming hole.”&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Similarly, in the NLI model, a neuron detects the word “nobody” in the hypothesis as highly indicative of contradiction. An adversarial example can be created by adding the word “nobody” to the hypothesis, prompting the model to predict contradiction while the true label should be neutral.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;These observations support the hypothesis that one can use explanations to create adversarial examples.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Design patterns for container-based distributed systems</title>
+   <link href="/site/2020/12/21/Design-patterns-for-container-based-distributed-systems.html"/>
+   <updated>2020-12-21T00:00:00-05:00</updated>
+   <id>/site/2020/12/21/Design patterns for container-based distributed systems</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The paper describes three design patterns for container-based distributed systems: single-container pattern, single-node pattern, and multi-node pattern.&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://www.usenix.org/conference/hotcloud16/workshop-program/presentation/burns&quot;&gt;Link to the paper&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;single-container-management-patterns&quot;&gt;Single-container management patterns&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Traditionally, containers have exposed three functions: run, pause and stop.&lt;/li&gt;
+  &lt;li&gt;A richer API can be implemented to provide fine-grained control to system developers and operators.&lt;/li&gt;
+  &lt;li&gt;Similarly, much more application information (including monitoring metrics) can be exposed.&lt;/li&gt;
+  &lt;li&gt;The container interface can be used to define a contract for a complex lifecycle. For example, instead of arbitrarily shutting down the container, the system could signal that it will be terminated, giving the container some time to perform some cleanup/follow-up actions.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;single-node-multi-container-pattern&quot;&gt;Single-node, multi-container pattern&lt;/h2&gt;
+
+&lt;h3 id=&quot;sidecar-pattern&quot;&gt;Sidecar pattern&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Multiple containers extend and enhance the main container.&lt;/li&gt;
+  &lt;li&gt;For example, a web-server serves from the local disk (main container) while a side container updates the data.&lt;/li&gt;
+  &lt;li&gt;Benefits:
+    &lt;ul&gt;
+      &lt;li&gt;independent development, deployment, and scaling of containers&lt;/li&gt;
+      &lt;li&gt;possibility of combining different type of containers&lt;/li&gt;
+      &lt;li&gt;failure containment boundary, i.e., one failing container, need not bring down the entire system.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;ambassador-pattern&quot;&gt;Ambassador pattern&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Proxy communication to and from the main container with the ambassador hiding the complexities of communication with a distributed (multi-shard system) that may be written in a different language.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;adapter-pattern&quot;&gt;Adapter pattern&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Standardize output and interfaces across the containers to provide a simple, homogenized view to external applications.&lt;/li&gt;
+  &lt;li&gt;A common example is using a single tool for collecting/processing metrics from multiple applications.&lt;/li&gt;
+  &lt;li&gt;This is different from the adapter pattern, which aims to provide a simplified view of the external world to an application.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;multi-node-application-patterns&quot;&gt;Multi-node application patterns&lt;/h2&gt;
+
+&lt;h3 id=&quot;leader-election-pattern&quot;&gt;Leader election pattern&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;In a sharded (or replication-based) system, the system may have to elect a leader (or multiple leaders) among the replicas (or shards).&lt;/li&gt;
+  &lt;li&gt;Instead of using a leader election library, a leader election container can be used (that communicates with other containers over, say, HTTP). This removes the restriction of using a leader election library compatible with the containers (e.g., using the same language).&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;work-queue-pattern&quot;&gt;Work queue pattern&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;A work coordinator container can queue different containers, each of which may have a different implementation or dependencies, thus removing the restriction that all the works use the same runtime.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;scattergather-pattern&quot;&gt;Scatter/gather pattern&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;An external client sends a request to a root container.&lt;/li&gt;
+  &lt;li&gt;This container fans out the request to many containers that may perform the computation in parallel.&lt;/li&gt;
+  &lt;li&gt;The root container gathers these parallel computations’ results and aggregates them into a response to the external client.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Cassandra - a decentralized structured storage system</title>
+   <link href="/site/2020/12/14/Cassandra-a-decentralized-structured-storage-system.html"/>
+   <updated>2020-12-14T00:00:00-05:00</updated>
+   <id>/site/2020/12/14/Cassandra - a decentralized structured storage system</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Cassandra is a distributed storage system that runs over cheap commodity servers and handles high write throughput while maintaining low latency for read operations.&lt;/li&gt;
+  &lt;li&gt;At the time of writing, it was used to support the search for Facebook Inbox.&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://dl.acm.org/doi/10.1145/1773912.1773922&quot;&gt;Link to the paper&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://cassandra.apache.org/&quot;&gt;Link to the implementation&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;data-model&quot;&gt;Data Model&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;A table is a distributed multidimensional map.&lt;/li&gt;
+  &lt;li&gt;The key is a string (generally 16-36 bytes long), while the value is a structured object.&lt;/li&gt;
+  &lt;li&gt;Every operation under a single row key is atomic per replica.&lt;/li&gt;
+  &lt;li&gt;Columns are grouped together into sets called column families.&lt;/li&gt;
+  &lt;li&gt;There are two types of columns families:
+    &lt;ul&gt;
+      &lt;li&gt;Simple families.&lt;/li&gt;
+      &lt;li&gt;Super column families: visualized as a column family within a column family.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;Columns can be sorted by name or time (used to display results in time sorted order).&lt;/li&gt;
+  &lt;li&gt;The API supports insert, get and delete operations.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;system-architecture&quot;&gt;System Architecture&lt;/h2&gt;
+
+&lt;h3 id=&quot;handling-requests&quot;&gt;Handling Requests&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Any read/write request gets routed to any node in the cluster. The node determines the replicas for a given key and routes the request.&lt;/li&gt;
+  &lt;li&gt;For write query, the system waits for a quorum of replicas to acknowledge the writes’ completion.&lt;/li&gt;
+  &lt;li&gt;For read query, the system either routes the requests to the closest replica (might fetch stale results) or routes the requests to all replicas and waits for a quorum of responses.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;partitioning&quot;&gt;Partitioning&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Cassandra partitions data across the cluster using consistent hashing with an order-preserving hash function.&lt;/li&gt;
+  &lt;li&gt;The hash function’s output range is treated as a fixed circular ring, and each node is assigned a random position on the ring.&lt;/li&gt;
+  &lt;li&gt;An incoming request specifies a key used to route requests.&lt;/li&gt;
+  &lt;li&gt;One benefit of this approach is that the addition/removal of a node only affects its immediate neighbors.&lt;/li&gt;
+  &lt;li&gt;However, randomly assigning nodes leads to non-uniform data and load distribution.&lt;/li&gt;
+  &lt;li&gt;Cassandra uses the load information and moves lightly loaded nodes to reduce the load on other nodes.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;replication&quot;&gt;Replication&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Each data item is replicated at N hosts, where N is the per-instance replication factor.&lt;/li&gt;
+  &lt;li&gt;Cassandra supports the following replication policies: Rack Unaware, Rack Aware (within a datacenter), and Datacenter Aware.&lt;/li&gt;
+  &lt;li&gt;For “Rack Aware” and “Datacenter Aware” strategies, Zookeeper elects a leader among the nodes and holds metadata about which range a node is responsible for.&lt;/li&gt;
+  &lt;li&gt;In case of node failure and network partitions, the quorum requirements are relaxed.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;membership&quot;&gt;Membership&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Cluster membership is based on Scuttlebutt, a very efficient anti-entropy Gossip based mechanism.&lt;/li&gt;
+  &lt;li&gt;Cassandra uses a modified version of $\phi$ Accrual Failure Detector for detecting failures, which provides the suspicion level (of failure) for each node.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;bootstrapping&quot;&gt;Bootstrapping&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;A node, starting for the first time, chooses a random position in the ring.&lt;/li&gt;
+  &lt;li&gt;This information is persisted on the local disk, on Zookeeper, and gossiped around the cluster (so any node can route any query in the cluster).&lt;/li&gt;
+  &lt;li&gt;During bootstrapping, the newly joined node reads a list of contact points (within the cluster) using a configuration file.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;local-persistence&quot;&gt;Local Persistence&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Generally, a write operation involves a write into a commit log (for durability and recoverability), followed by a write into the in-memory data structures.&lt;/li&gt;
+  &lt;li&gt;A read operation starts with querying the in-memory data and then looks into the filesystem.&lt;/li&gt;
+  &lt;li&gt;Read queries on the filesystem use bloom filters.&lt;/li&gt;
+  &lt;li&gt;Column indices are maintained to make it faster to look up relevant columns.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;implementation-details&quot;&gt;Implementation Details&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Components implemented in Java.&lt;/li&gt;
+  &lt;li&gt;System control messages use UDP while messages for replication and request routing uses TCP.&lt;/li&gt;
+  &lt;li&gt;A new commit log is rolled out after the older one exceeds 128MB of size.&lt;/li&gt;
+  &lt;li&gt;All the data is indexed using a primary key.&lt;/li&gt;
+  &lt;li&gt;Data on the disk is chunked into sequences of blocks. Each block contains at most 128 keys and is demarcated by a block index.&lt;/li&gt;
+  &lt;li&gt;When the data is written to the disk, a block index is generated and maintained in the memory for faster access.&lt;/li&gt;
+  &lt;li&gt;A compaction process is performed to merge multiple files (on disk) into one file.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;practical-experience&quot;&gt;Practical Experience&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Data from MySQL servers is added to Cassandra using MapReduce processes.&lt;/li&gt;
+  &lt;li&gt;Although Cassandra is a completely decentralized system, adding some coordination (via Zookeeper) is helpful.&lt;/li&gt;
+  &lt;li&gt;For Inbox Search, a per-user index is maintained for all the messages.&lt;/li&gt;
+  &lt;li&gt;For “term search”, the key is the userid, and the words in the message become the super column.&lt;/li&gt;
+  &lt;li&gt;For searching all the messages ever sent/received by a user, the key is the userid, and the recipient ids are the super columns.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>CAP twelve years later - How the rules have changed</title>
+   <link href="/site/2020/12/07/CAP-twelve-years-later-How-the-rules-have-changed.html"/>
+   <updated>2020-12-07T00:00:00-05:00</updated>
+   <id>/site/2020/12/07/CAP twelve years later - How the rules have changed</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The CAP theorem states that any system sharing data over the network can only have at most two (out of three) desirable properties:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;consistency (C), i.e., a single, up-to-date copy of the data;&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;high availability (A) of that data (for updates); and&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;tolerance to network partitions (P).&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This “2 of 3” formulation is misleading as it oversimplifies the interplay between properties.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://ieeexplore.ieee.org/abstract/document/6133253&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;acid-vs-base&quot;&gt;ACID vs. BASE&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;ACID is a design philosophy that focuses on consistency as reflected in the traditional relational databases.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The four properties in ACID are:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Atomicity (A), i.e., the operations are atomic, and either the entire operation succeeds or none of it succeeds.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Consistency (C), i.e., a transaction preserves all the rules. Note that the consistency in CAP is a subset of consistency in ACID.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Isolation (I), i.e., transactions occur in isolation and do not affect each other.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Durability (D), i.e., the transactions are durable irrespective of system failure.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;BASE is an alternate design philosophy that focuses on availability as reflected in the NoSQL databases.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The four properties in BASE are:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Basic Availability (BA), i.e., the database appears to work most of the time.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Soft state (S), i.e., the system’s state can change over time as it becomes eventually consistent.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Eventual consistency (E), i.e., the system will eventually become consistent over time.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;cap-confusion&quot;&gt;CAP confusion&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Generally, partitionability is seen as a must-have, thus reducing the choice to be between availability and consistency.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This view is somewhat misleading because the choice between C, A, and P is not binary but granular.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The choice between C and A can occur at various granularity levels, and different components (of a larger system) can prioritize different aspects.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Similarly, the CAP theorem generally ignores latency even though it is closely related to partitionability. For example, failing to achieve consistency within a time-bound (i.e., latency) implies a partition.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In general, there is no global notion of partition - some subset of nodes may experience a partition, and others may not.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Once a partition is detected, the system can then choose between C and A.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;managing-partitions&quot;&gt;Managing Partitions&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Three-step process for managing partitions:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Detect the start of a partition.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Enter an explicit partition mode that may limit some operations.&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Possible strategies:&lt;/p&gt;
+
+            &lt;ul&gt;
+              &lt;li&gt;
+                &lt;p&gt;Reduce availability by limiting some operations.&lt;/p&gt;
+              &lt;/li&gt;
+              &lt;li&gt;
+                &lt;p&gt;Record extra information that can be used during partition recovery.&lt;/p&gt;
+              &lt;/li&gt;
+            &lt;/ul&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;The strategy depends on the invariants that the system should maintain.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;For example, if the invariant is that the keys (in a table) should be unique, the system could allow duplicate keys for some time and perform a de-duplication step during partition recovery.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;A counterexample is a monetary transaction (e.g., charging a credit card). In such cases, the system could disable the operation and record it for performing later. Sometimes this “unavailability” is not visible to the user.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;History of operations (over replicas across different partitions) can be tracked using version vectors of the form (node, logical time). The system can easily recreate the order in which they were executed (or mark them as being concurrent).&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Initiate partition recovery when communication is restored and make the state across the partitions consistent.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;One common approach is to revert to the state when the partition was detected and apply the operations consistently across all the replicas.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;This may require some extra effort to merge conflicts.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;One workaround can be to constrain the use of certain operations so that the system does not encounter merge conflicts during recovery.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Sometimes, certain invariants may be violated when the system is in the partition mode and needs to be fixed during recovery.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The key takeaway is that when partitions exist, the choice between availability and consistency is not binary, and both can be optimized for.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Consistency Tradeoffs in Modern Distributed Database System Design</title>
+   <link href="/site/2020/11/30/Consistency-Tradeoffs-in-Modern-Distributed-Database-System-Design.html"/>
+   <updated>2020-11-30T00:00:00-05:00</updated>
+   <id>/site/2020/11/30/Consistency Tradeoffs in Modern Distributed Database System Design</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;CAP theorem has been influential in the design decisions for distributed databases.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;However, designers incorrectly assume that the CAP theorem “always” imposes restrictions in terms of the tradeoff between availability and consistency. In contrast, the tradeoff is applicable only in the case of partitions.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;CAP theorem led to the development of highly available systems with reduced consistency models (and reduced ACID guarantees).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Another tradeoff - between latency and consistency - has also been influential for database design.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper unifies CAP and latency-consistency tradeoffs into a single formulation called PACELC.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Note that some of the observations, especially ones about the databases, may be outdated now (the paper was written in 2012). However, the core message is still relevant.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://www.cs.umd.edu/~abadi/papers/abadi-pacelc.pdf&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;latency-consistency-tradeoff&quot;&gt;Latency-Consistency Tradeoff&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Low latency (or high availability) means that the system must replicate data.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In case of an update query, three possibilities arise:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;The system can choose to send data updates to all the replicas at once. This leads to two possibilities:&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;A replica can receive the update queries in an arbitrary order, thus breaking consistency with other replicas.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Alternatively, the replicas could use some protocol to agree on the order of updates. However, this can introduce latency.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The update queries can be first sent to a master replica.&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;The master replica can apply the updates and send them to the other replicas using one of the following strategies:&lt;/p&gt;
+
+            &lt;ul&gt;
+              &lt;li&gt;
+                &lt;p&gt;Synchronous replication where the master waits for all the updates to be applied to a replica(s). However, this approach introduces latency.&lt;/p&gt;
+              &lt;/li&gt;
+              &lt;li&gt;
+                &lt;p&gt;Asynchronous replication where the master assumes the update to be complete before it completes. In this case, the latency-consistency tradeoff depends on how read queries are handled:&lt;/p&gt;
+
+                &lt;ul&gt;
+                  &lt;li&gt;
+                    &lt;p&gt;The system can send all read queries to the master. In this case, there are no consistency issues, but additional latency is introduced because all the read queries go to the same replica, thus potentially overloading it.&lt;/p&gt;
+                  &lt;/li&gt;
+                  &lt;li&gt;
+                    &lt;p&gt;Alternatively, the read query can be served from any replica. While this improves read latency, the results can be inconsistent now.&lt;/p&gt;
+                  &lt;/li&gt;
+                &lt;/ul&gt;
+              &lt;/li&gt;
+              &lt;li&gt;
+                &lt;p&gt;Use a mix of Synchronous and Asynchronous replication - i.e., some of the write queries are Synchronous, and others are Asynchronous. In this case, the latency-consistency tradeoff depends on how read queries are handled:&lt;/p&gt;
+
+                &lt;ul&gt;
+                  &lt;li&gt;
+                    &lt;p&gt;If the read is routed to at least one replica that has been Synchrnously updated, the consistency can be preserved, with additional latency for discovering the updated replica, etc.&lt;/p&gt;
+                  &lt;/li&gt;
+                  &lt;li&gt;
+                    &lt;p&gt;If the read query can not be routed to an updated replica (maybe because none of the replicas is updated), then either latency suffers or inconsistent read can be performed.&lt;/p&gt;
+                  &lt;/li&gt;
+                &lt;/ul&gt;
+              &lt;/li&gt;
+            &lt;/ul&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The update query is first sent to an arbitrary replica.&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;This is the same as the previous case, with the query going to an arbitrary replica instead of the master replica, and suffers from the same latency issues as the last case.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In a nutshell, the tradeoff between latency and consistency  is always present, irrespective of network failure.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This contrasts with the CAP theorem, which imposes the tradeoff between availability and consistency only in the case of a network partition.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;pacelc&quot;&gt;PACELC&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;If there is a partition (P), how does the system tradeoff availability (A) and consistency (C); else (E), when the system is running without failures, how does the system tradeoff latency (L) and consistency (C)?&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The latency-consistency tradeoff (ELC) is relevant only when the data is replicated.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Default versions of Dynamo, Cassandra, and Riak were PA/EL systems, i.e., if a partition occurs, availability is prioritized. In the absence of partition, lower latency is prioritized.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Fully ACID systems (VoltDB, H-Store, and Megastore) and others like BigTable and HB are PC/EC, i.e., they prioritize consistency and give up availability and latency.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;MongoDB can be classified as a PA/EC system, while PNUTS is a PC/EL system.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Exploring Simple Siamese Representation Learning</title>
+   <link href="/site/2020/11/23/Exploring-Simple-Siamese-Representation-Learning.html"/>
+   <updated>2020-11-23T00:00:00-05:00</updated>
+   <id>/site/2020/11/23/Exploring Simple Siamese Representation Learning</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper shows that Siamese networks can be used for unsupervised learning with images without needing techniques like negative sample pairs, large batch training, or momentum encoders. The training mechanism is referred to as the SimSiam method.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/2011.10566&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;method&quot;&gt;Method&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given an input image &lt;em&gt;x&lt;/em&gt;, create two augmented views &lt;em&gt;x1&lt;/em&gt; and &lt;em&gt;x2&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;These views are processed by an encoder network &lt;em&gt;f&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One of the views (say &lt;em&gt;x1&lt;/em&gt;) is processed by the encoder &lt;em&gt;f&lt;/em&gt; as well as a predictor MLP &lt;em&gt;h&lt;/em&gt; to obtain a projection &lt;em&gt;p1&lt;/em&gt; ie &lt;em&gt;p1 = h(f(x1))&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The second view (&lt;em&gt;x2&lt;/em&gt;) is processed only by the encoder &lt;em&gt;f&lt;/em&gt; to obtain an encoding &lt;em&gt;z2&lt;/em&gt; i.e., &lt;em&gt;z2 = f(x2)&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Negative cosine similarity is minimized between &lt;em&gt;p1&lt;/em&gt; and &lt;em&gt;z2&lt;/em&gt; with the catch that the resulting gradients are not used to update the encoder via &lt;em&gt;z2&lt;/em&gt;. I.e., Loss = &lt;em&gt;D(p1, stopgrad(z2))&lt;/em&gt; where &lt;em&gt;D&lt;/em&gt; is the negative cosine similarity and &lt;em&gt;stopgrad&lt;/em&gt; is an operation that stops the flow of gradients.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In practice, both &lt;em&gt;p1, z2&lt;/em&gt; and &lt;em&gt;p2, z1&lt;/em&gt; pairs are used for computing the loss. ie  Loss = &lt;em&gt;0.5 * (D(p1, stopgrad(z2)) + D(p2, stopgrad(z1)))&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;implementation-details&quot;&gt;Implementation Details&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Encoder uses batch norm in all the layers (including output) while projection MLP uses batch norm only in the hidden layers.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;SGD optimizer with learning rate as &lt;em&gt;0.05 * batchsize / 256&lt;/em&gt;, cosine learning rate decay schedule and SGD momentum = 0.9.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Unsupervised pretraining on the ImageNet dataset followed by training a supervised linear classifier on the frozen representations.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Stop-gradient operation is necessary to avoid a degenerate solution. Without stop-gradient, the model maps all inputs to a constant &lt;em&gt;z&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;If the projection layer is removed, the method does not work (because of the loss’s symmetric nature). If the loss is also made asymmetric, the method still does not work without the projection layer. However, asymmetric loss + projection layer works.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Keeping the projection layer fixed (i.e., not updating during training) avoids collapse but leads to poor validation performance.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Training the projection layer with a constant learning rate works better in practice, likely because the projection layer needs to keep adapting before the encoder layer is sufficiently trained.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The method works well across different batch sizes.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Removing batch norm layers from all the layers in all the networks does not lead to collapse, though the model’s performance degrades on the validation dataset. Adding batch norm to the hidden layers alone is sufficient.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Adding batch norm to the encoder’s output further improves the performance but adding batch norm to all the layers of all the networks makes the training unstable, with the loss oscillating.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Overall, while batch norm helps to improve performance, it is not sufficient to avoid collapse.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The setup does not collapse when the cross-entropy loss replaces the cosine loss.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;what-is-simsiam-solving&quot;&gt;What is SimSiam solving?&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given that the stop-gradient operation seems to be the critical ingredient for avoiding collapse, the paper hypothesizes that SimSiam is solving a different optimization problem.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The hypothesis is that SimSiam is implementing an Expectation-Maximisation (EM) algorithm with two sets of variables and two underlying sub-problems.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper performs several experiments to test this hypothesis. For example, they consider &lt;em&gt;k&lt;/em&gt; SGD steps for the first problem before performing an update for the second problem, showing that the alternating optimization is a valid formulation, of which SimSiam is a particular case.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;comparison-to-other-methods&quot;&gt;Comparison to other methods&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;SimSiam achieves the highest accuracy among SimCLR, MoCo, BYOL, and SwAV for training under 100 epochs. However, it lags behind other methods when trained longer.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;SimSiam’s representations are transferable beyond the ImageNet tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Adding projection layer and stop-gradient operator to SimCLR does not improve its performance.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Data Management for Internet-Scale Single-Sign-On</title>
+   <link href="/site/2020/11/16/Data-Management-for-Internet-Scale-Single-Sign-On.html"/>
+   <updated>2020-11-16T00:00:00-05:00</updated>
+   <id>/site/2020/11/16/Data Management for Internet-Scale Single-Sign-On</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper describes the architecture of an erstwhile single-sign-on (SSO) service used by Google, called Google Accounts (2006).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Note that some of the metrics and design decisions may be outdated now (the paper was written in 2006). However, the core message is still relevant.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://www.usenix.org/legacy/event/worlds06/tech/prelim_papers/perl/perl.pdf&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;operational-constraints&quot;&gt;Operational Constraints&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;SSO’s availability affects the availability of all applications that require user sign-in.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Generally, systems can achieve high availability by sacrificing consistency, but given the nature of SSO (matching username/passwords), providing an inconsistent view is not a good option, and single-copy consistency is a usability requirement.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;berkeley-db&quot;&gt;Berkeley DB&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Berkeley DB is an embedded, high-performance, scalable, transactional storage system for key-value data and provides both keyed and sequential lookup.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It provides a primary copy replication model with a single writer (called master) and multiple read-only replicas.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;All writes are sent to the master, which first applies the changes and then propagates them to the replicas.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The master and the replicas have identical logs, and in case of master failure, a new master is elected from the replicas.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Some synchronization may be needed between the replicas in case, e.g., the master dies in between a transaction.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;sso-architecture&quot;&gt;SSO Architecture&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;SSO service maps usernames to user account data and services to service-specific data.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The SSO database is partitioned into shards, where each shard is a replicated Berkeley DB (having 5 to 15 replicas).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each replica stores the data in a B+-link tree data structure.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Consistent reads must go to the master, while non-master replicas can serve “ stale” reads.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the case of larger replication groups (say 15 replicas), only a subset of replicas can become master (“electable replicas”).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In general, replicas are spread geographically to handle machine-failure, network-failure, and data center-failure.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Replicas in a share are kept close to reduce the communication latency, which affects the time to commit a write operation or electing a new master.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Some of the shards implement ID-map, i.e., map of username to userid and userid to shards.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;database-integration&quot;&gt;Database Integration&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Berkeley DB leaves decisions regarding quorums, leases, etc., up to the application.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;quorums&quot;&gt;Quorums&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;SSO chooses a quorum protocol that guarantees that updates are never lost.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For the write queries, the master waits for a positive acknowledgment from a majority of the replicas, including itself, before marking the query as completed.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;When selecting a new leader, SSO requires a majority of replicas to agree. Moreover, Berkeley DB elections always choose a replica with the latest log entry during an election, thus guaranteeing that the new master’s log will include all the previous master’s updates.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;leases&quot;&gt;Leases&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The master holds a &lt;em&gt;master lease&lt;/em&gt; when responding to read queries and refreshes this lease periodically by communicating with a majority of replicas.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The lease guarantees that the master is not returning stale data if a partition or failure causes the master to lose its mastership, i.e., holding the lease guarantees that the master is still the master.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Moreover, elections can not be completed within the lease timeout interval.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;replica-group-membership&quot;&gt;Replica Group Membership&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;SSO maintains a replica configuration containing the logical (DNS) name and IP address of each replica.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In case of any changes to the configuration, the changes are specified in a file that the master reads periodically.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;If the configuration changes, the master initiates a configuration change and update the database.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Non-master replicas can get the new configuration from the database.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A new replica or a replica that lost state (say due to a failure) starts as a non-voting replica and can not participate in an election till it has caught up with the master as of the time the replica joined (again).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Searching for Build Debt - Experiences Managing Technical Debt at Google</title>
+   <link href="/site/2020/11/09/Searching-for-Build-Debt-Experiences-Managing-Technical-Debt-at-Google.html"/>
+   <updated>2020-11-09T00:00:00-05:00</updated>
+   <id>/site/2020/11/09/Searching for Build Debt - Experiences Managing Technical Debt at Google</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper describes the efforts to control and repay the technical debt in the build system at Google (called the Build Debt).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Guiding Principles:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Automate techniques to analyze and fix issues that contribute to technical debt.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Make it easier to do the right thing as developers can incur technical debt unknowingly.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Make it hard to do the wrong thing, e.g., by building stricter checks into the build process.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Note that some of the metrics and design decisions may be outdated now (the paper was written in 2012). However, the core message is still relevant.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37755.pdf&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;googles-build-system-debt&quot;&gt;Google’s Build System Debt&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;BUILD files encapsulate the specifications for building software.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Generally, these files are maintained manually, and the dependencies may not be up-to-date over time.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In extreme cases, some of the build targets are not built for months. Such targets are called zombie targets.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Originally, any project could depend on any other project’s internal details, thus creating (sometimes unwanted) couplings.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;If the lower-level project did not intend to expose some internal details, the unwanted couplings introduce technical debt and make it harder to modify the lower-level project.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One form of technical debt is the visibility debt or the cost of back-fitting visibility rules onto the existing build specifications to re-establish the appropriate encapsulations.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Another example of technical debt is dead code that can confuse the developers looking for useful APIs.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;dependency-debt&quot;&gt;Dependency Debt&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;Over-declared&lt;/em&gt; or &lt;em&gt;underutilized&lt;/em&gt; dependencies can slow the build and testing of systems.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;Under-declared&lt;/em&gt; dependencies can make the build process brittle and make it difficult to remove &lt;em&gt;over-declared&lt;/em&gt; dependencies.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Potential solutions for &lt;em&gt;over-declared&lt;/em&gt; dependencies include:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Setting aside some dedicated time for fixing build rules. But this approach is not automated, and potential breakages make it harder for developers to do the right thing.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Automatically add all the &lt;em&gt;under-declared&lt;/em&gt; dependencies to the BUILD files. The system can raise an error if a direct dependency is missing, making it harder to do the wrong thing.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Automation can be applied for finding/reporting the over-declared dependencies as well.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Potential solutions for &lt;em&gt;underutilized&lt;/em&gt; dependencies include:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;While it is challenging to automate fixing &lt;em&gt;underutilized&lt;/em&gt; dependencies, automating the discovery of such dependencies is still useful.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Highlighting dependencies with high cost and low removal effort could incentivize developers to clean up their projects.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;zombie-targets&quot;&gt;Zombie Targets&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Zombie targets can be identified by query the results of build and test runs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A target is marked as “dead” if the attempts to build it have failed for at least 90 days. Until then, build errors are considered to be transient.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A zombie target can be eliminated by deleting its definition from the BUILD and deleting the source files, which are reachable only via the zombie target.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;visibility-debt&quot;&gt;Visibility Debt&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Originally, the default visibility of all the targets was public, leading to unintended dependencies.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The visibility of all the existing builds was set to &lt;em&gt;legacy_public&lt;/em&gt;, and the default visibility was changed to private.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This encouraged developers to explicitly consider if they wanted other projects to depend on their project.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;dead-flags&quot;&gt;Dead Flags&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Google developed its command-line parsing utilities and defined a set of recognized command-line flags for libraries and binaries.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Overtime, the number of flags grew to half a million, and many of these flags are not useful anymore (i.e., dead).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;These dead flags can it hard to understand and refactor code.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Existing flags are analyzed to check which ones have always been set to the same value and replaced by those contents, clearing about 150 thousand flags.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Removing dead flags also helps to clean up dead/unreachable code.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>One Solution is Not All You Need - Few-Shot Extrapolation via Structured MaxEnt RL</title>
+   <link href="/site/2020/11/02/One-Solution-is-Not-All-You-Need-Few-Shot-Extrapolation-via-Structured-MaxEnt-RL.html"/>
+   <updated>2020-11-02T00:00:00-05:00</updated>
+   <id>/site/2020/11/02/One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Key idea: Practicing and remembering diverse solutions to a task can lead to robustness to that task’s variations.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes a framework to implement this idea - train multiple policies such that they are &lt;em&gt;collectively&lt;/em&gt; robust to a new distribution over environments while using a single training environment.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/2010.14484&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;During training, the agent has access to only one MDP.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;During the evaluation, the agent encounters a new MDP which has the same state and action space but may have a different reward and transition function.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The agent is allowed some interactions (say &lt;em&gt;k&lt;/em&gt;) with the test MDP and is then evaluated on the test MDP. The setup is referred to as &lt;em&gt;few-shot robustness&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;structured-maximum-entropy-reinforcement-learning-smerl&quot;&gt;Structured Maximum Entropy Reinforcement Learning (SMERL)&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Represent a set of policies using a latent variable policy (i.e., a policy conditioned on a latent variable &lt;em&gt;z&lt;/em&gt;).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This has two benefits: (i) Multiple policies can be represented by the same object, and (ii) diverse behaviors can be learned by encouraging the trajectories, corresponding to different &lt;em&gt;z&lt;/em&gt; to be different, while being able to solve the task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A diversity-inducing objective is used to encourage the agent to learn different trajectories for different &lt;em&gt;z&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Specifically, the mutual information between &lt;em&gt;p(Z)&lt;/em&gt; and marginal trajectory distribution for the latent variable policy is maximized, subject to the constraint that each policy achieves close to optimal returns in the train MDP.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The mutual information between &lt;em&gt;p(Z)&lt;/em&gt; and marginal trajectory distribution for the latent variable policy is lower bounded by the sum of mutual information terms over individual states (appearing in the trajectory).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;An unsupervised reward function is defined using the mutual information between states and latent variables.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;\(r(s, a) = log(q_{\phi})(z\|s) - log(p(z))\) where \(q_{\phi}\) is a learned discriminator.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This unsupervised reward is optimized for only when the policy achieves close to an optimal return, i.e., the environment return is close to the optimal return. Otherwise, the agent optimizes only for the environment return.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;implementation&quot;&gt;Implementation&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;SMERL is implemented using SAC with a latent variable maximum entropy policy.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The set of latent variables is a fixed discrete set \(Z\) and \(p(z)\) is set to be a uniform distribution over this set.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;At the start of an episode, a \(z\) is sampled and used throughout the episode.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Discriminator \(q_{\phi}(z\|s)\) is trained to infer \(z\) from the visited states.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A baseline SAC agent is trained beforehand to evaluate if the current training policy achieves close to optimal environment return.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;During the evaluation, the policy corresponding to each latent variable is executed in the test MDP, and the policy with the maximum return is returned.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;theoretical-analysis&quot;&gt;Theoretical Analysis&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given an MDP \(M\) and \(\epsilon&amp;gt;0\), the MDP robustness set is defined as the set of all MDPs \(M&apos;\) where the optimal policy of \(M&apos;\) produces the same trajectory distribution in \(M&apos;\) as \(M\). Moreover, on the training MDP \(M\), the optimal policies (corresponding to \(M\) and \(M&apos;\)) obtain similar returns.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper shows that SMERL generalizes to MDPs belong to the robustness set.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It also provides a simplified view of the optimization objective and shows how it naturally leads to a trajectory-centric mutual information objective.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Environments&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;2D navigation environments with point mass.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Mujoco Environments: HalfCheetah-Goal, Walker2d-Velocity, Hopper-Velocity.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;On the 2D navigation environment, the paper shows that SMERL learns to use different trajectories to reach the goal.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;On the Mujoco setup, the evaluation shows that SMERL generally outperforms the best-performing baseline or is close to the best-performing baseline on different tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Generally, higher train performance does not correlate with higher test performance, and there is no single policy that performs the best across all the tasks. Thus, it should be beneficial to learn multiple diverse policies that can be selected from during testing.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Learning Explanations That Are Hard To Vary</title>
+   <link href="/site/2020/10/19/Learning-Explanations-That-Are-Hard-To-Vary.html"/>
+   <updated>2020-10-19T00:00:00-04:00</updated>
+   <id>/site/2020/10/19/Learning Explanations That Are Hard To Vary</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The paper builds on the principle “good explanations are hard to vary” to propose that &lt;em&gt;invariant mechanisms&lt;/em&gt; can be identified by finding explanations (say model parameters) that are hard to vary across examples.&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2009.00329&quot;&gt;Link to the paper&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://github.com/gibipara92/learning-explanations-hard-to-vary&quot;&gt;Link to the code&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Collection of &lt;em&gt;d&lt;/em&gt; different datasets (from different environments). Each dataset is a collection of input-target tuples.&lt;/li&gt;
+  &lt;li&gt;Objective is to learn a function &lt;em&gt;f&lt;/em&gt; (also called &lt;em&gt;mechanism&lt;/em&gt;) to map the input to the target (for all the environments).&lt;/li&gt;
+  &lt;li&gt;The standard approach is to pool the loss for examples corresponding to the different environments and perform gradient updates on this average-pooled loss.&lt;/li&gt;
+  &lt;li&gt;In this standard gradient-based setup, the model may not learn invariances due to the following reasons:
+    &lt;ul&gt;
+      &lt;li&gt;Model learned the spurious features first, and now the training loss is too small.&lt;/li&gt;
+      &lt;li&gt;The pooled loss is generally computed by summing (or averaging) the loss corresponding to individual examples. Thus the gradient for each example is calculated independently. Each sample can be thought of as a dataset of size 1, for which all the features are relevant.&lt;/li&gt;
+      &lt;li&gt;Gradient descent with averaging (of gradients across the environments) greedily maximizes for the learning speed and not invariance.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;Performing arithmetic mean can be seen as performing an OR operation (i.e., the sum can be high if any one of the constituents is high), whereas performing geometric mean can be seen as performing an AND operation (i.e., the product can be high only if all the constituents are high).&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;invariant-learning-consistencyilc&quot;&gt;Invariant Learning Consistency(ILC)&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Given an algorithm \(A\), let \(\theta_{A}^{*}\) denote the set of convergence points of \(A\) when trained on all the environments.&lt;/li&gt;
+  &lt;li&gt;Each convergence point is associated with a consistency score.&lt;/li&gt;
+  &lt;li&gt;Intuitively, given a convergence point and an environment &lt;em&gt;e&lt;/em&gt;, find the set of parameters equivalent to the convergence point (in terms of loss) with respect to &lt;em&gt;e&lt;/em&gt;. Let’s call this set as &lt;em&gt;S&lt;/em&gt;.&lt;/li&gt;
+  &lt;li&gt;Evaluate the points in this set for all the remaining environments. For the given convergence point, an environment &lt;em&gt;e’&lt;/em&gt; is consistent with &lt;em&gt;e&lt;/em&gt; if the maximum difference in the loss for two environments is small, for all points belonging to &lt;em&gt;S&lt;/em&gt;.&lt;/li&gt;
+  &lt;li&gt;This idea is used to define the invariant learning consistency score for algorithm \(A\), which measures the expected consistency of the converged points (on the pooled data) across all the environments.&lt;/li&gt;
+  &lt;li&gt;The paper shows that the converged points’ consistency is linked to the Hessians’ geometric mean and that for the convex quadratic case, using the elementwise geometric mean of gradients improves consistency.&lt;/li&gt;
+  &lt;li&gt;However, there are some practical challenges:
+    &lt;ul&gt;
+      &lt;li&gt;Geometric mean is defined only when all signs are consistent. This issue can potentially be handled by treating different signs as 0.&lt;/li&gt;
+      &lt;li&gt;There is very little flexibility in “partial” agreement, and even a single zero gradient component can stop optimization for that component. This can probably be handled by not masking if many environments have a gradient for that component.&lt;/li&gt;
+      &lt;li&gt;Geometric component needs to be computed in the log-domain (for numerical scalability), but that can be computationally more expensive.&lt;/li&gt;
+      &lt;li&gt;When using adaptive optimizers like Adam, the exact magnitude of geometric mean will be ignored because of rescaling for the local curvature adaptation.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;Some of these challenges can be handled using average gradients when the geometric mean would be 0 and masking out components based on the sign.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;and-mask&quot;&gt;AND-mask&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The ideas from the previous section can be used to develop a practical algorithm called AND-mask.&lt;/li&gt;
+  &lt;li&gt;Zero-out gradients that have inconsistent signs across some threshold number (hyper-parameter) of environments.&lt;/li&gt;
+  &lt;li&gt;In the presence of purely random gradient patterns, the AND-mask decreases the signals’ strength exponentially fast.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;h3 id=&quot;synthetic-memorization-dataset&quot;&gt;Synthetic Memorization Dataset&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;This is a binary classification task with two kind of features: (i) “meaningful” features that are shared across environments but harder for the model to learn and (ii) “shortcut” features that are easy to learn but not shared across environments.&lt;/li&gt;
+  &lt;li&gt;While the dataset may look simple, it is difficult to find the invariant mechanism because the “shortcut” features allow for a simple, linear decision boundary, with a large margin that is fast to learn, has perfect accuracy, robust to input noise, and no iid generalization gap.&lt;/li&gt;
+  &lt;li&gt;Baselines:
+    &lt;ul&gt;
+      &lt;li&gt;MLPs trained with regularizers like dropout, L1, L2, and batch norm.&lt;/li&gt;
+      &lt;li&gt;Domain Adversarial Neural Networks (DANN)&lt;/li&gt;
+      &lt;li&gt;Invariant Risk Minimization (IRM)&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;In terms of results, AND-mask with L1/L2 regularizers gives the best results.&lt;/li&gt;
+  &lt;li&gt;Empirically, the paper shows that the signal from the “meaningful” features is present when the gradients are averaged, but their magnitude is much smaller than the signal from the “shortcut” features.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;experiments-on-cifar-10&quot;&gt;Experiments on CIFAR-10&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;A ResNet model is trained on the CIFAR-10 dataset with random labels, with and without the AND-mask.&lt;/li&gt;
+  &lt;li&gt;The model with the AND-mask did not memorize the data, whereas the model without the AND-mask did. As sanity, the paper ensured that both the models generalize well when trained with the original labels.&lt;/li&gt;
+  &lt;li&gt;Note that for this experiment, every example was treated to have come from its own environment.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;behavioral-cloning-on-coinrun&quot;&gt;Behavioral Cloning on CoinRun&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Train an expert policy using PPO for 400M steps on the full distribution of levels.&lt;/li&gt;
+  &lt;li&gt;Generate a dataset of state-action pairs. Training data consists of 1000 states from each of the 64 levels, while the test data comes from 2000 levels.&lt;/li&gt;
+  &lt;li&gt;A ResNet18 model is used as an imitation learning policy.&lt;/li&gt;
+  &lt;li&gt;The exact implementation of the AND-mask is a little more involved, but the key takeaway is that model trained with AND-mask identifies invariant mechanisms across different levels.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Remembering for the Right Reasons - Explanations Reduce Catastrophic Forgetting</title>
+   <link href="/site/2020/10/12/Remembering-for-the-Right-Reasons-Explanations-Reduce-Catastrophic-Forgetting.html"/>
+   <updated>2020-10-12T00:00:00-04:00</updated>
+   <id>/site/2020/10/12/Remembering for the Right Reasons - Explanations Reduce Catastrophic Forgetting</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The paper hypothesizes that catastrophic forgetting can happen if the model can not rely on “reasoning” used for an old datapoint. If that is the case, catastrophic forgetting may be alleviated when the model “remembers” why it made a prediction previously.&lt;/li&gt;
+  &lt;li&gt;The paper presents a simple instantiation of this hypothesis, in the form of a technique called Remembering for the Right Reasons (RRR).&lt;/li&gt;
+  &lt;li&gt;The idea is to store model explanations, along with previous examples in the replay buffer. During replay, an additional &lt;em&gt;explanation loss&lt;/em&gt; is used, along with the regular replay loss.&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2010.01528&quot;&gt;Link to the paper&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://github.com/SaynaEbrahimi/Remembering-for-the-Right-Reasons&quot;&gt;Link to the code&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The model is trained over a sequence of data distributions in the class-incremental learning setup. A single-head architecture is used so that the task ID is not required during inference.&lt;/li&gt;
+  &lt;li&gt;Along with the standard replay buffer (\(M^{rep}\)) for the raw input examples (from different tasks), another replay buffer (\(M^{RRR}\)) is maintained for storing the “explanations” (in the form of saliency maps), corresponding to examples in \(M^{rep}\).&lt;/li&gt;
+  &lt;li&gt;RRR is implemented as an L1 loss on the error between the saliency map generated after training on the current task and the saliency map in \(M^{RRR}\).&lt;/li&gt;
+  &lt;li&gt;Saliency maps need to be generated while the model is training. This requirement rules out black-box saliency methods, which can be used only after training.&lt;/li&gt;
+  &lt;li&gt;The gradient-based white-box explainability techniques that are used include:
+    &lt;ul&gt;
+      &lt;li&gt;Vanilla backpropagation - Perform a forward pass through the model and take the gradient of the given output class with respect to the input.&lt;/li&gt;
+      &lt;li&gt;Backpropagation with SmoothGrad - Saliency maps generated using Vanilla backpropagation can be visually noisy. These maps can be improved by adding pixel-wise Gaussian noise to &lt;em&gt;n&lt;/em&gt; copies of the image and averaging the resulting gradients. The paper used &lt;em&gt;n=40&lt;/em&gt;.&lt;/li&gt;
+      &lt;li&gt;Gradient-weighted Class Activation Mapping (Grad-CAM) - Uses gradients to determine the importance of feature map activations on a given prediction.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;RRR can be easily used with memory and regularization based approaches.&lt;/li&gt;
+  &lt;li&gt;The paper combined RRR with the following standard Class Incremental Learning (CIL) models:
+    &lt;ul&gt;
+      &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2003.11652&quot;&gt;iTAML : An incremental task-agnostic meta-learning approach&lt;/a&gt;&lt;/li&gt;
+      &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1807.09536&quot;&gt;End-to-end incremental learning (EEIL)&lt;/a&gt;&lt;/li&gt;
+      &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1905.13260&quot;&gt;Large scale incremental learning (BiC)&lt;/a&gt;&lt;/li&gt;
+      &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2004.10956&quot;&gt;TOpology-Preserving knowledge InCrementer (TOPIC)&lt;/a&gt;&lt;/li&gt;
+      &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1611.07725&quot;&gt;iCaRL: Incremental Classifier and Representation Learning&lt;/a&gt;&lt;/li&gt;
+      &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1612.00796&quot;&gt;Elastic Weight Consolidation&lt;/a&gt;&lt;/li&gt;
+      &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1606.09282&quot;&gt;Learning without forgetting&lt;/a&gt;&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;h3 id=&quot;few-shiot-class-incremental-learning&quot;&gt;Few-Shiot Class Incremental Learning&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;C-way K-shot class incremental learning with C classes and K training samples per class and b base classes to learn as the first task.&lt;/li&gt;
+  &lt;li&gt;Caltech-UCSD Birds dataset with 100 base classes and remaining 100 classes divided into ten tasks, with three samples per class. The test set is not changed.&lt;/li&gt;
+  &lt;li&gt;In teems of saliency maps., Grad-CAM is better than Vanilla Backpropagation, which in turn is comparable to SmoothGrad. The same trend is seen in terms of memory overhead, with Grad-CAM having the least memory overhead.&lt;/li&gt;
+  &lt;li&gt;Adding the RRR loss improves the performance of all the baselines.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;standard-class-incremental-learning&quot;&gt;Standard Class Incremental Learning&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;CIFAR100 and ImageNet100 with a memory budget of 2000 samples.&lt;/li&gt;
+  &lt;li&gt;Adding the RRR loss improves all the baselines’ performance, and the gains for ImageNet100 are more significant than the gains for CIFAR100.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;how-often-does-the-model-remember-its-decision-for-the-right-reason&quot;&gt;How often does the model remember its decision for the right reason?&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The paper uses the Pointing Game (PG) experiment, which uses the ground truth image segmentation to define the true object region.&lt;/li&gt;
+  &lt;li&gt;If the maximum attention location (in the predicted saliency map) falls inside the objects, it is considered a &lt;em&gt;hit&lt;/em&gt;, else a &lt;em&gt;miss&lt;/em&gt;. A &lt;em&gt;hit&lt;/em&gt; on a previous example is considered a proxy for the model remembering its decision for the right reason.&lt;/li&gt;
+  &lt;li&gt;The precision and recall are reported for the &lt;em&gt;hit&lt;/em&gt; metric. Using RRR increases both precision (i.e., less often the model makes the correct decision without looking at the right evidence) and recall (i.e., less frequently does the model makes an incorrect decision, despite looking at the proper evidence).&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>A Foliated View of Transfer Learning</title>
+   <link href="/site/2020/09/28/A-Foliated-View-of-Transfer-Learning.html"/>
+   <updated>2020-09-28T00:00:00-04:00</updated>
+   <id>/site/2020/09/28/A Foliated View of Transfer Learning</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper presents a formalism for transfer learning, offers a definition of relatedness between tasks, and proposes foliations as a mathematical framework to represent the relationship between tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/2008.00546&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The term &lt;em&gt;representation&lt;/em&gt; denotes a mechanism for &lt;em&gt;describing&lt;/em&gt; and &lt;em&gt;realizing&lt;/em&gt; abstract objects, thus allowing manipulation and reasoning about the objects. This description goes beyond the usual meaning (in deep learning), where &lt;em&gt;representation&lt;/em&gt; denotes some useful information about data.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;Relatedness&lt;/em&gt; describes &lt;em&gt;what&lt;/em&gt; changes between tasks. Consider a set of transformations (or functions) that convert one task to another. A &lt;em&gt;relationship&lt;/em&gt; between two tasks is an element of this transformation set.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given a transformation set, one can define a &lt;em&gt;set of related tasks&lt;/em&gt;, which is the set of all the tasks that can be transformed into each other using the functions from the given transformation set. This set of tasks is an equivalence class, and the transformation set is the equivalence relationship.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given two related tasks &lt;em&gt;t1&lt;/em&gt; and &lt;em&gt;t2&lt;/em&gt;, denote the corresponding models (trained on those tasks) as &lt;em&gt;m1&lt;/em&gt; and &lt;em&gt;m2&lt;/em&gt;. One can assume that &lt;em&gt;m1&lt;/em&gt; and &lt;em&gt;m2&lt;/em&gt; are related in the same way as &lt;em&gt;t1&lt;/em&gt; and &lt;em&gt;t2&lt;/em&gt; (equivariance).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Now, given a set of transformations, one can partition the space of continuous functions into non-overlapping spaces, which describe a set of related tasks. These spaces are referred to as the &lt;em&gt;parallel spaces&lt;/em&gt; or &lt;em&gt;transfer spaces&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The parallel space represents a lower dimension than the original space. So knowing which parallel space a model lies on can make it easier to find it. This is the primary motivation behind transfer learning - knowing the relationship between tasks can make it easier to find a solution to new tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Another way of partitioning the set of transformations is to use tessellation (e.g., Voronoi diagrams). Tasks in the same partition are similar to each other as compared to a task from another partition.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Two tasks are defined as &lt;em&gt;similar&lt;/em&gt; if the distance between them (under some distance metric) is small.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Similarity is a &lt;em&gt;geometric&lt;/em&gt; notion, while relatedness is a &lt;em&gt;transformative&lt;/em&gt; notion. Parallelized space is to relatedness what tessellation is to similarity.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The distinction between similarity and relatedness is quite nuanced, and the authors provide several examples to differentiate between them.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Similarity can only be measured in terms of a reference element (similar to what). For example, when one finetunes a pre-trained model on a new task, one assumes that the model’s pretraining task is similar to the current task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given a set (say &lt;em&gt;T&lt;/em&gt;), a &lt;em&gt;quantity&lt;/em&gt; (a function that maps elemenets of &lt;em&gt;T&lt;/em&gt; to a &lt;em&gt;k&lt;/em&gt; dimensional vector) is said to be &lt;em&gt;invariant&lt;/em&gt; with respect to a transformation &lt;em&gt;p&lt;/em&gt; (defined on &lt;em&gt;T&lt;/em&gt;) if &lt;em&gt;q(f) = q(p(f))&lt;/em&gt; ie the value of &lt;em&gt;f&lt;/em&gt; (belonging to &lt;em&gt;T&lt;/em&gt;) does not change if &lt;em&gt;f&lt;/em&gt; is transformed by &lt;em&gt;p&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;If one assumes that the set of transformations is a group, specifically a Lie group whose action on the set of tasks is locally free and regular, then one can define a parallel partitioning of the space of tasks and the space of models.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One can develop a hierarchial categorization scheme for the set of all considered tasks using the invariant quantities.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One can consider the space of tasks and models to be smooth manifolds as manifolds naturally give a notion of representation and transformations between them.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A manifold is a topological space that can be locally mapped to a Euclidean space using coordinate charts. One can define regular foliation by choosing charts that satisfy certain conditions. In that case, the manifold has immersed, connected, non-intersecting submanifolds called leaves.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The charts (that satisfies those conditions) give a set of rectified coordinates, where the notions of “which leaf a point is on” and “where on the leaf it is” are clearly separated.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Thus, foliation can provide the theoretical tools to work with parallel spaces.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;How can the foliations be incorporated into theory and solutions for transfer learning is left aa future work.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Harvest, Yield, and Scalable Tolerant Systems</title>
+   <link href="/site/2020/09/21/Harvest,-Yield,-and-Scalable-Tolerant-Systems.html"/>
+   <updated>2020-09-21T00:00:00-04:00</updated>
+   <id>/site/2020/09/21/Harvest, Yield, and Scalable Tolerant Systems</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;A classic paper that looks into strategies for scaling large systems that can tolerate graceful degradation.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://dl.acm.org/doi/10.5555/822076.822436&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;cap-theorem&quot;&gt;CAP Theorem&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;CAP refers to strong &lt;strong&gt;C&lt;/strong&gt;onsistency, high &lt;strong&gt;A&lt;/strong&gt;vailability, and &lt;strong&gt;P&lt;/strong&gt;artitionability.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Strong consistency refers to single copy ACID consistency.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;High availability means any consumer can access the data anytime. Generally, this is achieved by adding one or more data replicas.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Partitionability means that the system can survive a partition between the different replicas.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Strong CAP theorem states that any system can have only two out of three properties.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Weak CAP theorem says that stronger are the guarantees about any two properties, weaker are the third property’s guarantees.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;harvest-yield-and-cap-theorem&quot;&gt;Harvest, Yield, and CAP Theorem&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Assume that the clients are making a request to a server.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;There are two quantities of interest here:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;Yield - the probability of completing a request.&lt;/li&gt;
+      &lt;li&gt;Harvest - completeness of answer to a query.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the presence of faults, a tradeoff can is made between yield and harvest. This tradeoff applies to both read and update queries.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;two-strategies-for-scaling-systems&quot;&gt;Two strategies for scaling systems&lt;/h2&gt;
+
+&lt;h3 id=&quot;trading-harvest-for-yield&quot;&gt;Trading Harvest for Yield&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;In a hundred node cluster (without replication), a single-node failure reduces harvest by 1 %, and in the case of multi-node failure, the harvest degrades linearly.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The probability of losing high-priority data can be reduced by replicating it. However, replicating all the data would not n guarantee 100% harvest and yield despite significant costs.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;application-decomposition-and-orthogonal-mechanisms&quot;&gt;Application Decomposition and Orthogonal Mechanisms&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Decompose a large application into subcomponents so that each component can be provisioned separately. Strong consistency can only be applied only on the components that need it, instead of the application as a whole.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Further, failure of one or more components need not cause the application to fail as a whole.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Decomposition also provides the opportunity to use orthogonal mechanisms, i.e., mechanisms independent of other mechanisms with no runtime interface.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Composition of orthogonal subsystems improves the robustness of runtime interactions by &lt;em&gt;locally&lt;/em&gt; containing the errors. For example, the orthogonal components can be restarted /replaced independently without affecting other running components.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>MONet - Unsupervised Scene Decomposition and Representation</title>
+   <link href="/site/2020/09/14/MONet-Unsupervised-Scene-Decomposition-and-Representation.html"/>
+   <updated>2020-09-14T00:00:00-04:00</updated>
+   <id>/site/2020/09/14/MONet Unsupervised Scene Decomposition and Representation</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper introduces Multi-Object Network (MONet) architecture that learns a modular representation of images by spatially decomposing scenes into &lt;em&gt;objects&lt;/em&gt; and learning a representation for these &lt;em&gt;objects&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1901.11390&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;architecture&quot;&gt;Architecture&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Two components:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Attention Module: generates spatial masks corresponding to the &lt;em&gt;objects&lt;/em&gt; in the scene.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;VAE: learn representation for each &lt;em&gt;object&lt;/em&gt;.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;VAE components:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Encoder: It takes as input the image and the attention mask generated by the attention module and produce the parameters for distribution over latent variable &lt;em&gt;z&lt;/em&gt;.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Decoder: It takes as input the latent variable &lt;em&gt;z&lt;/em&gt; and attempts to reproduce the image.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The decoder loss term is weighted by mask, i.e., the decoder tries to reproduce only those parts of the image that the attention mask focuses on.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The attention mechanism is auto-regressive with an ongoing state (called a scope) that tracks which parts of the image are not yet attended over.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the last step, no attention mask is computed, and the previous scope is used as-is. This ensures that all the masks sum to 1.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The VAE also models the attention mask over the components, i.e., the probability that the pixels belong to a particular component.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;motivation&quot;&gt;Motivation&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;A model could efficiently process compositional visual scenes if it can exploit some recurring structures in the scene.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper validates this hypothesis by showing that an autoencoder performs better if it can build up the scenes compositionally, processing one mask at a time (these masks are ground-truth spatial masks) rather than processing the scene at once.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;VAE encoder parameterizes a diagonal Gaussian latent posterior with a spatial broadcast decoder that encourages the VAE to learn disentangled features.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;MONet with seven slots is trained on &lt;em&gt;Objects Room&lt;/em&gt; dataset with 1-3 objects.&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;It learns to generate different attention mask for different objects.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Combining the reconstructed components using the corresponding attention masks produces good quality reconstruction for the entire scene.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Since it is an autoregressive model, MONet can be evaluated for more slots. The model generalizes to novel scene configurations (not seen during training).&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;On the Multi-dSprites dataset (modification of the dSprites dataset), the model (post-training) distinguishes individual sprites and background.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;On the CLEVER data (2-10 objects per image), the model generates good image segmentation and reconstructions and can distinguish between overlapping shapes.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Revisiting Fundamentals of Experience Replay</title>
+   <link href="/site/2020/09/07/Revisiting-Fundamentals-of-Experience-Replay.html"/>
+   <updated>2020-09-07T00:00:00-04:00</updated>
+   <id>/site/2020/09/07/Revisiting Fundamentals of Experience Replay</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper presents an extensive study of the effects of experience replay in Q-learning based methods.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It focuses explicitly on the replay capacity and replay ratio (ratio of learning updates to experience collected).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/2007.06700&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Replay capacity is defined as the total number of transitions stored in the replay buffer.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Age of a transition (stored in the replay buffer) is defined as the number of gradient steps taken by the agent since the transition was stored.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;More is the replay capacity, more will be the age of the oldest transition (also referred to as the age of the oldest policy).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;More is the replay capacity, more will be the degree of “off-policyness” of the transitions in the buffer (with everything else held constant).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Replay ratio is the number of gradient updates per environment transition. This ratio can be used as a proxy for how often the agent uses old data (vs. collecting new data) and is related to off-policyness.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In &lt;a href=&quot;https://www.nature.com/articles/nature14236&quot;&gt;DQN paper&lt;/a&gt;, the replay ratio is set to be 0.25.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For experiments, a subset  (of 14 games) is selected from Atari ALE (Arcade Learning Environment) with sticky actions.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each experiment is repeated with three seeds.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Rainbow is used as the base algorithm.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Total number of gradient updates and batch size (per gradient update) are fixed for all the experiments.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Rainbow used replay capacity of 1M and oldest policy of age 250K.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In experiments, replay capacity varies from 0.1M to 10M ( 5 values), and the age of the oldest policy varies from 25K to 25M (4 values).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;observations&quot;&gt;Observations&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;With the age of the oldest policy fixed, performance improves with higher replay capacity, probably due to increased state-action coverage.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;With fixed replay capacity, reducing the oldest policy’s age improves performance, probably due to the reduced off-policyness of the data in the replay buffer.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;However, in some specific instances (with sparse reward, hard exploration setup), performance can drop when reducing the oldest policy’s age.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Increasing replay capacity, while keeping the replay ratio fixed, provides varying improvements and depends on the particular values of replacy capacity and replay ratio.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper reports the effect of these choices for DQN as well.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Unlike Rainbow, DQN does not improve with larger replay capacity, irrespective of whether the replay ratio or age of the oldest policy is kept fixed.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given that the Rainbow agent is a DQN agent with additional components, the paper explores which of these components leads to an improvement in Rainbow’s performance as replay capacity increases.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;additive-experiments&quot;&gt;Additive Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Four new DQN variants are created by adding each of Rainbow’s four components to the base DQN agent.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;DQN with n-step returns is the only variant that benefits by increased replay capacity.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The usefulness of n-step returns is further validated by verifying that Rainbow agent without n-step returns does not benefit by increased replay capacity. While Rainbow agent without any other component benefits by the increased capacity.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Prioritized Experience Replay does not significantly affect the performance with increased replay capacity.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The observation that n-step returns are critical for taking advantage of larger replay sizes is surprising because the uncorrected n-step returns are theoretically not suitable for off-policy learning.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper tests the limits of increasing replay capacity (with n-step returns) by performing experiments in the offline-RL setup, the agent collects a dataset of about 200M frames. These frames are used to train another agent.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Even in this extreme setup, n-step returns improve the learning agent’s performance.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;why-do-n-step-returns-help&quot;&gt;Why do n-step returns help?&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Hypothesis 1: n-step returns help to counter the increased off-policyness produced by a larger replay buffer.&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;This hypothesis does not seem to hold as keeping the oldest policy fixed or using the same contrastive factor as an n-step update does not improve the 1-step update’s performance.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Hypothesis 2: Increasing the replay buffer’s capacity may reduce the variance of the n-step returns.&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;This hypothesis is evaluated by training on environments with lesser variance or by turning off the sticky actions in the atari domain.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;While the hypothesis does explain the gains by using n-step returns to some extent, n-step gains are observed even in environments with low variance.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Deep Reinforcement Learning and the Deadly Triad</title>
+   <link href="/site/2020/08/31/Deep-Reinforcement-Learning-and-the-Deadly-Triad.html"/>
+   <updated>2020-08-31T00:00:00-04:00</updated>
+   <id>/site/2020/08/31/Deep Reinforcement Learning and the Deadly Triad</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper investigates the practical impact of the deadly triad (function approximation, bootstrapping, and off-policy learning) in deep Q-networks (trained with experience replay).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The deadly triad is called so because when all the three components are combined, TD learning can diverge, and value estimates can become unbounded.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;However, in practice, the component of the deadly triad has been combined successfully. An example is training DQN agents to play Atari.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1812.02648&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The effect of each component of the triad can be regulated with some design choices:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Bootstrapping - by controlling the number of steps before bootstrapping.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Function approximation - by controlling the size of the neural network.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Off-policy learning - by controlling how data points are sampled from the replay buffer (i.e., using different prioritization approaches)&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The problem is studied in two contexts: toy example and Atari 2600 games.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper makes several hypotheses about how different components may interact in the triad and evaluate these hypotheses by training DQN with different hyperparameters:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Number of steps before bootstrapping - 1, 3, 10&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Four levels of prioritization (for sampling data from the replay buffer)&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Bootstrap target - Q-learning, target Q-learning, inverse double Q-learning, and double Q-learning&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Network sizes-small, medium, large and extra-large.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each experiment was run with three different seeds.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper formulates a series of hypotheses and designs experiments to support/reject the hypotheses.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;hypothesis-1-combining-q-learning-with-conventional-deep-rl-function-spaces-does-not-commonly-lead-to-divergence&quot;&gt;Hypothesis 1: Combining Q learning with conventional deep RL function spaces does not commonly lead to divergence&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Rewards are clipped between -1 and 1, and the discount factor is set to 0.99. Hence, the maximum absolute action value is bound to smaller than 100. This upper bound is used soft-divergence in the value estimates.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper reports that while soft-divergence does occur, the values do not become unbounded, thus supporting the hypothesis.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;hypothesis-2-there-is-less-divergence-when-correcting-for-overestimation-bias-or-when-bootstrapping-on-separate-networks&quot;&gt;Hypothesis 2: There is less divergence when correcting for overestimation bias or when bootstrapping on separate networks.&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;One manifestation of bootstrapping on separate networks is target-Q learning. While using separate networks helps on Atari, it does not entirely solve the problem on the toy setup.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One manifestation of correcting for the overestimation bias is using double Q-learning.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the standard form, double Q-learning benefits by bootstrapping on a separate network. To isolate the gains by using each component independently, an inverse double Q-learning update is used that does not use a separate target-network for bootstrapping.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Experimentally, Q-learning is the most unstable while target Q-learning and double Q-learning are the most stable. This observation supports the hypothesis.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;hypothesis-3-longer-multi-step-returns-will-diverge-easily&quot;&gt;Hypothesis 3: Longer multi-step returns will diverge easily&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;This hypothesis is intuitive as the dependence on bootstrapping is reduced with multi-step returns.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Experimental results support this hypothesis.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;hypothesis-4-larger-more-capacity-networks-will-diverge-less-easily&quot;&gt;Hypothesis 4: Larger, more capacity networks will diverge less easily.&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;This hypothesis is based on the assumption that more flexible value function approximations may behave more like the tabular case.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In practice, smaller networks show fewer instances of instability than the larger networks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The hypothesis is not supported by the experiments.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;hypothesis-5-stronger-prioritization-of-updates-will-diverge-more-easily&quot;&gt;Hypothesis 5: Stronger prioritization of updates will diverge more easily.&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;This hypothesis is supported by the experiments for all the four updates.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;effect-of-the-deadly-triad-on-the-agents-performance&quot;&gt;Effect of the deadly triad on the agent’s performance&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Generally, soft-divergence correlates with poor control performance.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For example, longer multi-step returns lead to fewer instances of instabilities and better performance.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The trend is more interesting in terms of network capacity. Large networks tend to diverge more but also perform the best.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;While action-value estimates can grow to large values, they can recover to plausible values as training progresses.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Alpha Net--Adaptation with Composition in Classifier Space</title>
+   <link href="/site/2020/08/24/Alpha-Net-Adaptation-with-Composition-in-Classifier-Space.html"/>
+   <updated>2020-08-24T00:00:00-04:00</updated>
+   <id>/site/2020/08/24/Alpha Net--Adaptation with Composition in Classifier Space</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Common transfer learning method focuses on transferring knowledge in the model feature space.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In contrast, the paper argues that the learned knowledge is more concisely captured in the “classifier space” as the classifier is fitted for all the samples for a given class, while the feature representation is specific to each sample.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Building on this intuition, the paper proposes to combine strong classifiers (trained on large datasets) with weak classifiers (trained on smaller datasets) to improve the weak classifiers’ performance.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/2008.07073&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;high-level-idea&quot;&gt;High-Level Idea&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given $n$ classifiers, $C_1, …, C_n$, trained with a large amount of data and a weak classifier $a$ trained for a class with few samples.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Find the nearest neighbors of $a$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Train a new classifier by linearly combining $a$ with its nearest classifiers.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The coefficients (for linearly combining the classifiers) are learned using another classifier called as AlphaNet.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In theory, this approach can be used with any set of classifiers.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;A long-tailed dataset is one where some classes (referred to as the tail classes) have very few examples—for example, ImageNet-LT and Places-LT.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Split the long-tailed dataset into two splits - “base” classes with $B$ (number of) classes and “few” classes with $F$ (number of) classes.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Total number of classes $N = B + F$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Start with a pre-trained model, with classifiers $w_j$ and biases $b_j$ for $j \in (1, N)$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For a given target class $j$, find its top $k$ nearest neighbor classifiers and concatenate their output.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For each “few” class, learn a feedforward network that takes the concatenated representation (of classifiers) as the input and returns a vector of $k \alpha$ values.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;These $\alpha$ values are interpreted as the classifier’s strength (or confidence) in its nearest neighbors.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The (normalized) alpha values are used for defining the weight and bias for the classifier for the given “few” class.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The collection of all the “few” classifiers is referred to as the AlphaNet.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper outlines a degenerate case, where the confidence in the prediction of all the strong classifiers goes to 0. The paper proposes to counter this case by clamping the $\alpha$ values.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The entire setup is trained end-to-end using cross-entropy loss on AlphaNet.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given the proposed approach’s flexibility, it is used to combine the state-of-the-art models on ImageNet-LT, namely retraining classifiers on class-balanced samples and training models with weight normalization. The combined setup outperforms the individual models.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One interesting observation is that it is useful to include the weak classifiers, along with the strong classifiers, as AlphaNet adjusts the position of weak classifiers towards the appropriate strong classifier.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;While the idea is described in the context of long-tail data distribution, the idea is useful in the general context of non-stationary data distribution. One instantiation could be lifelong class incremental learning where the model encounters new data classes during training. For some time duration (till sufficient data points are seen), the newly seen classes are the “few” classes. This approach can help with faster adaptation when the model is yet to see sufficient examples for the unseen classes.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Outrageously Large Neural Networks--The Sparsely-Gated Mixture-of-Experts Layer</title>
+   <link href="/site/2020/08/14/Outrageously-Large-Neural-Networks-The-Sparsely-Gated-Mixture-of-Experts-Layer.html"/>
+   <updated>2020-08-14T00:00:00-04:00</updated>
+   <id>/site/2020/08/14/Outrageously Large Neural Networks--The Sparsely-Gated Mixture-of-Experts Layer</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Conditional computation is a technique to increase a model’s capacity (without a proportional increase in computation) by activating parts of the network on a per example basis.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper describes (and address) the computational and algorithmic challenges in conditional computation. It introduces a sparsely-gated Mixture-of-Experts layer (MoE) with 1000s of feed-forward sub-networks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1701.06538&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;practical-challenges&quot;&gt;Practical Challenges&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;GPUs are fast at matrix arithmetic but slow at branching.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Large batch sizes amortizes the cost of updates. Conditional computation reduces the effective batch size for different components of the model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Network bandwidth can be a bottleneck with the network demand overshadowing the computational demand.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Additional losses may be needed to achieve the desired level of sparsity.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Conditional computation is most useful for large datasets.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;architecture&quot;&gt;Architecture&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;n&lt;/em&gt; Expert Networks - $E_1$, …, $E_n$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Gating Network $G$ to select a sparse combination of experts.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Output of the MoE module is the weighted sum of predictions of experts (weighted by the output of the gate).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;If the gating network’s output is sparse, then some of the experts’ value does not have to be computed.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In theory, one could use a hierarchical mixture of experts where a mixture of experts is trained at each level.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;choices-for-the-gating-network&quot;&gt;Choices for the Gating Network&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Softmax Gating&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Noisy top-k gating - Add tunable Gaussian noise to the output of softmax gating and retain only the top-k values. A second trainable weight matrix controls the amount of noise per component.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;addressing-performance-challenge&quot;&gt;Addressing Performance Challenge&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Shrinking Batch Problem&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;If the MoE selects &lt;em&gt;k&lt;/em&gt; out of &lt;em&gt;n&lt;/em&gt; experts, the effective batch size reduces by a factor of &lt;em&gt;k&lt;/em&gt; / &lt;em&gt;n&lt;/em&gt;.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;This reduction in batch size is accounted for by combining data parallelism (for standard layers and gasting networks) and model parallelism (for experts in MoE). Thus, with &lt;em&gt;d&lt;/em&gt; devices, the batch size changes by a factor of (&lt;em&gt;k&lt;/em&gt; x &lt;em&gt;d&lt;/em&gt; ) / &lt;em&gt;n&lt;/em&gt;.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;For hierarchical MoE, the primary gating network uses data parallelism while secondary MoEs use model parallelism.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The paper considers LSTM models where the MoE is applied once the previous layer has finished. This increases the batch size (for the current MoE layer) by a factor equal to the number of unrolling timesteps.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Network Bandwith limitations can be overcome by ensuring that the ratio of computation (of each expert) to the input and output size is greater than (or equal to) the ratio of computational to network capacity.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Computational efficiency can be improved by using larger hidden layers (or more hidden layers).&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Balancing Expert Utilization&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Importance of an expert (relative to a batch of training examples) is defined as the batchwise sum of the expert’s goal values.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;An additional loss, called importance loss, is added to encourage the experts to have equal importance.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The importance loss is defined as the square of the coefficient of variation (of a set of importance values) multiplied by a (hand-tuned) scaling factor $w_{importance}$.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;In practice, an additional loss called $L_{load}$ might be needed to ensure that the different experts get equal load (along with equal importance).&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Datasets&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Billon Word Language modeling Benchmark&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;100 Billion word Google News Corpus&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Machine Translation datasets&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Single Language Pairs - WMT’14 En to Fr (36M sentence pairs) and En to De (5M sentence pairs).&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Multilingual Machine Translation - large combine dataset of twelve language pairs.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In all the setups, the proposed MoE models achieve significantly better results than the baseline models, at a lower computational cost.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Gradient Surgery for Multi-Task Learning</title>
+   <link href="/site/2020/08/06/Gradient-Surgery-for-Multi-Task-Learning.html"/>
+   <updated>2020-08-06T00:00:00-04:00</updated>
+   <id>/site/2020/08/06/Gradient Surgery for Multi-Task Learning</id>
+   <content type="html">&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper hypothesizes that main optimization challenges in multi-task learning arise because of negative interference between different tasks’ gradients.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It hypothesizes that negative interference happens when:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;The gradients are conflicting (i.e., have a negative cosine similarity).&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The gradients coincide with high positive curvature.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The difference in gradient magnitude is quite large.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proses to work around this problem by performing “gradient surgery.”&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;If two gradients are conflicting, modify the gradients by projecting each onto the other’s normal plane.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This modification is equivalent to removing the conflicting component of the gradient.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This approach is referred to as &lt;em&gt;projecting conflicting gradients&lt;/em&gt; (PCGrad).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/2001.06782&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Theoretical Analysis&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;The paper proves the local conditions under which PCGrad improves multi-task gradient descent in the two-task setup.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The conditions are:&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Angle between the task gradients is not too small.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Difference in the magnitude of the gradients is sufficiently large.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Curvature of the multi-task gradient is large.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Large enough learning rate.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Experimental Setup&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Multi-task supervised learning&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;MutliMNIST, Multi-task CIFAR100, NYUv2.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;For Multi-task CIFAR-100, PCGrad is used with the shared parameters of the routing networks.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;For NYUv2, PCGrad is combined with MTAN.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;In all the cases, using PCGrad improves the performance.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Multi-task Reinforcement Learning&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Meta-World Benchmark&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;PCGrad + SAC outperforms all other baselines.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;In the context of SAC, the paper suggests learning temperature $\alpha$ on a per-task basis.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Goal-conditioned Reinforcement Learning&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Goal-conditioned robotic pushing task with a Sawyer robot.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;PCGrad + SAC outperforms vanilla SAC.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>GradNorm--Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks</title>
+   <link href="/site/2020/07/30/GradNorm-Gradient-Normalization-for-Adaptive-Loss-Balancing-in-Deep-Multitask-Networks.html"/>
+   <updated>2020-07-30T00:00:00-04:00</updated>
+   <id>/site/2020/07/30/GradNorm--Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes GradNorm, a gradient normalization algorithm that improves multi-task training by dynamically tuning the magnitude of gradients corresponding to different tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1711.02257&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;motivation&quot;&gt;Motivation&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;During multi-task training, some tasks can dominate the training, at the expense of others.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It is common to define the multi-task loss as a linearly weighted combination of the individual task losses.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes two changes to this setup:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Adapt weight-coefficients, assigned to each loss term, at each training step.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Directly modify the gradient magnitudes, corresponding to different tasks, so that all the tasks are learning at similar rates.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Proposed GradNorm algorithm is similar to BatchNorm, but it performs normalization across tasks, not data batches.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;algorithm&quot;&gt;Algorithm&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Gradient norm at timestep $t$, for the $i^{th}$ task, is computed as the product between average gradient norm (across all tasks at timestep $t$) and $r_i(t) ^ {\alpha}$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;$r_i$ is the relative inverse training rate of task $i$. It is defined as the ratio between the loss ratio of task $i$ and the average loss ratio (across all the tasks).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;$\alpha$ is a hyperparameter.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This computed per-task gradient norm is treated as the target value for actual gradient norms.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;An additional $L_1$ loss is incorporated between the actual and the target gradient norms, summed over all the tasks, and optimizes the weight-coefficients only.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;After every step, the weight-coefficients are renormalized to decouple the gradient normalization from the global learning rate.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Note that all the gradient norm computations are performed only for the layers on which GradNorm is applied. Generally, GradNorm is used with only the last shared layer of weights (to save on computational costs).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Two variants of NYUv2 dataset – NYUv2+seg (small dataset) and NYUv2+kpts (big dataset).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Both regression and classification setups were used.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Models:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;SegNet with a symmetric VGG16 encoder/decoder&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;FCN with modified ResNet-50 as the encoder and shallow ResNet as the decoder.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Standard pixel-wise losses for each task.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;results&quot;&gt;Results&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;GradNorm with $\alpha=1.5$ outperforms the equal-weight baseline and either surpasses or matches the best performance of single networks for each task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Almost any value of 0 &amp;lt; $\alpha$ &amp;lt; 3 improves the network’s performance over an equal weight baseline.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>TaskNorm--Rethinking Batch Normalization for Meta-Learning</title>
+   <link href="/site/2020/07/23/TASKNORM-Rethinking-Batch-Normalization-for-Meta-Learning.html"/>
+   <updated>2020-07-23T00:00:00-04:00</updated>
+   <id>/site/2020/07/23/TASKNORM--Rethinking Batch Normalization for Meta-Learning</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Meta-learning techniques are shown to benefit from the use of deep neural networks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;BatchNorm is a commonly used component when training deep networks, especially for vision tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;However, BatchNorm and meta-learning make contradictory assumptions, and their combination may not work well in practice.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes TaskNorm, a normalization method that is designed explicitly for meta-learning.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/2003.03284&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Standard meta-learning setup with $k$ tasks, each task with its own context and target set.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Two sets of parameters are considered during meta-learning - (i) global parameters, and (ii) task-specific parameters.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Meta-learning setup can be viewed as an inference task, where the task-specific parameters are inferred using a context set and some additional (trainable) parameters.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Normalization layers are commonly used to accelerate the training of neural networks. The general approach is to use normalization moments (statistics) along with some learned parameters.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;BatchNorm is a well-known and widely used normalization approach. It relies on the implicit assumption that the dataset comprises of iid samples from some underlying distribution.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;However, in meta-learning, data points are assumed to be iid only within a specific task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This leaves open the question of what moments to use during meta-train and meta-test time.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;variants-of-batchnorm&quot;&gt;Variants of BatchNorm&lt;/h2&gt;
+
+&lt;h3 id=&quot;conventional-batchnorm-cbn&quot;&gt;Conventional BatchNorm (CBN)&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Compute moments at meta train time and use during meta test time.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This is equivalent to lumping the moments with the global parameters. I.e., the running moments are shared globally, while the data is iid only locally.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Using CBN with MAML leads to poor results.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Moreover, meta-learning setup can some times require the use of a very small batch size. (e.g., 1-shot learning) In those cases, the computed statistics are likely to be inaccurate.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;transductive-batchnorm-tbn&quot;&gt;Transductive BatchNorm (TBN)&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Use context/target set statistics at both meta-train and meta-test time.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This is the default BatchNorm mode used in MAML.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;instance-based-normalization&quot;&gt;Instance-based normalization&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Moments are computed separately for each instance.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This mode corresponds to treating the statistics as local at the observation level.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;These methods provide only limited improvement in performance, and can sometimes have a large overhead.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;task-normalization-proposed&quot;&gt;Task Normalization (Proposed)&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The normalization statistics are local at the task level, and statistics for a given data point should only depend on the context set’s data point. It should not depend on the other elements of the target set.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Meta-Batch Normalisation (METABN) is a precursor to TaskNorm where the context set alone is used to compute the normalization statistics for both the context and the target set (during both meta-test and meta-train time).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;METABN does not perform well when used with small context sets.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;TaskNorm overcomes this limitation by using a set of non-transductive, secondary moments (computed from the input being normalized).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;When the context is small, using additional moments will help to improve the moment estimates.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the general case, a trainable blending factor, $\alpha$, is used to combine the two sets of moments.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;While the computational cost of TaskNorm is slightly more than CBN, it converges faster than CBN in practice.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Normalization mechanism in Reptile can be interpreted as a particular case of TaskNorm.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Small scale few-shot classification experiments&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Omniglot and imin ImageNet dataset&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;First order MAML, with different kinds of normalization schemes.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Transductive BatchNorm performs the best.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Among non-transductive approaches, TaskNorm using Instance Normalisation augmentation performs the best.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Similar trend holds for the speed of convergence as well.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Large scale few-shot classification experiments&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;MetaDataset dataset&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;CNAPs model&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The context set’s size varies across tasks in this setup and can be as small as 5.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;TaskNorm with Instance Normalisation ranks first in 10 (out of 13) datasets and is also the fastest to train.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;While Instance-based methods (Instance Normalisation and Layer Normalisation) are the slowest to converge, they still outperform the running average based methods (conventional BatchNorm).&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The results demonstrate that designing meta-learning specific normalization methods can significantly improve performance and that Transductive BatchNorm may not always be the optimal choice.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Averaging Weights leads to Wider Optima and Better Generalization</title>
+   <link href="/site/2020/07/16/Averaging-Weights-leads-to-Wider-Optima-and-Better-Generalization.html"/>
+   <updated>2020-07-16T00:00:00-04:00</updated>
+   <id>/site/2020/07/16/Averaging Weights leads to Wider Optima and Better Generalization</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes Stochastic Weight Averaging (SWA) procedure for improving the generalization performance of models trained with SGD (with cyclic or constant learning rate).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Specifically, the model is checkpointed at several points along the training trajectory, and these checkpoints are averaged (in the parameter space) to obtain a single model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1803.05407&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;idea&quot;&gt;Idea&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;“Stochastic” in the name refers to the idea that with cyclical or constant learning rate, SGD proposals are approximately sampled from a neural network’s loss surface and are hence stochastic.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;SWA uses a learning rate schedule that allows exploration in the weight space.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;SGD with cyclical and constant learning rates explore points (model instances) at the periphery of high-performing networks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;With different initializations, SGD will find different points (of low training loss) on this boundary, but will not move inside it.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Averaging the points provide a mechanism to move inside this periphery.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The train and the test error surfaces, while being similar, are not perfectly aligned. Hence, averaging several models (along the optimization trajectory) could lead to a more robust model.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;algorithm&quot;&gt;Algorithm&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given a model $w$ and some training budget $B$, train the model in the conventional way for approx 75% of the budget.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Starting from that point, continue training with the remaining budget, with a constant or cyclical learning rate.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For fixed learning rate, checkpoint models at each epoch. For cyclical learning rate, checkpoint the model at the lowest learning rate in the cycle.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Average all the models to get the SWA model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;If the model has Batch Normalization layers, run an additional pass to compute the SWA model’s running mean and standard deviation.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The computational and space complexity of computing the SWA model is relatively low.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper highlights the ensembling like the effect of SWA by showing that if the model checkpoints ($w_i$) are generated by training with Fast Geometric Ensembling (FGE), the difference between averaging the weights and averaging the predictions is of the order $O(\Delta)$ where $\Delta = max ||w_i - w_{SA}||$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Note that SWA does not have the overhead of an extra-forward pass during inference.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Datasets: CIFAR10, CIFAR100, ImageNet&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Models: VGG16, WideResNet, 164-layer preactivation ResNet, ShakeShake, Pyramid Net.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Baselines: Conventional SGD, Exponentially decaying average with SGD and FGE.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In all the CIFAR experiments, SWA consistently outperforms SGD in one budget and consistently improves with training.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;SWA also achieves performance comparable to FGE, despite FGE being an ensemble method.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;On ImageNet, SWA is run on a pre-trained model, and it improves performance in all the cases.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;An ablation experiment (on CIFAR-100) shows that it is possible to train a network (with SWA) using a fixed learning rate. In that setup, using SWA improves performance by 16%.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Decentralized Reinforcement Learning -- Global Decision-Making via Local Economic Transactions</title>
+   <link href="/site/2020/07/09/Decentralized-Reinforcement-Learning-Global-Decision-Making-via-Local-Economic-Transactions.html"/>
+   <updated>2020-07-09T00:00:00-04:00</updated>
+   <id>/site/2020/07/09/Decentralized Reinforcement Learning -- Global Decision-Making via Local Economic Transactions</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper explores the connections between the concepts of a single agent vs. society of agents.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A society of agents can be modeled as a single agent while a single agent can be modeled as a society of components (or sub-agents).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper focuses on mechanisms for training a society of self-interested agents to solve a given task – as if the system was a single task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/2007.02382&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;contributions&quot;&gt;Contributions&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Societal-decision making&lt;/strong&gt; framework relates the local optimization problem of a single agent with the global optimization problem of a society of agents.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Cloned Vickrey Society&lt;/strong&gt; is proposed as a mechanism to guarantee that an agent’s dominant strategy equilibrium coincides with the group’s optimal policy.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A class of &lt;strong&gt;decentralized RL algorithms&lt;/strong&gt; that optimize the MDP object of the society as a whole, as a consequence of individual agents optimizing their objectives.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Empirical evaluation of Cloned Vickrey Society using any implementation called &lt;strong&gt;Credit Conserving Vickery&lt;/strong&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;terminology&quot;&gt;Terminology&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;Environment&lt;/em&gt; - a tuple that specifies an input space, an output space, and parameters for determining an objective.&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;A standard RL setup can be mapped to &lt;em&gt;environment&lt;/em&gt; by mapping state space to input space, action space to output space and reward function, transition function, and discount factors to the parameters specifying the objective.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;Agent&lt;/em&gt; - a function that maps input space to output space.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;Objective&lt;/em&gt; - a functional that maps an agent to a real number.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In &lt;em&gt;auction environments&lt;/em&gt;, the input space is a single auction item (say &lt;em&gt;s&lt;/em&gt;), and the output space is bidding space &lt;em&gt;B&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;There are &lt;em&gt;N&lt;/em&gt; agents who compete by bidding for an item &lt;em&gt;s&lt;/em&gt; using their bidding policy.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;$b$ is a vector of bids produced by the agents.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;$v_s$ is a vector of agent’s valuations of item &lt;em&gt;s&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The $i^{th}$ agent’s utility is given as $v_s^i \times X^i(b) - P^i(b)$. Here, $X^i(b)$ is the portion of $s$ allocated to $i^{th}$ agent and $P^i(b)$ is the price that $i^{th}$ agent is willing to pay.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;design-choices&quot;&gt;Design Choices&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each agent is independently maximizing its utility.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In certain conditions (i.e., if the auction is dominant strategy incentive compatible), it is optimal for each agent to bid its valuation.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;These conditions are satisfied by the Vickery auction where $P^i(b)$ is set to be the second-highest bid and $X^i(b) = 1$ if the $i^{th}$ agent wins (and 0 otherwise).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A &lt;em&gt;society&lt;/em&gt; is a set of agents where each agent is a tuple of bidding policy $\psi$ and a transformation function.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The environment is modeled at two levels - (i) global environment (referred to as the global MDP) and local environment (referred to as local auction).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each state $s$ in the global MDP is an auction item in a different auction. The winner (of local auction at $s$) transforms $s$ into some other state $s’$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;If these transformations are modeled as actions, then the proposed framework can be interpreted as a decentralized reinforcement learning framework.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Motivated by the design of market economy (where economic transactions determine wealth distribution), the paper proposes that, for an agent, the valuation of winning an auction is the revenue it can receive in the auction at the next timestep by selling the transformed state.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A global MDP that adhere to this design is referred to as the Market MDP.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;There is a catch in the design of the market MDP - the winning agent, at time $t-1$, gets the amount that the highest bidder is willing to pay at time $t+1$. But the winner at time $t+1$ only paid the second-highest bid. Hence, the credit is not conserved.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This inconsistency can be fixed by introducing “duplicate” (or cloned) agents, and the society is called the Cloned Vickery Society.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The Cloned Vickrey Auction mechanism is compared against alternate bidding mechanisms like &lt;em&gt;first price auction&lt;/em&gt; (where winner pays the bid they proposed), solitary version of Vickrey auction (no cloning), and &lt;em&gt;Environment Reward&lt;/em&gt; where only environment reward is used, and there is no price term.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It is empirically shown that Cloned Vickrey Auction learns bids that are most close to their actual valuations. Moreover, solitary version leads bids which are more spread out than the ones learned by cloned version. This highlights the importance of competitive pressure to learn bid values.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Three different implementations of Cloned Vickrey Auction are considered:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Bucket Brigade (BB) - winner at timestep $t$ receives the highest bid at time step $t+1$, and the subsequent winner pays the highest bid. This case satisfies Credit Conservation and Bellman Optimality.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Vickrey (V) - winner at timestep $t$ receives the highest bid at time step $t+1$, and the subsequent winner pays the second-highest bid. This case satisfies Truthful Dominant Strategy and Bellman Optimality.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Credit Conserving Vickrey (CCV) - winner at timestep $t$ receives the second-highest bid at time step $t+1$, and the subsequent winner pays the second-highest bid. This case satisfies Truthful Dominant Strategy and Credit Conservation.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;CCV implementation provides bid values closest to the optimal Q-values.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In one experiment, the paper explores the use of the proposed approach for selecting between sub-policies. It shows that CVV is more sample efficient for pretraining sub-policies and adapting them to transfer tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In another experiment, the task is to transform MNIST images by composing two out of 6 affine transformations. The transformed images are fed to a pretrained classifier that predicts a label. The agent gets a reward of 1 if the classifier makes correct prediction and 0 otherwise. CCV implementation obtains a mean reward of 0.933, thus highlighting the effectiveness of the CCV model.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>When to use parametric models in reinforcement learning?</title>
+   <link href="/site/2020/07/02/When-to-use-parametric-models-in-reinforcement-learning.html"/>
+   <updated>2020-07-02T00:00:00-04:00</updated>
+   <id>/site/2020/07/02/When to use parametric models in reinforcement learning</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper compares replay-based approaches with model-based approaches in Reinforcement Learning (RL).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It hypothesizes that if the parametric model is only used for generation transitions for the update rule, then under certain conditions, replay-based approaches will be as good as model-based approaches.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1906.05243&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;terminology&quot;&gt;Terminology&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Planning: Any algorithm that uses additional computations (but not additional experience) to improve its performance.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Learning: Any algorithm that uses additional experience to improve its performance.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In some cases, a replay buffer can be seen as a model. For example, querying using state-action pair (from the replay buffer) is similar to querying the (expected) next-state and reward from a model. In general, the model will be more flexible as any arbitrary state-action pair can be used for querying.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;computation-properties&quot;&gt;Computation Properties&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Parametric models require more computation than sampling from a replay buffer. In contrast, the cost of maintaining a replay buffer scales linearly with their capacity.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Parametric models are useful for planning multiple-steps into the future while it is much harder to do so with a replay buffer (even more so with pixel observations).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;An imperfect model maybe be more suitable for selecting actions (instead of updating the policy) because the chosen action, when executed in the environment, will lead to transitions that would improve the model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;When planning with an imperfect model, it is better to plan backward, as the update is applied on an imaginary state (which would not be encountered if the model is poor).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;If the model is accurate, forward and backward planning is equivalent. This distinction between forward and backward updates does not apply to replay buffers.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;failure-to-learn&quot;&gt;Failure to learn&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;When using a replay buffer and (i) uniformly replaying transitions, (ii) from a buffer containing only full episodes, and (iii) using TD updates, then the algorithm is stable.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;When using a replay buffer and (i) uniformly replaying transitions, (ii) generating transitions using a model, and (iii) using TD updates, then the algorithm can diverge.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This case can be fixed by:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Repeatedly interating over the model and sampling transitions &lt;em&gt;to&lt;/em&gt; and &lt;em&gt;from&lt;/em&gt; the state model generates (not a satisfactory solution).&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Using multiple-step returns (this can increase the variance).&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Use algorithms specifically for stable off-policy learning (not a definitive solution).&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;model-based-algorithms-at-scale&quot;&gt;Model-based algorithms at scale&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper compares against SimPLe (model-based) with Rainbow DQN (replay-based).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper shows that when using a similar number of real interactions, Rainbow DQN needs fewer replay samples than model samples in SimPLe, making it more efficient (computation-wise).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;Changes to Rainbow DQN:
+    &lt;ul&gt;
+      &lt;li&gt;Increase number of steps, for bootstrapping, from 3 to 20.&lt;/li&gt;
+      &lt;li&gt;Reduce the number of steps, before sampling starts from the replay buffer, from 20K to 1600.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;With these changes, Rainbow DQN outperforms SimPLe in 17 out of 26 games.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;When using a parametric model in a replay-like setting (sampling observed states from the past), model-based learning can be unstable (in theory). Using a replay buffer is likely a better strategy under the state sampling distribution.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Parametric models are likely more useful when:&lt;/p&gt;
+    &lt;ul&gt;
+      &lt;li&gt;planning backward for credit assignment - even if the model is in-accurate, backward planning will only update fictional states.&lt;/li&gt;
+      &lt;li&gt;planning forward for behavior - the resulting plan is only used to collect real &lt;em&gt;experience&lt;/em&gt; in the environment (and not directly update the policy).&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning</title>
+   <link href="/site/2020/06/25/Network-Randomization-A-Simple-Technique-for-Generalization-in-Deep-Reinforcement-Learning.html"/>
+   <updated>2020-06-25T00:00:00-04:00</updated>
+   <id>/site/2020/06/25/Network Randomization-A Simple Technique for Generalization in Deep Reinforcement Learning</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposed a Technique for improving the generalization ability of RL agents when evaluated on an unseen environment (which is similar to the training environment).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://openreview.net/forum?id=HJgcvJBFvB&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://github.com/pokaxpoka/netrand&quot;&gt;Link to the code&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The key idea is to learn features that are invariant across environments by using a randomized CNN (&lt;em&gt;f&lt;/em&gt;) that randomly perturbs the inputs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The policy is trained using the randomized observations obtained using &lt;em&gt;f&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Invariant features are learned using a feature matching (FM) loss that matches the feature representation of the original and randomized observations.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The random network’s parameters are initialized as $\alpha I + (1 - \alpha) N(0, \sqrt\frac{2}{n_{in} + n_{out}})$ where $\alpha \in [0, 1]$, $N$ denotes the Gaussian Distribution and $n_{in}, n_{out}$ denote the number of input and output channels respectively.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Xavier Normal distribution is used for randomization to maintain the variance between the input and the randomized input.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;f&lt;/em&gt; is randomized per iteration.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;During inference, the expected action is computed by approximating over &lt;em&gt;M&lt;/em&gt; samples (i.e., randomizing the input &lt;em&gt;M&lt;/em&gt; times).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;environments&quot;&gt;Environments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;2D CoinRun, 3D DeepMind Lab, 3D Robotics Control Task&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The evaluation environments consist of different styles of backgrounds, objects, and floors.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;baselines&quot;&gt;Baselines&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Regularization methods: Dropout, L2 regularization, Batch Normalization&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Dataset Augmentation methods: Cutout, Gray out, Inversion, Color Jitter&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;On CoinRun, the proposed approaches significantly outperforms the other baselines during evaluation. The performance improvement saturates around 10 &lt;em&gt;M&lt;/em&gt; samples.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Cycle consistency is used to measure the similarity between two trajectories. The proposed method improves the cycle consistency as compared to the vanilla PPO baseline. It also produces sharper activation maps in the evaluation environments.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For the large-scale experiments, when evaluated on 500 levels of CoinRun, the proposed method improves the success rates from 39.8% to 58.7%.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;On DeepMind Lab and Surreal robotics control tasks, the proposed method leads to agents that generalize better on the unseen environments (during evaluation).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>On the Difficulty of Warm-Starting Neural Network Training</title>
+   <link href="/site/2020/06/18/On-the-Difficulty-of-Warm-Starting-Neural-Network-Training.html"/>
+   <updated>2020-06-18T00:00:00-04:00</updated>
+   <id>/site/2020/06/18/On the Difficulty of Warm-Starting Neural Network Training</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper considers learning scenarios where the training data is available incrementally (and not at once).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For example, in some applications, new data is available periodically (e.g., latest news articles come out every day).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper highlights that, in such scenarios, the conventional wisdom of “warm start” does not apply.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;When new data is available, it is better to train a new model from scratch than to update the model trained on previously available data.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;While the two setups lead to similar training performance, the randomly initialized model has a much better generalization performance.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1910.08475&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;basic-batch-updating&quot;&gt;Basic Batch Updating&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Create two random, equally-sized partitions of the training data.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Train the model till convergence on the first half of the data. Then train the model on the entire dataset.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Models: ResNet18, MLPs, Logisitic Regression (LR)&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Dataset: CIFAR10, CIFAR100, SVHN&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Optimizers: Adam, SGD&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Warm starting hurts generalization in all the cases.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The effect is more pronounced in the case of ResNets and MLPs (compared to LR) and harder CIFAR 10 dataset (as compared to SVHN dataset).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;online-learning&quot;&gt;Online Learning&lt;/h2&gt;
+
+&lt;h3 id=&quot;passive-online-learning&quot;&gt;Passive Online Learning&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model is given access to k new learning examples at each iteration.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A warm started model reuses the previously initialized model and trains (till convergence) on the new batch of k items.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A “randomly initialized” model is trained on all the examples (seen so far) from scratch.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Dataset: CIFAR10&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Model: ResNet18&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;As more training data becomes available, the generalization gap between the two setups increases, and warmup starts hurting generalization.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;active-online-learning&quot;&gt;Active Online Learning&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;In this setup, the learner is trained to sample k new examples to add to the training dataset (using margin-based sampling).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Like the previous setup, warmup strategy still hurts generalization.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;transfer-learning&quot;&gt;Transfer Learning&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Train a Resnet18 model on the CIFAR10 dataset and use this model to warm start training on the SVHN dataset.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;When a small percentage of the SVHN dataset is used, the setup resembles pretraining / transfer learning and performs better than training from scratch.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;As the percentage of the SVHN dataset increases, the warmup approach starts underperforming.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;overcoming-warm-start-problem&quot;&gt;Overcoming warm start problem&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;ResNet18 model on CIFAR10 dataset&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;When performing a hyper-parameter sweep over the learning rate and batch size, it is possible to train warm start models to reach the same generalization performance as training from scratch.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Though, in that case, there are no computational savings as the warm-started models take about the same time (to converge) as the randomly initialized model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The increased training time indicates that the warm started model probably needs to forget the knowledge from previous training rounds.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Warm start Resnet models, that generalize well, have a low correlation to their initialization stage (measured via Pearson correlation coefficient between the model weights).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Generalization is damaged even when using a model trained on incomplete data for only a few epochs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For warm start models, the gradient (corresponding to the “new” data) is higher than that for randomly initialized models. This hints that regularisation may help to close the generalization gap. But in practice, regularization helps both the warmup and randomly initialized model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Warm starting only a few layers also does not close the gap.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Adding some noise to the warm started model (with the motivation of having a partially random initialization) does help somewhat but also increases the training time.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Motivating the problem as an instance of catastrophic forgetting, the authors use the EWC algorithm but report that using EWC hurts model performance.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper does not propose a solution to the problem but provides a thorough analysis of the problem setup, which is quite useful for understanding the phenomenon itself.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Supervised Contrastive Learning</title>
+   <link href="/site/2020/04/30/Supervised-Contrastive-Learning.html"/>
+   <updated>2020-04-30T00:00:00-04:00</updated>
+   <id>/site/2020/04/30/Supervised Contrastive Learning</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper builds on the prior work on self-supervised contrastive learning and extends it for the supervised learning case where many positive examples are available for each anchor.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/2004.11362&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The representation learning framework has the following components:&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;data-augmentation-module&quot;&gt;Data Augmentation Module&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;This module transforms the input example. The paper considers the following strategies:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;Random crop, followed by resizing&lt;/li&gt;
+      &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1805.09501&quot;&gt;Auto Augment&lt;/a&gt; - A method to search for data augmentation strategies.&lt;/li&gt;
+      &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1909.13719&quot;&gt;Rand Augment&lt;/a&gt; - Randomly sampling a sequence of data augmentations, with repetition&lt;/li&gt;
+      &lt;li&gt;SimAugment - Sequentially apply random color distortion and Gaussian blurring, followed by probabilistic sparse image wrap.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;encoder-network&quot;&gt;Encoder Network&lt;/h3&gt;
+
+&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;* This module maps the input to a latent representation.
+
+* The same network is used to encode both the anchor and the sample.
+
+* The representation vector is normalized to lie on the unit hypersphere.
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
+
+&lt;h3 id=&quot;projection-network&quot;&gt;Projection Network&lt;/h3&gt;
+
+&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;* This module maps the normalized representation to another representation, on which the contrastive loss is computed.
+
+* This network is only used for training the supervised contrastive loss.
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
+
+&lt;h3 id=&quot;loss-function&quot;&gt;Loss function&lt;/h3&gt;
+
+&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;* The paper extends the standard contrastive loss formulation to handle multiple positive examples.
+
+* The main effect is that the modified loss accounts for all the same-class pairs (from within the sampled batch as well as the augmented batch).
+
+* The paper shows that the gradient (corresponding to the modified loss) causes the learning to focus more on hard examples. &quot;Hard&quot; cases are the ones where contrasting the anchor benefits the encoder more.
+
+* The proposed loss can also be seen as a generalization of the triplet loss.
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Dataset - ImageNet&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Models - ResNet50, ResNet200&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The network is “pretrained” using supervised contrastive loss.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;After pre-training, the projection network is removed, and a linear classifier is added.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This classifier is trained with the CE loss while the rest of the network is kept fixed.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Using supervised contrastive loss improves over all the baseline models and data augmentation approaches.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The resulting classifier is more robust to image corruptions, as shown by the mean Corruption Error (mCE) metric on the ImageNet-C dataset.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model is more stable to the choice oh hyperparameter values (like optimizers, data augmentation, and learning rates).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;training-details&quot;&gt;Training Details&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Supervised Contrastive loss is trained for 700 epochs during pre-training.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each step is about 50% more expensive than performing CE.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The dense classifier layer can be trained in as few as ten epochs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The temperature value is set to 0.07. Using a lower temperature is better than using a higher temperature.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+</content>
+ </entry>
+ 
+ <entry>
+   <title>CURL - Contrastive Unsupervised Representations for Reinforcement Learning</title>
+   <link href="/site/2020/04/09/CURL-Contrastive-Unsupervised-Representations-for-Reinforcement-Learning.html"/>
+   <updated>2020-04-09T00:00:00-04:00</updated>
+   <id>/site/2020/04/09/CURL Contrastive Unsupervised Representations for Reinforcement Learning</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes a contrastive learning approach, called CURL, for performing off-policy control from raw pixel observations (by transforming them into high dimensional features).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The idea is motivated by the application of contrastive losses in computer vision. But there are additional challenges:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;The learning agent has to perform both unsupervised and reinforcement learning.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The “dataset” for unsupervised learning is not fixed and keeps changing with the policy of the agent.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Unlike prior work, CURL introduces fewer changes in the underlying RL pipeline and provides more significant sample efficiency gains. For example, CURL (trained on pixels) nearly matches the performance of SAC policy (trained on state-based features).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://github.com/MishaLaskin/curl&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;implementation&quot;&gt;Implementation&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;CURL uses instance discrimination. Deep RL algorithms commonly use a stack of temporally consecutive frames as input to the policy. In such cases, instance discrimination is applied to all the images in the stack.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For generating the positive and negative samples, random crop data augmentation is used.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Bilinear inner product is used as the similarity metric as it outperforms the commonly used normalized dot product.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For encoding the anchors and the samples, InfoNCE is used. It learns two encoders $f_q$ and $f_k$ that transform the query (base input) and the key (positive/negative samples) into latent representations. The similarity loss is applied to these latents.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Momentum contrast is used to update the parameters ($\theta_k$) of the $f_k$ network. ie $\theta_k = m \theta_k + (1-m) \theta_q$. $\theta_q$ are the parameters of the $f_q$ network and are updated in the usual way, using both the contrastive loss and the RL loss.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiment&quot;&gt;Experiment&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;DMControl100K and Atart100K refer to the setups where the agent is trained for 100K steps on DMControl and Atari, respectively.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Metrics:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Sample Efficiency - How many steps does the baseline need to match CURL’s performance after 100K steps.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Performance - Ratio of episodic returns by CURL vs. the baseline after 100K steps.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Baselines:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;DMControl&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1910.01741&quot;&gt;SAC-AE&lt;/a&gt;&lt;/li&gt;
+          &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1907.00953&quot;&gt;SLAC&lt;/a&gt;&lt;/li&gt;
+          &lt;li&gt;&lt;a href=&quot;https://planetrl.github.io/&quot;&gt;PlaNet&lt;/a&gt;&lt;/li&gt;
+          &lt;li&gt;&lt;a href=&quot;https://openreview.net/forum?id=S1lOTC4tDS&quot;&gt;Dreamer&lt;/a&gt;&lt;/li&gt;
+          &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1812.05905&quot;&gt;Pixel SAC&lt;/a&gt;&lt;/li&gt;
+          &lt;li&gt;SAC trained on state-space observations&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Atari&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1903.00374&quot;&gt;SimPLe&lt;/a&gt;&lt;/li&gt;
+          &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1710.02298&quot;&gt;RainbowDQN&lt;/a&gt;&lt;/li&gt;
+          &lt;li&gt;&lt;a href=&quot;https://openreview.net/forum?id=Bke9u1HFwB&quot;&gt;OTRainbow (Over Trained Rainbow)&lt;/a&gt;&lt;/li&gt;
+          &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1906.05243&quot;&gt;Efficient Rainbow&lt;/a&gt;&lt;/li&gt;
+          &lt;li&gt;Random Agent&lt;/li&gt;
+          &lt;li&gt;Human Performance&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Results&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;DM Control&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;CURL outperforms all pixel-based RL algorithms by a significant margin for all environments on DMControl and most environments on Atari.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;On DMControl, it closely matches the performance of the SAC agent trained on state-space observations.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;On Atari, it achieves better median human normalizes score (HNS) than the other baselines and close to human efficiency in three environments.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Competitive Training of Mixtures of Independent Deep Generative Models</title>
+   <link href="/site/2020/03/12/Competitive-Training-of-Mixtures-of-Independent-Deep-Generative-Models.html"/>
+   <updated>2020-03-12T00:00:00-04:00</updated>
+   <id>/site/2020/03/12/Competitive Training of Mixtures of Independent Deep Generative Models</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes a Competitive training mechanism to train a mixture of independent generative models.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The idea is that this mixture of different models would divide the data distribution amongst themselves and specialize to their respective splits.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The training procedure is related to clustering-based methods.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1804.11130&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;motivation&quot;&gt;Motivation&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;In causal modeling, a common assumption is that the data is generated by a set of independent mechanisms.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It is not known which mechanism generates which datapoint and recovering the underlying mechanisms can be modeled as learning a structural causal generative model.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper assumes that the support of the different generators do not overlap, i.e., the underlying data distribution is factorized into non-overlapping regions.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This data factorization is learned using a set of discriminators.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;If there are $k$ generators, $k$ binary partition functions $c_i, … c_k$ are used.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For a given datapoint $x$, if $c_i(x) = 1$ then $c_j(x) = 0$ for all other $j$ and $x$ is assigned to $i^{th}$ generator.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For a fixed partition function $c_j^t$ ($t$ denotes the partition function at time $t$), minimize the sum of f-divergence between the model and the data distribution (that is assigned to it). The loss formulation is an upper bound on the f-divergence of the mixture model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the next step, the data points are re-assigned to the generative models, based on the likelihood of each data point for each model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The likelihood is estimated by training a discriminator that can distinguish the generated samples from the real samples.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;independence-as-an-inductive-bias&quot;&gt;Independence as an inductive bias&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The independence assumption may be too restrictive because the low-level features will be common across the distribution splits.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This “violation” can be avoided by pretraining the model using a uniform random split of the dataset. In that case, the independence assumption will hold approximately after pretraining.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Another approach could be to share some parameters across the models.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A “load balancing” approach is also used where each model always keeps training on the data points assigned to it if not enough data points are assigned to it.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;comparison-to-vaes-and-gans&quot;&gt;Comparison to VAEs and GANs&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;VAEs tend to be “overly inclusive” of the training distribution, i.e., they try to cover the entire support of the distribution.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;GANs are prone to mode collapse where the model focuses only on one part of the distribution.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The proposed method provides a middle ground where the different generative models can focus on different parts of the distribution.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The experiments seem to be limited. The paper shows that their proposed setup improves over the VAE and GAN baselines.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For datasets, the paper uses two-dimensional synthetic data, MNIST and CelebA&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>What Does Classifying More Than 10,000 Image Categories Tell Us?</title>
+   <link href="/site/2020/03/05/What-Does-Classifying-More-Than-10,000-Image-Categories-Tell-Us.html"/>
+   <updated>2020-03-05T00:00:00-05:00</updated>
+   <id>/site/2020/03/05/What Does Classifying More Than 10,000 Image Categories Tell Us</id>
+   <content type="html">&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper is among the first to study image classification at a large scale (10000 classes and 9 million examples).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This is a relatively old paper (2010). Some of the findings may not be relevant anymore. For instance, specific scaling challenges have been significantly overcome. Moreover, the paper uses approaches like SVM and KNN (popular at that time) and not use CNNs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Other observations of the paper are still very relevant, and it is an educating paper. For example, since ImagetNet classes are based on WordNet, the paper looks at the effect of semantic relations (tree) of categories on the performance of the training models.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;http://openaccess.thecvf.com/content_cvpr_2015/papers/Jain_What_do_15000_2015_CVPR_paper.pdf&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper considers three variants of the ImageNet dataset - ImageNet 10K (10184 classes), ImageNet 7K (7404 classes) and ImageNet 1K (1000 classes).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;They also consider smaller variants with randomly sampled classes or cases where the examples are sampled from one high-level category like vehicles.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;SVM and KNN models are used with features like Bag of Words, GIST descriptors, and spatial pyramid of histograms.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Observations&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;A model that performs well on the smaller dataset (with fewer classes) may not perform well on the larger dataset (with more classes).&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;There seems to be an approximate correlation between the structure of the semantic hierarchy of the labels (obtained via WordNet) and visual confusion between the categories.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;For example, consider two high-level concepts - says artifacts and animals. The model is less likely to confuse between the classes across the high-level concepts but more likely to confuse between the classes in the respective concepts.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;For dense categories (categories where the classes are semantically more closely related to each other), the model tends to make more mistakes (even if the number of classes is fewer).&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Accounting for the label hierarchy (in the loss function) improves the classification performance.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>mixup - Beyond Empirical Risk Minimization</title>
+   <link href="/site/2020/02/27/mixup-Beyond-Empirical-Risk-Minimization.html"/>
+   <updated>2020-02-27T00:00:00-05:00</updated>
+   <id>/site/2020/02/27/mixup Beyond Empirical Risk Minimization</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes a simple and dataset-agnostic data augmentation mechanism called &lt;em&gt;mixup&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Consider two training examples, $(x_1, y_1)$ and $(y_1, y_2)$, where $x_1$ and $x_2$ are the datapoints and $y_1$ and $y_2$ are the labels.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;New training examples of the form $(\lambda \times x_1 + (1-\lambda) \times x_2, \lambda \times y_1 + (1-\lambda) \times y_2)$ are constructured by considering the linear interpolation of the datapoints and the labels. Here $\lambda \in [0, 1]$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;$\lambda$ is sampled from a Beta distribution $Beta(\alpha, \alpha)$ where $\alpha \in (0, \infty)$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Setting $\lambda$ to 0 or 1 eliminates the effect of &lt;em&gt;mixup&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Mixup encourages the neural network to favor linear behavior between the training examples.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Supervised Learning&lt;/strong&gt;&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;ImageNet for ResNet-50, ResNet-101 and ResNext-101.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;CIFAR10/CIFAR100 for PreAct ResNet-18, WideResNet-28-10 and DenseNet.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Google command dataset for LeNet and VGG.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In all these setups, adding &lt;em&gt;mixup&lt;/em&gt; improves the performance of the model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;Mixup&lt;/em&gt; makes the model more robust to noisy labels. Moreover, &lt;em&gt;mixup&lt;/em&gt; + dropout improves over &lt;em&gt;mixup&lt;/em&gt; alone. This hints that &lt;em&gt;mixup&lt;/em&gt;’s benefits are complementary to those of dropout.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;Mixup&lt;/em&gt; makes the network more robust to adversarial examples in both white-box and black-box settings (ImageNet + Resnet101).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;Mixup&lt;/em&gt; also stabilizes the training of GANs by acting as a regularizer for the gradient of the discriminator.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;observations&quot;&gt;Observations&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Convex combination of three or more examples (with weights sampled from a Dirichlet distribution) does not provide gains over the case of two examples.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the authors’ implementation, &lt;em&gt;mixup&lt;/em&gt; is applied between images of the same batch (after shuffling).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Interpolating only between inputs, with the same labels, did not lead to the same kind of gains as &lt;em&gt;mixup&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators</title>
+   <link href="/site/2020/02/20/ELECTRA-Pre-training-Text-Encoders-as-Discriminators-Rather-Than-Generators.html"/>
+   <updated>2020-02-20T00:00:00-05:00</updated>
+   <id>/site/2020/02/20/ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Masked Language Modeling (MLM) is a common technique for pre-training language-based models. The idea is to “corrupt” some tokens in the input text (around 15%) by replacing them with the [MASK] token and then training the network to reconstruct (or predict) the corrupted tokens.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Since the network learns from only about 15% of the tokens, the computational cost of training using MLM can be quite high.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes to use a “replaced token detection” task where some tokens in the input text are replaced by other plausible tokens.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For each token in the modified text, the network has to predict if the token has been replaced or not.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The alternative token is generated using a small generator network.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Unlike the previous MLM setup, the proposed task is defined for all the input tokens, thus utilizing the training data more efficiently.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://openreview.net/forum?id=r1xMH1BtvB&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The proposed approach is called ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately)&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Two neural networks - Generator (G) and Discriminator (D) are trained.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each network has a Transformer-based text encoder that maps a sequence of words into a sequence of vectors.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given an input sequence x (of length N), k indices are chosen for replacing the tokens.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For each index, the generator produces a distribution over tokens. A token is sampled to replace in the original sequence. The resulting sequence is referred to as the corrupted sequence.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given the corrupted sequence, the Discriminator predicts which token comes from the data distribution and which comes from the generator.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The generator is trained using the MLM setup, and the Discriminator is trained using the discriminative loss.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;After pre-training, only the Discriminator is finetuned on the downstream tasks.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Datasets&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;GLUE Benchmark&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Stanford QA dataset&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Architecture Choices&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Sharing word embeddings between generator and Discriminator helps.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Tying all the encoder weights leads to marginal improvement but forces the generator and the Discriminator to be of the same size. Hence only embeddings are shared.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Generator model is kept smaller than the discriminator model as a strong generator can make the training difficult for the Discriminator.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;A two-stage training procedure was explored where only the generator is trained for n steps. Then the weights of the generator are used to initialize the Discriminator. The Discriminator is then trained for n steps while keeping the generator fixed.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;This two-stage setup provides a nice curriculum for the Discriminator but does not outperform the joint training based setup.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;An adversarial loss based setup is also explored but it does not work well probably because of the following reasons:&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Adverserially trained generator is not as good as the MLM generator.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Adverserially trained generator produces a low entropy output distribution.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Results&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;Both small and large ELECTRA models outperform baselines models like &lt;a href=&quot;https://arxiv.org/abs/1810.04805&quot;&gt;BERT&lt;/a&gt;, &lt;a href=&quot;https://arxiv.org/abs/1907.11692&quot;&gt;RoBERTa&lt;/a&gt;, &lt;a href=&quot;https://arxiv.org/abs/1802.05365&quot;&gt;ELMo&lt;/a&gt; and &lt;a href=&quot;https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf&quot;&gt;GPT&lt;/a&gt;.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Ablations&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;ELECTRA-15 is a variant of ELECTRA where the Discriminator is trained on only 15% of the tokens (similar to the MLM setup). This reduces performance significantly.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Replace MLM setup&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Perform MLM training, but instead of using [MASK], use a toke sampled from the generator.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;This improves the performance marginally.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;All-token MLM&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;In the MLM setup, replace the [MASK] token by the sampled tokens and train the MLM model to generate all the words.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;In practice, the MLM model can either generate a word or copy the existing word.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;This approach closes much of the gap between BERT and ELECTRA.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Interestingly, ELECTRA outperforms All-token MLM BERT suggesting the ELECTRA may be benefitting from parameter efficiency since it does not have to learn a distribution over all the words.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Gradient based sample selection for online continual learning</title>
+   <link href="/site/2020/02/13/Gradient-based-sample-selection-for-online-continual-learning.html"/>
+   <updated>2020-02-13T00:00:00-05:00</updated>
+   <id>/site/2020/02/13/Gradient based sample selection for online continual learning</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Use of replay buffer (and rehearsal) is a common technique for mitigating catastrophic forgetting.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper builds on this idea but focuses on the sample selection aspect ie, which data points to store in the replay buffer.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It formulates sample selection as a constraint minimization problem and shows that the proposed formulation is equivalent to maximizing the diversity of the samples with respect to parameter gradient.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1903.08671&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Supervised learning tasks&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Online stream of data (i.e., one or few datapoints accessed at a time).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;When considering the $t^{th}$ task, the objective is: minimize the loss on the current task without increasing the loss on any of the previous tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The above constraint can be rephrased as $dot(g_t, g_i) \gt 0 \forall i \in [0, t-1]$ where $g_t$ is the gradient for the $t^{th}$ task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This is equivalent to saying that the current task gradient should not interfere negatively with the previous task gradient.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;In practice, the gradient constraint is enforced only over the examples in the minibatch (and not the full dataset).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper interprets the constraint satisfaction problem as approximating an optimal feasible region (in the gradient space) where current task performance can be improved without hurting the performance on the previous tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The approximate region (of the shape of a polyhedral convex cone) is determined using only the examples from the replay buffer. Hence, the optimal region (defined for the entire dataset) would be contained within the approximate region.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The size of the approximate region can be measured in terms of the solid angle defined by the intersection between the approximate region and a unit sphere.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper argues that the approximate region can be made smaller by reducing the angle between each pair of gradients.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The set of points, satisfying the constraint, can be computed using the Integer Quadratic Programming (IQP).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given that the problem setup is online learning, using IDP for every new data point is not feasible.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;An in-exact, greedy alternative is suggested where a score is maintained for each example in the buffer.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;When a new datapoint comes in, the score is computed and used to decide if the existing datapoint in the buffer should be replaced.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The score is the maximal cosine similarity of the current example with a random sample in the buffer.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Benchmarks&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Disjoint MNIST&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Permuted MNIST&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Disjoint CIFAR10&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Shared head setup&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Baselines for sample selection&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Randomly select examples to keep in the buffer.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Perform clustering - either in the feature space or in the gradient space.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Use IQP to select the examples. This approach is not used for CIFAR10, as it is computationally costly.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;It would be interesting if the paper had considered baselines like selecting samples which had the largest loss.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The proposed greedy approach outperforms the other methods.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In an ablation experiment, the paper shows that the proposed approach works better than reservoir sampling (when the underlying data distribution is imbalanced).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Another experiment compares the proposed approach with &lt;a href=&quot;https://papers.nips.cc/paper/7225-gradient-episodic-memory-for-continual-learning.pdf&quot;&gt;Gradient Episodic Memory&lt;/a&gt; and &lt;a href=&quot;https://arxiv.org/abs/1611.07725&quot;&gt;iCaRL&lt;/a&gt;. For Permuted and Disjoint MNIST, the different methods perform quite similar though the proposed approach performs better on Disjoint CIFAR10.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One</title>
+   <link href="/site/2020/02/06/Your-Classifier-is-Secretly-an-Energy-Based-Model,-and-You-Should-Treat-it-Like-One.html"/>
+   <updated>2020-02-06T00:00:00-05:00</updated>
+   <id>/site/2020/02/06/Your Classifier is Secretly an Energy-Based Model, and You Should Treat it Like One</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposed a framework for joint modeling of labels and data by interpreting a discriminative classifier &lt;em&gt;p(y|x)&lt;/em&gt; as an energy-based model &lt;em&gt;p(x, y)&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Joint modeling provides benefits like improved calibration (i.e., the predictive confidence should align with the miss classification rate), robustness, and out of order distribution.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1912.03263&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;motivation&quot;&gt;Motivation&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Consider a standard classifier $f_{\theta}(x)$ which produces a k-dimensional vector of logits.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;$p_{\theta}(y | x) = softmax(f_{\theta}(x)[y])$&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Uisng concepts from energy based models, we write $p_{\theta}(x, y) = \frac{exp(-E_{\theta}(x, y))}{Z_{\theta}}$ where $E_{\theta}(x, y) = -f_{\theta}(x)[y]$&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;$p_{\theta}(x) = \sum_{y}{ \frac{exp(-E_{\theta}(x, y))}{Z_{\theta}}}$&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;$E_{\theta}(x) = -LogSumExp_y(f_{\theta}(x)[y])$&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Note that in the standard discriminative setup, shiting the logits $f_{\theta}(x)$ does not affect the model but it affects $p_{\theta}(x)$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Computing $p_{\theta}(y | x)$ using $p_{\theta}(x, y)$ and $p_{\theta}(x)$ gives back the same softmax parameterization as before.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This reinterpreted classifier is referred to as a Joint Energy-based Model (JEM).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;optimization&quot;&gt;Optimization&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The log-liklihood of the data can be factoized as $log p_{\theta}(x, y) = log p_{\theta}(x) + log p_{\theta}(y | x)$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The second factor can be trained using the standard CE loss. In contrast, the first factor can be trained using a sampler based on Stochastic Gradient Langevin Dynamics.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;h3 id=&quot;hybrid-modelling&quot;&gt;Hybrid Modelling&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Datasets: CIFAR10, CIFAR100, SVHN.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Metrics: Inception Score, Frechet Inception Distance&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;JEM outperforms generative, discriminative, and hybrid models on both generative and discriminative tasks.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;calibration&quot;&gt;Calibration&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;A calibrated classifier is the one where the predictive confidence aligns with the misclassification rate.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Dataset: CIFAR100&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;JEM improves calibration while retaining high accuracy.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;out-of-distribution-ood-detection&quot;&gt;Out of Distribution (OOD) Detection&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;One way to detect OOD samples is to learn a density model that assigns a higher likelihood to in-distribution examples and lower likelihood to out of distribution examples.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;JEM consistently assigns a higher likelihood to in-distribution examples.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper also proposes an alternate metric called &lt;em&gt;approximate mass&lt;/em&gt; to detect OOD examples.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The intuition is that a point could have likelihood but be impossible to sample because its surroundings have a very low density.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;On the other hand, the in-distribution data points would lie in a region of high probability mass.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Hence the norm of the gradient of log density could provide a useful signal to detect OOD examples.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;robustness&quot;&gt;Robustness&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;JEM is more robust to adversarial attacks as compared to discriminative classifiers.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges</title>
+   <link href="/site/2020/01/30/Massively-Multilingual-Neural-Machine-Translation-in-the-Wild-Findings-and-Challenges.html"/>
+   <updated>2020-01-30T00:00:00-05:00</updated>
+   <id>/site/2020/01/30/Massively Multilingual Neural Machine Translation in the Wild-Findings and Challenges</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes to build a universal neural machine translation system that can translate between any pair of languages.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;As a concrete instance, the paper prototypes a system that handles 103 languages (25 Billion translation pairs).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1907.05019&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;why-universal-machine-translation&quot;&gt;Why universal Machine Translation&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Hypothesis: &lt;em&gt;The learning signal from one language should benefit the quality of other languages&lt;/em&gt;&lt;a href=&quot;https://link.springer.com/article/10.1023/A:1007379606734&quot;&gt;1&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This positive transfer is evident for low resource languages but tends to hurt the performance for high resource languages.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In practice, adding new languages reduces the effective per-task capacity of the model.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;desiderata-for-multilingual-translation-model&quot;&gt;Desiderata for Multilingual Translation Model&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Maximize the number of languages within one model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Maximize the positive transfer to low resource languages.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Minimize the negative interference to high resource languages.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Perform well ion the realistic, multi-domain settings.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;datasets&quot;&gt;Datasets&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;In-house corpus generated by crawling and extracting parallel sentences from the web.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;102 languages, with 25 billion sentence pairs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Compared with the existing datasets, this dataset is much larger, spans more domains, has a good variation in the amount of data available for different language pairs, and is noisier. These factors bring additional challenges to the universal NMT setup.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;baselines&quot;&gt;Baselines&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Dedicated Bilingual models (variants of Transformers).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Most bilingual experiments used Transformer big and a shared source-target sentence-piece model (SPE).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For medium and low resource languages, the Transformer Base was also considered.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Batch size of 1 M tokes per-batch. Increasing the batch size improves model quality and speeds up convergence.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;effect-of-transfer-and-interference&quot;&gt;Effect of Transfer and Interference&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper compares the following two setups with the baseline:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Combine all the datasets and train over them as if it is a single dataset.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Combine all the datasets but upsample low resource languages so all that all the languages are equally likely to appear in the combined dataset.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A target “index” is prepended with every input sentence to indicate which language it should be translated into.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Shared encoder and decoder are used across all the language pairs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The two setups use a batch size of 4M tokens.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;results&quot;&gt;Results&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;When all the languages are equally sampled, the performance on the low resource languages increases, at the cost of performance on high resource languages.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Training over all the data at once reverse this trend.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;countering-interference&quot;&gt;Countering Interference&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Temperature based sampling strategy is used to control the ratio of samples from different language pairs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A balanced sampling strategy improves the performance for the high resource languages (though not as good as the multilingual baselines) while retaining the high transfer performance on the low resource languages.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Another reason behind the lagging performance (as compared to bilingual baselines) is the capacity of the multilingual models.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Some open problems to consider:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Task Scheduling - How to decide the order in which different language pairs should be trained.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Optimization for multitask learning - How to design optimizer, loss functions, etc. that can exploit task similarity.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Understanding Transfer:&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;For the low resource languages, translating multiple languages to English leads to improved performance than translating English to multiple languages.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;This can be explained as follows: In the first case (many-to-one), the setup is that of a multi-domain model (each source language is a domain). In the second case (one-to-many), the setup is that of multitasking.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;NMT models seem to be more amenable to transfer across multiple domains than transfer across tasks (since the decoder distribution does not change much).&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;In terms of zero-shot performance, the performance for most language pairs increases as the number of languages change from 10 to 102.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;effect-of-preprocessing-and-vocabulary&quot;&gt;Effect of preprocessing and vocabulary&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Sentence Piece Model (SPM) is used.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Temperature sampling is used to sample vocabulary from different languages.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Using smaller vocabulary (and hence smaller sub-word tokens) perform better for low resource languages, probably due to improved generalization.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Low and medium resource languages tend to perform better with higher temperatures.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;effect-of-capacity&quot;&gt;Effect of Capacity&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Using deeper models improves performance (as compared to the wider models with the same number of parameters) on most language pairs.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Observational Overfitting in Reinforcement Learning</title>
+   <link href="/site/2020/01/23/Observational-Overfitting-in-Reinforcement-Learning.html"/>
+   <updated>2020-01-23T00:00:00-05:00</updated>
+   <id>/site/2020/01/23/Observational Overfitting in Reinforcement Learning</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper studies &lt;em&gt;observational overfitting&lt;/em&gt;: The phenomenon where an agent overfits to different observation spaces even though the underlying MDP remains fixed.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Unlike other works, the “background information” (in the pixel space) is correlated with the progress of the agent (and is not just noise).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1912.02975&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Base MDP $M = (S, A, R, T)$ where $S$ is the state space, $A$ is the action space, $R$ is the reward function, and $T$ is the transition dynamics.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;$M$ is parameterized using $\theta$. In practice, it means introducing an observation function $\phi_{\theta}$ ie $M_{\theta} = (M, \phi_{\theta})$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A distribution over $\theta$ defines a distribution over the MDPs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The learning agent has access to the pixel space observations and not the state space observations.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Generalization gap is defined as $J_{\theta}(\pi) - J_{\theta^{train}}(\pi)$ where $\pi$ is the learning agent, $\theta$ is the distribution over all the observation functions, $\theta^{train}$ is the distribution over the observation functions corresponding to the training environments. $J_{\theta}(\pi)$ is the average reward that the agent obtains over environments sampled from $M_{\theta}$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;$\phi_{\theta}$ considers two featurs - generalizable (invariant across $\theta$) and non-generalizable (depends on $\theta$) ie $\phi_{\theta}(s) = concat(f(s), g_{\theta}(s))$ where $f$ is the invariant function and $g$ is the non-generalizable function.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The problem is set up such that “explicit regularization” can easily solve it. The focus is on understanding the effect of “implicit regularization”.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;h3 id=&quot;overparameterized-lqr&quot;&gt;Overparameterized LQR&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;LQR is used as a proxy for deep RL architectures given its advantages like enabling exact gradient descent.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The functions are parameterized as follows:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;$f(s) = W_c(s)$&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;$g_{\theta}(s) = W_{\theta}(s)$&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Observation at time $t$ , $o_t$, is given as $[W_c W_{\theta}]^{-1} s_t$.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Action at time $t$ is given as $a_t = K o_{t}$ where $K$ is the policy matrix.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Dimensionality:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;state $s$: $d_{state}$ 100&lt;/li&gt;
+      &lt;li&gt;$f(s)$: $d_{state}$ 100&lt;/li&gt;
+      &lt;li&gt;$g_{\theta}(s)$: $d_{noise}$ 100&lt;/li&gt;
+      &lt;li&gt;observation $o$: $d_{state}$ + $d_{noise}$ 1100&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In case of training on just one environment, multiple solutions exist, and overfitting happens.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Increasing $d_{noise}$ increases the generalization gap.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Overparameterizing the network decreases the generalization gap and also reduces the norm of the policy.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;projected-gym-environments&quot;&gt;Projected Gym Environments&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The base MDP is the Gym Environment.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;$M_{\theta}$ is generated as before.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Increasing both width and depth for basic MLPs improves generalization.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Generalization also depends on the choice of activation function, residual layers, etc.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;deconvolutional-projections&quot;&gt;Deconvolutional Projections&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the Gym environment, the actual state is projected to a larger vector and reshaped into an 84x84 tensor (image).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The image from $f$ is concatenated with the image from $g$. This setup is referred to as the Gym-Deconv.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The relative order of performance between NatureCNN, IMPALA, and IMPALA-Large (on both CoinRun and Gym-Deconv) is the same as the order of the number of parameters they contain.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In an ablation, the policy is given access to only $g_{\theta}(s)$, which makes it impossible for the model to generalize. In this test of memorization capacity, implicit regularization seems to reduce the memorization effect.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;overparameterization-in-coinrun&quot;&gt;Overparameterization in CoinRun&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The pixel space observation in CoinRun is downsized from 64x64 to 32x32 and flattened into a vector.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In CoinRun, the dynamics change per level, and the noisy “irrelevant” features change location across the 1D input, making this setup more challenging than the previous ones.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Overparameterization improves generalization in this scenario as well.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML</title>
+   <link href="/site/2020/01/16/Rapid-Learning-or-Feature-Reuse-Towards-Understanding-the-Effectiveness-of-MAML.html"/>
+   <updated>2020-01-16T00:00:00-05:00</updated>
+   <id>/site/2020/01/16/Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper investigated two possible reasons behind the usefulness of MAML algorithm:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;strong&gt;Rapid Learning&lt;/strong&gt; - Does MAML learn features that are amenable for rapid learning?&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;strong&gt;Feature Reuse&lt;/strong&gt; - Does the MAML initialization provide high-quality features that are useful for the unseen tasks.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This leads to a follow-up question: how much task-specific inner loop adaptation is needed.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1909.09157&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;In a standard few-shot learning setup, the different datasets have different classes. Hence, the top-most layer (or the head) of the learning model should be different for different tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The subsequent discussion only applies to the body of the network (ie, network minus the head).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Freezing Layer Representations&lt;/strong&gt;&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;In this setup, a subset (or all) of parameters are frozen (after MAML training) and are not adapted during the representation.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Even when the entire network is frozen, the performance drops only marginally.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;This indicates that the representation learned by the meta-initialization is good enough to be useful on the test tasks (without requiring any adaptation step).&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Note that the head of the network is still adapted during testing.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Representational Similarity&lt;/strong&gt;&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;In this setup, the paper reports the change in the latent representation (learned by the network) during the inner loop update with a fully trained model.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Canonical Correlation Analysis (CCA) and Central Kernel Alignment (CKA) metrics are used to measure the similarity between the representations.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The main finding is that the representations in the body of the network are very similar before and after the inner loop updates while the representations in the head of the network are very different.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The above two observations indicate that feature reuse is the primary driving factor for the success of MAML.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;When does feature reuse happen&lt;/strong&gt;&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;The paper considers the model at different stages of training and compares the similarity in the representation (before and after the inner loop update).&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Even early in training, the CCA similarity between the representations (before and after the inner loop update) is quite high. Similarly, freezing the layers (for the test time update), early in training, does not degrade the test time performance much. This hints that the feature reuse happens early in the learning process.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;the-anil-almost-no-inner-loop-algorithm&quot;&gt;The ANIL (Almost No Inner Loop) Algorithm&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The empirical evidence suggests that the success of MAML lies in the feature reuse.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The authors build on this observation and propose a simplification of the MAML algorithm: ANIL or Almost No Inner Loop Algorithm&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In this algorithm, the inner loop updates are applied only to the head of the network.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Despite being much more straightforward, the performance of ANIL is close to the performance of MAML for both few-shot image classification and RL tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Removing most of the inner loop parameters speed up the computation by a factor of 1.7 (during training) and 4.1 (during inference).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;removing-the-inner-loop-update&quot;&gt;Removing the Inner Loop Update&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given that it is possible to remove most of the parameters from the inner loop update (without affecting the performance), the next step is to check if the inner loop update can be removed entirely.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This leads to the NIL (No Inner Loop) algorithm, which does not involve any inner loop adaptation steps.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;algorithm&quot;&gt;Algorithm&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;A few-shot learning model is trained - either with MAML or ANIL.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;During testing, the head is removed.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For each task, the K training examples are fed to the body to obtain class representations.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For a given test data point, the representation of the data point is compared with the different class representations to obtain the target class.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The NIL algorithm performs similar to the MAML and the ANIL algorithms for the few-shot image classification task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Note that it is still important to use MAML/ANIL during training, even though the learned head is not used during evaluation.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The paper discusses the different classes of meta-learning approaches. It concludes with the observation that feature reuse (and not rapid adaptation) seems to be the common model of operation for both optimization-based meta-learning (e.g., MAML) and model-based meta-learning.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour</title>
+   <link href="/site/2020/01/09/Accurate-Large-Minibatch-SGD-Training-ImageNet-in-1-Hour.html"/>
+   <updated>2020-01-09T00:00:00-05:00</updated>
+   <id>/site/2020/01/09/Accurate Large Minibatch SGD - Training ImageNet in 1 Hour</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Training models with large minibatches (using distributed synchronous SGD) can lead to optimization issues.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper presents techniques for training models with large batch size while matching the accuracy of small minibatch setups.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper focuses on the ImageNet dataset, but many of the proposed ideas are applicable broadly.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1706.02677&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;linear-scaling-rule&quot;&gt;Linear Scaling Rule&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;When the minibatch size increases by a factor of &lt;em&gt;k&lt;/em&gt;, the learning rate should also be increased by a factor of &lt;em&gt;k&lt;/em&gt; (while keeping all other hyperparameters like weight decay fixed).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Note that this is an empirical rule and is not expected to hold under all conditions.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One such condition is when the model is changing rapidly during the first few epochs. In this case, a warmup phase is introduced to stabilize the model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper verifies that the scaling rule is applicable to batch sizes as large as 8K.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;warmup&quot;&gt;Warmup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The learning rate should be gradually ramped up from a small value to a large value to allow convergence.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;batch-normalization&quot;&gt;Batch Normalization&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Batch normalization uses batch statistics to normalize the data. Hence, the loss corresponding to each data point (in the batch) is not independent. Thus, changing the batch size could change the underlying function being optimized.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the distributed SGD setup, the per-GPU (or per-worker) batch size should be kept constant, and only one worker should compute the batch norm statistics.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;pitfalls-when-using-distributed-sgd&quot;&gt;Pitfalls when using distributed SGD&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;When using weight decay, scaling the cross-entropy loss is not the same as scaling the learning rate.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;When using momentum, changing the learning rate could require “momentum correction.”&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Ensure that the per-worker loss is normalized by the size of the total minibatch and not just by the size of minibatch that each worker sees.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For each epoch, uses a single random shuffling of the training data (before dividing between the workers).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;communication&quot;&gt;Communication&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper describes various techniques to speed up the training pipeline by reducing the communication overhead between nodes. (Each node can have one or more GPUs).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;First, a node sums the gradient from all the GPUs it has.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The gradients are shared and summed across all the nodes.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each node broadcasts the resulting gradient to all the GPUs it has.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Gradient Aggregation is performed in parallel with the backpropagation operator. While aggregating the gradient for one layer, the system starts computing the gradient of the next layer.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Using these approaches, a Resnet50 model can be trained on the ImageNet dataset in an hour (using 256 workers).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;When an appropriate warmup strategy is used, the training and the validation curves (for the large batch size setup) matches the corresponding curves for the small batch size setup.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The best performing warmup strategy is the one where training starts at a learning rate of 0.1 and linearly increases to 3.2 over five epochs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper shows that the results are not specific to the Resnet50 model (experiments with Resnet101 model) or the use case (experiments with object detection and instance segmentation using Mask R-CNN).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Along with providing the empirical validation of the proposed ideas, the paper describes all the hyperparameters. It also includes the training and validation curves with the different configurations which enable others to replicate and build on this work.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Superposition of many models into one</title>
+   <link href="/site/2020/01/02/Superposition-of-many-models-into-one.html"/>
+   <updated>2020-01-02T00:00:00-05:00</updated>
+   <id>/site/2020/01/02/Superposition of many models into one</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes a technique (called Parameter Superposition or PSP) for training and storing multiple models within a single set (or instance) of parameters.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The different models exist in “superposition” and can be retrieved dynamically given task-specific context information.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1902.05522&quot;&gt;Link to the paper&lt;/a&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;parameter-substitution&quot;&gt;Parameter Substitution&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Consider a task with input \(x \in R^N\) and parameter \(W$ \in R^{M \times N}\) where the output (target or features) are given as \(y=Wx\).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Now consider \(K\) such tasks with parameters \(W_1, W_2, \cdots W_K\).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;If each \(W_k\) requires only a small subspace in \(R^N\), then a linear transformation \(C_k^{-1}\) can be used such that each \(W_kC_k^{-1}\) occupies a mutually orthogonal subspace in \(R^N\).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The set of parameters \(W_1, \cdots W_K\) can be represented by a single \(W^{M \times N}\) by adding \(W_kC_k^{-1}\).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The parameter corresponding to the \(k^{th}\) task can be retrived (with some noise) using the context \(C_k\) as \(W^{~}_k = WC_k\)&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Even though the retrieval is noisy, the effect of noise is limited for the context vectors used in the paper.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Finally, \(\widetilde(y) = \widetilde(W)_{k}x = (WC_{k})x = W(C_{k}x)\)&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Instead of learning \(K\) separate models, only \(K\) context vectors (along with 1 superimposed model) needs to be learned.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The key assumption is that \(N\) (in \(x \in R^N)\) is large enough such that each \(W_k\) requires only a small subspace of \(R^N\).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Since images and speech signals tend to occupy a low dimensional manifold, this requirement can be satisfied by over-parameterizing x.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;choice-of-context-c&quot;&gt;Choice of Context C&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Rotational Superposition (pspRotation)&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Sample rotations uniformly from the orthogonal group \(O(M)\).&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Downside is that if \(M \sim N\), it requires storing as many parameters as learning \(K\) individual models (since \(C\) is of the size of ##M \times M$$).&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Complex Superposition (pspComplex)&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;The design of rotational superposition can be improved by choosing \(C_k\) to be a diagonal matrix ie \(C_k = diag(c_k)\) where \(c_k\) is a vector of size \(M\).&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Choosing \(c_k\) to be a vector of complex numbers (of the form \(c_{k}^{j} = e^{i\phi_{j}(k)}\) where \(\phi_{j}(k)\) or the phase is sampled uniformly from \([-\pi, \pi]\)) leads to \(C_k\) being a digonal orthogonal matrix.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Powers of a single context&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;The memory footprint can be further reduced by choosing the context vectors to be integral powers of the first context vector.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Binary Superposition (pspBinary)&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;This is a special case of complex superposition where the context vectors are binary.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;neural-network-superposition&quot;&gt;Neural Network Superposition&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The parameter superposition principle can be applied to all the linear layers of a network.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For the convolutional layers, it makes more sense to apply superposition to the convolutional kernel and not to the input image (as the dimensionality of convolutional parameters is smaller than that of inputs).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;For all the experiments, the baseline is a standard supervised learning setup, unless mentioned otherwise.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The metric is the performance on the previous tasks when the model has been trained on the newer tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Input Interference&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;The input distribution changes over time.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Permuted MNIST dataset is used where each permutation of the pixels corresponds to a new task.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;A new task is sampled every 1000 mini-batches.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;As the network size increases, the performance of Parameter Superposition (psp) outperforms the baseline significantly.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;pspRotation &amp;gt; pspComplex &amp;gt; pspBinary in terms of both performance and the number of additional parameters required for each new task.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Given that pspBinary is the easiest to implement while being comparable to more sophisticated baselines like Elastic Weight Consolidation (EWC) and Synaptic Intelligence, the paper presents most of the results with the pspBinary model.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Continous Domain Shift&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Rotating-MNIST and Rotating-FashionMNIST tasks are proposed to simulate continuous domain shift.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;In these tasks, the input images are rotated in-plane by a small angle such that the rotation is complete after 1000 steps.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;A new context is assigned after 100 steps as per step changes in the angle would be very small.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The 10 context vectors used in the first 1000 steps are reused for the subsequent steps.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Randomly changing the context vector&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;The paper considers an ablation where the context vector is randomly changed at every step (of the 1000 step cycle). This required the superposition model to store 1000 models.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;This approach is better than the supervised learning baseline but not as good as the proposed psp* models.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Output Interference&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;This is the setup where the model transitions from one classification task to another.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Incremental CIFAR dataset is used with Resnet18 as the base model.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Baseline is a standard supervised learning model where a new classification head is used for each task (since the classes have a different meaning in each dataset). The model component before the classification layer is shared across the tasks.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Even though the labels are different across the datasets, the pspBinary model, trained with a single output layer, outperforms the multi-headed baseline.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Towards a Unified Theory of State Abstraction for MDPs</title>
+   <link href="/site/2019/12/26/Towards-a-Unified-Theory-of-State-Abstraction-for-MDPs.html"/>
+   <updated>2019-12-26T00:00:00-05:00</updated>
+   <id>/site/2019/12/26/Towards a Unified Theory of State Abstraction for MDPs</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper studies five different techniques for stat abstraction in MDPs (Markov Decision Processes) and evaluates their usefulness for planning and learning.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The general idea behind abstraction is to map the actual (or observed) state to an abstract state that should be more amenable for learning.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It can be thought of as a mapping from one representation to another representation while preserving some useful properties.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://pdfs.semanticscholar.org/ca9a/2d326b9de48c095a6cb5912e1990d2c5ab46.pdf&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;general-definition&quot;&gt;General Definition&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Consider a MDP \(M = &amp;lt;S, A, P, R, \gamma&amp;gt;\) where \(S\) is the finite set of states, \(A\) is finite set of actions, \(P\) is the transition function, \(R\) is the bounded reward function and \(\gamma\) is the discount factor.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The abstract version of the MDP is \(\widetilde{M} = &amp;lt;\widetilde{S}, A, \widetilde{P}, \widetilde{R}, \gamma&amp;gt;\) where \(\widetilde{S}\) is the finite set if abstract states, \(\widetilde{P}\) is the transition function in the abstract state space and \(\widetilde{R}\) is the bounded reward function in the abstract reward space.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Abstraction function \(\phi\) is a function that maps a given state \(s\) to its abstract counterpart \(\widetilde{s}\).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The inverse image \(\phi^{-1}(\widetilde{s})\) is the set of ground states that map to the \(\widetilde{s}\) under the abstraction function \(\phi\).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A wieghing functioon \(w(s)\) is used to measure how much does a state \(s\) contribute to the abstract state \(\phi(s)\).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;topology-of-abstraction-space&quot;&gt;Topology of Abstraction Space&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given two abstraction functions \(\phi_{1}\) and \(\phi_{2}\), \(\phi_{1}\) is said to be &lt;em&gt;finer&lt;/em&gt; than \(\phi_{2}\) iff for any states \(s_{1}, s_{2}\) if \(\phi_{1}(s_{1}) = \phi_{1}(s_{2})\) then \(\phi_{2}(s_{1}) = \phi_{2}(s_{2})\).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This &lt;em&gt;finer&lt;/em&gt; relation is reflex, antisymmetric, transitive and partially ordered.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;five-types-of-abstraction&quot;&gt;Five Types of Abstraction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;While many abstractions are possible, not all abstractions are equally important.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Model-irrelevance abstraction \(\phi_{model}\):&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;If two states $s_{1}$ and $s_{2}$ have the same abstracted state, then their one-step model is preserved.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Consider any action \(a\) and any abstract state \(\widetilde{s}\), if \(\phi_{model}(s_{1} = \phi_{model}(s_{2})\) then \(R(s_1, a) = R(s_2, a)\) and \(\sum_{s&apos; \in \phi_{model}^{-1}\widetilde(s)}P_{s_1, s&apos;}^{a} = \sum_{s&apos; \in \phi_{model}^{-1}\widetilde(s)}P_{s_2, s&apos;}^{a}\).&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;\(Q^{\pi}\)-irrelevance abstraction:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;It preserves the state-action value finction for all the states.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;\(\phi_{Q^{\pi}}(s_1) = \phi_{Q^{\pi}}(s_2)\) implies \(Q^{\pi}(s_1, a) = Q^{\pi}(s_1, a)\).&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;\(Q^{*}\)-irrelevance abstraction:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;It preserves the optimal state-action value function.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;\(a^{*}\)-irrelevance abstraction:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;It preserves the optimal action and its value function.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;\(\phi_{\pi^{*}}\)-irrelevance abstraction:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;It preserves the optimal action.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In terms of &lt;em&gt;fineness&lt;/em&gt;, \(\phi_0 \geq \phi_{model} \geq \phi_{Q^{\pi}} \geq \phi_{Q^*} \geq \phi_{a^*} \geq \phi_{\pi^*}\). Here \(\phi_0\) is the identity mapping ie \(\phi_0(s) = s\)&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;If a property applies to any abstraction, it also applies to all the finer abstractions.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;key-theorems&quot;&gt;Key Theorems&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;As we go from finer to coarser abstractions, the information loss increases (ie fewer components can be recovered) while the state-space reduces (ie the efficiency of solving the problem increases). This leads to a tradeoff when selecting abstractions.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For example, with abstractions \(\phi_{model}, \phi_{Q^{\pi}}, \phi_{Q^*}, \phi_{a^*}\), the optimal abstract policy \(\widetilde(\pi)^*\) is optimal in the ground MDP.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Similarly, if each state-action pair is visited infinitely often and the step-size decays properly, Q-learning with \(\phi_{model}, \phi_{Q^{\pi}}, \phi_{Q^*}\) converges to the optimal state-action value functions in the MDP. More conditions are needed for convergence in the case of the remaining two abstractions.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For \(\phi_{model}, \phi_{Q^{\pi}}, \phi_{Q^*}, \phi_{a^*}\), the model built with the experience converges to the true abstract model with infinite experience if the weighing function \(w(s)\) is fixed.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+</content>
+ </entry>
+ 
+ <entry>
+   <title>ALBERT - A Lite BERT for Self-supervised Learning of Language Representations</title>
+   <link href="/site/2019/12/19/ALBERT-A-Lite-BERT-for-Self-supervised-Learning-of-Language-Representations.html"/>
+   <updated>2019-12-19T00:00:00-05:00</updated>
+   <id>/site/2019/12/19/ALBERT - A Lite BERT for Self-supervised Learning of Language Representations</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes parameter-reduction techniques to lower the memory consumption (and improve training speed) of BERT.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It also proposes to use a self-supervised loss (based on inter-sentence coherence) and argues that this loss is better than the NSP loss used by BERT.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1909.11942&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;architecture&quot;&gt;Architecture&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;ALBERT architecture is similar to that of BERT with three major differences.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Factorized Embedding Parameterization&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;In BERT and followup works, the embedding size was tied to the size of the context vector.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Since context vector is expected to encoder the entire context, it needs to have a large dimensionality.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;One consequence of this choice is that even the embedding layer (which encodes the representation for each token) has a large size. This increases the overall memory footprint of the model.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The paper proposed to factorize the embedding parameters into two smaller matrics.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The embedding layer learns a low dimensional representation of the tokens and this representation is projected into a high dimensional space.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Cross-layer parameter sharing&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;ALBERT shares all the parameters across the layers.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Inter-sentence coherence loss&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;BERT uses two losses - Masked Language Modeling loss (MLM) and Next Sentence Prediction (NSP).&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;In the NSP task, the model is provided a pair of sentences and it has to predict if the two sentences appear consecutively in the same document or not. Negative samples are created by sampling sentences from different documents.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The paper argues that NSP is not effective as a loss function as it merges topic prediction and coherence prediction into one task (as the two sentences come from different documents). The topic prediction is an easier task as compared to coherence prediction.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Hence the paper proposes to use the Sentence Order Prediction task where the model has to predict which of the two sentences comes first in a document. The negative samples are created by simply swapping the order in the positive samples. Hence both the sentences come from the same document and topic prediction alone can not be used to solve the task.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Different variants (in terms of size) of ALBERT and BERT models are compared (eg ALBERT, ALBERT-x, BERT-x, etc).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In general, ALBERT models have many-times fewer parameters as compared to the BERT models.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Datasets - BookCorpus, English Wikipedia.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;observations&quot;&gt;Observations&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;ALBERT-xxlarge significantly outperforms the BERT-large model even though it has around 70% parameters as the BERT-large model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;BERT-xlarge performs worse than BERT-base hinting that it is difficult to train such large models.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;ALBERT models also have better data throughput as compared to BERT models.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For the ALBERT models, an embedding size of 128 performs the best.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;As the hidden dimension is increased, the model obtains better performance, but with diminishing returns.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Very wide ALBERT models (say with a context size of 1024) do not benefit much from depth.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Using additional training data boosts the performance for most of the downstream tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper empirically shows that using dropout could hurt the performance of the ALBERT models. This observation may not hold for BERT as it does not share parameters across layers and hence may need regularization via dropout.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;ALBERT also improves the state of the art performance on GLUE, SQuAD and RACE benchmarks, for both single-model and ensemble setup.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text</title>
+   <link href="/site/2019/12/12/Everything-Happens-for-a-Reason-Discovering-the-Purpose-of-Actions-in-Procedural-Text.html"/>
+   <updated>2019-12-12T00:00:00-05:00</updated>
+   <id>/site/2019/12/12/Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Procedural text comprehension tasks focus on modeling the effect of actions and predicting what happens next.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;But they do not consider &lt;em&gt;why&lt;/em&gt; some actions need to happen before other actions.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes a new model called XPAD (eXPlainable Action Dependency) that considers the &lt;em&gt;purpose&lt;/em&gt; of actions while predicting their effect.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model favors &lt;em&gt;effects&lt;/em&gt; that:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;explain more of actions in the text.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;are more plausible given the context.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;An existing procedural text benchmark dataset (Propara) is expanded by adding the task of explaining actions by predicting their dependencies.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1909.04745&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;http://data.allenai.org/propara/&quot;&gt;Link to the dataset&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Input&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Procedural (chronologically ordered text) sequence of &lt;em&gt;T&lt;/em&gt; sentences.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;List of &lt;em&gt;N&lt;/em&gt; participant entities, whose state changes at some step.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Output&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;State change matrix $\pi(T \times N)$ with four possible states - move, create destroy, none.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;This matrix tracks how property changes after each step.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Dependency Explanation Graph&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Identify what steps are necessary to execute a given step (say &lt;em&gt;s&lt;sub&gt;i&lt;/sub&gt;&lt;/em&gt;) and represent this dependency in the form of a dependency explanation graph &lt;em&gt;G = &amp;lt;S, E&amp;gt;&lt;/em&gt;.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;In this graph, each node is a step and the direction of edge describes the order of dependency.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;dependency-graph-dataset&quot;&gt;Dependency Graph Dataset&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1805.06975&quot;&gt;Propara dataset&lt;/a&gt; is expanded to extract the dependency graph using both heuristic and automated methods.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The automated method is based on the coherence assumption that if step &lt;em&gt;s&lt;sub&gt;j&lt;/sub&gt;&lt;/em&gt; changes state of entity &lt;em&gt;e&lt;sub&gt;k&lt;/sub&gt;&lt;/em&gt; then &lt;em&gt;s&lt;sub&gt;j&lt;/sub&gt;&lt;/em&gt; is a precondition for the first subsequent step that changes the state of &lt;em&gt;e&lt;sub&gt;k&lt;/sub&gt;&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;xpad-model&quot;&gt;XPAD Model&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model is based on the ProStruct system and uses an encoder-decoder based architecture.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Encoder&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Input: Sentence &lt;em&gt;s&lt;sub&gt;t&lt;/sub&gt;&lt;/em&gt; and entity &lt;em&gt;e&lt;sub&gt;j&lt;/sub&gt;&lt;/em&gt;.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Sentence is encoded using the GloVe vectors and a BiLSTM model and the entity is encoded as an indicator variable.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The combined representation is denoted as &lt;em&gt;c&lt;sub&gt;tj&lt;/sub&gt;&lt;/em&gt;.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;This representation is passed through an MLP to generate &lt;em&gt;k&lt;/em&gt; logits that encode the probability of each entity &lt;em&gt;j&lt;/em&gt; undergoing a state change at step &lt;em&gt;t&lt;/em&gt;.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Decoder&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Beam search is performed to decode the encoder representation into the state change matrix and dependency graph using a score function that ensures global consistency.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Score function has two components:&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;State change score - depends on the likelihood that the selected state changes at step &lt;em&gt;t&lt;/em&gt; given the text and state change history from steps &lt;em&gt;s&lt;sub&gt;1&lt;/sub&gt;&lt;/em&gt; to &lt;em&gt;s&lt;sub&gt;t-1&lt;/sub&gt;&lt;/em&gt;.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Dependency graph score&lt;/p&gt;
+
+            &lt;ul&gt;
+              &lt;li&gt;
+                &lt;p&gt;This is based on the connectivity and likelihood of the resulting dependency explanation graph.&lt;/p&gt;
+              &lt;/li&gt;
+              &lt;li&gt;
+                &lt;p&gt;This score is used to bias the graph search towards:&lt;/p&gt;
+
+                &lt;ul&gt;
+                  &lt;li&gt;
+                    &lt;p&gt;predictions that have an identifiable purpose ie checking if a particular state change prediction leads to a connection in the dependency explanation graph.&lt;/p&gt;
+                  &lt;/li&gt;
+                  &lt;li&gt;
+                    &lt;p&gt;graphs that are more likely according to the background knowledge to distinguish likely dependency links from the unlikely ones.&lt;/p&gt;
+                  &lt;/li&gt;
+                &lt;/ul&gt;
+              &lt;/li&gt;
+            &lt;/ul&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;During training, XPAD has access to the correct path (in the search space) and learns to minimize the joint loss corresponding to predicting the state change and the dependency explanation graph.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;During testing, XPAD performs beam search to predict the most likely state change and dependency explanation graph.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Tasks:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;State change prediction&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Dependency explanation prediction&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Baselines:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1612.03969&quot;&gt;Recurrent Entity Networks&lt;/a&gt;&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1606.04582&quot;&gt;Query-Reduction Networks&lt;/a&gt;&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1805.06975&quot;&gt;ProLocal and ProGlobal&lt;/a&gt;&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1808.10012&quot;&gt;ProStruct&lt;/a&gt;&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;XPAD significantly outperforms all the baseline models on the dependency explanation task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Improvements on the state change prediction task are less significant.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Removing dependency graph scores from XPAD leads to a drop in the F1 score.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper provides an elaborate discussion on the different types of errors that the XPAD system makes.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model</title>
+   <link href="/site/2019/12/05/Mastering-Atari,-Go,-Chess-and-Shogi-by-Planning-with-a-Learned-Model.html"/>
+   <updated>2019-12-05T00:00:00-05:00</updated>
+   <id>/site/2019/12/05/Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper presents the MuZero algorithm that performs planning with a learned model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The algorithm achieves state of the art results on Atari suite (where generally model-free approaches perform the best) and on planning-oriented games like Chess and Go (where generall planning approaches perform the best).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1911.08265&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;relation-to-standard-model-based-approaches&quot;&gt;Relation to standard Model-Based Approaches&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Model-based approaches generally focus on reconstructing the true environment state or the sequence of full observations.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;MuZero focuses on predicting only those aspects that are most relevant for planning - policy, value functions, and rewards.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model consists of three components: (representation) encoder, dynamics function, and the prediction network.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The learning agent has two kinds of interactions - real interactions (ie the actions that are actually executed in the real environment) and hypothetical or imaginary actions (ie the actions that are executed in the learned model or the dynamics function).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;At any timestep &lt;em&gt;t&lt;/em&gt;, the past observations &lt;em&gt;o&lt;sub&gt;1&lt;/sub&gt;&lt;/em&gt;, … &lt;em&gt;o&lt;sub&gt;t&lt;/sub&gt;&lt;/em&gt; are encoded into the state &lt;em&gt;s&lt;sub&gt;t&lt;/sub&gt;&lt;/em&gt; using the encoder.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Now the model takes hypothetical actions for the next &lt;em&gt;K&lt;/em&gt; timesteps by unrolling the model for &lt;em&gt;K&lt;/em&gt; steps.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For each timestep &lt;em&gt;k = 1, …, K&lt;/em&gt;, the dynamics model predicts the immediate reward &lt;em&gt;r&lt;sub&gt;k&lt;/sub&gt;&lt;/em&gt; and a new hidden state &lt;em&gt;h&lt;sub&gt;k&lt;/sub&gt;&lt;/em&gt; using the previous hidden state &lt;em&gt;h&lt;sub&gt;k-1&lt;/sub&gt;&lt;/em&gt; and action &lt;em&gt;a&lt;sub&gt;k&lt;/sub&gt;&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;At the same time, the policy &lt;em&gt;p&lt;sup&gt;k&lt;/sup&gt;&lt;/em&gt; and the value function &lt;em&gt;v&lt;sup&gt;k&lt;/sup&gt;&lt;/em&gt; are computed using the prediction network.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The initial hidden state &lt;em&gt;h&lt;sub&gt;0&lt;/sub&gt;&lt;/em&gt; is initialized using the state &lt;em&gt;s&lt;sub&gt;t&lt;/sub&gt;&lt;/em&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Any MDP Planning algorithm can be used to search for optimal policy and value function given the state transitions and the rewards induced by the dynamics function.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Specifically, the MCTS (Monte Carlo Tree Search) algorithm is used and the action &lt;em&gt;a&lt;sub&gt;t+1&lt;/sub&gt;&lt;/em&gt; (ie the action that is executed in the actual environment) is selected from the policy outputted by MCTS.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;collecting-data-for-the-replay-buffer&quot;&gt;Collecting Data for the Replay Buffer&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;At each timestep &lt;em&gt;t&lt;/em&gt;, the MCTS algorithm is executed to choose the next action (which will be executed in the real environment).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The resulting next observation &lt;em&gt;o&lt;sub&gt;t+1&lt;/sub&gt;&lt;/em&gt; and reward &lt;em&gt;r&lt;sub&gt;t+1&lt;/sub&gt;&lt;/em&gt; are stored and the trajectory is written to the replay buffer (at the end of the episode).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;objective&quot;&gt;Objective&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;For every hypothetical step &lt;em&gt;k&lt;/em&gt;, match the predicted policy, value, and reward to the actual target values.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The target policy is generated by the MCTS algorithm.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The target value function and reward are generated by actually playing the game (or the MDP).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;relation-to-alphazero&quot;&gt;Relation to AlphaZero&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;MuZero leverages the search-based policy iteration from AlphaZero.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It extends AlphaZero to setups with a single agent (where self-play is not possible) and setups with a non-zero reward at the intermediate time steps.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The encoder and the predictions functions are similar to ones used by AlphZero.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;K&lt;/em&gt; is set to 5.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Environments: 57 games in Atari along with Chess, Go and Shogi&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;MuZero achieves the same level of performance as AlphaZero for Chess and Shogi. In Go, MuZero slightly outperforms AlphaZero despite doing fewer computations per node in the search tree.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In Atari, MuZero achieves a new state-of-the-art compared to both model-based and model-free approaches.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper considers a variant called MuZero Reanalyze that reanalyzes old trajectories by re-running the MCTS algorithm with the updated network parameter. The motivation is to have a better sample complexity.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;MuZero performs well even when using a single simulation of MCTS (during inference).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;During training, using more simulations of MCTS helps to achieve better performance through even just 6 simulations per move is sufficient to learn a good model for Ms. Pacman.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Contrastive Learning of Structured World Models</title>
+   <link href="/site/2019/11/28/Contrastive-Learning-of-Structured-World-Models.html"/>
+   <updated>2019-11-28T00:00:00-05:00</updated>
+   <id>/site/2019/11/28/Contrastive Learning of Structured World Models</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper introduces Contrastively-trained Structured World Models (C-SWMs).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;These models use a contrastive approach for learning representations in environments with compositional structure.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1911.12247&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://github.com/tkipf/c-swm&quot;&gt;Link to the code&lt;/a&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The training data is in the form of an experience buffer \(B = \{(s_t, a_t, s_{t+1})\}_{t=1}^T\) of state transition tuples.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The goal is to learn:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;an encoder \(E\) that maps the observed states $s_t$ (pixel state observations) to latent state $z_t$.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;a transition model \(T\) that predicts the dynamics in the hidden state.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model defines the enegry of a tuple \((s_t, a_t, s_{t+1})\) as \(H = d(z_t + T(z_t, a_t), z_{t+1})\).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model has an inductive bias for modeling the effect of action as translation in the abstract state space.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;An extra hinge-loss term is added: \(max(0, \gamma - d(z^{~}_{t}, z_{t+1}))\) where \(z^{~}_{t} = E(s^{~}_{t})\) is a corrputed latent state corresponding to a randomly sampled state \(s^{~}_{t}\).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;object-oriented-state-factorization&quot;&gt;Object-Oriented State Factorization&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The goal is to learn object-oriented representations where each state embedding is structured as a set of objects.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Assuming the number of object slots to be \(K\), the latent space, and the action space can be factored into \(K\) independent latent spaces (\(Z_1 \times ... \times Z_K\)) and action spaces (\(A_1 \times ... \times A_k\)) respectively.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;There are &lt;em&gt;K&lt;/em&gt; CNN-based object extractors and an MLP-based object encoder.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The actions are represented as one-hot vectors.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A fully connected graph is induced over &lt;em&gt;K&lt;/em&gt; objects (representations) and the transition function is modeled as a Graph Neural Network (GNN) over this graph.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The transition function produces the change in the latent state representation of each object.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The factorization can be taken into account in the loss function by summing over the loss corresponding to each object.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;environments&quot;&gt;Environments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Grid World Environments - 2D shapes, 3D blocks&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Atari games - Pong and Space Invaders&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;3-body physics simulation&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Random policy is used to collect the training data.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Evaluation is performed in the latent space (no reconstruction in the pixel space) using ranking metrics. The observations (to compare against) are randomly sampled from the buffer.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Baselines - auto-encoder based World Models and &lt;a href=&quot;https://arxiv.org/abs/1905.11169&quot;&gt;Physics as Inverse Graphics model&lt;/a&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the grid-world environments, C-SWM models the latent dynamics almost perfectly.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Removing either the state factorization or the GNN transition model hurts the performance.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;C-SWM performs well on Atari as well but the results tend to have high variance.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The optimal values of $K$ should be obtained by hyperparameter tuning.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For the 3-body physics tasks, both the baselines and proposed models work quite well.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Interestingly, the paper has a section on limitations:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;The object extractor module can not disambiguate between multiple instances of the same object (in a scene).&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The current formulation of C-SWM can only be used with deterministic environments.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Gossip based Actor-Learner Architectures for Deep RL</title>
+   <link href="/site/2019/09/12/Gossip-based-Actor-Learner-Architectures-for-Deep-RL.html"/>
+   <updated>2019-09-12T00:00:00-04:00</updated>
+   <id>/site/2019/09/12/Gossip based Actor-Learner Architectures for Deep RL</id>
+   <content type="html">&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1906.04585&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper considers the task of training an RL system by sampling data from multiple simulators (over parallel devices).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The setup is that of distributed RL setting with &lt;em&gt;n&lt;/em&gt; agents or actor-learners (composed of a single learner and several actors). These agents are trying to maximize a common value function.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One (existing) approach is to perform on-policy updates with a shared policy. The policy could be updated in synchronous (does not scale well) or asynchronous manner (can be unstable due to stale gradients).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Off policy approaches allow for better computational efficiency but can be unstable during training.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposed Gossip based Actor-Learner Architecture (GALA) which uses asynchronous communication (gossip) between the &lt;em&gt;n&lt;/em&gt; agents to improve the training of Deep RL models.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;These agents are expected to converge to the same policy.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;During training, the different agents are not required to share the same policy and it is sufficient that the agent’s policies remain $\epsilon$-close to each other. This relaxation allows the policies to be trained asynchronously.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;GALA approach is combined with A2C agents resulting in GALA-A2C agents. They have better computational efficiency and scalability (as compared to A2C) and similar in performance to A3C and Impala.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Training alternates between one local policy-gradient (and TD update) and asynchronous gossip between agents.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;During the gossip step, the agents send their parameters to some of the other agents (referred to as the peers) and update their parameters based on the parameters received from the other agents (for which the given agent is a peer).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;GALA agents are implemented using non-blocking communication so that they can operate asynchronously.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper includes the proof that the policies learned by the different agents are within $\epsilon$ distance of each other (ie all the policies lie within an $\epsilon$-distance ball) thus ensuring that the policies do not diverge much from each other.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Six games from the Ataru 2600 games suite are used for the experiments.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Baselines: A2C, A3C, Impala&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;GALA agents are configured in a directed ring graph topology.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;With A2C, as the number of simulators increases, the number of convergent runs (runs with a threshold reward) decreases.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Using gossip algorithms increases or maintains the number of convergent runs. It also improves the performance, sample efficiency and compute efficiency of A2C across all the six games.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;When compared to Impala and A3C, GALA-A2C generally outperforms (or performs as well as) those baselines.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given that the learned policies remain within an $\epsilon$ ball, the agent’s gradients are less correlated as compared to the A2C agents.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>How to train your MAML</title>
+   <link href="/site/2019/09/05/How-to-train-your-MAML.html"/>
+   <updated>2019-09-05T00:00:00-04:00</updated>
+   <id>/site/2019/09/05/How to train your MAML</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes MAML++ - a modification of MAML algorithm that stabilizes its training improves generalization performance and reduces the computational overhead.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1810.09502&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;notes&quot;&gt;Notes&lt;/h2&gt;
+
+&lt;h3 id=&quot;unstable-training&quot;&gt;Unstable Training&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Training the outer loop requires unfolding the inner loop multiple times.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In absence of skip connections, the gradient is multiplied by the same parameter multiple times.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Large depth and absent skip connections could lead to exploding and vanishing gradients respectively.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes to stabilize the gradient propagation by minimizing the target set loss computed by the base-network after every step towards a support set task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It is important to anneal the contribution of earlier steps and increase the contribution of later steps over time.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;second-order-derivatives-are-expensive-to-compute&quot;&gt;Second Order derivatives are expensive to compute&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;While the first-order MAML is faster, the resulting model may not have as good a generalization error as the second-order MAML.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes to use derivative order annealing where the first order gradients are used for the first 50 epochs and the network uses second-order gradients from thereon.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This derivative order annealing appears to be more stable than models that use second-order derivatives only.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;batch-normalization&quot;&gt;Batch Normalization&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;In MAML, the statistics of the current batch are used for normalization instead of accumulating the running statistics.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes to collect the statistics per step which can increase the convergence speed, stability, and generalization performance.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In MAML, the batch normalization biases are not updated in the inner-loop which can adversely impact the performance.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes to learn a set of biases (per step) within the inner loop update.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;fixed-learning-rate&quot;&gt;Fixed Learning Rate&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;MAML uses a single learning rate across all the steps and all the parameters. This means there is one single learning rate that needs to be hyperparameter to work well for all the layers and steps.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;An alternate solution would be to learn a separate learning rate per parameter but this can be impractical as it doubles the number of parameters to be learned.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes to learn a learning rate and direction for each layer in the network, for each step it takes in the inner loop.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper also proposed to anneal the learning rate of the outer loop (using cosine annealing) as it helps to achieve better generalization.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Using these modifications helps to outperform the MAML model on both Omniglot and MiniImagenet datasets.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The biggest benefit comes by learning the per-layer, per-step learning rates and by using the per-step batch normalization.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>PHYRE - A New Benchmark for Physical Reasoning</title>
+   <link href="/site/2019/08/29/PHYRE-A-New-Benchmark-for-Physical-Reasoning.html"/>
+   <updated>2019-08-29T00:00:00-04:00</updated>
+   <id>/site/2019/08/29/PHYRE - A New Benchmark for Physical Reasoning</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes the PHYRE (PHYsical REasoning) benchmark - consisting of classic mechanical puzzles in 2D physical environments - as a means to evaluate the physical reasoning ability of machine learning models.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1908.05656&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;environment&quot;&gt;Environment&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;2D world that obeys Newtonian mechanics.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Gravitational force + Friction.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Non-deformable objects that can be static (ie fixed) or dynamic (ie can move and are affected by collisions etc).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;task&quot;&gt;Task&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The learning agent starts in some initial world state (ie configuration of objects).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Goal is described in the form of (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;subject&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;relation&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;object&lt;/code&gt;) where the agent’s task is to satisfy the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;relation&lt;/code&gt; between the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;subject&lt;/code&gt; and the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;object&lt;/code&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Currently, only the “touch” &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;relation&lt;/code&gt; is supported.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The learning agent has to take a single action - placing one or more new dynamic objects in the world.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A simulator is run on the new configuration (for a fixed amount of time) to check if the goal condition is satisfied.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;At the end of the simulation, a binary reward and intermediate observations (collected as the simulator executes) are provided to the learning agent.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;These observations are 256*256 grids where each grid cell can take 1 of the 7 values (denoting different types of objects).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Since only one relation supported currently, the color is sufficient to encode the goal.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;benchmark-tiers&quot;&gt;Benchmark Tiers&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Two benchmark tiers are provided where each tier comprises of a combination of:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;a predefined set of all the actions that the agent is allowed to perform.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;set of tasks that can be solved by at least one action from the allowed action set.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;PHYRE-B&lt;/strong&gt; - The agent is allowed to place a single (ball of any radii) at any valid location.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;PHYRE-2B&lt;/strong&gt; - The agent is allowed to place 2 balls at any valid pair of locations.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each of the two tiers has 25 task templates where each template comprises of variants of a single task (same goal but different initial conditions).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;evaluation&quot;&gt;Evaluation&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Two evaluation setups are considered:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;strong&gt;within-template&lt;/strong&gt; where the agent is trained on some tasks in a template and evaluated on a set of held-out tasks from the same template.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;strong&gt;cross-template&lt;/strong&gt; where the agent is evaluated on tasks from a different template.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the training phase, the model has access to the simulator (but not to the correct solution). So the model could learn an action-prediction model or forward dynamics model or both.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the testing phase, the model can query the simulator only a few times. Each query provides it with the binary reward and the intermediate observations.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;performance-measure&quot;&gt;Performance Measure&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The emphasis is on solving more tasks (in few queries) during the test phase.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This requirement is captured using a metric called AUCCESS.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In general, the tasks in PHYRE-2B are harder than tasks in PHYRE-B.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;baseline-agents&quot;&gt;Baseline Agents&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Random Agent - Randomly samples actions&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Non-parametric agent (MEM) - generates R actions at random and uses the simulator to check how many tasks can be solved using these R random actions. During testing, try the R actions in the decreasing order of the number of tasks they solve.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Non-parametric agent with online learning (MEM-O) - Variant of MEM where an online adaptation step is performed during test time (to update the rank of the actions).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Deep Q Networks with an action encoder, observation encoder and fusion model (combine action and observation representation).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;DQN with online learning (DQN-0): Variant of DQN with online updates (during the test phase).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Contextual bandits.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Policy learning approaches like PPO and A2C.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;observations&quot;&gt;Observations&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Both Contextual bandits and policy-based approaches show poor training stability.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The best agent, DQN-O, reaches AUCCESS of 56.2\% on PHYRE-B and 39.26\% on PHYRE-2B. In general, agents with online adaptation perform better.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The tasks are designed such that 100000 attempts are sufficient to solve 100\% of tasks in PHYRE-B and 95\% of tasks in PHYRE-2B.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Even though only two tiers are provided right now, the benchmark is readily extensible and new tasks can be added in the future.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Large Memory Layers with Product Keys</title>
+   <link href="/site/2019/08/22/Large-Memory-Layers-with-Product-Keys.html"/>
+   <updated>2019-08-22T00:00:00-04:00</updated>
+   <id>/site/2019/08/22/Large Memory Layers with Product Keys</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The paper proposes a structured key-value memory layer that:
+    &lt;ul&gt;
+      &lt;li&gt;Can scale to a very large size (and capacity).&lt;/li&gt;
+      &lt;li&gt;Has very low computational overhead.&lt;/li&gt;
+      &lt;li&gt;Supports exact search in the keyspace.&lt;/li&gt;
+      &lt;li&gt;Can be easily integrated with neural networks.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1907.05242&quot;&gt;Link to the paper&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;architecture&quot;&gt;Architecture&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The memory layer is composed of 3 components:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;strong&gt;Query Network&lt;/strong&gt;&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;Maps input to a latent space.&lt;/li&gt;
+          &lt;li&gt;Can be implemented as a feed-forward network.&lt;/li&gt;
+          &lt;li&gt;Adding batch-norm on top of the query network helps to spread out keys.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;strong&gt;Key selection module&lt;/strong&gt;&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;Lets say there are a total of &lt;em&gt;K&lt;/em&gt; keys of dimensionality &lt;em&gt;d&lt;sub&gt;q&lt;/sub&gt;&lt;/em&gt; of which we want to select top &lt;em&gt;k&lt;/em&gt; keys.&lt;/li&gt;
+          &lt;li&gt;Partition the set of keys into two sets of &lt;em&gt;subkeys&lt;/em&gt; (say &lt;em&gt;Q&lt;sub&gt;1&lt;/sub&gt;&lt;/em&gt; and &lt;em&gt;Q&lt;sub&gt;2&lt;/sub&gt;&lt;/em&gt;) where each subset has &lt;em&gt;K&lt;/em&gt; keys of dimensionality &lt;em&gt;d_q/2&lt;/em&gt;.&lt;/li&gt;
+          &lt;li&gt;The query is split into two subqueries (say &lt;em&gt;q&lt;sub&gt;1&lt;/sub&gt;&lt;/em&gt; and &lt;em&gt;q&lt;sub&gt;2&lt;/sub&gt;&lt;/em&gt;).&lt;/li&gt;
+          &lt;li&gt;Each of these two queries are compared with every query in their corresponding set of &lt;em&gt;subkeys&lt;/em&gt;.&lt;/li&gt;
+          &lt;li&gt;For example, &lt;em&gt;q&lt;sub&gt;1&lt;/sub&gt;&lt;/em&gt; is compared with every query is &lt;em&gt;Q&lt;sub&gt;1&lt;/sub&gt;&lt;/em&gt;.&lt;/li&gt;
+          &lt;li&gt;Top &lt;em&gt;k&lt;/em&gt; ranked keys are selected from each set to create two new sets &lt;em&gt;C&lt;sub&gt;1&lt;/sub&gt;&lt;/em&gt; and &lt;em&gt;C2&lt;sub&gt;2&lt;/sub&gt;&lt;/em&gt;.&lt;/li&gt;
+          &lt;li&gt;The keys from these two sets are combined uder the concatenation operator to obtain &lt;em&gt;k&lt;sub&gt;2&lt;/sub&gt;&lt;/em&gt; vectors.&lt;/li&gt;
+          &lt;li&gt;the final top &lt;em&gt;k&lt;/em&gt; (concatenated) keys are searched from these *k&lt;sup&gt;2* keys.&lt;/sup&gt;&lt;/li&gt;
+          &lt;li&gt;The overall complexity is $O((\sqrt K + k^2) \times d_q)$ where &lt;em&gt;K&lt;/em&gt; is the total number of keys (whiuc)&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;strong&gt;Value lookup table&lt;/strong&gt;&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;The values (corresponding to selected subkeys) are aggregated (using weighted sum operation) to obtain the output.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;All the parameters are trainable, though, in practice, only the selected &lt;em&gt;k&lt;/em&gt; memory slots are updated.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Using Multihead attention mechanism helps to improve the performance further.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;1 or more feedforward layers in transformers are placed by the memory layers.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model is evaluated on large scale language modeling tasks with 140 Gb of data from common crawl corpora (28n billion words).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Evaluation metrics&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Perplexity on the test set.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Fraction of accessed values.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;KL divergence between the (normalized) weights of key access and uniform distribution.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The last two metrics are used together to determine how well the keys are utilized.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given the large size of the training dataset, adding more layers to the transformer model helps.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Effect of using memory layer is more powerful than the effect of adding new layers to the transformer. For example, a 12 layer transformer + memory layer outperforms a 24 layer transformer while being almost twice as fast.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The best position to place the memory is at an intermediate layer and placing the memory layer right after the input or just before the softmax layer does not work well in practice.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Abductive Commonsense Reasoning</title>
+   <link href="/site/2019/08/15/Abductive-Commonsense-Reasoning.html"/>
+   <updated>2019-08-15T00:00:00-04:00</updated>
+   <id>/site/2019/08/15/Abductive Commonsense Reasoning</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper presents the task of abductive NLP (pronounced as &lt;em&gt;alpha NLP&lt;/em&gt;) where the model needs to perform abductive reasoning.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Abductive reasoning is the inference to the most plausible explanation. Even though it is considered to be an important component for understanding narratives, the work in this domain is sparse.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A new dataset called as Abstractive Reasoning in narrative Text (ART) consisting of 20K narrative contexts and 200k explanations is also provided. The dataset models the task as multiple-choice questions to make the evaluation process easy.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1908.05739&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;task-setup&quot;&gt;Task Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given a pair of observations &lt;em&gt;O&lt;sub&gt;1&lt;/sub&gt;&lt;/em&gt; and &lt;em&gt;O&lt;sub&gt;2&lt;/sub&gt;&lt;/em&gt; and two hypothesis &lt;em&gt;h&lt;sub&gt;1&lt;/sub&gt;&lt;/em&gt; and &lt;em&gt;h&lt;sub&gt;2&lt;/sub&gt;&lt;/em&gt;, the task is to select the most plausible hypothesis.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In general, &lt;em&gt;P(h | O&lt;sub&gt;1&lt;/sub&gt;, O&lt;sub&gt;2&lt;/sub&gt;)&lt;/em&gt; is propotional to &lt;em&gt;P(h |O&lt;sub&gt;1&lt;/sub&gt;)P(O&lt;sub&gt;2&lt;/sub&gt;|h, O&lt;sub&gt;1&lt;/sub&gt;)&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Different independence assumptions can be imposed on the structure of the problem eg one assumption could be that the hypothesis is independent of the observations or the “fully connected” assumption would jointly model both the observations and the hypothesis.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;dataset&quot;&gt;Dataset&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Along with crowdsourcing several plausible hypotheses for each observation instance pair, an adversarial filtering algorithm (AF) is used to remove weak pairs of hypothesis.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Observation pairs are created using the &lt;a href=&quot;https://aclweb.org/anthology/N16-1098&quot;&gt;ROCStories dataset&lt;/a&gt; which is a collection of short, manually crafted stories of 5 sentences.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The average word length for both the content and the hypothesis is between 8 to 9.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;To collect plausible hypothesis, the crowd workers were asked to fill in a plausible “in-between” sentence in natural language.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given the plausible hypothesis, the crowd workers were asked to create an implausible hypothesis by editing fewer than 6 words.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Adversarial filtering approach from &lt;a href=&quot;https://aclweb.org/anthology/D18-1009&quot;&gt;Zellers et al.&lt;/a&gt; is used with BERT as the adversary. A temperature parameter is introduced to control the maximum number of instances that can be changed in each adversarial filtering iteration.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;key-observations&quot;&gt;Key Observations&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Human performance: 91.4%&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Baselines like SVM classifier, the bag-of-words classifier (using Glove) and max-pooling overt BiLSTM representation: approx 50%&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Entailment NLI baseline: 59%. This highlights the additional complexity of abductive NLI as compared to entailment NLI.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;BERT: 68.9%&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;GPT: 63.1%&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Numerical and spatial knowledge-based data points are particularly hard.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model is more likely to fail when the narrative created by the incorrect hypothesis is plausible&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models</title>
+   <link href="/site/2019/08/08/Deep-Reinforcement-Learning-in-a-Handful-of-Trials-using-Probabilistic-Dynamics-Models.html"/>
+   <updated>2019-08-08T00:00:00-04:00</updated>
+   <id>/site/2019/08/08/Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes a new algorithm called as Probabilistic Ensemble with Trajectory sampling (PETS) that combines uncertainty aware deep learning models (ensemble of deep learning models that encode uncertainty) with sampling-based uncertainty propagation.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;PETS improves over other probabilistic MBRL approaches by isolating epistemic uncertainty (due to limited training data) and aleatoric uncertainty (inherent in the system).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;uncertainty-aware-neural-network-dynamics-model&quot;&gt;Uncertainty-Aware Neural Network Dynamics Model&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Aleatoric uncertainty can be accounted for by learning a parameterized distribution (probabilistic neural network) trained with negative log-likelihood.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Epistemic uncertainty can be accounted for by either having an infinite amount of data or by using ensembles.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper uses a neural network to predict the mean and standard deviation of a gaussian distribution which defines the predictive model. This setup is referred to as the “probabilistic” model and denoted by &lt;strong&gt;P&lt;/strong&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The alternate setup of the deterministic model is where a neural network is used to make a point prediction (and is denoted by &lt;strong&gt;D&lt;/strong&gt;).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Ensemble of probabilistic models is denoted as &lt;strong&gt;PE&lt;/strong&gt; while that of deterministic models is denoted as &lt;strong&gt;DE&lt;/strong&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;planning-and-control-with-learned-dynamics&quot;&gt;Planning and Control with learned Dynamics&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Model Predictive Control (MPC) is used for planning.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given a start state and an action sequence, the probabilistic dynamics model induces a distribution over the trajectories.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The first action, among the sequence of optimized actions, is executed.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Instead of random shooting, &lt;a href=&quot;https://www.sciencedirect.com/science/article/pii/B9780444538598000035&quot;&gt;Cross Entropy Method (CEM)&lt;/a&gt; is used.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;trajectory-sampling&quot;&gt;Trajectory Sampling&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Let us say there are B bootstrap models in the ensemble. Given the current state, P particles are created and each particle is propagated using one of the bootstrap models. Two variants are considered:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;TS1 - At each timestep, each particle samples a bootstrap. In this case, particle separation can not be attributed to ti the compounding effects of the bootstraps.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;TS$\infty$ - The bootstrapped model (per particle) is sampled just once and is not changed after that. This setup separates aleatoric and epistemic uncertainty. Aleatoric state variance is the average variance of the particles of the same bootstrap while epistemic state variance is the variance of the average of particles of same bootstrap indexes.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;result&quot;&gt;Result&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The proposed approach reaches the asymptotic performance of state-of-the-art model-free algorithms in much fewer samples.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The general performance trend is probabilistic emsemble &amp;gt; probabilisitc model &amp;gt; deterministc ensemble &amp;gt; determinisitc model./.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Initial experiments for learning policy by propagating gradients through the ensemble of models did not work and has been left as future work.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Assessing Generalization in Deep Reinforcement Learning</title>
+   <link href="/site/2019/08/01/Assessing-Generalization-in-Deep-Reinforcement-Learning.html"/>
+   <updated>2019-08-01T00:00:00-04:00</updated>
+   <id>/site/2019/08/01/Assessing Generalization in Deep Reinforcement Learning</id>
+   <content type="html">&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper presents a benchmark and experimental protocol (environments, metrics, baselines, training/testing setup) to evaluate RL algorithms for generalization.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Several RL algorithms are evaluated and the key takeaway is that the “vanilla” RL algorithms can generalize better than the RL algorithms that are specifically designed to generalize, given enough diversity in the distribution of the training environments.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1810.12282&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The focus is on evaluating generalization to environmental changes that affect the system dynamics (and not the goal or rewards).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Two generalization regimes are considered:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Interpolation - parameters of the test environment are similar to the parameters of the training environment.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Extrapolation - parameters of the test environment are different from the parameters of the training environment.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Following algorithms are considered as part of the benchmark:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;“Vanilla” RL algorithms - A2C, PPO&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;RL algorithms that are designed to generalize:&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;EPOpt - Learn a (robust) policy that maximizes the expected reward over the most difficult distribution of environments (ones with the worst expected reward).&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;RL&lt;sup&gt;2&lt;/sup&gt; - Learn an (adaptive) policy that can adapt to the current environment/task by considering the trajectory and not just the state transition sequence.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;These specially designed RL algorithms can be optimized using either A2C or PPO leading to combinations like EPOpt-A2C or EPOpt-PPO etc.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The models are either composed of feedforward networks completely or feedforward + recurrent networks.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Environments&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;CartPole, MountainCar, Acrobot, and Pendulum from OpenAI Gym.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;HalfCheetah and Hopper from OpenAI Roboschool.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Three versions of each environment are considered:&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Deterministic: Environment parameters are fixed. This case corresponds to the standard environment setup in classical RL.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Random: Environment parameters are sampled randomly. This case corresponds to sampling from a distribution of environments.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Extreme: Environment parameters are sampled from their extreme values. This case corresponds to the edge-case environments which would not be encountered during training generally.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Performance Metrics&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Average total reward per episode.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Success percentage: Percentage of episodes where a certain goal (or reward) is obtained.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Evaluation Metrics/Setups&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Default: success percentage when training and evaluating the deterministic version of the environment.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Interpolation: success percentage when training and evaluating on the random version of the environment.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Extrapolation: the geometric mean of the success percentage of following three versions:&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Train on deterministic and evaluate on the random version.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Train on deterministic and evaluate on extreme version.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Train on random and evaluate on the extreme version.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Observations&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Extrapolation is harder than interpolation.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Increasing the diversity in the training environments improves the interpolation generalization of vanilla RL methods.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;EPOpt improves generalization only for continuous control environments and only with PPO.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;RL&lt;sup&gt;2&lt;/sup&gt; is difficult to train on the environments considered and did not provide a clear advantage in terms of generalization.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;EPOpt-PPO outperforms PPO on only 3 environments and EPOpt-A2C does not&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Quantifying Generalization in Reinforcement Learning</title>
+   <link href="/site/2019/07/25/Quantifying-Generalization-in-Reinforcement-Learning.html"/>
+   <updated>2019-07-25T00:00:00-04:00</updated>
+   <id>/site/2019/07/25/Quantifying Generalization in Reinforcement Learning</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper introduces a new, procedurally generated environment called as CoinRun that is designed to benchmark the generalization capabilities of RL algorithms.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper reports that deep convolutional architectures and techniques like L2 regularization, batch norm, etc (which were proposed in the context of generalization in supervised learning) are also useful for RL.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1812.02341&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;coinrun-environment&quot;&gt;CoinRun Environment&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;CoinRun is made of multiple levels.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In each level, the agent spawns on the far left side and needs to collect a single coin that lies on the far right side.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;There are many obstacles in between and colliding with an obstacle leads to agent’s death.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each episode extends for a maximum for 1000 steps.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;CoinRun is designed such that given sufficient training time and levels, a near-optimal policy can be learned for all the levels.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;generalization&quot;&gt;Generalization&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Generalization can be measure by training an agent on a given set of training tasks and evaluating on an unseen set of test tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;9 agents are trained to play CoinRun, on different training sets (each with a different number of levels).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The first 8 agents are trained on sets of size 100 to 16000 levels while the last agent is trained on an unbounded set of levels.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Training a model on an unbounded set of levels provides a good proxy for the train-to-test generalization performance.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;evaluating-architectures&quot;&gt;Evaluating Architectures&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Two convolutional architectures (of different sizes) are compared:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Nature-CNN: The CNN architecture used in the &lt;a href=&quot;https://web.stanford.edu/class/psych209/Readings/MnihEtAlHassibis15NatureControlDeepRL.pdf&quot;&gt;Deep Q Network&lt;/a&gt;. This is the smaller network among the two models.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;IMPALA-CNN: The CNN architecture used in the &lt;a href=&quot;https://arxiv.org/abs/1802.01561&quot;&gt;Imapla architecture&lt;/a&gt;.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;IMPALA-CNN agent always outperforms the Nature-CNN agent indicating that larger architecture has more capacity for generalization. But increasing the network size beyond a limit gives diminishing returns.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;evaluating-regularization&quot;&gt;Evaluating Regularization&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;While both L2 regularization and Dropout helps to improve generalization, L2 regularization is more impactful.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A domain randomization/data augmentation approach is tested where rectangular regions of different sizes are masked and assigned a random color. This approach seems to improve performance.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Batch Normalization helps to improve performance as well.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Environment stochasticity is introduced by using sticky actions while policy stochasticity is introduced by controlling the entropy bonus. Both these forms of stochasticity boost performance.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;While combining different regularization methods help, the gains are only marginally better than using just 1 regularization approach. This suggests that these different approaches induce similar generalization properties.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;additional-environments&quot;&gt;Additional Environments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Two additional environments are also considered to verify the high degree of overfitting observed in the CoinRun environment:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;CoinRun-Platforms:&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Unlike CoinRun, each episode can have multiple coins and the time limit is 0increased to 1000 steps.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Levels are larger as well so the agent might need to backtrack their steps.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;RandomMazes:&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Partially observed environment with square mazes of dimensions 3x3 to 25x25.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Timelimit of 500 steps&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Overfitting is observed for both these environments as well.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Set Transformer - A Framework for Attention-based Permutation-Invariant Neural Networks</title>
+   <link href="/site/2019/07/18/Set-Transformer-A-Framework-for-Attention-based-Permutation-Invariant-Neural-Networks.html"/>
+   <updated>2019-07-18T00:00:00-04:00</updated>
+   <id>/site/2019/07/18/Set Transformer A Framework for Attention-based Permutation-Invariant Neural Networks</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Consider problems where the input to the model is a set. In such problems (referred to as the set-input problems), the model should be invariant to the permutation of the data points.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In “set pooling” methods (&lt;a href=&quot;https://arxiv.org/abs/1606.02185&quot;&gt;1&lt;/a&gt;, &lt;a href=&quot;https://arxiv.org/abs/1703.06114&quot;&gt;2&lt;/a&gt;), each data point (in the input set) is encoded using a feed-forward network and the resulting set of encoded representations are pooled using the “sum” operator.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This approach can be shown to be bot permutation-invariant and a universal function approximator.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes an attention-based network module, called as the Set Transformer, which can model the interactions between the elements of an input set while being permutation invariant.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1810.00825&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;transformer&quot;&gt;Transformer&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;An attention function &lt;em&gt;Attn(Q, K, V) = (QK&lt;sup&gt;T&lt;/sup&gt;)V&lt;/em&gt; is used to map queries &lt;em&gt;Q&lt;/em&gt; to output using key-value pairs &lt;em&gt;K, V&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In case of multi-head attention, the key, query, and value are projected into &lt;em&gt;h&lt;/em&gt; different vectors and attention is applied on all these vectors. The output is a linear transformation of the concatenation of all the vectors.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;set-transformer&quot;&gt;Set Transformer&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;3 modules are introduced: MAB, SAB and ISAB.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Multihead Attention Block (MAB) is a module very similar to to the encoder in the Transformer, without the positional encoding and dropout.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Set Attention Block (SAB) is a module that takes as input a set and performs self-attention between the elements of the set to produce another set of the same size ie &lt;em&gt;SAB(X) = MAB(X, X)&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The time complexity of the SAB operation is &lt;em&gt;O(n&lt;sup&gt;2&lt;/sup&gt;)&lt;/em&gt; where &lt;em&gt;n&lt;/em&gt; is the number of elements in the set. It can be reduced to &lt;em&gt;O(m*n)&lt;/em&gt; by using Induced Set Attention Blocks (ISAB) with &lt;em&gt;m&lt;/em&gt; induced point vectors (denoted as I).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;ISAB&lt;sub&gt;m&lt;/sub&gt; = MAB(X, MAB(I, X))&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;ISAB can be seen as performing a low-rank projection of inputs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;These modules can be used to model the interactions between data points in any given set.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;pooling-by-multihead-attention-pma&quot;&gt;Pooling by Multihead Attention (PMA)&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Aggregation is performed by applying multi-head attention on a set of &lt;em&gt;k&lt;/em&gt; seed vectors.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The interaction between the &lt;em&gt;k&lt;/em&gt; outputs (from PMA) can be modeled by applying another SAB.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Thus the entire network is a stack of SABs and ISABs. Both the modules are permutation invariant and so is any network obtained by stacking them.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Datasets include:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;Predicting the maximum value from a set.&lt;/li&gt;
+      &lt;li&gt;Counting unique (Omniglot) characters from an image.&lt;/li&gt;
+      &lt;li&gt;Clustering with a mixture of Gaussians (synthetic points and CIFAR 100).&lt;/li&gt;
+      &lt;li&gt;Set Anomaly detection (celebA).&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Generally, increasing &lt;em&gt;m&lt;/em&gt; (the number of inducing datapoints) improve performance, to some extent. This is somewhat expected.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper considers various ablations of the proposed approach (like disabling attention in the encoder or pooling layer) and shows that attention mechanism is needed during both the stages.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The work has two main benefits over prior work:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Reducing the &lt;em&gt;O(n&lt;sup&gt;2&lt;/sup&gt;)&lt;/em&gt; complexity to &lt;em&gt;O(m*n)&lt;/em&gt; complexity.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Using self-attention mechanism for both encodings the inputs and for aggregating the encoded representations.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Measuring abstract reasoning in neural networks</title>
+   <link href="/site/2019/06/27/Measuring-Abstract-Reasoning-in-Neural-Networks.html"/>
+   <updated>2019-06-27T00:00:00-04:00</updated>
+   <id>/site/2019/06/27/Measuring Abstract Reasoning in Neural Networks</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes a dataset to diagnose the abstract reasoning capabilities of learning systems.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper shows that a variant of the relational networks, explicitly designed for abstract reasoning, outperforms models like ResNets.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;http://proceedings.mlr.press/v80/santoro18a/santoro18a.pdf&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;idea&quot;&gt;Idea&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Visual reasoning tasks, that are inspired by the human IQ test, are used to evaluate the models in terms of generalization.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Let’s say that we want to test if the model understands the abstract notion of “increasing”. We could train the model on data that captures the notion of “increasing”, in terms of say increasing size (or quantities) of objects and then test it on a dataset where the notion is expressed in terms of increasing intensity of color.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The dataset is then used to evaluate if the models can find any solution to such abstract reasoning tasks and how well they generalize when the abstract content is specifically controlled.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;dataset&quot;&gt;Dataset&lt;/h2&gt;
+
+&lt;h3 id=&quot;ravens-progressive-matrics-rpms&quot;&gt;Raven’s Progressive Matrics (RPMs):&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Consists of an incomplete 3x3 matrix of images where the missing image needs to be filled in, typically by choosing from a set of candidate images.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;As such, it is possible to justify multiple answers to be correct though, in practice, the right answer is the one with the simplest explanation.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;procedurally-generated-matrices-pgms&quot;&gt;Procedurally Generated Matrices (PGMs)&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Generating RPM like matrices procedurally by building an abstract structure for matrices.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;table&gt;
+      &lt;tbody&gt;
+        &lt;tr&gt;
+          &lt;td&gt;The abstract structure &lt;em&gt;S&lt;/em&gt; consists of 3 components: (i) Relation types (&lt;em&gt;R&lt;/em&gt;), (ii) Object types (&lt;em&gt;O&lt;/em&gt;) and (iii) Attribute types (&lt;em&gt;A&lt;/em&gt;). ie *S = {(r, o, a)&lt;/td&gt;
+          &lt;td&gt;r in R, o in O and a in A}*.&lt;/td&gt;
+        &lt;/tr&gt;
+      &lt;/tbody&gt;
+    &lt;/table&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This can be read as: “Structure &lt;em&gt;S&lt;/em&gt; is instantiated on attribute &lt;em&gt;a&lt;/em&gt; of object &lt;em&gt;o&lt;/em&gt; and exhibits the relation &lt;em&gt;r&lt;/em&gt;”. For example, &lt;em&gt;S&lt;/em&gt; is instantiated on “color” of object “shape” and exhibits the relation “increasing”.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In general, the structure could be made of more than one such tuple and more are the tuples, harder is the task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given the structure, sample values &lt;em&gt;v&lt;/em&gt; for each attribute &lt;em&gt;a&lt;/em&gt; while conforming with the relation &lt;em&gt;r&lt;/em&gt;. For example, if the attribute is “color” and the relation is “increasing”, the intensity of color must increase.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;The resulting structure is rendered as pixels.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;test-for-generalization&quot;&gt;Test for Generalization&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper tests for the following generalization scenarios:&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Neutral: The structure of the training and test data can contain any tuple.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Interpolation: The training data contains even-indexed members of the attribute values while the test data contains odd-indexed members of the attribute values.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Extrapolation: The training data contains first-half of the attribute values while the test data contains the second-half of the attribute values.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Heldout attribute: Training data contains no tuples with (o = shape, a = color) or (o = line, a = type).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Heldout triples: Out of 29 possible triples, 7 are held out from training and only used during testing.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Heldout pair-of-triples: Out of 400 possible sets of pair of triples, 40 were held out and used only during testing.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Heldout pair-of-triples: Out of 400 possible sets of pair of triples, 40 were held out and used only during testing.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Heldout attribute pair: Out of 20 (unordered) variable attribute pairs, 4 were held out and used only during testing.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;models&quot;&gt;Models&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Input&lt;/strong&gt;: 8 context panels (from the 3x3) matrix where the last panel needs to be filled.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;CNN-MLP - 4 layer CNN with batchnorm and ReLU.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;ResNet - ResNet-50 (as it perfomed better than ResNet-101 and ResNet-152).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;LSTM&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Wild Relation Network (WReN) - A CNN model encodes the 8 panels and the candidate answers and feeds them as input to a relational network.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Context-blind ResNet - ResNet network without the context (or the 8 input panels).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;WReN model outperforms the other models on the Neutral setup.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Models have a harder time differentiating between size than quantity.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;WRen is the best performing models in all the setups and rest of the discussion only applies to that model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Generalisation is easy in the context of interpolation while worst in case of extrapolation, hinting at the limited generalization capability of the models.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;auxiliary-training&quot;&gt;Auxiliary Training&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model is also trained to predict the relevant relation, object and attribute types using the meta-targets that encode this information.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The auxiliary training helps in all the cases. Further, the model’s accuracy on the main task is where in the cases where it solves the auxiliary tasks well.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;key-takeaway&quot;&gt;Key Takeaway&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;For abstract visual reasoning tasks, the choice of models can make a large difference, the case in consideration of ResNets vs Relational Networks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Using auxiliary loss that encourages the model to “explain” its reasoning (in this case by predicting the attributes, relations, etc) helps to improve the performance on the main task as well.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given that the challenge is motivated by tasks used to measure human IQ, it would have been interesting to get an estimate of human performance on at least a subset of this dataset.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Hamiltonian Neural Networks</title>
+   <link href="/site/2019/06/20/Hamiltonian-Neural-Networks.html"/>
+   <updated>2019-06-20T00:00:00-04:00</updated>
+   <id>/site/2019/06/20/Hamiltonian Neural Networks</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes a very cool idea at the intersection of deep learning and physics.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The idea is to train a neural network architecture that builds on the concept of Hamiltonian Mechanics (from Physics) to learn physical conservation laws in an unsupervised manner.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1906.01563&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://github.com/greydanus/hamiltonian-nn&quot;&gt;Link to the code&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://greydanus.github.io/2019/05/15/hamiltonian-nns/&quot;&gt;Link to author’s blog&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;hamiltonian-mechanics&quot;&gt;Hamiltonian Mechanics&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;It is a branch of physics that can describe systems which follow some conservation laws and invariants.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Consider a set of &lt;em&gt;N&lt;/em&gt; pair of coordinates [(q&lt;sub&gt;1&lt;/sub&gt;, p&lt;sub&gt;1&lt;/sub&gt;), …, (q&lt;sub&gt;N&lt;/sub&gt;, p&lt;sub&gt;N&lt;/sub&gt;)] where &lt;strong&gt;q&lt;/strong&gt; = [q&lt;sub&gt;1&lt;/sub&gt;, …, q&lt;sub&gt;N&lt;/sub&gt;] dnotes the position of the set of objects while &lt;strong&gt;p&lt;/strong&gt; = [p&lt;sub&gt;1&lt;/sub&gt;, …, p&lt;sub&gt;N&lt;/sub&gt;] denotes the momentum of the set of variables.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Together these &lt;em&gt;N&lt;/em&gt; pairs completely describe the system.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A scalar function &lt;em&gt;H(&lt;strong&gt;q&lt;/strong&gt;, &lt;strong&gt;p&lt;/strong&gt;)&lt;/em&gt;, called as the Hamiltonian is defined such that the partial derivative of &lt;em&gt;H&lt;/em&gt; with respect to &lt;strong&gt;p&lt;/strong&gt; is equal to derivative of &lt;strong&gt;q&lt;/strong&gt; with respect to time &lt;em&gt;t&lt;/em&gt; and the negative of partial derivative of &lt;em&gt;H&lt;/em&gt; with respect to &lt;strong&gt;q&lt;/strong&gt; is equal to derivative of &lt;strong&gt;p&lt;/strong&gt; with respect to time &lt;em&gt;t&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This can be expressed in the form of the equation as follows:&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;img src=&quot;https://raw.githubusercontent.com/shagunsodhani/papers-I-read/master/assets/HNN/equation1.png&quot; alt=&quot;equation1&quot; width=&quot;100&quot; height=&quot;100&quot; /&gt;&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The Hamiltonian can be tied to the total energy of the system and can be used in any system where the total energy is conserved.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;hamiltonian-neural-network-hnn&quot;&gt;Hamiltonian Neural Network (HNN)&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The Hamiltonian &lt;em&gt;H&lt;/em&gt; can be parameterized using a neural network and can learn conserved quantities from the data in an unsupervised manner.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The loss function looks as follows:&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;img src=&quot;https://raw.githubusercontent.com/shagunsodhani/papers-I-read/master/assets/HNN/equation2.png&quot; alt=&quot;equation2&quot; width=&quot;400&quot; height=&quot;50&quot; /&gt;&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The partial derivatives can be obtained by computing the &lt;em&gt;in-graph&lt;/em&gt; gradient of the output variables with respect to the input variables.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;observations&quot;&gt;Observations&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;For setups where the energy must be conserved exactly, (eg ideal mass-spring and ideal pendulum), the HNN learn to preserve an energy-like scalar.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For setups where the energy need not be conserved exactly, the HNNs still learn to preserve the energy thus highlighting a limitation of HNNs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In case of two body problems, the HNN model is shown to be much more robust when making predictions over longer time horizons as compared to the baselines.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the final experiment, the model is trained on pixel observations and not state observations. In this case, two auxiliary losses are added: auto-encoder reconstruction loss and a loss on the latent space representations. Similar to the previous experiments, the HNN model makes robust predictions over much longer time horizons.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations</title>
+   <link href="/site/2019/06/13/Extrapolating-Beyond-Suboptimal-Demonstrations-via-Inverse-Reinforcement-Learning-from-Observations.html"/>
+   <updated>2019-06-13T00:00:00-04:00</updated>
+   <id>/site/2019/06/13/Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes a new inverse RL (IRL) algorithm, called as Trajectory-ranked Reward EXtrapolation (T-REX) that learns a reward function from a collection of ranked trajectories.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Standard IRL approaches aim to learn a reward function that “justifies” the demonstration policy and hence those approaches cannot outperform the demonstration policy.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In contrast, T-REX aims to learn a reward function that “explains” the ranking over demonstrations and can learn a policy that outperforms the demonstration policy.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1904.06387&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The input is a sequence of trajectories &lt;em&gt;T&lt;sub&gt;1&lt;/sub&gt;, … T&lt;sub&gt;m&lt;/sub&gt;&lt;/em&gt; which are ranked in the order of preference. That is, given any pair of trajectories, we know which of the two trajectories is better.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The setup is to learn from observations where the learning agent does not have access to the true reward function or the action taken by the demonstration policy.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Reward Inference&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;A parameterized reward function &lt;em&gt;r&lt;sub&gt;θ&lt;/sub&gt;&lt;/em&gt; is trained with the ranking information using a binary classification loss function which aims to predict which of the two given trajectory would be ranked higher.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Given a trajectory, the reward function predicts the reward for each state. The sum of rewards (corresponding to the two trajectories) is used used to predict the preferred trajectory.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;T-REX uses partial trajectories instead of full trajectories as a data augmentation strategy.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Policy Optimization&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;Once a reward function has been learned, standard RL approaches can be used to train a new policy.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Environments: Mujoco (Half Cheetah, Ant, Hooper), Atari&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Demonstrations generated using PPO (checkpointed at different stages of training).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Ensemble of networks used to learn the reward functions.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The proposed approach outperforms the baselines &lt;a href=&quot;https://arxiv.org/abs/1805.01954&quot;&gt;Behaviour Cloning from Observations&lt;/a&gt; and &lt;a href=&quot;https://arxiv.org/abs/1606.03476&quot;&gt;Generative Adversarial Imitation Learning&lt;/a&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In terms of reward extrapolation, T-REX can predict the reward for trajectories which are better than the demonstration trajectories.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Some ablation studies considered the effect of adding noise (random swapping the preference between trajectories) and found that the model is somewhat robust to noise up to an extent.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Meta-Reinforcement Learning of Structured Exploration Strategies</title>
+   <link href="/site/2019/06/08/Meta-Reinforcement-Learning-of-Structured-Exploration-Strategies.html"/>
+   <updated>2019-06-08T00:00:00-04:00</updated>
+   <id>/site/2019/06/08/Meta-Reinforcement Learning of Structured Exploration Strategies</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper looks at the problem of learning structured exploration policies for training RL agents.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Link to the &lt;a href=&quot;https://arxiv.org/abs/1802.07245&quot;&gt;paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;structured-exploration&quot;&gt;Structured Exploration&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Consider a stochastic, parameterized policy π&lt;sub&gt;θ&lt;/sub&gt;(a|s) where θ represents the &lt;em&gt;policy-parameters&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;To encourage exploration, noise can be added to the policy at each time step t. But the noise added in such a manner does not have any notion of temporal coherence.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Another issue is that if the policy is represented by a simple distribution (say parameterized unimodal Gaussian), it can not model complex time-correlated stochastic processes.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes to condition the policy on per-episode random variables (z) which are sampled from a learned latent distribution.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Consider a distibution over the tasks p(T). At the start of any episode of the i&lt;sup&gt;th&lt;/sup&gt; task, a latent variable z&lt;sub&gt;i&lt;/sub&gt; is sampled from the distribution &lt;em&gt;N(μ&lt;sub&gt;i&lt;/sub&gt;, σ&lt;sub&gt;i&lt;/sub&gt;)&lt;/em&gt; where μ&lt;sub&gt;i&lt;/sub&gt; and σ&lt;sub&gt;i&lt;/sub&gt; are the learned parameters of the distribution and are referred to as the &lt;em&gt;variation parameters&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Once sampled, the same &lt;em&gt;z&lt;sub&gt;i&lt;/sub&gt;&lt;/em&gt; is used to condition the policy for as long as the current episode lasts and the action is sampled from then distribution π&lt;sub&gt;θ&lt;/sub&gt;(a|s, z&lt;sub&gt;i&lt;/sub&gt;).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The intuition is that the latent variable z&lt;sub&gt;i&lt;/sub&gt; would encode the notion of a task or goal that does not change arbitrarily during the episode.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;model-agnostic-exploration-with-structured-noise&quot;&gt;Model Agnostic Exploration with Structured Noise&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper focuses on the setting where the structured exploration policies are to be learned while leveraging the learning from prior tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A meta-learning approach, called as model agnostic exploration with structured noise (MAESN) is proposed to learn a good initialization of the &lt;em&gt;policy-parameters&lt;/em&gt; and to learn a latent space (for sampling the z from) that can inject structured stochasticity in the policy.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;General meta-RL approaches have two limitations when it comes to “learning to explore”:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;Casting meta-RL problems as RL problems lead to policies that do not exhibit sufficient variability to explore effectively.&lt;/li&gt;
+      &lt;li&gt;Many current approaches try to meta-learn the entire learning algorithm which limits the asymptotic performance of the model.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Idea behind MAESN is to meta-train &lt;em&gt;policy-parameters&lt;/em&gt; so that they learn to use the task-specific &lt;em&gt;latent variables&lt;/em&gt; for exploration and can quickly adapt to a new task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;An important detail is that the parameters are optimized to maximize the expected rewards after one step of gradient update to ensure that the policy uses the latent variables for exploration.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For every iteration of meta-training, an “inner” gradient update is performed on the variational parameters and the &lt;em&gt;post-inner-update&lt;/em&gt; parameters are used to perform the meta-update.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The authors report that performing the “inner” gradient update on the &lt;em&gt;policy-parameters&lt;/em&gt; does not help the overall learning objective and that the step size for each parameter had to be meta-learned.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The variation parameters have the usual KL divergence loss which encourages them to be close to the prior distribution (unit Gaussian in this case).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;After training, the &lt;em&gt;variational parameters&lt;/em&gt; for each task are quite close to the prior probably because the training objective optimizes for the expected reward after one step of gradient descent on the &lt;em&gt;variational parameters&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Another implementation detail is that reward shaping is used to ensure that the policy gets useful signal during meta-training. To be fair to the baselines, reward shaping is used while training baselines as well. Moreover, the policies trained with reward shaping generalizes to sparse reward setup as well (during meta-test time).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Three tasks distributions: Robotic Manipulation, Wheeled Locomotion, and Legged Locomotion. Each task distribution has 100 meta-training tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the Manipulation task distribution, the learner has to push different blocks from different positions to different goal positions. In the Locomotion task distributions, the different tasks correspond to the different goal positions.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The experiments show that the proposed approach can adapt to new tasks quickly and the learn coherent exploration strategy.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;• In some cases, learning from scratch also provides a strong asymptotic performance although learning from scratch takes much longer.&lt;/p&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Relational Reinforcement Learning</title>
+   <link href="/site/2019/06/01/Relational-Reinforcement-Learning.html"/>
+   <updated>2019-06-01T00:00:00-04:00</updated>
+   <id>/site/2019/06/01/Relational Reinforcement Learning</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Relational Reinforcement Learning (RRL) paradigm uses relational state (and action) space and policy representation to leverage the generalization capability of relational learning for reinforcement learning.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper shows that effectiveness of RRL - in terms of generalization, sample efficiency and interplay - using box-world and StarCraft II minigames.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1806.01830&quot;&gt;Link to the paper&lt;/a&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;architecture&quot;&gt;Architecture&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The main idea is to use neural network models that operate on structured representations and perform relational reasoning via iterated, message-passing style methods.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Use of non-local computations using a shared function (in terms of pairwise interactions between entities) provides a better inductive bias.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Multi-head dot product attention mechanism is used to model the pairwise interactions (with one or more attention blocks).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Iterative computations can be used to capture higher-order interactions between entities.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Entity extraction is based on the assumption that entities are things located at a particular point in space.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A CNN is used to parse the pixel space observation into &lt;em&gt;k&lt;/em&gt; feature maps of size &lt;em&gt;nxn&lt;/em&gt;. The &lt;em&gt;(x, y)&lt;/em&gt; coordinates are concatenated to each &lt;em&gt;k-&lt;/em&gt;dimensional pixel feature-vector to indicate the pixel’s position in the map.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The resulting &lt;em&gt;n&lt;sup&gt;2&lt;/sup&gt; x k&lt;/em&gt; matrix acts as the entity matrix.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Actor-critic architecture (using distributed agent IMPALA) is used.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;environment&quot;&gt;Environment&lt;/h2&gt;
+
+&lt;h3 id=&quot;box-world&quot;&gt;Box-World&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;12 x 12-pixel room with keys and boxes placed randomly.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Agent can move in 4 directions.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The task is to collect gems by unlocking boxes (which may contain keys to unlock other boxes).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each level has a unique sequence in which boxes need to be opened as opening the wrong box could make the level unsolvable.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Difficulty of a level can be controlled using: (i) Number of boxes in the path to the goal. (ii) The number of distractor branches, (iii)  Length of distractor branches.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;starcraft-ii-minigames&quot;&gt;StarCraft II minigames&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;9 mini games designed as specific scenarios in the Starcraft game are used.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;h3 id=&quot;box-world-1&quot;&gt;Box-World&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;RRL agents solve over 98% of the levels while the RL agent solves less than 95% of the levels.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Visualising the attention scores indicate that:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;keys attend to locks they can unlock.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;all objects attend to agent’s location.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;agent and gem attend to each other (and themselves).&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Generalization capacity is tested in two ways:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Performance on levels that require opening a larger sequence of boxes than it is trained on.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Performance on levels that require key-lock combinations not seen during training.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In both the scenarios, the RRL agent significantly outperforms the RL agent.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;starcraft&quot;&gt;StarCraft&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;RLL agent achieves better or equal results that the RL agent in all but one game.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For testing generalization, the agent, that was trained for controlling two marines, was transferred on the task which requires it to control 5 marines. These results are not conclusive given the high variability.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Good-Enough Compositional Data Augmentation</title>
+   <link href="/site/2019/05/21/Good-Enough-Compositional-Data-Augmentation.html"/>
+   <updated>2019-05-21T00:00:00-04:00</updated>
+   <id>/site/2019/05/21/Good-Enough Compositional Data Augmentation</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper introduces a simple data augmentation protocol that provides a good compositional inductive bias for sequential models.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Synthetic examples are created by taking real sequences and replacing the fragments in sequences which appear in similar environments. This operation is referred to as GECA (Good Enough Compositional Augmentation).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The underlying idea is that if two fragments of training examples occur in some environment, then any environment where the first fragment appears is also a valid environment for the second fragment.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1904.09545&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Discover substitutable fragments (ie pairs of fragments that co-occur with a common fragment) and use them to generate new sequences by swapping fragments.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The current work uses very simple criteria to decide if fragments are substitutable - fragments should occur in at least one lexical environment that is exactly the same. A lexical environment is the k-word window around each span of the fragment.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Though the idea can be motivated by work in generative syntax and distributional semantics, it would not hold like a physical law when applied to the real data.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The authors view this tradeoff as a balance between the shortage of training data vs relative frequency of mistake in the proposed data augmentation approach.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The approach is evaluated on the SCAN dataset when the model is trained on the short sequence of English commands. Though the dataset augmentation helps the baseline models, it is not surprising given the nature of the SCAN dataset.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;More challenging tasks (for evaluating the proposed approach) are semantic parsing (where the query is represented in the form of λ calculus or SQL and low resource language modeling. While the improvement (in terms of metrics) is sometimes limited, the gains are consistent across different datasets.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given that the proposed approach is relatively simple and straightforward, it appears to be quite promising.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Multiple Model-Based Reinforcement Learning</title>
+   <link href="/site/2019/05/14/Multiple-Model-Based-Reinforcement-Learning.html"/>
+   <updated>2019-05-14T00:00:00-04:00</updated>
+   <id>/site/2019/05/14/Multiple Model-Based Reinforcement Learning</id>
+   <content type="html">&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper presents some general ideas and mechanisms for multiple model-based RL. Even though the task and model architecture may not be very relevant now, I find the general idea and the mechanisms to be quite useful. As such, I am focusing only on high-level ideas and not the implementation details themselves.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The main idea behind Multiple Model-based RL (MMRL) is to decompose complex tasks into multiple domains in space and time so that the environment dynamics within each domain is predictable.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://www.mitpressjournals.org/doi/abs/10.1162/089976602753712972&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;MMRL proposes an RL architecture composes of multiple modules, each with its own state prediction model and RL controller.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The prediction error from each of the state prediction model defines the “responsibility signal” for each module.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This responsibility signal is used to:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Weigh the state prediction output ie the predicted state is the weighted sum of individual state predictions (weighted by the responsibility signal).&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Weigh the parameter update of the environment models as well as the RL controllers.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Weighing the action output - ie predicted action is a weighted sum of individual actions.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The framework is amenable for incorporating prior knowledge about which module should be selected.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the modular decomposition of a task, the modules should not change too frequently and some kind of spatial and temporal continuity is also desired.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Temporal continuity can be accounted for by using the previous responsibility signal as input during the current timestep.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Spatial continuity can b ensured by considering a spatial prior like the Gaussian spatial prior.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Though model-free methods could be used for learning the RL controllers, model-based methods could be more relevant given that the modules are learning state-prediction models as well.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Exploration can be ensured by using a stochastic version of greedy action selection.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One failure mode for such modular architectures is when a single module tries to perform well across all the tasks. The modules themselves should be relatively simplistic (eg linear models) which can learn quickly and generalize well.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Non-stationary hunting task in a grid world and non-linear, non-stationary control task of swinging up a pendulum provides the proof of concept for the proposed methods.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Towards a natural benchmark for continual learning</title>
+   <link href="/site/2019/04/09/Towards-a-natural-benchmark-for-continual-learning.html"/>
+   <updated>2019-04-09T00:00:00-04:00</updated>
+   <id>/site/2019/04/09/Towards a natural benchmark for continual learning</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Continual Learning paradigm focuses on learning from a non-stationary stream of data with additional desiderata - transferring knowledge from previously seen task to unseen tasks and being resilient to catastrophic forgetting - all with a fixed memory and computational budget.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This is in contrast to the IID (independent and identically distributed) assumption in statistical learning.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One common example of the non-iid data is setups involving sequential decision making - eg Reinforcement learning.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://marcpickett.com/cl2018/CL-2018_paper_48.pdf&quot;&gt;Paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;benchmark&quot;&gt;Benchmark&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Many existing benchmarks use MNIST as the underlying dataset (eg Permuted MNIST, Split MNIST, etc). These benchmarks lack complexity and make it hard to observe positive and negative backward transfer.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Most works focus only on the catastrophic forgetting challenge and ignore the other issues (like computation and memory footprint, the capacity of the network, etc).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes a new benchmark based on Starcraft II video game to understand the different approaches for lifelong learning.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The sequence of tasks is designed to be a curriculum - the learning agent stats with learning simple skills and later move to more complex tasks. These complex tasks require remembering and composing skills learned in the earlier levels.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;To evaluate for catastrophic forgetting, the tasks are designed such that not all the skills are needed for solving each task. Hence the learning agent needs to remember skills even though they are not needed at the current level.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each level comes with a fixed computational budget of episodes and each episode has a fixed time limit. Once the budget is consumed the agent has to proceed to the next level. Hence agents with better sample efficiency would benefit.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The benchmark supports both RL and supervised learning version. In the supervised version, expert agents (pretrained on each level) are also provided.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Baselines are provided for distillation (using experts): sequential training (fine tuning), Dropout and SER. None of the baseline methods achieve positive or negative backward transfer.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;When modeled as a pure RL task, the benchmark is extremely difficult to solve.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper suggests using a metric to record the amount of learning/data required to recover performance on the previous task.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Meta-Learning Update Rules for Unsupervised Representation Learning</title>
+   <link href="/site/2019/04/02/Meta-Learning-Update-Rules-for-Unsupervised-Representation-Learning.html"/>
+   <updated>2019-04-02T00:00:00-04:00</updated>
+   <id>/site/2019/04/02/Meta-Learning Update Rules for Unsupervised Representation Learning</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Standard unsupervised learning aims to learn transferable features. The paper proposes to learn a transferable learning rule (in an unsupervised manner) that can generalize across tasks and architectures.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1804.00222&quot;&gt;Paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Consider training the model with supervised learning - &lt;em&gt;φ&lt;sub&gt;t+1&lt;/sub&gt; = SupervisedUpdate(φ&lt;sub&gt;t&lt;/sub&gt;, x&lt;sub&gt;t&lt;/sub&gt;, y&lt;sub&gt;t&lt;/sub&gt;, θ)&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Here &lt;em&gt;t&lt;/em&gt; denotes the step, &lt;em&gt;(x, y)&lt;/em&gt; denotes the data points, &lt;em&gt;θ&lt;/em&gt; denotes the hyperparameters of the optimizer.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Extending this formulation for meta-learning, one could say that &lt;em&gt;t&lt;/em&gt; is the step of the inner loop, &lt;em&gt;θ&lt;/em&gt; are the parameters of the meta learning model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Further, the paper proposes to use &lt;em&gt;φ&lt;sub&gt;t+1&lt;/sub&gt; = UnsupervisedUpdate(φ&lt;sub&gt;t&lt;/sub&gt;, x&lt;sub&gt;t&lt;/sub&gt;, θ)&lt;/em&gt; ie &lt;em&gt;y&lt;sub&gt;t&lt;/sub&gt;&lt;/em&gt; is not used (or even assumed to be available as this is unsupervised learning).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The meta update rule is used to learn the weights of a meta-model by performing SGD on the sum of &lt;em&gt;MetaObjective&lt;/em&gt; over the distribution of tasks (over the course of inner loop training).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;model&quot;&gt;Model&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Base model: MLP with parameters &lt;em&gt;φ&lt;sub&gt;t&lt;/sub&gt;&lt;/em&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;To ensure that it generalizes across architectures, the update rule is designed to be neural-local ie updates are a function of pre and postsynaptic neurons though, in practice, this constraint is relaxed to decorrelate neurons by using cross neural information.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each neuron &lt;em&gt;i&lt;/em&gt; in every layer &lt;em&gt;l&lt;/em&gt; (in the base model) has an update network (MLP) which takes as input the feedforward activations, feedback weights and error signals. ie &lt;em&gt;h&lt;sub&gt;b&lt;/sub&gt;&lt;sup&gt;l&lt;/sup&gt;(i) = MLP(x&lt;sub&gt;b&lt;/sub&gt;&lt;sup&gt;l&lt;/sup&gt;(i), z&lt;sub&gt;b&lt;/sub&gt;&lt;sup&gt;l&lt;/sup&gt;(i), v&lt;sup&gt;l+1&lt;/sup&gt;,
+δ&lt;sup&gt;l&lt;/sup&gt;(i), θ)&lt;/em&gt;&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;&lt;em&gt;b&lt;/em&gt; - index of the minibatch&lt;/li&gt;
+      &lt;li&gt;&lt;em&gt;x&lt;sup&gt;l&lt;/sup&gt;&lt;/em&gt; - pre non-linearity activations&lt;/li&gt;
+      &lt;li&gt;&lt;em&gt;z&lt;sup&gt;l&lt;/sup&gt;&lt;/em&gt; - post non-linearity activations&lt;/li&gt;
+      &lt;li&gt;&lt;em&gt;v&lt;sup&gt;l&lt;/sup&gt;&lt;/em&gt; - feedback weights&lt;/li&gt;
+      &lt;li&gt;&lt;em&gt;δ&lt;sup&gt;l&lt;/sup&gt;&lt;/em&gt; - error signal&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;All the update networks share the meta parameters &lt;em&gt;θ&lt;/em&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model is run in a standard feed-forward manner and the update network (corresponding to each unit) is used to generate the error signal &lt;em&gt;δ&lt;sup&gt;l&lt;/sup&gt;&lt;sub&gt;b&lt;/sub&gt;(i) = lin(h&lt;sub&gt;b&lt;/sub&gt;&lt;sup&gt;l&lt;/sup&gt;(i))&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This loss is backpropogated using the set of learned backward weights &lt;em&gt;v&lt;sup&gt;l&lt;/sup&gt;&lt;/em&gt; instead of the forward weights &lt;em&gt;w&lt;sub&gt;l&lt;/sub&gt;&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The weight update &lt;em&gt;Δw&lt;sub&gt;l&lt;/sub&gt;&lt;/em&gt; is also generated using a per-neuron update network.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;meta-objective&quot;&gt;Meta Objective&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The &lt;em&gt;MetaObjective&lt;/em&gt; is based on fitting a linear regression model to labeled examples with a small number of data points.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given the emphasis on learning generalizable features, the weights (of linear regression) are estimated on one batch and evaluated on another batch.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The &lt;em&gt;MetaObjective&lt;/em&gt; is to reduce the cosine distance between &lt;em&gt;y&lt;sub&gt;b&lt;/sub&gt;&lt;/em&gt; and &lt;em&gt;v&lt;sup&gt;T&lt;/sup&gt;x&lt;sub&gt;b&lt;/sub&gt;&lt;sup&gt;L&lt;/sup&gt;&lt;/em&gt;&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;em&gt;y&lt;sub&gt;b&lt;/sub&gt;&lt;/em&gt; - Actual lables on the evaluation batch&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;em&gt;x&lt;sub&gt;b&lt;/sub&gt;&lt;sup&gt;L&lt;/sup&gt;&lt;/em&gt; - Features of the evaluation batch (using the base model)&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;em&gt;v&lt;/em&gt; - parameters of the linear regression model (learned on train batch)&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;practical-considerations&quot;&gt;Practical Considerations&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Meta gradients are approximated using truncated backdrop through time.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Increasing variation in the training dataset helps the meta optimization process. Data is augmented with shifts, rotations, and noise. Predicting these coefficients is an auxiliary (regression) task for training the meta-objective.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Training the system requires a lot of resources - 8 days with 512 workers.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;With standard unsupervised learning, the performance (on transfer task) starts declining after some time even though the performance (on the unsupervised task) is improving. This suggests that the objective function for the two tasks starts to mismatch.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;UnsupervisedUpdate&lt;/em&gt; leads to a better generalization as compared to both VAE and supervised learning (followed by transfer).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;UnsupervisedUpdate&lt;/em&gt; also leads to a positive transfer across domains (vision to language) when trained for a shorter duration of time (to ensure that the meta-objective does not overfit).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;UnsupervisedUpdate&lt;/em&gt; also generalizes to larger model architectures and different activation functions.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks</title>
+   <link href="/site/2019/03/26/GNN-Explainer-A-Tool-for-Post-hoc-Explanation-of-Graph-Neural-Networks.html"/>
+   <updated>2019-03-26T00:00:00-04:00</updated>
+   <id>/site/2019/03/26/GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Graph Neural Network (GNN) is a family of powerful machine learning (ML) models for graphs that can combine node information with the structural information.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One downside of GNNs is that their predictions are hard to interpret.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes GNN Explainer model for solving the problem of interpretability.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1903.03894&quot;&gt;Paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;desiderata-for-gnn-explanations&quot;&gt;Desiderata for GNN explanations&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Local edge fidelity&lt;/strong&gt; - identify the subgraph structure (ideally the smallest) that significantly affected the predictions of the GNN. ie identify the important edges in the graph (for a given prediction).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Local node fidelity&lt;/strong&gt; - identify the import node features and correlations in the features of the neighboring nodes.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Single instance and multi-instance explanations&lt;/strong&gt; - Support both single instance prediction tasks and multi-instance prediction tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Model Agnostic&lt;/strong&gt; - Support a large family of models (ideally all)&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Task Agnostic&lt;/strong&gt; - Support a large family of tasks (ideally all)&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;I first describe the single instance prediction case and use that as the base to describe the multiple instance prediction cases. All the discussion in this section assumes a single instance prediction task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Input&lt;/strong&gt;: Trained GNN, a single instance whose prediction is to be explained.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Task&lt;/strong&gt;: Identify the small subgraph and the small subset of features that explain the prediction.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Idea&lt;/strong&gt;: Maximize the mutual information (MI) between the GNN and the explanation by learning a &lt;em&gt;graph mask&lt;/em&gt; which can be used for selecting the relevant subgraph (from the GNN’s computational graph) and features (from all layers of the GNN).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Computational graph of GNN (corresponding to a node) refers to the approx L-hop neighborhood of the node in the graph ie the subgraph formed by nodes and edges whose representation affected the representation of the given node.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;single-instance-explanations&quot;&gt;Single-Instance Explanations&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;For a node &lt;em&gt;v&lt;/em&gt;, the information used to predict its label &lt;em&gt;y&lt;/em&gt; is completely described by its computation graph &lt;em&gt;G&lt;sub&gt;c&lt;/sub&gt;(v)&lt;/em&gt; and the associated feature set &lt;em&gt;X&lt;sub&gt;c&lt;/sub&gt;(v)&lt;/em&gt;. The feature set includes the features of all the nodes in the computation graph.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;When constructing the explaination, only &lt;em&gt;G&lt;sub&gt;c&lt;/sub&gt;(v)&lt;/em&gt; and &lt;em&gt;X&lt;sub&gt;c&lt;/sub&gt;(v)&lt;/em&gt; are used.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The task can be reformulated as identifying a subgraph &lt;em&gt;G&lt;sub&gt;S&lt;/sub&gt;&lt;/em&gt; (subset of &lt;em&gt;G&lt;sub&gt;c&lt;/sub&gt;(v)&lt;/em&gt;) with associated features &lt;em&gt;X&lt;sub&gt;S&lt;/sub&gt;&lt;/em&gt; which are important when predicting the label &lt;em&gt;y&lt;/em&gt; for node &lt;em&gt;v&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;“Importance” is measured in terms of MI&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;em&gt;MI(Y, (G&lt;sub&gt;S&lt;/sub&gt;, X&lt;sub&gt;S&lt;/sub&gt;)) = H(Y) - H(Y | G = G&lt;sub&gt;S&lt;/sub&gt;, X = X&lt;sub&gt;S&lt;/sub&gt;)&lt;/em&gt; where &lt;em&gt;H&lt;/em&gt; is the entropy and &lt;em&gt;Y&lt;/em&gt; is a random variable representing the prediction.&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;A further constraint, &lt;em&gt;| G&lt;sub&gt;S&lt;/sub&gt;| &amp;lt; k&lt;/em&gt; is imposed to obtain consise explaintations.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Since &lt;em&gt;H(Y)&lt;/em&gt; is fixed (recall that the network has already been trained and is now being used in the inference mode), maximizing MI is equivalent to minimizing the conditional entropy &lt;em&gt;H(Y | G = G&lt;sub&gt;S&lt;/sub&gt;, X = X&lt;sub&gt;S&lt;/sub&gt;)&lt;/em&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This is equivalent to selecting the subgraph that minimizes the uncertainty in the prediction of &lt;em&gt;y&lt;/em&gt; when the computational graph is &lt;em&gt;G&lt;sub&gt;c&lt;/sub&gt;(v)&lt;/em&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h4 id=&quot;optimiation-process&quot;&gt;Optimiation Process&lt;/h4&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given the exponentially large number of possible subgraphs, we can not directly optimize the given equation.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A “relaxed”-adjacency matrix (whose values are real numbers in the range 0 to 1) is introduced where each element of this fractional adjacency matrix is smaller than the corresponding element of the original adjacency matrix. Gradient descent can be performed on this adjacency matrix.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The “relaxed” &lt;em&gt;G&lt;sub&gt;S&lt;/sub&gt;&lt;/em&gt; can be interpreted as a variational approximation of the subgraph distributions of &lt;em&gt;G&lt;sub&gt;c&lt;/sub&gt;(v)&lt;/em&gt; and the objective can be written as &lt;em&gt;min E&lt;sub&gt;G&lt;sub&gt;S&lt;/sub&gt;&lt;/sub&gt;H(Y | G = G&lt;sub&gt;S&lt;/sub&gt;, X = X&lt;sub&gt;S&lt;/sub&gt;)&lt;/em&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Now the paper makes a big approximation that the GNN is convex so as to leverage the Jensen inequality and push the expectation inside the entropy term to get an upper bound and then minimize that ie &lt;em&gt;min H(Y | G = E&lt;sub&gt;s&lt;/sub&gt;[G&lt;sub&gt;S&lt;/sub&gt;], X = X&lt;sub&gt;S&lt;/sub&gt;)&lt;/em&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper reports that the convexity approximation (along with discreteness constraint) works in practice.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Next, mean field approximation is used to decompose &lt;em&gt;P(G&lt;sub&gt;S&lt;/sub&gt;)&lt;/em&gt; as a multivariate Bernoulli distrbitution ie product of &lt;em&gt;A&lt;sub&gt;S&lt;/sub&gt;(i, j)&lt;/em&gt; for all &lt;em&gt;(i, j)&lt;/em&gt; belonging to &lt;em&gt;G&lt;sub&gt;c&lt;/sub&gt;(v)&lt;/em&gt;. &lt;em&gt;A&lt;sub&gt;S&lt;/sub&gt;&lt;/em&gt; can be optimized directly and its values represent the expectation of the Bernoulli distrbitution on wether the edge &lt;em&gt;e&lt;sub&gt;i, j&lt;/sub&gt;&lt;/em&gt; exists.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given the constraints on &lt;em&gt;A&lt;sub&gt;S&lt;/sub&gt;&lt;/em&gt;, it is easier to learn a mask matrix &lt;em&gt;M&lt;/em&gt; and optimize that such that &lt;em&gt;A&lt;sub&gt;S&lt;/sub&gt;&lt;/em&gt; = M * A&lt;sub&gt;c&lt;/sub&gt;* Additionally, the sigmod operator can be applied on &lt;em&gt;M&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Once &lt;em&gt;M&lt;/em&gt; is learned, only the top &lt;em&gt;k&lt;/em&gt; values are retained.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h4 id=&quot;including-node-features-in-the-explanation&quot;&gt;Including Node Features in the Explanation&lt;/h4&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Similar to the previous approach, another feature mask is learned (either one for entire GNN or one per node of the GNN) and is used as a feature selector.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The mask could either be learned such that same set of node features (in terms of dimensions) are selected or a different set of features are selected per node. The paper uses the former as it is more straightforward.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Just like before, a “relaxed” mask &lt;em&gt;M&lt;sub&gt;T&lt;/sub&gt;&lt;/em&gt; is trained to select features as &lt;em&gt;M&lt;sub&gt;T&lt;/sub&gt; * X&lt;sub&gt;S&lt;/sub&gt;&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One tricky case is where one feature is important but its value is set to 0. In the case, the value will be masked even though it should not be&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The workaround is to use Monte Carlo (MC) estimates of marginals of the missing features. This gives a way to assign importance scores to each feature dimension and a form of reparameterization trick is used to perform end-to-end learning.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Masks are encouraged to be discrete by regularizing their element-wise entropy.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Resulting computation graph is valid as in it allows message passing towards the central node &lt;em&gt;v&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;multi-instance-explanations&quot;&gt;Multi-Instance Explanations&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given a set of nodes (having the label say &lt;em&gt;y&lt;/em&gt;),  the task is to obtain a global explanation of the predictions.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For the given class, a prototypical reference node is chosen by computing the mean of embeddings of all the nodes in the class and then selecting the node which is closest to the mean.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Now, compute the important computational graph corresponding to this node and align the computational subgraphs of all the other nodes (in the given class) to reference.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Let &lt;em&gt;A*&lt;/em&gt; be the adjacency matrix and &lt;em&gt;X*&lt;/em&gt; be the feature matrix for the explanation corresponding to the reference node. Let &lt;em&gt;A&lt;sub&gt;v&lt;/sub&gt;&lt;/em&gt; and &lt;em&gt;X&lt;sub&gt;v&lt;/sub&gt;&lt;/em&gt; be the adjacency matrix and feature matrix of the to-ber-aligned computational graph.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A relaed alignment matrix &lt;em&gt;P&lt;/em&gt; is optimized to align the nodes and features in the two graphs ie we minimize &lt;em&gt;|P&lt;sup&gt;T&lt;/sup&gt;A&lt;sub&gt;v&lt;/sub&gt;P - A*| + *|P&lt;sup&gt;T&lt;/sup&gt;X&lt;sub&gt;v&lt;/sub&gt;P - X*|&lt;/em&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Choosing concise explanations helps in efficient graph matching.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For GNNs that compute attention over the entire graph, edges with low attention weight can be pruned to increase efficiency.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Datasets&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Node classification: BA-Shapes, BA-Community, Tree-Cycles, Tree-Grid&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Graph classification: MUTAG, Reddit-Binary&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Baselines&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;GRAD - Compute the gradient of the model loss with respect to the adjacency matrix and the node features to be classified and fix the edges with the highest absolute gradient.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;GAT - Graph Attention Network&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The proposed model seems to outperform the baselines both qualitatively and quantitatively. But the results should be taken with a grain of salt as only 2 baselines are considered.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks</title>
+   <link href="/site/2019/03/16/To-Tune-or-Not-to-Tune-Adapting-Pretrained-Representations-to-Diverse-Tasks.html"/>
+   <updated>2019-03-16T00:00:00-04:00</updated>
+   <id>/site/2019/03/16/To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks</id>
+   <content type="html">&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1903.05987&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper provides useful empirical advice for adapting pretrained language models for a given target task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Pre-trained models considered&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;ELMo&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;BERT&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Tasks considered&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Named Entity Recognition (NER) - CoNLL 2003 dataset&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Sentiment Analysis (SA) - Stanford Sentiment Treebank (SST-2) dataset&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Natural Language Inference (NLI) - MultiNLI and Sentences Involving Compositional Knowledge (SICK-E) dataset&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Paraphrase Detection (PD) - Microsoft Research Paraphrase Corpus (MRPC)&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Semantic Textual Similarity (STS) - Semantic Textual Similarity Benchmark (STS-B) and SICK-R&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The last 3 tasks (NLI, PD, STS) are defined for sentence pairs.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Adaptation Strategies&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Feature Extraction&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;The pretrained model is only used for extracting features and its weights are kept fixed.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;For both ELMo and BERT, the contextual representation of the words from all the layers are extracted.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;A weighted combination of these layers is used as an input to the task-specific model.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Task-specific models&lt;/p&gt;
+
+            &lt;ul&gt;
+              &lt;li&gt;
+                &lt;p&gt;NER - BiLSTM with CRF layer&lt;/p&gt;
+              &lt;/li&gt;
+              &lt;li&gt;
+                &lt;p&gt;SA - bi-attentive classification network&lt;/p&gt;
+              &lt;/li&gt;
+              &lt;li&gt;
+                &lt;p&gt;NLI, PD, STS - &lt;a href=&quot;https://arxiv.org/abs/1609.06038&quot;&gt;Enhanced Sequential Inference Model (ESIM)&lt;/a&gt;&lt;/p&gt;
+              &lt;/li&gt;
+            &lt;/ul&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Fine-tuning&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;The pretrained model is finetuned on the target task.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Task-specific models for ELMO&lt;/p&gt;
+
+            &lt;ul&gt;
+              &lt;li&gt;
+                &lt;p&gt;NER - CRF on top of LSTM states&lt;/p&gt;
+              &lt;/li&gt;
+              &lt;li&gt;
+                &lt;p&gt;SA - Max-pool over the language model states followed by a softmax layer&lt;/p&gt;
+              &lt;/li&gt;
+              &lt;li&gt;
+                &lt;p&gt;NLI, PD, STS - cross sentence bi-attention between the language model states followed by pooling and softmax layer.&lt;/p&gt;
+              &lt;/li&gt;
+            &lt;/ul&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Task-specific models for BERT&lt;/p&gt;
+
+            &lt;ul&gt;
+              &lt;li&gt;
+                &lt;p&gt;NER - Extract representation of the first-word piece of each token followed by the softmax layer&lt;/p&gt;
+              &lt;/li&gt;
+              &lt;li&gt;
+                &lt;p&gt;SA, NLI, PD, STS - standard BERT training&lt;/p&gt;
+              &lt;/li&gt;
+            &lt;/ul&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Main observations&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Feature extraction and fine-tuning have comparable performance in most cases unless the two tasks are highly similar(fine-tuning is better) or highly dissimilar (feature extraction is better).&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;For ELMo, feature extraction consistently outperforms fine-tuning for the sentence pair tasks (NLI, PD, STS). The reverse trend is observed for BERT with fine-tuning being better on sentence pair tasks.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Adding extra parameters is helpful for feature extraction but not fine-tuning.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;ELMo fine-tuning requires careful tuning and other tricks like triangular learning rates, gradual unfreezing and discriminative fine-tuning.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;For the tasks considered, there is no correlation observed between the distance of the source and target domains and adaptation performance.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Training a diagnostic classifier (on the intermediate representations) suggests that fine-tuning improves the performance of the classifier at all the intermediate layers (which is sort of expected).&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;In terms of mutual information estimates, fine-tuned representations have a much higher mutual information as compared to the feature extraction based representations.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Knowledge for single sentence tasks seems to be mostly concentrated in the last layers while for pair classification tasks, the knowledge seems gradually build un in the intermediate layers, all the way up to the last layer.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Model Primitive Hierarchical Lifelong Reinforcement Learning</title>
+   <link href="/site/2019/03/12/Model-Primitive-Hierarchical-Lifelong-Reinforcement-Learning.html"/>
+   <updated>2019-03-12T00:00:00-04:00</updated>
+   <id>/site/2019/03/12/Model Primitive Hierarchical Lifelong Reinforcement Learning</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper presents a framework that uses diverse suboptimal world models that can be used to break complex policies into simpler and modular sub-policies.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given a task, both the sub-policies and the controller are simultaneously learned in a bottom-up manner.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The framework is called as Model Primitive Hierarchical Reinforcement Learning (MPHRL).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1903.01567&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;idea&quot;&gt;Idea&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Instead of learning a single transition model of the environment (aka &lt;em&gt;world model&lt;/em&gt;) that can model the transitions very well, it is sufficient to learn several (say &lt;em&gt;k&lt;/em&gt;) suboptimal models (aka &lt;em&gt;model primitives&lt;/em&gt;).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each &lt;em&gt;model primitive&lt;/em&gt; will be good in only a small part of the state space (aka &lt;em&gt;region of specialization&lt;/em&gt;).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;These &lt;em&gt;model primitives&lt;/em&gt; can then be used to train a gating mechanism for selecting sub-policies to solve a given task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Since these &lt;em&gt;model primitives&lt;/em&gt; are sub-optimal, they are not directly used with model-based RL but are used to obtain useful functional decompositions and sub-policies are trained with model-free approaches.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;single-task-learning&quot;&gt;Single Task Learning&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;A gating controller is trained to choose the sub-policy whose &lt;em&gt;model primitive&lt;/em&gt; makes the best prediction.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This requires modeling &lt;em&gt;p(M&lt;sub&gt;k&lt;/sub&gt; | s&lt;sub&gt;t&lt;/sub&gt;, a&lt;sub&gt;t&lt;/sub&gt;, s&lt;sub&gt;t+1&lt;/sub&gt;)&lt;/em&gt; where &lt;em&gt;p(M&lt;sub&gt;k&lt;/sub&gt;)&lt;/em&gt; denotes the probability of selecting the &lt;em&gt;k&lt;sup&gt;th&lt;/sup&gt; model primitive&lt;/em&gt;. This is hard to compute as the system does not have access to &lt;em&gt;s&lt;sub&gt;t+1&lt;/sub&gt;&lt;/em&gt;  and &lt;em&gt;a&lt;sub&gt;t&lt;/sub&gt;&lt;/em&gt; at time &lt;em&gt;t&lt;/em&gt; before it has choosen the sub-policy.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Properly marginalizing &lt;em&gt;s&lt;sub&gt;t+1&lt;/sub&gt;&lt;/em&gt; and &lt;em&gt;a&lt;sub&gt;t&lt;/sub&gt;&lt;/em&gt; would require expensive MC sampling. Hence an approximation is used and the gating controller is modeled as a categorical distribution - to produce &lt;em&gt;p(M&lt;sub&gt;k&lt;/sub&gt; | s&lt;sub&gt;t&lt;/sub&gt;)&lt;/em&gt;. This is trained via a conditional cross entropy loss where the ground truth distribution is obtained from transitions sampled in a rollout.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper notes that technique is biased but reports that it still works for the downstream tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The gating controller composes the sub-policies as a mixture of Gaussians.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For learning, PPO algorithm is used with each &lt;em&gt;model primitives&lt;/em&gt; gradient weighted by the probability from the gating controller.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;lifelong-learning&quot;&gt;Lifelong Learning&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Different tasks could share common subtasks but may require a different composition of subtasks. Hence, the learned sub-policies are transferred across tasks but not the gating controller or the baseline estimator (from PPO).&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Domains:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Mujoco ant navigating different mazes.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Stacker arm picking up and placing different boxes.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Implementation Details:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Gaussian subpolicies&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;PPO as the baseline&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Model primitives are hand-crafted using the true next state provided by the environment simulator.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Single Task&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Only maze task is considered with the start position (of the ant) and the goal position is fixed.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Observation includes distance from the goal.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Forcing the agent to decompose the problem, when a more direct solution may be available, causes the sample complexity to increase on one task.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Lifelong Learning&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Maze&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;10 random Mujoco ant mazes used as the task distribution.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;MPHRL takes almost twice the number of steps (as compared to PPO baseline) to solve the first task but this cost gets amortized over the distribution and the model takes half the number of steps as compared to the baseline (summed over the 10 tasks).&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Pick and Place&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;8 Pick and Place tasks are created with max 3 goal locations.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Observation includes the position of the goal.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Ablations&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Overlapping &lt;em&gt;model primitives&lt;/em&gt; can degrade the performance (to some extent). Similarly, the performance suffers when redundant primitives are introduced indicating that the gating mechanism is not very robust.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Sub-policies could quickly adapt to the previous tasks (on which they were trained initially) despite being finetuned on subsequent tasks.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The order of tasks (in the 10-Mazz task) does not degrage the performance.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Transfering the gating controller leads to negative transfer.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Notes&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;I think the biggest strength of the work is that accurate dynamics model are not needed (which are hard to train anyways!) through the experimental results are not conclusive given the limited number of domains on which the approach is tested.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>TuckER - Tensor Factorization for Knowledge Graph Completion</title>
+   <link href="/site/2019/02/19/TuckER-Tensor-Factorization-for-Knowledge-Graph-Completion.html"/>
+   <updated>2019-02-19T00:00:00-05:00</updated>
+   <id>/site/2019/02/19/TuckER-Tensor Factorization for Knowledge Graph Completion</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;TuckER is a simple, yet powerful linear model that uses Tucker decomposition for the task of link prediction in knowledge graphs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1901.09590&quot;&gt;Paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://github.com/ibalazevic/TuckER&quot;&gt;Implementation&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;knowledge-graph-as-a-tensor&quot;&gt;Knowledge Graph as a Tensor&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Let E be the set of all the entities and R be the set of all the relations in a given knowledge graph (KG).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The KG can be represented as a list of triples of the form (source entity, relation, object entity) or (e&lt;sub&gt;s&lt;/sub&gt;, r, e&lt;sub&gt;o&lt;/sub&gt;).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The list of triples can be represented as a third-order tensor (of binary values) where each element corresponds to a triple and each element’s value corresponds to ether that element is present in the KG or not.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The link prediction task can be formulated as - given a set of all triples, learn a scoring function that assigns a score to each triple. The score indicates whether the triple is actually present in the KG or not.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;tucker-decomposition&quot;&gt;TuckER Decomposition&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Tucker decomposition factorizes a tensor into a set of factor matrices and a smaller core tensor.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the specific case of three-mode tensors (alternate representation of a KG), the given original tensor &lt;strong&gt;X&lt;/strong&gt; (of shape &lt;em&gt;IxJxK&lt;/em&gt;) can be factorized into a core tensor &lt;strong&gt;W&lt;/strong&gt; (of shape &lt;em&gt;PxQxR&lt;/em&gt;) and 3 factor matrics - &lt;strong&gt;A&lt;/strong&gt; (of shape &lt;em&gt;IxP&lt;/em&gt;), &lt;strong&gt;B&lt;/strong&gt; (of shape &lt;em&gt;JxQ&lt;/em&gt;) and &lt;strong&gt;C&lt;/strong&gt; (of shape &lt;em&gt;KxR&lt;/em&gt;) such that &lt;strong&gt;X&lt;/strong&gt; is approximately &lt;strong&gt;W&lt;/strong&gt; x&lt;sub&gt;1&lt;/sub&gt; &lt;strong&gt;A&lt;/strong&gt; x&lt;sub&gt;2&lt;/sub&gt; &lt;strong&gt;B&lt;/strong&gt; x&lt;sub&gt;3&lt;/sub&gt; &lt;strong&gt;C&lt;/strong&gt;, where X&lt;sub&gt;n&lt;/sub&gt; denotes the tensor product along the nth mode.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Generally, &lt;em&gt;P, Q, R&lt;/em&gt; are smaller than &lt;em&gt;I, J K&lt;/em&gt; (respectively) and &lt;strong&gt;W&lt;/strong&gt; can be seen as a compressed version of &lt;strong&gt;X&lt;/strong&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;tucker-decomposition-for-link-prediction&quot;&gt;TuckER Decomposition for Link Prediction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Two embedding matrics are used for embedding the entities and the relations respectively.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Entity embedding matrix &lt;strong&gt;E&lt;/strong&gt; is shared for both subject and the object ie &lt;strong&gt;E&lt;/strong&gt; = &lt;strong&gt;A&lt;/strong&gt; = &lt;strong&gt;B&lt;/strong&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The scoring function is gives as &lt;strong&gt;W&lt;/strong&gt; x&lt;sub&gt;1&lt;/sub&gt; &lt;strong&gt;e&lt;sub&gt;s&lt;/sub&gt;&lt;/strong&gt; x&lt;sub&gt;2&lt;/sub&gt; &lt;strong&gt;w&lt;sub&gt;r&lt;/sub&gt;&lt;/strong&gt; x&lt;sub&gt;3&lt;/sub&gt; &lt;strong&gt;e&lt;sub&gt;0&lt;/sub&gt;&lt;/strong&gt; where &lt;strong&gt;e&lt;sub&gt;s&lt;/sub&gt;&lt;/strong&gt;, &lt;strong&gt;w&lt;sub&gt;r&lt;/sub&gt;&lt;/strong&gt; and &lt;strong&gt;e&lt;sub&gt;o&lt;/sub&gt;&lt;/strong&gt; are the embedding vectors corresonding to e&lt;sub&gt;s&lt;/sub&gt;, e&lt;sub&gt;r&lt;/sub&gt; and e&lt;sub&gt;o&lt;/sub&gt; respectively. Note that both the core tensor and the factor matrices are to be learnt.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Model is trained with the standard negative log-likelihood loss given as (for one triple):  y * log(p) + (1-y) * log(1-p)&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;To speed up training and increase accuracy, 1-N scoring is used. A given (e&lt;sub&gt;s&lt;/sub&gt;, r) is simultaneously scored for all the entities using the local-closed world assumption (knowledge graph is only locally complete).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Handling asymmetric relations is straightforward by learning a relation embedding alongside a relation-agnostic core tensor which enables knowledge sharing across relations.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;theoretical-analysis&quot;&gt;Theoretical Analysis&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;One important consideration would be the expressive power of TuckER models, especially in relation to other models like ComplEx and SimplE.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It can be shown the TuckER is fully expressive ie give any ground truth over E and R, there exists a TuckER model which can perfectly represent the data - using 1-hot entity and relation embedding.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For full expressiveness, dimensionality of entity (relation) is n&lt;sub&gt;E&lt;/sub&gt; (n&lt;sub&gt;R&lt;/sub&gt;) where n&lt;sub&gt;E&lt;/sub&gt; (n&lt;sub&gt;R&lt;/sub&gt;) are the number of entities (relations). In comparsion, the required dimensionality for ComplEx is n&lt;sub&gt;E&lt;/sub&gt; * n&lt;sub&gt;R&lt;/sub&gt; (for both entity and relations) and for SimplE, it is min(&lt;sub&gt;E&lt;/sub&gt; * n&lt;sub&gt;R&lt;/sub&gt;, number of facts + 1) (for both entity and relations).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Many existing models like RESCAL, DistMult, ComplEx, SimplE etc can be seen as special cases of TuckER.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;h3 id=&quot;datasets&quot;&gt;Datasets&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;FB15k, FB15k-237, WN18, WN18RR&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The max number of entities is around 41K and max number of relations is around 1.3K&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;implementation&quot;&gt;Implementation&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;BatchNorm, Dropout and Learning rate decay are used.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;metrics&quot;&gt;Metrics&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Mean Reciprocal Rank (MRR) - the average of the inverse of mean rank assigned to the true triple overall n&lt;sub&gt;e&lt;/sub&gt; generated triples.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;hits@k (k = 1, 3, 10) - percentage of times the true triple is ranked in the top k of the n&lt;sub&gt;e&lt;/sub&gt; generated triples.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Higher is better for both the metrics.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;results&quot;&gt;Results&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;TuckER outperforms all the baseline models on all but one task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Dropout is an important factor with higher dropout rates (0, 3, 0.4, 0.5) needed for datasets with fewer training examples per relation (hence more prone to overfitting).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;TuckER improves performance more significantly when the number of relations is large.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Even with lower embedding dimensions, TuckER’s performance does not deteriorate as much as other models.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Linguistic Knowledge as Memory for Recurrent Neural Networks</title>
+   <link href="/site/2019/02/05/Linguistic-Knowledge-as-Memory-for-Recurrent-Neural-Networks.html"/>
+   <updated>2019-02-05T00:00:00-05:00</updated>
+   <id>/site/2019/02/05/Linguistic Knowledge as Memory for Recurrent Neural Networks</id>
+   <content type="html">&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1703.02620&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Training RNNs to model long term dependencies is difficult but in some cases, the information about dependencies between elements (of the sequence) may be present in the form of symbolic knowledge.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For example, when encoding sentences, coreference, and hypernymy relations can be extracted between tokens.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;These elements(tokens) can be connected with each other with different kind of edges resulting in the graph data structure.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One approach could be to model this knowledge(encoded in the graph) using a graph neural network (GNN).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The authors prefer to encode the information into 2 DAGs (via topological sorting) as training the GNN could add some extra overhead.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This results into the Memory as Acyclic Graph Encoding RNN (MAGE-RNN) architecture. Its GRU version is referred to as MAGE-GRU.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given an input sequence of tokens [x&lt;sub&gt;1&lt;/sub&gt;, x&lt;sub&gt;2&lt;/sub&gt;, …, x&lt;sub&gt;T&lt;/sub&gt;] and information about which tokens relate to other tokens, a graph G is constructed with different (possibly typed) edges.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given the graph &lt;em&gt;G&lt;/em&gt;, two DFS orderings are computed - forward DFS and backward DFS.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;MAGE-RNN uses separate networks for accessing the forward and backward DFS orders.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A separate hidden state is maintained for each entity type to separate memory content from addressing.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For any DFS order (forward or backward), the representation at time &lt;em&gt;t&lt;/em&gt; is given as the concatenation of representation of different edge types at that time.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The hidden states (for different edge types at time t) are updated in the topological order using the current state of all incoming edges at x&lt;sub&gt;t&lt;/sub&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The representation of the DFS order is given as the sequence of all the previous representations.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In some cases, elements across multiple sequences could be related to each other. In that case, the graph is decomposed into a collection of DAGs and use MAGE-GRU on the DAGs by taking one random permutation of the sequences and decomposing it into the forward and the backward graphs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model is evaluated on the task of text comprehension with coreference on bAbi dataset (story based QA), LAMBADA dataset (broad context language modeling) and CNN dataset (cloze-style QA).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;MAGE-GRU was used as a replacement for GRU units in bi-directional GRUs and GA-Reader architecture.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;DAG-RNN and shared version of MAGE-GRU (with shared edge types) are the other baselines.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For all the cases, the model with MAGE-GRU works the best.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Diversity is All You Need - Learning Skills without a Reward Function</title>
+   <link href="/site/2019/01/29/Diversity-is-All-You-Need-Learning-Skills-without-a-Reward-Function.html"/>
+   <updated>2019-01-29T00:00:00-05:00</updated>
+   <id>/site/2019/01/29/Diversity is All You Need - Learning Skills without a Reward Function</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes an approach to learn useful skills without a reward function by maximizing an information theoretic objective by using a maximum entropy policy.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Skills are defined as latent-conditioned policies that alter the state of the environment in a consistent way.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1802.06070&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://github.com/ben-eysenbach/sac&quot;&gt;Link to the code&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Unsupervised “exploration” stage followed by supervised stage.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;desirable-qualities-of-skills&quot;&gt;Desirable Qualities of Skills&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Skills should dictate the states that the agent visits. Different skills should visit different states to be distinguishable.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;States (not actions) should be used to distinguish between skills as not all actions change the state (for the outside observer).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Skills are encouraged to be diverse and “exploratory” by learning skills that act randomly (have high entropy).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;loss-formulation&quot;&gt;Loss Formulation&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;(S, A) - state and action&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;z ~ p(z) - latent variable to condition the policy.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Skill - policy conditioned on a fixed z.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Objective is to maximize the mutual information between skill and state (MI(A; Z)) ie skill should control which state is visited or the skill should be inferrable from the state visited.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Simultaneously minimize the mutual information between skills and actions given the state to ensure that the state (and not the action) is used to distinguish the skills.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Maximize the entropy of the mixture of policies (p(z) and all the skills).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;implementation&quot;&gt;Implementation&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Policy π(a | s, z)&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Task reward replaced by the pseduoreward logq&lt;sub&gt;φ&lt;/sub&gt;(z | s) - log(p(z)).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;During unsupervised training, z is sampled at the start of the episode and then not changed during the episode.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Learning agent gets rewards for visiting the states that are easy to discriminate while the discriminator updated to correctly predict z from the states visited.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;observations&quot;&gt;Observations&lt;/h2&gt;
+
+&lt;h3 id=&quot;analysis-of-learned-skills&quot;&gt;Analysis of Learned Skills&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The agent learns a diverse set of primitive behaviors for all tasks ranging from 2 DoF to 111 DoF.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;for inverted pendulum and mountain car, the skills become increasingly diverse throughout training.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Use of uniform prior, in place of a learned prior, for p(z) allows for discovery of more diverse skills.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The proposed approach can be used as a pretraining technique where the best-performing primitives (from unsupervised training) can be finetuned with the task-specific rewards.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The discovered skills can be used for hierarchical RL by learning a meta-policy(which chooses the skill to execute for k steps).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Modifying the discriminator in the proposed formulation can be used to bias DIAYN towards discovering a particular type of policies. This provides a mechanism for incorporating “supervision” in the learning setup.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The “discovered” primitives can also be used for imitation learning.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Modular meta-learning</title>
+   <link href="/site/2019/01/22/Modular-meta-learning.html"/>
+   <updated>2019-01-22T00:00:00-05:00</updated>
+   <id>/site/2019/01/22/Modular meta-learning</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes an approach for learning neural networks (modules) that can be combined in different ways to solve different tasks (combinatorial generalization).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The proposed model is called as BOUNCEGRAD.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1806.10166&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://github.com/FerranAlet/modular-metalearning&quot;&gt;Link to the code&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Focuses on supervised learning.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Task distribution &lt;em&gt;p(T)&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each task is a joint distribution &lt;em&gt;p&lt;sub&gt;T&lt;/sub&gt;(x, y)&lt;/em&gt; over &lt;em&gt;(x, y)&lt;/em&gt; data pairs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given data from &lt;em&gt;m&lt;/em&gt; meta-training tasks, and a meta-test task, find a hypothesis &lt;em&gt;h&lt;/em&gt; which performs well on the unseen data drawn from the meta-test task.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;structured-hypothesis&quot;&gt;Structured Hypothesis&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given a compositional scheme &lt;em&gt;C&lt;/em&gt;, a set of modules &lt;em&gt;F&lt;sub&gt;1&lt;/sub&gt;, …, F&lt;sub&gt;k&lt;/sub&gt;&lt;/em&gt; (represented as a whole by &lt;em&gt;F&lt;/em&gt;) and the set of their respective parameters θ&lt;sub&gt;1&lt;/sub&gt;, …, θ&lt;sub&gt;k&lt;/sub&gt; (represented as a whole by θ), &lt;em&gt;(C, F, θ)&lt;/em&gt; represents the set of possible functional input-output mappings. These mappings form the hypothesis space.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A structured hypothesis model is specified by what modules to use and their parametric forms (but not the values).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;examples-of-compositional-schemes&quot;&gt;Examples of compositional schemes&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Choosing a single module for the task at hand.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Fixed compositional structure but different modules selected every time.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Weight ensemble (maybe using attention mechanism)&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;General function composition tree&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;phases&quot;&gt;Phases&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Offline Meta Learning Phase:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Take training and validation dataset for the first &lt;em&gt;k&lt;/em&gt; tasks and generate a parameterization for each module &lt;em&gt;θ&lt;sub&gt;1&lt;/sub&gt;, …, θ&lt;sub&gt;k&lt;/sub&gt;&lt;/em&gt;.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The hypothesis (or composition) to use comes from the online meta-test learning phase.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;In this stage, find the best θ given a structure.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Online Meta-test Learning Phase&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Given a hypothesis space and θ, the output is a compositional form (or hypothesis) that specifies how to compose the models.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;In this stage, find the best structure, given a hypothesis space and θ.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;learning-algorithm&quot;&gt;Learning Algorithm&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;During Meta-test learning phase, simulated annealing is used to find the optimal structure, with temperature &lt;em&gt;T&lt;/em&gt; decreased over time.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;During meta-learning phrase, the actual objective function is replaced by a surrogate, smooth objective function (during the search step) to avoid local minima.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Once a structure has been picked, any gradient descent based approach can be used to optimize the modules.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Basically the state of optimization process comprises of the parameters and the temperature. Together, they are used to induce a distribution over the structures. Given a structure, θ is optimized and &lt;em&gt;T&lt;/em&gt; is annealed over time.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The learning procedure can be improved upon by performing parameter tuning during the online (meta-test learning) phase as well. the resulting approach is referred to as MOMA - MOdular MAml.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;h3 id=&quot;approaches&quot;&gt;Approaches&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Pooled - Single network using combined data of all the tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;MAML - Single network using MAML&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;BOUNCEGRAD - Modular Network without MAML adaptation in online learning.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;MOMA - BOUNCEGRAD with MAML adaptation in online learning.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;domains&quot;&gt;Domains&lt;/h3&gt;
+
+&lt;h4 id=&quot;simple-functional-relationships&quot;&gt;Simple Functional Relationships&lt;/h4&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Sine-function prediction problem&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In general, MOMA outperforms other models.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;With a small amount of online training data, BOUNCEGRAD outperforms other models as it has a better structural prior.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h4 id=&quot;predicting-next-frame-of-a-kinematic-skeleton-motion-capture-data&quot;&gt;Predicting next frame of a kinematic skeleton (motion capture data)&lt;/h4&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;11 different objects (with different shapes) on 4 surfaces with different friction properties.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;2 meta-learning scenarios are considered. In the first case, the object-surface combination in the test case was present in some meta-training tasks and in the other case, it was not present.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For previously seen combinations, MOMA performs the best followed by BOUNCEGRAD and MAML.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For unseen combinations, all the 3 are equally good.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Compositional scheme is the attention mechanism.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;An interesting result is that the modules seem to specialize (and activate more often) based on the shape of the object.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;predicting-next-frame-of-a-kinematic-selection-using-motion-capture-data&quot;&gt;Predicting next frame of a kinematic selection (using motion capture data)&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Composition Structure - generating kinematics subtrees for each body part (2 legs, 2 arms, 2 torsi).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Again 2 setups are used - one where all activities in the training and the meta-test task are shared while the other setup where the activities are not shared.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For known activities MOMA and BOUNCEGRAD perform the best while for unknown activities, MOMS performs the best.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;notes&quot;&gt;Notes&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;While the approach is interesting, maybe a more suitable set of tasks (from the point of composition) would be more convincing.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It would be useful to see the computational tradeoff between MAML, BOUNCEGRAD, and MOMA.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies</title>
+   <link href="/site/2019/01/15/Hierarchical-RL-Using-an-Ensemble-of-Proprioceptive-Periodic-Policies.html"/>
+   <updated>2019-01-15T00:00:00-05:00</updated>
+   <id>/site/2019/01/15/Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes a simple and robust approach for hierarchically training an agent in the sparse reward setup.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The broad idea is to train low-level primitives that are sufficiently diverse (so that they can be composed for solving higher level tasks) and to train a high level primitive that learns to combine these primitives for any given downstream task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://openreview.net/forum?id=SJz1x20cFQ&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The state can be divided into two components: proprioceptive states s&lt;sup&gt;p&lt;/sup&gt; (measurement of agent’s own body that can be directly controlled by the agent) and the external states s&lt;sup&gt;e&lt;/sup&gt;/&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;low-level-policy-training&quot;&gt;Low-Level Policy Training&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Low-level policies should be:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;Diverse: should cover all the skills that the agent might have to perform.&lt;/li&gt;
+      &lt;li&gt;Effective: can make significant changes to the environment.&lt;/li&gt;
+      &lt;li&gt;Controllable: easy for high-level policies to use and control&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For the low-level policy, the per-time step reward is directly proportional to change in the external state. The same reward is used for all the agents and environments(except regulated with environment specific controls and survival rewards).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;phase-conditioned-policies&quot;&gt;Phase conditioned policies&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Good movement policies are expected to be at least roughly periodic and phase input (or time index) is used to achieve periodicity.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Phase conditioned policy (=f(s&lt;sup&gt;p&lt;/sup&gt;, φ)) where φ = {0, 1, …, k-1} is the phase index.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;At each timestep &lt;em&gt;t&lt;/em&gt;, the model receives observation s&lt;sup&gt;p&lt;/sup&gt; and phase index φ = t%k. The phase index is represented by a vector b&lt;sub&gt;φ&lt;/sub&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For phase conditioned policies, the agent state and actions are encouraged to be cyclic with the help of a cyclic loss.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Environments: Ant and Humanoid from Mujoco.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Low-level control:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;Using phase-conditioning is helpful when training low-level primitives.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;High-level control:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Cross Maze Environment with fixed goals&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;3 goals along 3 paths&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Proposed method converges faster and to a smaller final distance to the goal showing that it is both efficient and consistent (with smaller variance across random seeds).&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Random Goal Maze&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;The goal is randomly drawn from a set of goals.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;“Cross” (shaped) maze and “skull” (shaped) mazes are considered.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Even with velocity rewards and pretraining on low-level objectives (which can be thought of as exploration bonuses), the baseline fails to get close to the goal locations while the proposed model reach the goal most of the times.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;The main results are reported using PPO though repeating the experiments with A2C and DQN show that the idea is fairly robust.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;The paper reported that in their experiments, finetuning the lower level primitives did not help much though it might not be the case of other environments.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Efficient Lifelong Learning with A-GEM</title>
+   <link href="/site/2019/01/08/Efficient-Lifelong-Learning-with-A-GEM.html"/>
+   <updated>2019-01-08T00:00:00-05:00</updated>
+   <id>/site/2019/01/08/Efficient Lifelong Learning with A-GEM</id>
+   <content type="html">&lt;h2 id=&quot;contributions&quot;&gt;Contributions&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;A new (and more realistic) evaluation protocol for lifelong learning where each data point is observed just once and a disjoint set of tasks are used for training and validation.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A new metric that focuses on the efficiency of the models - in terms of sample complexity and computational (and memory) costs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Modification of &lt;a href=&quot;https://arxiv.org/abs/1706.08840&quot;&gt;Gradient Episodic Memory ie GEM&lt;/a&gt; which reduces the computational overhead of GEM without compromising on the results.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Empirical validation that using task descriptors help lifelong learning models and improve their few-shot learning capabilities.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1812.00420&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://github.com/facebookresearch/agem/&quot;&gt;Link to the code&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;learning-protocol&quot;&gt;Learning Protocol&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Two group of datasets - one for training and evaluation (D&lt;sup&gt;EV&lt;/sup&gt;) and other for cross validation (D&lt;sup&gt;CV&lt;/sup&gt;).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Data can be sampled multiple times for cross-validation dataset but only once from the training dataset.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each group of dataset (say D&lt;sup&gt;EV&lt;/sup&gt; or D&lt;sup&gt;CV&lt;/sup&gt;) is a list of task-specific datasets D&lt;sub&gt;k&lt;/sub&gt; (k is the task index).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each sample in D&lt;sub&gt;k&lt;/sub&gt; is of the form (x, t, y) where x is the data, t is the task descriptor and y is the output.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;D&lt;sub&gt;k&lt;/sub&gt; contains B&lt;sup&gt;k&lt;/sup&gt; minibatches of data.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;metrics&quot;&gt;Metrics&lt;/h2&gt;
+
+&lt;h3 id=&quot;accuracy&quot;&gt;Accuracy&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;a&lt;sub&gt;k,i,j&lt;/sub&gt; = accuracy on test task j after training on ith minibatch of training task k.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A&lt;sub&gt;k&lt;/sub&gt; = mean over all j = 1 to k (a&lt;sub&gt;k, B&lt;sub&gt;k&lt;/sub&gt;, j&lt;/sub&gt;) ie train the model on data for task k and then test it on all the tasks.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;forgetting-measure&quot;&gt;Forgetting Measure&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;f&lt;sub&gt;j&lt;/sub&gt;&lt;sup&gt;k&lt;/sup&gt; = forgetting on task j after training on all minibatches upto task k.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;f&lt;sub&gt;j&lt;/sub&gt;&lt;sup&gt;k&lt;/sup&gt; = max over all l = 1 to k-1 (a&lt;sub&gt;l, B&lt;sub&gt;l&lt;/sub&gt;j&lt;/sub&gt; - a&lt;sub&gt;k, B&lt;sub&gt;k&lt;/sub&gt;j&lt;/sub&gt;)&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Forgetting = F&lt;sub&gt;k&lt;/sub&gt; = mean over all j = 1 to k-1 (f&lt;sub&gt;j&lt;/sub&gt;&lt;sup&gt;k&lt;/sup&gt;)&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;lca---learning-curve-area&quot;&gt;LCA - Learning Curve Area&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Z&lt;sub&gt;b&lt;/sub&gt; = average b shot performance where b is the minibatch number.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Z&lt;sub&gt;b&lt;/sub&gt; = mean over all k = 0 to T (a&lt;sub&gt;k, b, k&lt;/sub&gt;)&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;LCA&lt;sub&gt;β&lt;/sub&gt; = mean over all b = 0 to β (Z&lt;sub&gt;b&lt;/sub&gt;)&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One special case is LCA&lt;sub&gt;0&lt;/sub&gt; which is the forward transfer performance or performance on the unseen task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In experiments, β is kept small as we want the model to learn from few examples.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;model&quot;&gt;Model&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;GEM has been shown to be very effective in single epoch setting but introduces a very high computational overhead.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Average GEM (AGEM) reduces this overhead by sampling (and using) only some examples from the episodic memory instead of using all the examples.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;While GEM provides better guarantees in terms of worst-case forgetting, AGEM provides better guarantees in terms of average accuracy.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;joint-embedding-model-using-compositional-task-descriptors&quot;&gt;Joint Embedding Model Using Compositional Task Descriptors&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Compositional Task Descriptors are used to speed training on the subsequent tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A matrix specifying the attribute value of objects (to be recognized in the task) are used.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A joint-embedding space between image features and attribute embeddings is learned.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;h3 id=&quot;datasets&quot;&gt;Datasets&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1612.00796&quot;&gt;Permuted MNIST&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1703.04200&quot;&gt;Split CIFAR&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;http://www.vision.caltech.edu/visipedia/CUB-200-2011.html&quot;&gt;Split CUB&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;http://cvml.ist.ac.at/papers/lampert-cvpr2009.pdf&quot;&gt;Split AWA&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;setup&quot;&gt;Setup&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Integer task descriptors for MNIST and CIFAR and class attributes as descriptors for CUB and AWA&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Baselines include &lt;a href=&quot;https://arxiv.org/abs/1706.08840&quot;&gt;GEM&lt;/a&gt;, &lt;a href=&quot;https://arxiv.org/abs/1611.07725&quot;&gt;iCaRL&lt;/a&gt;, &lt;a href=&quot;https://arxiv.org/pdf/1612.00796.pdf&quot;&gt;Elastic Weight Consolidation&lt;/a&gt;, &lt;a href=&quot;https://arxiv.org/abs/1606.04671&quot;&gt;Progressive Neural Networks&lt;/a&gt; etc.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;AGEM outperforms other models on all the datasets expect MNIST where the Progressive Neural Networks lead. One reason could be that MNIST has a large number of training examples per task. But Progressive Neural Networks lead to bad utilization of capacity.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;While AGEM and GEM have similar performance, GEM has a much higher computational and memory overhead.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Use of task descriptors improves the accuracy for all the models.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It seems that AGEM offers a good tradeoff between average accuracy performance and efficiency - in terms of sample efficiency, memory requirements and computational costs.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Pre-training Graph Neural Networks with Kernels</title>
+   <link href="/site/2019/01/02/Pre-training-Graph-Neural-Networks-with-Kernels.html"/>
+   <updated>2019-01-02T00:00:00-05:00</updated>
+   <id>/site/2019/01/02/Pre-training Graph Neural Networks with Kernels</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes a pretraining technique that can be used with the &lt;a href=&quot;https://shagunsodhani.in/papers-I-read/Neural-Message-Passing-for-Quantum-Chemistry&quot;&gt;GNN&lt;/a&gt; architecture for learning graph representation as induced by powerful graph kernels.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1811.06930&quot;&gt;Paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;idea&quot;&gt;Idea&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Graph Kernel methods can learn powerful representations of the input graphs but the learned representation is implicit as the kernel function actually computes the dot product between the representations.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;GNNs are flexible and powerful in terms of the representations they can learn but they can easily overfit if a large amount of training data is not available as is commonly the case of graphs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Kernel methods can be used to learn an unsupervised graph representation that can be finetuned using the GNN architectures for the supervised tasks.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;architecture&quot;&gt;Architecture&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given a dataset of graphs &lt;em&gt;g&lt;sub&gt;1&lt;/sub&gt;, g&lt;sub&gt;2&lt;/sub&gt;, …, g&lt;sub&gt;n&lt;/sub&gt;&lt;/em&gt;, use a relevant kernel function to compute &lt;em&gt;k(g&lt;sub&gt;i&lt;/sub&gt;, g&lt;sub&gt;j&lt;/sub&gt;)&lt;/em&gt; for all pairs of graphs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A siamese network is used to encode the pair of graphs into representations &lt;em&gt;f(g&lt;sub&gt;i&lt;/sub&gt;)&lt;/em&gt; and &lt;em&gt;f(g&lt;sub&gt;j&lt;/sub&gt;)&lt;/em&gt; such that &lt;em&gt;dot(f(g&lt;sub&gt;i&lt;/sub&gt;), f(g&lt;sub&gt;j&lt;/sub&gt;))&lt;/em&gt; equals &lt;em&gt;k(g&lt;sub&gt;i&lt;/sub&gt;, g&lt;sub&gt;j&lt;/sub&gt;)&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The function &lt;em&gt;f&lt;/em&gt; is trained to learn the compressed representation of kernel’s feature space.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;h3 id=&quot;datasets&quot;&gt;Datasets&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Biological node-labeled graphs representing chemical compounds - MUTAG, PTC, NCI1&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;baselines&quot;&gt;Baselines&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;&lt;a href=&quot;https://www.cse.wustl.edu/~muhan/papers/AAAI_2018_DGCNN.pdf&quot;&gt;DGCNN&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Graphlet Kernel (GK)&lt;/li&gt;
+  &lt;li&gt;Random Walk Kernel&lt;/li&gt;
+  &lt;li&gt;Propogation Kernel&lt;/li&gt;
+  &lt;li&gt;Weisfeiler-Lehman subtree kernel (WL)&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;results&quot;&gt;Results&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Pretraining uses the WL kernel&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Pretrained model performs better than the baselines for 2 datasets but lags behind WL method (which was used for pretraining) for the NCI1 dataset.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;notes&quot;&gt;Notes&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The idea is straightforward and intuitive. In general, this kind of pretraining should help the downstream model. It would be interesting to try it on more datasets/kernels/GNNs so that more conclusive results can be obtained.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Smooth Loss Functions for Deep Top-k Classification</title>
+   <link href="/site/2018/12/25/Smooth-Loss-Functions-for-Deep-Top-k-Classification.html"/>
+   <updated>2018-12-25T00:00:00-05:00</updated>
+   <id>/site/2018/12/25/Smooth Loss Functions for Deep Top-k Classification</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;For top-k classification tasks, cross entropy is widely used as the learning objective even though it is the optimal metric only in the limit of infinite data.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper introduces a family of smoothed loss functions that are specially designed for top-k optimization.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1802.07595&quot;&gt;Paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://github.com/oval-group/smooth-topk&quot;&gt;Code&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;idea&quot;&gt;Idea&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Inspired by the multi-loss SVMs, a surrogate loss (l&lt;sub&gt;k&lt;/sub&gt;) is introduced that creates a margin between the ground truth and the kth largest score.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;img src=&quot;https://github.com/shagunsodhani/papers-I-read/raw/master/assets/topk/eq1.png&quot; alt=&quot;Equation 1&quot; /&gt;&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Here &lt;strong&gt;s&lt;/strong&gt; denotes the output of the classifier model to be learnt, &lt;em&gt;y&lt;/em&gt; is the ground truth label, &lt;em&gt;s[p]&lt;/em&gt; denotes the kth largest element of &lt;strong&gt;s&lt;/strong&gt; and &lt;strong&gt;s\p&lt;/strong&gt; denotes the vector &lt;strong&gt;s&lt;/strong&gt; without &lt;em&gt;p&lt;/em&gt;th element.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This l&lt;sub&gt;k&lt;/sub&gt; loss has two limitations:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;It is continous but not differentiable in &lt;em&gt;s&lt;/em&gt;.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Its weak derivatives have at most 2-nonzero elements.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The loss can be reformulated by adding and subtracting the k-1 largest scores of &lt;strong&gt;s\y&lt;/strong&gt; and &lt;em&gt;s&lt;sub&gt;y&lt;/sub&gt;&lt;/em&gt; and by introducing a temperature parameter τ.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;img src=&quot;https://github.com/shagunsodhani/papers-I-read/raw/master/assets/topk/eq2.png&quot; alt=&quot;Equation 2&quot; /&gt;&lt;/p&gt;
+
+&lt;h2 id=&quot;properties-of-lkτ&quot;&gt;Properties of L&lt;sub&gt;kτ&lt;/sub&gt;&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;For any τ &amp;gt; 0, L&lt;sub&gt;kτ&lt;/sub&gt; is infinite-differentiable and has non-sparse gradients.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Under mild conditions, L&lt;sub&gt;kτ&lt;/sub&gt; apporachs l&lt;sub&gt;k&lt;/sub&gt; (in a pointwise sense) as τ approaches to 0+&lt;sup&gt;+&lt;/sup&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It is an upper bound on the actual loss (up to a constant factor).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It is a generalization of the cross-entropy loss for different values of k, and τ and higher margins.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;computational-challenges&quot;&gt;Computational Challenges&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;nCk&lt;/em&gt; number of terms needs to be evaluated for computing the loss for one sample (n is number of classes).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Loss L&lt;sub&gt;kτ&lt;/sub&gt; can be expressed in terms of elementary symmetric polynomials σ&lt;sub&gt;i&lt;/sub&gt;(&lt;strong&gt;e&lt;/strong&gt;) (sum of all products of i distinct elements of vector e). Thus the challenge is to compute σ&lt;sub&gt;k&lt;/sub&gt; efficiently.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;forward-computation&quot;&gt;Forward Computation&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Compute σ&lt;sub&gt;k&lt;/sub&gt;(&lt;strong&gt;e&lt;/strong&gt;) where &lt;strong&gt;e&lt;/strong&gt; is a n-dimensional vector and k« n and e[i]!=0 for all i.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;σ&lt;sub&gt;i&lt;/sub&gt;(&lt;em&gt;e&lt;/em&gt;) can be computed using the coefficients of the polynomial (X+e&lt;sub&gt;1&lt;/sub&gt;)(X+e&lt;sub&gt;2&lt;/sub&gt;)…(X+e&lt;sub&gt;n&lt;/sub&gt;) by divide and conquer approach with polynomial multiplication.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;With some more optimizations (eg log(n) levels of recursion and each level being parallelized on a GPU), the resulting algorithms scale well with n on a GPU.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Operations are performed in the log-space using the log-sum-exp trick to achieve numerical stability in single floating point precision.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;backward-computation&quot;&gt;Backward computation&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The backward pass uses optimizations like computing derivative of σ&lt;sub&gt;j&lt;/sub&gt; with respect to e&lt;sub&gt;i&lt;/sub&gt; in a recursive manner.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Appendix of the paper describes these techniques in detail.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Experiments are performed on CIFAR-100 (with noise) and Imagenet.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For CIFAR-100 with noise, the labels are randomized with probability p (within the same top-level class).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The proposed loss function is very robust to both noise and reduction in the amount of training dataset as compared to cross-entropy loss function for both top-k and top-1 performance.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Hindsight Experience Replay</title>
+   <link href="/site/2018/12/18/Hindsight-Experience-Replay.html"/>
+   <updated>2018-12-18T00:00:00-05:00</updated>
+   <id>/site/2018/12/18/Hindsight Experience Replay</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Hindsight Experience Replay(HER) is a sample efficient technique to learn from sparse rewards.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1707.01495&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;idea&quot;&gt;Idea&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Assume a footballer misses the goal narrowly. Even though the player does not get any “reward”(in terms of goal), the player realizes that had the goal post been shifted a bit, it would have resulted in a goal(reward).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The same intuition is applied for the RL agent - let us say that the true goal state was &lt;em&gt;g&lt;/em&gt; while the agent ends up in the state &lt;em&gt;s&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;While the action sequence is not useful for reaching the goal state &lt;em&gt;g&lt;/em&gt;, it is indeed useful for reaching state &lt;em&gt;s&lt;/em&gt;. Hence the trajectory could be replayed with the goal as &lt;em&gt;s&lt;/em&gt;(and not &lt;em&gt;g&lt;/em&gt;).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;technical-details&quot;&gt;Technical Details&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Multi-goal policy trained using Universal Value Function Approximation (UVFA).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Every episode starts by sampling a start state and a goal state. Each goal has a different reward function.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Policy uses both the current state and the current goal state and leads to a state transition sequence &lt;em&gt;s&lt;sub&gt;1&lt;/sub&gt;, s&lt;sub&gt;2&lt;/sub&gt;,…, s&lt;sub&gt;n&lt;/sub&gt;&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each of these transitions &lt;em&gt;s&lt;sub&gt;i&lt;/sub&gt; -&amp;gt; s&lt;sub&gt;i+1&lt;/sub&gt;&lt;/em&gt; are stored in a buffer with both the original goal and a subset of the other goals.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For the goal selection, following strategies are tried:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;em&gt;Future&lt;/em&gt; - goal state is the state &lt;em&gt;k&lt;/em&gt; steps after observing the state transition.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;em&gt;Final&lt;/em&gt; - goal state is the final state of the current episode.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;em&gt;Episode&lt;/em&gt; - &lt;em&gt;k&lt;/em&gt; random states are selected from the current episode.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;em&gt;Randon&lt;/em&gt; - &lt;em&gt;k&lt;/em&gt; states are selected randomly.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Any off-policy algorithm can be used. Specifically, DDPG is used.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Robotic arm simulated using MuJoCo for &lt;em&gt;push&lt;/em&gt;, &lt;em&gt;slide&lt;/em&gt; and &lt;em&gt;pick and place&lt;/em&gt; tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;DDPG with and without HER evaluated on the 3 tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;DDPG with the HER variant significantly outperforms the baseline in all the cases.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Representation Tradeoffs for Hyperbolic Embeddings</title>
+   <link href="/site/2018/12/11/Representation-Tradeoffs-for-Hyperbolic-Embeddings.html"/>
+   <updated>2018-12-11T00:00:00-05:00</updated>
+   <id>/site/2018/12/11/Representation Tradeoffs for Hyperbolic Embeddings</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper describes a combinatorial approach to embed trees into hyperbolic spaces without performing optimization.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The resulting mechanism is analyzed to obtain dimensionality-precision tradeoffs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;To embed any metric spaces in the hyperbolic spaces, a hyperbolic generalization of the multidimensional scaling (h-MDS) is proposed.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1804.03329&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;preliminaries&quot;&gt;Preliminaries&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Hyperbolic Spaces&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Have the “tree” like property ie the shortest path between a pair of points is almost the same as the path through the origin.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Generally, Poincare ball model is used given its advantages like conformity to the Euclidean spaces.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Fidelity Measures&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Mean Average Precision - MAP&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;A local metric that ranks between distances of the immediate neighbors.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Distortion&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;A global metric that depends on the underlying distances and not just the local relationship between distances.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;combinatorial-construction-for-embedding-hierarchies-into-hyperbolic-spaces&quot;&gt;Combinatorial Construction for embedding hierarchies into Hyperbolic spaces&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Embed the given graph &lt;em&gt;G = (V, E)&lt;/em&gt; into a tree &lt;em&gt;T&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Embed the tree &lt;em&gt;T&lt;/em&gt; into the poincare ball &lt;em&gt;H&lt;sub&gt;d&lt;/sub&gt;&lt;/em&gt; of dimensionality &lt;em&gt;d&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;sarkars-construction-to-embed-points-in-a-2-d-poincare-ball&quot;&gt;Sarkar’s construction to embed points in a 2-d Poincare ball&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Consider two points &lt;em&gt;a&lt;/em&gt; and &lt;em&gt;b&lt;/em&gt; (from the tree) where &lt;em&gt;b&lt;/em&gt; is the parent of &lt;em&gt;a&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Assume that &lt;em&gt;a&lt;/em&gt; is embedded as &lt;em&gt;f(a)&lt;/em&gt; and &lt;em&gt;b&lt;/em&gt; is embedded as &lt;em&gt;f(b)&lt;/em&gt; and the children of &lt;em&gt;a&lt;/em&gt; needs to be embedded.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Reflect &lt;em&gt;f(a)&lt;/em&gt; and &lt;em&gt;f(b)&lt;/em&gt; across a geodesic such that &lt;em&gt;f(a)&lt;/em&gt; is mapped to 0 (origin) while &lt;em&gt;f(b)&lt;/em&gt; is mapped to some new point &lt;em&gt;z&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Children of &lt;em&gt;a&lt;/em&gt; are placed at points &lt;em&gt;y&lt;sub&gt;i&lt;/sub&gt;&lt;/em&gt; which are equally placed around a circle of radius &lt;em&gt;(e&lt;sup&gt;r&lt;/sup&gt; - 1) / (e&lt;sup&gt;r&lt;/sup&gt; + 1)&lt;/em&gt; and maximally seperated from &lt;em&gt;z&lt;/em&gt;, where &lt;em&gt;r&lt;/em&gt; is the scaling factor.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Then all the points are reflected back across the geodesic so that all children are at a distance &lt;em&gt;r&lt;/em&gt; from &lt;em&gt;f(a)&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;To embed the tree itself, place the root node at the origin, place its children around it in a circle, then place their children and so on.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In this construct, precision scales logarithmically with the degree of the tree but linearly with the maximum path length.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;d-dimensional-hyperbolic-spaces&quot;&gt;&lt;em&gt;d&lt;/em&gt;-dimensional hyperbolic spaces&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the &lt;em&gt;d&lt;/em&gt;-dimensional space, the points are embedded into hyperspheres (instead of circles).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The number of children node that can be placed for a particular angle grows with the dimension.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Increasing dimension helps with bushy trees (with high node degree).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;hyperbolic-multidimensional-scaling-h-mds&quot;&gt;Hyperbolic multidimensional scaling (h-MDS)&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given the pairwise distance from a set of points in the hyperbolic space, how to recover the points?&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The corresponding problem in the Euclidean space is solved using MDS.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A variant of MDS called as h-MDS is proposed.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;MDS makes a centering assumption that points have 0 mean. In h-MDS, a new mean (called as the pseudo-Euclidean mean) is introduced to enable recovery via matrix factorization.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Instead of the Poincare model, the hyperboloid model is used (though the points can be mapped back and forth).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;pseudo-euclidean-mean&quot;&gt;pseudo-Euclidean Mean&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;A set of points can always be centered without affecting their pairwise distance by simply finding their mean and sending it to 0 via isometry&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;recovery-via-matrix-factorization&quot;&gt;Recovery via matrix factorization&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given the pairwise distances, a new matrix &lt;em&gt;Y&lt;/em&gt; is constructed by applying &lt;em&gt;cosh&lt;/em&gt; on the pairwise distances.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Running PCA on &lt;em&gt;-Y&lt;/em&gt; recovers X up to rotation.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;dimensionality-reduction-with-pga-principal-geodesic-analysis&quot;&gt;Dimensionality Reduction with PGA (Principal Geodesic Analysis)&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;PGA is the counterpart of PCA in the hyperbolic spaces.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;First the &lt;em&gt;Karcher&lt;/em&gt; mean of the given points is computed.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;All points &lt;em&gt;x&lt;sub&gt;i&lt;/sub&gt;&lt;/em&gt; are reflected so that their mean is 0 in the Poincare disk model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Combining that with Euclidean reflection formula and hyperbolic metrics leads to a non-convex loss function which can be optimized using gradient descent algorithm.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Datasets&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;Trees: fully balanced and phylogenic trees expressing genetic heritage.&lt;/li&gt;
+      &lt;li&gt;Tree-like hierarchy: WordNet hypernym and graph of Ph.D. advisor-advisee relationships.&lt;/li&gt;
+      &lt;li&gt;No-tree like disease relationships, proteins interactions etc&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Results&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;Combinatorial construction outperforms approaches based on optimization in terms of both MAP and distortion.&lt;/li&gt;
+      &lt;li&gt;eg on WordNet, the combinatorial approach achieves a MAP of 0.989 with just 2 dimensions while the previous best was 0.87 with 200 dimensions.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Learned Optimizers that Scale and Generalize</title>
+   <link href="/site/2018/11/01/Learned-Optimizers-that-Scale-and-Generalize.html"/>
+   <updated>2018-11-01T00:00:00-04:00</updated>
+   <id>/site/2018/11/01/Learned Optimizers that Scale and Generalize</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper introduces a learned gradient descent optimizer that has low memory and computational overhead and that generalizes well to new tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1703.04813&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;key-advantage&quot;&gt;Key Advantage&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Uses a hierarchial RNN architecture augmented by features like adapted input an output scaling, momentum etc.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A meta-learning set of small diverse optimization tasks, with diverse loss landscapes is developed. The learnt optimizer generalizes to much more complex tasks and setups.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;architecture&quot;&gt;Architecture&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;A hierarchical RNN is designed to act as a learned optimizer. This RNN is the meta-learner and its parameters are shared across different tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The learned optimizer takes as input the gradient (and related metadata) for each parameter and outputs the update to the parameters.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;At the lowest level of hierarchical, a small “parameter RNN” ingests the gradient (and related metadata).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One level up, an intermediate “Tensor RNN” incorporates information from a subset of Parameter RNNS (eg one Tensor RNN per layer of feedforward network).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;At the highest level is the glocal RNN which receives input from all the Tensor RNNs and can keep track of weight updates across the task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;the input of each RNN is averaged and fed as input to the subsequent RNN and the output of each RNN is fed as bias to the previous RNN.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In practice, the hidden states are fixed at 10, 30 and 20 respectively.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;features-inspired-from-existing-optimizers&quot;&gt;Features inspired from existing optimizers&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Attention and Nesterov’s momentum&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Attention mechanism is incorporated by attending to new regions of the loss surface (which are an offset from previous parameter location).&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;To incorporate momentum on multiple timescales, the exponential moving average of the gradient at several timescales is also provided as input.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The average gradients are rescaled (as in RMSProp and Adam)&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Relative log gradient magnitudes are also provided as input so that the optimizer can access how the gradient magnitude changes with time.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>One-shot Learning with Memory-Augmented Neural Networks</title>
+   <link href="/site/2018/10/25/One-shot-Learning-with-Memory-Augmented-Neural-Networks.html"/>
+   <updated>2018-10-25T00:00:00-04:00</updated>
+   <id>/site/2018/10/25/One-shot Learning with Memory-Augmented Neural Networks</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper demonstrates that Memory Augmented Neural Networks (MANN) are suitable for one-shot learning by introducing a new method for accessing an external memory.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This method focuses on memory content while earlier methods additionally used memory location based focusing mechanisms.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Here, MANN refers to neural networks that have an external memory. This includes Neural Turning Machines (NTMs) and excludes LSTMs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1605.06065&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;meta-learning&quot;&gt;Meta-Learning&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;In meta-learning, a learner is learning at two levels.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The learner is shown a sequence of tasks D&lt;sub&gt;1&lt;/sub&gt;, D&lt;sub&gt;2&lt;/sub&gt;, …, D&lt;sub&gt;T&lt;/sub&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;When it is training on one of the datasets (say D&lt;sub&gt;T&lt;/sub&gt;), it learns to solve the current dataset.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;At the same time, the learner tries to incorporate knowledge about how task structure changes across different datasets (second level of learning).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;mann--meta-learning&quot;&gt;MANN + Meta Learning&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Following are the desirable characteristics for a scalable, combined architecture:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Memory representation should be both stable and element-wise accessible.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Number of model parameters should not be tied to the size of the memory.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;task-setup&quot;&gt;Task Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;In standard learning, the goal is to reduce error on some dataset D. In meta-learning, the goal is to reduce the error across a distribution of datasets p(D).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each dataset is presented to the model in the form (x&lt;sub&gt;1&lt;/sub&gt;, null), (x&lt;sub&gt;1&lt;/sub&gt;, y&lt;sub&gt;0&lt;/sub&gt;), …, (x&lt;sub&gt;t+1&lt;/sub&gt;, y&lt;sub&gt;t&lt;/sub&gt;) where y&lt;sub&gt;t&lt;/sub&gt; is the correct label (or value) corresponding to the inpuit x&lt;sub&gt;t&lt;/sub&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Further, the data labels are shuffled from dataset to dataset.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model must learn to hold the data samples in memory till the appropriate candidate labels are presented in the next step.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The idea is that a model that meta learns would learn to map data representation to correct labels regardless of the actual context of data representation or the label.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper uses NTM as the MANN with one modification.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the original formulation, the memories were addressed by both context and location. Location-based addressing is not optimal for the current setup where information encoding is not independent of the sequence.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A new access module - LRUA - Least Recent Used Access - is used to write to memory.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;LRUA is purely content-based and writes to either least used memory location (to preserve recent information) or most recently used memory location (to overwrite recent information with more relevant information). This is decided on the basis of interpolation between previous read weights and weights scaled according to the usage weight.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;datasets&quot;&gt;Datasets&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Omniglot (classification)&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Sampled functions from Gaussian Processes&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;For the omniglot dataset, the model was trained with various combinations of randomly chosen classes with randomly chosen labels.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;As baselines, following models were considered:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;Regular NTM&lt;/li&gt;
+      &lt;li&gt;LSTM&lt;/li&gt;
+      &lt;li&gt;Feedforward RNN&lt;/li&gt;
+      &lt;li&gt;Nearest Neighbour Classifier&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Since each episode (dataset created by the combination of classes) contains unique classes (with their own unique labels) it is important to clear the memory across different episodes.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For the regression task, the data was generated from a GP prior with a fixed set of hyper-parameters which resulted in different functions.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For both the tasks, the MANN architecture outperforms the LSTM architecture baseline NTMs.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+</content>
+ </entry>
+ 
+ <entry>
+   <title>BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop</title>
+   <link href="/site/2018/10/18/BabyAI-First-Steps-Towards-Grounded-Language-Learning-With-a-Human-In-the-Loop.html"/>
+   <updated>2018-10-18T00:00:00-04:00</updated>
+   <id>/site/2018/10/18/BabyAI-First Steps Towards Grounded Language Learning With a Human In the Loop</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;BabyAI is a research platform to investigate and support the feasibility of including humans in the loop for grounded language learning.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The setup is a series of levels (of increasing difficulty) to train the agent to acquire a synthetic language (Baby Language) which is a proper subset of English language.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1810.08272&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;motivation&quot;&gt;Motivation&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;BabyAI platform provides support for curriculum learning and interactive learning as part of its human-in-the-loop training setup.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Curriculum learning is incorporated by having a curriculum of levels of increasing difficulty.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Interactive learning is supported by including a heuristic expert which can provide new demonstrations on the fly to the learning agent.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The heuristic expert can be thought of as the human-in-the-loop which can guide the agent through the learning process.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One downside of human-in-the-loop is the poor sample complexity of the learning agent. The heuristic agent can be used to estimate the sample  efficiency.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;contribution&quot;&gt;Contribution&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;BabyAI research platform for grounded language learning with a simulated human-in-the-loop.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Baseline results for performance and sample efficiency for the different tasks.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;babyai-platform&quot;&gt;BabyAI Platform&lt;/h2&gt;
+
+&lt;h3 id=&quot;environment&quot;&gt;Environment&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;MiniGrid - A partially observable 2D grid-world environment.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Entities - Agent, ball, box, door, keys&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Actions - pick, drop or move objects, unlock doors etc.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;baby-language&quot;&gt;Baby Language&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Synthetic Language (a proper subset of English) - Used to give instructions to the agent&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Support for verifying if the task (and the subtasks) are completed or not&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;levels&quot;&gt;Levels&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;A level is an instruction-following task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Formally, a level is a distribution of missions - a combination of initial state of the environment and an instruction (in Baby Language)&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Motivated by curriculum learning, the authors create a series of tasks (with increasing difficulty).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A subset of skills (competencies) is required for solving each task. The platform takes into account this constraint when creating a level.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;heuristic-expert&quot;&gt;Heuristic Expert&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The platform supports a Heuristic expert that simulates the role of a human teacher and knows how to solve each task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For any level, it can suggest actions or generate demonstrations (given the state of the environment).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiment&quot;&gt;Experiment&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;An imitation learning baseline is trained for each level.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Data requirement for each level and the benefits of curriculum learning and imitation learning are investigated (in terms of sample efficiency).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;model-architecture&quot;&gt;Model Architecture&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;GRU to encode the sentence, CNN to encode the input observation&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;FiLM layer to combine the two representations&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;LSTM to encode the per-timestep FiLM encoding (timesteps in the environment)&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Two model variants are considered:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Large Model - Bidirectional GRU + attention + large hidden state&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Small Model - Unidirectional GRU + No attention + small hidden state&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Heuristic expert used to generate trajectory and the models are trained by imitation learning (to be used as baselines)&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The key takeaway is that the current deep learning approaches are extremely sample inefficient when learning a compositional language.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Data efficiency of RL methods is much worse than that of imitation learning methods showing that the current imitation learning and reinforcement learning methods scale and generalize poorly.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Curriculum-based pretraining and interactive learning was found to be useful in only some cases.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Poincaré Embeddings for Learning Hierarchical Representations</title>
+   <link href="/site/2018/10/11/Poincare-Embeddings-for-Learning-Hierarchical-Representations.html"/>
+   <updated>2018-10-11T00:00:00-04:00</updated>
+   <id>/site/2018/10/11/Poincare Embeddings for Learning Hierarchical Representations</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Much of the work in representation leaning uses Euclidean vector spaces to embed datapoints (like words, nodes, entities etc).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This approach is not effective when data has a (latent) hierarchical structure.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes to compute the embeddings in the hyperbolic space so as to preserve both the similarity and structure information.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/pdf/1705.08039.pdf&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;hyperbolic-geometry&quot;&gt;Hyperbolic Geometry&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Hyperbolic spaces are spaces with a constant negative curvature while Euclidean spaces have zero curvature.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The hyperbolic disc area and circle length increase exponentially with the radius r while in Euclidean space, it increases quadratically and linearly respectively.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This makes the hyperbolic space more suitable for embedding tree-like structures where the number of nodes increases as we move away from the root.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Hyperbolic spaces can be thought of as the continuous version of trees and trees can be thought of as the discrete version of hyperbolic spaces.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;poincare-embeddings&quot;&gt;Poincare Embeddings&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Poincare model is one of the several possible models of the hyperbolic space and is considered here as it is more amenable to gradient-based optimisation.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Distance between 2 pints change smoothly and is symmetric. Thus the hierarchical organisation only depends on the distance from the origin which makes the model applicable in settings where the hierarchical structure needs to be inferred from the data.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Eventually the norm of a point represents its hierarchy and distance between the points represents similarity.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;optimization&quot;&gt;Optimization&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;RSGD (Riemannian SGD) method is used.&lt;/li&gt;
+  &lt;li&gt;Riemannian gradients can be computed from the Euclidean gradients by rescaling with the inverse of the Poincare ball metric tensor.&lt;/li&gt;
+  &lt;li&gt;The embeddings are constrained to be within the Poincare ball by projection operation which normalizes the magnitude of embeddings to be 1.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;training-details&quot;&gt;Training Details&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Initializing the embeddings close to 0 (by sampling uniformly from (-0.001, 0.001)) helps.&lt;/li&gt;
+  &lt;li&gt;The model is trained for an initial burn-out period of 10 epochs with 0.1 times the learning rate so as to find a better initial angular layout.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;evaluation&quot;&gt;Evaluation&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Embedding taxonomy for wordnet task&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Setup&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;Reconstruction&lt;/li&gt;
+          &lt;li&gt;Link Prediction&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The input data is a collection of a pair of words (u, v) which are related to each other.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;For each word pair, 10 negative samples of the form (u, v’) are sampled and the training procedure uses a soft ranking loss that aims to bring the related objects closer together.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Network Embedding&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Baselines&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;Euclidean Embeddings&lt;/li&gt;
+          &lt;li&gt;Translational Embedding where a relation vector corresponding to the edge type is also learnt.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Datasets&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;ASTROPH&lt;/li&gt;
+          &lt;li&gt;CONDMAT&lt;/li&gt;
+          &lt;li&gt;GRQC&lt;/li&gt;
+          &lt;li&gt;HEPPH&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Lexical Entailment&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;* Hyperlex - Gold standard to evaluate how well the semantics models capture lexical entailment on a scale of [0, 10].
+
+* The key takeaway is that for all the datasets/setups, hyperbolic embeddings give a performance benefit when the embedding dimension is small.
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
+
+&lt;h2 id=&quot;challenges&quot;&gt;Challenges&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Hyperbolic embeddings are not suitable for all the datasets. Eg if the dataset is not tree-like or has cycles.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Hyperbolic embeddings are difficult to optimize as each operation needs to be modified to be usable in the hyperbolic space.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>When Recurrent Models Don’t Need To Be Recurrent</title>
+   <link href="/site/2018/10/04/When-Recurrent-Models-Don-t-Need-To-Be-Recurrent.html"/>
+   <updated>2018-10-04T00:00:00-04:00</updated>
+   <id>/site/2018/10/04/When Recurrent Models Don’t Need To Be Recurrent</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper explores “if a well behaved RNN can be replaced by a feed-forward network of comparable size without loss in performance.”&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;“Well behaved” is defined in terms of control-theoretic notion of stability. This roughly requires that the gradients do not explode over time.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper shows that under the stability assumption, feedforward networks can approximate RNNs for both training and inference. The results are empirically validated as well.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1805.10369&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;problem-setting&quot;&gt;Problem Setting&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Consider a general, non linear dynamical system given by a differential state transition map Φ&lt;sub&gt;w&lt;/sub&gt;. The hidden h&lt;sub&gt;t&lt;/sub&gt; = Φ&lt;sub&gt;w&lt;/sub&gt;(h&lt;sub&gt;t-1&lt;/sub&gt;, x&lt;sub&gt;t&lt;/sub&gt;).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Assumptions:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;Φ is smooth in w and h.&lt;/li&gt;
+      &lt;li&gt;h&lt;sub&gt;0&lt;/sub&gt; = 0&lt;/li&gt;
+      &lt;li&gt;Φ&lt;sub&gt;w&lt;/sub&gt;(0, 0) = 0 (can be ensured by translation)&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Stable models are the ones where Φ is contractive ie Φ&lt;sub&gt;w&lt;/sub&gt;(h, x) - Φ&lt;sub&gt;w&lt;/sub&gt;(h’, x) is less than Λ * (h - h’)&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For example, in RNN, stability would require that norm(w) is less than (L&lt;sub&gt;p&lt;/sub&gt;)&lt;sup&gt;-1&lt;/sup&gt; where L&lt;sub&gt;p&lt;/sub&gt; is the Lipschitz constant of the point-wise non linearity used.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The feedforward approximation uses a finite context (of length k) and is a truncated model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A non-parametric function f maps the output of the recurrent model to prediction. If f is desired to be a parametric model, its parameters can be pushed to the recurrent model.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;theoretical-results&quot;&gt;Theoretical Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;For a Λ-contractive system, it can be proved that for a large k (and additional Lipschitz assumptions) the difference in prediction between the recurrent and truncated mode is negligible.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;If the recurrent model and truncated feed-forward network are initialized at the same point and trained over the same input for N-step, then for an optimal k, the weights of the two models would be very close in the Euclidean space. It can be shown that this small difference does not lead to large gradient differences during subsequent update steps.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This can be roughly interpreted as - if the gradient descent can train a stable recurrent network, it can also train a feedforward model and vice-versa.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The stability condition is important as, without that, truncated models would be bad (even for large values of k). Further, it is difficult to show that gradient descent converges to a stationary point.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>HoME - a Household Multimodal Environment</title>
+   <link href="/site/2018/09/27/HoME-a-Household-Multimodal-Environment.html"/>
+   <updated>2018-09-27T00:00:00-04:00</updated>
+   <id>/site/2018/09/27/HoME - a Household Multimodal Environment</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Environment for learning using modalities like vision, audio, semantics, physics and interaction with objects and other agents.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1711.11017&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;motivation&quot;&gt;Motivation&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Humans learn by interacting with their surroundings (environment).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Similarly training an agent in an interactive multi-model environment (virtual embodiment) could be useful for a learning agent.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;characteristics&quot;&gt;Characteristics&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Open-source and Open-AI gym compatible&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Built on top of 45000 3D house layouts from SUNCG dataset.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Provides both 3D visual and audio recording.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Semantic image segmentation and langauge description of objects.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;components&quot;&gt;Components&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Rendering Engine&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Implemented using Panda 3D game engine.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Renders RGB+depth scenes based on textures, multi-source lightings and shadows.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Acoustic Engine&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Implemented using EVERT&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Supports multiple microphones, sound sources, sound absorption based on material, atmospheric conditions etc.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Semantics Engine&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;Provides a short textual description for each object, along with information like color, category, material size, location etc.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Physics Engine&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Implemented using Bullet3 Engine&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Supports physical interaction, external forces like gravity and position and velocity information for multiple agents.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;potential-applications&quot;&gt;Potential Applications&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Visual Question Answering&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Conversational Agents&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Training an agent to follow instructions&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Multi-agent communication&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Emergence of Grounded Compositional Language in Multi-Agent Populations</title>
+   <link href="/site/2018/09/12/Emergence-of-Grounded-Compositional-Language-in-Multi-Agent-Populations.html"/>
+   <updated>2018-09-12T00:00:00-04:00</updated>
+   <id>/site/2018/09/12/Emergence of Grounded Compositional Language in Multi-Agent Populations</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper provides a multi-agent learning environment and proposes a learning approach that facilitates the emergence of a basic compositional language.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The language is quite rudimentary and is essentially a sequence of abstract discrete symbols. But it does comprise of a defined vocabulary and syntax.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1703.04908&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Cooperative, partially observable Markov game (multi-agent extension of MDP).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;All agents have identical action and observation spaces, use the same policy and receive a shared reward.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;grounded-communication-environment&quot;&gt;Grounded Communication Environment&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Physically simulated 2-D environment in continuous space and discrete time with N agents and M landmarks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The agents and the landmarks would occupy some location and would have some attributes (colour, shape).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Within the environment, the agents can &lt;em&gt;go to&lt;/em&gt; a location, &lt;em&gt;look&lt;/em&gt; at a location or &lt;em&gt;do nothing&lt;/em&gt;. Additionally, they can utter communication symbols c (from a shared vocabulary C). Agents themselves learn to assign a meaning to the symbols.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each agent has an internal goal (which could require interaction with other agents to complete) which the other agents cannot see.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Goal for agent &lt;em&gt;i&lt;/em&gt; consists of an action to perform, a landmark location where to perform the action and another agent who should be performing the action.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Since the agent is continuously emitting symbols, a memory module is provided and simple additive memory updates are done.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For interaction, the agents could use verbal utterances, non-verbal signals (gaze) or non-communicative strategies (pushing other agents).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;A model of all agent and environment state dynamics is created over time and the return gradient is computed.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Gumbel-Softmax distribution is used to obtain categorical word emission c.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A multi-layer perceptron is used to model the policy which returns action, communication symbol and the memory update for each agent.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Since the number of agents (and hence the number of communication streams etc) can vary across instantiations, an identical model is instantiated per agent and per communication stream.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The output of individual processing modules are pooled into feature vectors corresponding to communication and physical observations. These pooled features and the goal vectors are fed to the final processing module from which actions and categorical symbols are sampled.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In practice, using an additional task (each agent predicts the goal for another agent) encouraged more meaningful communication utterances.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;compositionality-and-vocabulary-size&quot;&gt;Compositionality and Vocabulary Size&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Authors recommend using a large vocabulary with a soft penalty that discourages use of too many words. This leads to use of a large vocabulary in the intermediate state which converges to a small vocabulary.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Along the lines of rich gets richer dynamics, the communication symbol c’s are modelled as being generated by a Dirichlet process. The resulting reward across all agents is the log-likelihood of all communication utterances to have been generated by a Dirichlet process.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Since the agents can only communicate in discrete symbols and do not have a global positioning reference, they need to unambiguously communicate landmark references to other agents.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;case-i---agents-can-not-see-each-other&quot;&gt;Case I - Agents can not see each other&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Non-verbal communication is not possible.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;When trained with just 2 agents, symbols are assigned for each landmark and action.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;As the number of agents is increased, additional symbols are used to refer to agents.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;If the agents of the same colour are asked to perform conflicting tasks, they perform the average of conflicting tasks. If distractor locations are added, the agents learn to ignore them.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;non-verbal-communication&quot;&gt;Non-verbal communication&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Agents are allowed to observe other agents’ position, gaze etc.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Now the location can be pointed to using gaze.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;If gaze is disabled, the agent could indicate the goal landmark by moving to it.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Basically even when the communication is disabled the agents can come up with strategies to complete the task.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>A Semantic Loss Function for Deep Learning with Symbolic Knowledge</title>
+   <link href="/site/2018/08/21/A-Semantic-Loss-Function-for-Deep-Learning-with-Symbolic-Knowledge.html"/>
+   <updated>2018-08-21T00:00:00-04:00</updated>
+   <id>/site/2018/08/21/A Semantic Loss Function for Deep Learning with Symbolic Knowledge</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes an approach for using symbolic knowledge in deep learning systems. These constraints are often expressed as boolean constraints on the output of the deep learning system and directly incorporating these constraints break the differentiability of the system.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1711.11157&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;problem-setting&quot;&gt;Problem Setting&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model is given some input data to perform predictions and symbolic knowledge is provided in form of boolean constraints like exactly-one constraint for one-hot output encoding.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Most approaches tend to encode the symbolic knowledge in the vector space embedding to keep the model pipeline differentiable. In this process, the precise meaning of symbolic knowledge is often lost.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A differentiable “semantic loss” is derived which captures the meaning of the constraint while being independent of its syntax.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;terminology&quot;&gt;Terminology&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;A state &lt;strong&gt;x&lt;/strong&gt; (state refers to the instantiation of boolean variables) satisfies a sentence &lt;em&gt;a&lt;/em&gt; if &lt;em&gt;a&lt;/em&gt; evaluates to true when using the variables as specified by &lt;strong&gt;x&lt;/strong&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A sentence &lt;em&gt;a&lt;/em&gt; entails another sentence &lt;em&gt;b&lt;/em&gt; if all states that satisfy &lt;em&gt;a&lt;/em&gt; also satisfy &lt;em&gt;b&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The row output vector of the neural network is denoted as &lt;em&gt;p&lt;/em&gt; where each value in &lt;em&gt;p&lt;/em&gt; denotes the probability of an output.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Three different output constraints are studied:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;em&gt;Exactly-one constraint&lt;/em&gt;&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;Exactly one value in &lt;em&gt;p&lt;/em&gt; should be true.&lt;/li&gt;
+          &lt;li&gt;Can be expressed in boolean logic as follows: Let (x1, x2, …, xn) be variables in &lt;em&gt;p&lt;/em&gt;. Then (not xi or not xj) for all pair of variables and (x1 or x2 or … xn).&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;&lt;em&gt;Valid Simple Path Constraint&lt;/em&gt;
+        &lt;ul&gt;
+          &lt;li&gt;Set of edges must form a valid path.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;&lt;em&gt;Ordering Constraint&lt;/em&gt;
+        &lt;ul&gt;
+          &lt;li&gt;Defining an ordering over the variables.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;semantic-loss&quot;&gt;Semantic Loss&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The semantic loss &lt;em&gt;L&lt;sup&gt;s&lt;/sup&gt;(a, p)&lt;/em&gt; is a function of a propositional logic sentence &lt;em&gt;a&lt;/em&gt; (the symbolic knowldge constraint) and &lt;em&gt;p&lt;/em&gt; (output of the neural network).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;a&lt;/em&gt; is defined over variables (x1, …, xn) and &lt;em&gt;p&lt;/em&gt; is interpreted as a vector of probabilities corresponding to these variables &lt;em&gt;xi’s&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The semantic loss is directly proportional to the negative log likelihood of generating a state that satisfies the constraints when sampling values according to the distribution &lt;em&gt;p&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;main-axioms-and-insights&quot;&gt;Main Axioms and Insights&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;&lt;strong&gt;Monotonicity&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;If a sentence &lt;em&gt;a&lt;/em&gt; entails another sentence &lt;em&gt;b&lt;/em&gt; then for any given &lt;em&gt;p&lt;/em&gt;, &lt;em&gt;L&lt;sup&gt;s&lt;/sup&gt;(a, p) &amp;gt; L&lt;sup&gt;s&lt;/sup&gt;(b, p)&lt;/em&gt; ie adding more constraints cannot decrease the semantic loss.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Semantic Equivalence&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;If two sentences are logically equivalent, their semantic loss is the same.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Identity&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;For any given sentence &lt;em&gt;a&lt;/em&gt;, its representation as a sentence is equivalent to its representation as a deterministic vector ie writing the “one-hot” constraint as a boolean expression is equivalent to a one-hot vector.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Satisfaction&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;If &lt;em&gt;p&lt;/em&gt; entails the sentence &lt;em&gt;a&lt;/em&gt; then &lt;em&gt;L&lt;sup&gt;s&lt;/sup&gt;(a, p) = 0&lt;/em&gt;.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Label-literal correspondence&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;When the constraint is defined in terms of a single variable, it can be interpreted as the supervised label.&lt;/li&gt;
+      &lt;li&gt;Hence the semantic loss in case of a single variable should be equivalent to the cross-entropy loss.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Truth&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;The semantic loss of a true sentence is 0&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Non-negativity&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;Semantic loss should always be non-negative.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Probabilities of variables that are not part of the constraint, do not affect the semantic loss.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;It can be shown that the semantic loss function satisfies all these axioms (and the other axioms specified in the paper) and is the only function to do so, up to a multiplicative constant.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experimental-evaluation&quot;&gt;Experimental Evaluation&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Semantic Loss is used in the semi-supervised setting for Permuted MNIST, Fashion MNIST and CIFAR-10.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The key takeaway is that using semantic loss improves the performance of the state-of-the-art models for Fashion MNIST and CIFAR-10.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One downside is that the effectiveness of the semantic loss in this type of constraint strongly depends on the performance of the underlying model. Further, the semantic loss does not improve the performance in case of fully supervised scenario.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Further experiments are performed to evaluate the performance of the semantic loss on complex constraints. Since these tasks aim to highlight the effect of using semantic loss, only simple models (MLPs) are evaluated.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;tractability-of-semantic-loss&quot;&gt;Tractability of Semantic Loss&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The semantic loss is similar to the automated reasoning task called as weight model counting (wmc).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Circuit compiler techniques can be used to compute wmc while allowing backpropagation.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;notes&quot;&gt;Notes&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The proposed idea is simple and intuitive and the results on semi-supervised classification task are quite good. It would be interesting to extend and scale this method for more complex constraints.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Hierarchical Graph Representation Learning with Differentiable Pooling</title>
+   <link href="/site/2018/08/16/Hierarchical-Graph-Representation-Learning-with-Differentiable-Pooling.html"/>
+   <updated>2018-08-16T00:00:00-04:00</updated>
+   <id>/site/2018/08/16/Hierarchical Graph Representation Learning with Differentiable Pooling</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Most existing GNN (Graph Neural Network) methods are inherently flat and are unable to process the information in a hierarchical manner.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes a differentiable graph pooling operation, DIFFPOOL, that can generate hierarchical graph representations and can be easily plugged into many GNN architectures.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1806.08804&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;key-idea&quot;&gt;Key Idea&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;CNNs have spatial pooling operation that allows for deep CNN architectures to operate on coarse graph representations of input images.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This notion cannot be applied as-is to graphs as they do not have a natural notion of spatial locality like images do.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;DIFFPOOL attempts to resolve this problem by learning a differentiable soft-assignment at each layer which is equivalent to pooling the cluster of nodes to obtain a sparse representation.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given a graph &lt;em&gt;G(A, F)&lt;/em&gt;, where &lt;em&gt;A&lt;/em&gt; is the adjacency matrix and &lt;em&gt;F&lt;/em&gt; is the feature matrix.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given a permutation invariant GNN that follows the message passing architecture. The output of this GNN can be expressed as &lt;em&gt;Z = GNN(A, X)&lt;/em&gt; where &lt;em&gt;X&lt;/em&gt; is the current feature matrix.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Goal is to stack &lt;em&gt;L&lt;/em&gt; GNN layers on top of each other such that the &lt;em&gt;l&lt;sup&gt;th&lt;/sup&gt;&lt;/em&gt; layer uses coarsened output from the  &lt;em&gt;(l-1)&lt;sup&gt;th&lt;/sup&gt;&lt;/em&gt; layer.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This coarsening operation uses a cluster assignment matrix &lt;em&gt;S&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The learned cluster assignment matrix at layer &lt;em&gt;l&lt;/em&gt; is denoted at &lt;em&gt;S&lt;sup&gt;l&lt;/sup&gt;&lt;/em&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given &lt;em&gt;S&lt;sup&gt;l&lt;/sup&gt;&lt;/em&gt;, the embedding matrix for the &lt;em&gt;(l+1)&lt;sup&gt;th&lt;/sup&gt;&lt;/em&gt; layer is given as &lt;em&gt;transpose(S&lt;sub&gt;l&lt;/sub&gt;)Z&lt;sub&gt;l&lt;/sub&gt;&lt;/em&gt; and adjancecy matrix is given by &lt;em&gt;transpose(S&lt;sub&gt;l&lt;/sub&gt;)A&lt;sub&gt;l&lt;/sub&gt;S&lt;sub&gt;l&lt;/sub&gt;&lt;/em&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A new GNN, called as GNN&lt;sub&gt;pool&lt;/sub&gt; is used to produce the assignment matrix &lt;em&gt;S&lt;/em&gt; by taking a softmax over &lt;em&gt;GNN&lt;sub&gt;pool&lt;/sub&gt;(A&lt;sup&gt;l&lt;/sup&gt;, X&lt;sup&gt;l&lt;/sup&gt;)&lt;/em&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;As long as the GNN model is permutation invariant, the resulting DIFFPOOL model is also permutation invariant.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;auxiliary-losses&quot;&gt;Auxiliary Losses&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper uses 2 auxiliary losses to push the model away from spurious local minima early in the training.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Link prediction objective - at each layer, link prediction loss ( = A - S(transpose(S))) is minimized with the intuition that the nearby nodes should be pooled together.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Ideally, the cluster assignment for each node should be a one-hot vector so the entropy for cluster assignment per node is regularized.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;baselines&quot;&gt;Baselines&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;GNN based models
+    &lt;ul&gt;
+      &lt;li&gt;GraphSage
+        &lt;ul&gt;
+          &lt;li&gt;Mean pooling&lt;/li&gt;
+          &lt;li&gt;Set2Set pooling&lt;/li&gt;
+          &lt;li&gt;Sort pooling&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;Structure2vec&lt;/li&gt;
+      &lt;li&gt;Edge conditioned filters in CNN&lt;/li&gt;
+      &lt;li&gt;PatchySan&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;Kernel based models
+    &lt;ul&gt;
+      &lt;li&gt;Graphlet, shortest path etc&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;model-variants&quot;&gt;Model Variants&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;GraphSage
+    &lt;ul&gt;
+      &lt;li&gt;Mean pool + Diff pool (3 or 2 layers)&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;Structure2Vec + Diffpool&lt;/li&gt;
+  &lt;li&gt;Diffpool-Det
+    &lt;ul&gt;
+      &lt;li&gt;The assignment matrix &lt;em&gt;S&lt;/em&gt; are generated using graph clustering algorithms.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;Diffpool-NoLP
+    &lt;ul&gt;
+      &lt;li&gt;The link prediction objective function is turned off.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;At each DiffPool layer, the number of classes is set to 25% of the number of nodes before the DiffPool layer.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;DiffPool obtains the highest average performance across all the pooling approaches and improves upon the base GraphSage architecture by an average of around 7%.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In terms of runtime complexity, the paper reports that DiffPool does not incur any significant additional running time. But given that now there are 2 GNN models per layer, the size of the model should increase.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;DiffPool can capture hierarchical community structure even when trained on just the graph classification loss.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One advantage of DiffPool is that the nodes are pooled in a non-uniform way so densely connected group of nodes would collapse into one cluster while sparsely connected nodes can retain their identity.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Imagination-Augmented Agents for Deep Reinforcement Learning</title>
+   <link href="/site/2018/08/08/Imagination-Augmented-Agents-for-Deep-Reinforcement-Learning.html"/>
+   <updated>2018-08-08T00:00:00-04:00</updated>
+   <id>/site/2018/08/08/Imagination-Augmented Agents for Deep Reinforcement Learning</id>
+   <content type="html">&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper presents I2A (Imagination Augmented Agent) that combines the model-based and model-free approaches leading to data efficiency and robustness even with imperfect models.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;I2A agent uses the predictions from a learned environment model as an additional context in deep policy networks. This leads to improved data efficiency and robustness to imperfect models.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1707.06203&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;I2A agent has two main modules - Imagination module and the Policy module.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Imagination Module&lt;/strong&gt;&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;&lt;strong&gt;Environment Model&lt;/strong&gt;
+        &lt;ul&gt;
+          &lt;li&gt;This is a recurrent model, trained in an unsupervised manner using the agent trajectories. It can be used to predict the future state given the current state and action.&lt;/li&gt;
+          &lt;li&gt;The environment model can be rolled out multiple times to obtain a simulated trajectory or an “imagined” trajectory.&lt;/li&gt;
+          &lt;li&gt;During each rollout, the actions are chosen using a rollout policy π&lt;sub&gt;r&lt;/sub&gt;.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;&lt;strong&gt;Rollout Encoder&lt;/strong&gt;
+        &lt;ul&gt;
+          &lt;li&gt;A rollout encoder &lt;em&gt;E&lt;/em&gt; (LSTM) is used to process the entire imagined rollout.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;The imagination module is used to generate &lt;em&gt;n&lt;/em&gt; trajectories. Each trajectory is a sequence of outputs of the environment model.&lt;/li&gt;
+      &lt;li&gt;These &lt;em&gt;n&lt;/em&gt; trajectories are concatenated into a single “imagination” vector.&lt;/li&gt;
+      &lt;li&gt;The training data for the environment model is generated from trajectories of a partially trained model-free agent.&lt;/li&gt;
+      &lt;li&gt;Pretraining the environment model (instead of joint training with policy) leads to faster runtime.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Policy Module&lt;/strong&gt;&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;This module uses the output of both model-based path and model-free path as its input. It generates the policy vector and value function.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Rollout Strategy&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;One rollout is performed for each possible action in the environment ie, the first action in the i&lt;sup&gt;th&lt;/sup&gt; rollout is the i&lt;sup&gt;th&lt;/sup&gt; action in the action set.&lt;/li&gt;
+      &lt;li&gt;Subsequent actions are generated using a shared rollout policy π&lt;sub&gt;’&lt;/sub&gt;&lt;/li&gt;
+      &lt;li&gt;An effective strategy was to create a small model-free network π&lt;sub&gt;’&lt;/sub&gt;(o&lt;sub&gt;t&lt;/sub&gt;) and then add a KL loss component that encourages π&lt;sub&gt;’&lt;/sub&gt;(o&lt;sub&gt;t&lt;/sub&gt;)to be similar to the imagination augmented policy π(o&lt;sub&gt;t&lt;/sub&gt;).&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Baselines&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;Model-free agent&lt;/li&gt;
+      &lt;li&gt;Copy-model agent - same as I2A but the environment model is replaced by a “copy” model that just returns the input observations.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Environments&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;Sokoban
+        &lt;ul&gt;
+          &lt;li&gt;Task is to push a number of boxes onto given target locations.&lt;/li&gt;
+          &lt;li&gt;I2A outperforms the baselines and gains in performance as the number of unrolling steps increases (though at a diminishing rate).&lt;/li&gt;
+          &lt;li&gt;In case of poor environment models, the agent seems to be able to ignore the later part of the rollout when the error starts to accumulate.&lt;/li&gt;
+          &lt;li&gt;Monte Carlo search algorithm (without an explicit rollout encoder) performed poorly as compared to the model using rollout encoder.&lt;/li&gt;
+          &lt;li&gt;Predicting the reward along with value function and action seems to speed up training.&lt;/li&gt;
+          &lt;li&gt;If a near-perfect model is available, I2A agent’s performance can be improved by performing Monte Carlo search with the trained I2A agent for the rollout policy. The agent plays entire episodes in simulation and tries to find a successful action sequence within 10 retries.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;&lt;strong&gt;MiniPacman&lt;/strong&gt;
+        &lt;ul&gt;
+          &lt;li&gt;I2A agent is evaluated to see if a single model can be used to solve multiple tasks.&lt;/li&gt;
+          &lt;li&gt;A new environment is designed to define multiple tasks in an environment with shared state transitions.&lt;/li&gt;
+          &lt;li&gt;Each task is specified by a 5-dimensional reward vector that associates a reward with moving, eating food, eating a pill, eating a ghost and being eaten by a ghost.&lt;/li&gt;
+          &lt;li&gt;A single environment model is trained to predict both observations (frames) and events (eg “eating a pill”). This way, the environment model is shared across all tasks.&lt;/li&gt;
+          &lt;li&gt;Baseline agents and I2As are trained on each task separately. I2A architecture outperforms the standard agent in all tasks and the copy-model
+baseline in all but one task.&lt;/li&gt;
+          &lt;li&gt;The improvement in performance is higher for tasks where rewards are sparse and where the anticipation
+of ghost dynamics is especially important indicating that the I2A agent can use the environment model to explore the environment more effectively.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Kronecker Recurrent Units</title>
+   <link href="/site/2018/07/19/Kronecker-Recurrent-Units.html"/>
+   <updated>2018-07-19T00:00:00-04:00</updated>
+   <id>/site/2018/07/19/Kronecker Recurrent Units</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Recurrent Neural Networks have two key issues:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;strong&gt;Over parameterization&lt;/strong&gt; which increases the time for training and inference.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;strong&gt;Ill conditioned&lt;/strong&gt; recurrent weight matrix which makes training difficult due to vanishing or exploding gradients.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper presents a flexible RNN model called as KRU (Kronecker Recurrent Units) which overcomes the above problems by using a Kronecker factored recurrent matrix and soft unitary constraints on the factors.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1705.10142&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;related-work&quot;&gt;Related Work&lt;/h2&gt;
+
+&lt;h3 id=&quot;existing-solutions-for-overparameterization&quot;&gt;Existing solutions for overparameterization&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Low-rank decomposition.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Training a neural network on the soft targets predicted by a big pre-trained network.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Low-bit precision training.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Hashing.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;existing-solutions-for-vanishing-and-exploding-gradients&quot;&gt;Existing solutions for vanishing and exploding gradients&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Gating mechanism like in LSTMs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Gradient Clipping.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Orthogonal Weight Initialization.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Parameterizing recurrent weight matrix.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;kru&quot;&gt;KRU&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Uses a Kronecker factored recurrent matrix which enables controlling the number of parameters and number of factor matrices.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Vanishing and exploding gradients are taken care of by using a soft unitary constraint.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Why not use strict unitary constraint:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Restricts the search space and makes learning process unstable.&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Makes forgetting (irrelevant) information difficult.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Relaxing the strict constraint has shown to improve the convergence speed and generalization performance.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;KRU can be easily plugged into RNNs, LSTMs and other variants.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The recurrent matrix &lt;em&gt;W&lt;/em&gt; is paramterized as a kronecker product of &lt;em&gt;F&lt;/em&gt; matrices &lt;em&gt;W&lt;sub&gt;0&lt;/sub&gt;, …, W&lt;sub&gt;F-1&lt;/sub&gt;&lt;/em&gt; where each &lt;em&gt;W&lt;sub&gt;f&lt;/sub&gt;&lt;/em&gt; is a complex matrix of shape &lt;em&gt;P&lt;sub&gt;f&lt;/sub&gt; x Q&lt;sub&gt;f&lt;/sub&gt;&lt;/em&gt; and the product of all &lt;em&gt;P&lt;sub&gt;f&lt;/sub&gt;&lt;/em&gt; and producto of all &lt;em&gt;Q&lt;sub&gt;f&lt;/sub&gt;&lt;/em&gt; are both equal to &lt;em&gt;N&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Why is &lt;em&gt;W&lt;/em&gt; a complex matrix?&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;In the real space, the set of all unitary matrices have the determinant as 1 or -1.&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Given that determinant is a continuous function, the unitary set in the real space is disconnected.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;The unitary set in the complex space is connected as its determinants are points on the unit circle.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;soft-unitary-constraint&quot;&gt;Soft Unitary Constraint&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;table&gt;
+      &lt;tbody&gt;
+        &lt;tr&gt;
+          &lt;td&gt;A soft unitary constraint is introduced in the form of regularization term&lt;/td&gt;
+          &lt;td&gt; &lt;/td&gt;
+          &lt;td&gt;W&lt;sub&gt;f&lt;/sub&gt;&lt;sup&gt;H&lt;/sup&gt;W&lt;sub&gt;f&lt;/sub&gt; - I&lt;/td&gt;
+          &lt;td&gt; &lt;/td&gt;
+          &lt;td&gt;&lt;sup&gt;2&lt;/sup&gt; (per kronecker factored recurrent matrix).&lt;/td&gt;
+        &lt;/tr&gt;
+      &lt;/tbody&gt;
+    &lt;/table&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;If each of the Kronecker factors is unitary, the resulting matrix &lt;em&gt;W&lt;/em&gt; would also be unitary.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It is computationally inefficient to apply this constraint over the recurrent matrix &lt;em&gt;W&lt;/em&gt; itself as the complexity of the regularizer is given as &lt;em&gt;O(N&lt;sup&gt;3&lt;/sup&gt;)&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;Use of Kronecker factorisation makes it computationally feasible to use this regulariser.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiment&quot;&gt;Experiment&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The Kronecker recurrent model is compared against the existing recurrent models for multiple tasks including copy memory, adding memory, pixel-by-pixel MNIST, char level language models, polyphonic music modelling, and framewise phoneme classification.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For most of the task, KRU model produces results comparable to the best performing models despite using fewer parameters.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Using soft unitary constraints in KRU provides a principled alternative to gradient clipping (a common heuristic to avoid exploding gradients).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Further, recent theoretical results suggest the gradient descent converges to a global optimizer of linear recurrent networks even if the learning problem is non-convex provided that the spectral norm of the recurrent matrix is bound by 1.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The key take away from the paper is that state should be high dimensional so that high capacity network can be used for encoding and decoding the input and output. The recurrent dynamics should be implemented via a low capacity model.s per task.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Learning Independent Causal Mechanisms</title>
+   <link href="/site/2018/07/11/Learning-Independent-Causal-Mechanisms.html"/>
+   <updated>2018-07-11T00:00:00-04:00</updated>
+   <id>/site/2018/07/11/Learning Independent Causal Mechanisms</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper presents a very interesting approach for learning independent (inverse) data transformation from a set of transformed data points in an unsupervised manner.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1712.00961&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;formulation&quot;&gt;Formulation&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;We start with a given data distribution &lt;em&gt;P&lt;/em&gt; (say the MNIST dataset) where each x ε R&lt;sup&gt;d&lt;/sup&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Consider N transformations M&lt;sub&gt;1&lt;/sub&gt;, …, M&lt;sub&gt;N&lt;/sub&gt; (functions that map input x to transformed input x’). Note that N need not be known before hand.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;These transformations can be thought of as independent (from other transformations) causal mechanisms.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Applying these transformation would give N new distributions Q&lt;sub&gt;1&lt;/sub&gt;, …, Q&lt;sub&gt;N&lt;/sub&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;These individual distributions are combined to form a single transformed distribution Q which contains the union of samples from the individual distributions.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;At training time, two datasets are created. One dataset corresponds to untransformed objects (sampled from &lt;em&gt;P&lt;/em&gt;), referred to as &lt;em&gt;D&lt;sub&gt;P&lt;/sub&gt;&lt;/em&gt;. The other dataset corresponds to samples from the transformed distribution &lt;em&gt;Q&lt;/em&gt; and is referred to as &lt;em&gt;D&lt;sub&gt;Q&lt;/sub&gt;&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Note that all the samples in &lt;em&gt;D&lt;sub&gt;P&lt;/sub&gt;&lt;/em&gt; and &lt;em&gt;D&lt;sub&gt;Q&lt;/sub&gt;&lt;/em&gt; are sampled independently and no supervising information is needed.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A series of N’ parametric models, called as experts, are initialized and would be trained to learn the different mechanisms.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For simplicity, assume that N = N’. If N &amp;gt; N’, some experts would learn more than one transformation or certain transformations would not be learnt. If N &amp;lt; N’, some experts would not learn anything or some experts would learn the same distribution. All of these cases can be diagnosed and corrected by changing the number of experts.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The experts are trained with the goal of maximizing an objective parameter &lt;em&gt;c&lt;/em&gt;: R&lt;sup&gt;d&lt;/sup&gt; to R. &lt;em&gt;c&lt;/em&gt; takes high values on the support of  &lt;em&gt;P&lt;/em&gt; and low values outside.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;During training, an example x&lt;sub&gt;Q&lt;/sub&gt; (from D&lt;sub&gt;Q&lt;/sub&gt;) is fed to all the experts at the same time. Each expert produces a value &lt;em&gt;c&lt;sub&gt;j&lt;/sub&gt; = c(E&lt;sub&gt;j&lt;/sub&gt;(x&lt;sub&gt;Q&lt;/sub&gt;))&lt;/em&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The winning expert is the one whose output is the max among all the outputs. Its parameters are updated to maximise its output while the other experts are not updated.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This forces the best performing model to become even better and hence specialize.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The objective &lt;em&gt;c&lt;/em&gt; comes from adversarial training where a discriminator network discriminates between the untransformed input and the output of the experts.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each expert can be thought of as a GAN that conditions on the input x&lt;sub&gt;Q&lt;/sub&gt; (and not on a noise vector). The output of the different experts is fed to the discriminator which provides both a selection mechanism and the gradients for training the experts.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Experiments are performed on the MNIST dataset using the transformations like translation along 4 directions and along 4 diagonals, contrast shift and inversion.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The discriminator is further trained against the output of all the losing experts thereby furthering strengthing the winning expert.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;approximate-identity-initialization&quot;&gt;Approximate Identity Initialization&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The experts are initialized randomly and then pretrained to approximate the identity function by training with identical input-output pairs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This ensures that the experts start from a similar level.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In practice, it seems necessary for the success of the proposed approach.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;observations&quot;&gt;Observations&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;During the initial phase, there is a heavy competition between the experts and eventually different winners emerge for different transformations.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;The approximate quality of reconstructed output was also evaluated using a downstream task.
+    &lt;ul&gt;
+      &lt;li&gt;3 type of inputs were created:
+        &lt;ul&gt;
+          &lt;li&gt;Untransformed images&lt;/li&gt;
+          &lt;li&gt;Transformed images&lt;/li&gt;
+          &lt;li&gt;Transformed images a being processed by experts.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;These inputs are fed to a pretrained MNISTN classifier.&lt;/li&gt;
+      &lt;li&gt;The classifier performs poorly on the transformed images while the performance for images processed by experts quickly catches up with the performance on untransformed images.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;The experts E&lt;sub&gt;i&lt;/sub&gt; generalize on the data points from a different dataset as well.
+    &lt;ul&gt;
+      &lt;li&gt;To test the generalisation capabilities of the expert, a sample of data from the omniglot dataset is transformed and fed to experts (which are trained only on MNIST).&lt;/li&gt;
+      &lt;li&gt;Each expert consistently applies the same transformation even though the inputs are outside the training domain.&lt;/li&gt;
+      &lt;li&gt;This suggests that the experts have generalized to different transformations irrespective of the underlying dataset.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;comments&quot;&gt;Comments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The experiments are quite limited in terms of complexity of dataset and complexity of transformation but it provides evidence for a promising connection between deep learning and causality.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Appendix mentions that in case there are too many experts, for most of the tasks, only one model specialises and the extra experts do not specialize at all. This is interesting as there is no explicit regularisation penalty which prevents the emergence of multiple experts per task.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Memory-based Parameter Adaptation</title>
+   <link href="/site/2018/07/04/Memory-Based-Parameter-Adaption.html"/>
+   <updated>2018-07-04T00:00:00-04:00</updated>
+   <id>/site/2018/07/04/Memory-Based Parameter Adaption</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Standard Deep Learning networks are not suitable for continual learning setting as the change in the data distribution leads to catastrophic forgetting.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes Memory-based Parameter Adaptation (MbPA), a technique that augments a standard neural network with an episodic memory (containing examples from the previous tasks).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This episodic memory allows for rapid acquisition of new knowledge (corresponding to the current task) while preserving performance on the previous tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1802.10542&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;architecture&quot;&gt;Architecture&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;MbPA consists of 3 components:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;Embedding Network &lt;em&gt;f&lt;/em&gt;&lt;/li&gt;
+      &lt;li&gt;Memory &lt;em&gt;M&lt;/em&gt;&lt;/li&gt;
+      &lt;li&gt;Output network &lt;em&gt;g&lt;/em&gt;&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;f&lt;/em&gt; and &lt;em&gt;g&lt;/em&gt; are parametric components while &lt;em&gt;M&lt;/em&gt; is a non-parametric component.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;M&lt;/em&gt; is a dynamically sized dictionary where the key represents the output of the embedding network and the value represents the desired output for a given input (input to the model).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;When a new training tuple (x&lt;sub&gt;j&lt;/sub&gt;, y&lt;sub&gt;j&lt;/sub&gt;) is fed as input to the model, a key-value pair (h&lt;sub&gt;j&lt;/sub&gt;, v&lt;sub&gt;j&lt;/sub&gt;) is added to the memory. h&lt;sub&gt;j&lt;/sub&gt; = f(x&lt;sub&gt;j&lt;/sub&gt;)&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The memory has a fixed size and acts as a circular buffer. When it gets filled up, earlier examples are dropped.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;When accessing the memory using a key &lt;em&gt;h&lt;sub&gt;key&lt;/sub&gt;&lt;/em&gt;, the k-nearest neighbours (in terms of distance from the given key) are retrieved.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;training-phase&quot;&gt;Training Phase&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;During the training phase, the memory is only used to store the input examples and does not interfere with the training procedure.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;testing-phase&quot;&gt;Testing Phase&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;During testing, the memory is used to adapt the parameters of the output network &lt;em&gt;g&lt;/em&gt; while the embedding network &lt;em&gt;f&lt;/em&gt; remains the same.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given the input x, obtain the embedding corresponding to x and using that as the key, retrieve the k-nearest neighbours from the memory.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each retrived neighbour is a tuple of the form (h&lt;sub&gt;k&lt;/sub&gt;, v&lt;sub&gt;k&lt;/sub&gt;, w&lt;sub&gt;k&lt;/sub&gt;) where w&lt;sub&gt;k&lt;/sub&gt; is propotional to the closeness between the input query and the key corresponding to the retrived example.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The collection of all the retrieved examples are referred to as the context &lt;em&gt;C&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The parameters of the output network &lt;em&gt;g&lt;/em&gt; are adapted from θ to θ&lt;sub&gt;x&lt;/sub&gt; where θ&lt;sub&gt;x&lt;/sub&gt; = θ + δ&lt;sub&gt;M&lt;/sub&gt;(x, θ)&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;δ&lt;sub&gt;M&lt;/sub&gt;(x, θ) is referred to as the contextual update of parameters of the output network.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;interpretation-of-mbpa&quot;&gt;Interpretation of MbPA&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;MbPA can be interpreted as decreasing the weighted average of negative log likelihood over the retrieved neighbours in the context C.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The expression corresponding to  δ&lt;sub&gt;M&lt;/sub&gt;(x, θ) can be obtained by performing gradient descent to minimise the max a posterior over the context C.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The a posterior expression can be written as a sum of two terms - one corresponding to a weighted likelihood of data in the context C and the other corresponding to a regularisation term to prevent overfitting the data.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This idea can be thought of as a generalisation of attention. Attention can be viewed as fitting a constant function over the neighbourhood of memories while MbPA fits a more general function which is parameterised by the output network of the given model. Refer appendix E in the paper for further details.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;MbPA aims to solve the fundamental problem of enabling the model to deal with changes in data distribution.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In that sense, it is evaluated on a wide range of settings: continual learning, incremental learning, unbalanced datasets and change in data distribution at test time.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Continual Learning:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;In this setting, the model encounters a sequence of tasks and cannot revisit a previous task.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Permuted MNIST dataset was used.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The key takeaway is that once a task is catastrophically forgotten, only a few gradient updates on a carefully selected data, are sufficient to recover the performance.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Incremental Learning:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;In this setting, the model is trained on a subset of classes and then introduced to novel, unseen classes. The model is tested to see if it can incorporate the new knowledge while retaining the knowledge about the previous classes.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Imagenet dataset with Resnet V1 model is used. It is first pretrained on 500 classes and then fine-tuned to see how quickly could it adapt to new classes.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Unbalanced Dataset:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;This setting is similar to the incremental learning setting with the key difference that once the model has been trained on a part of the dataset and is to be finetuned to acquire new knowledge, the dataset used for finetuning is much smaller than the initial dataset thus creating the effect of unbalanced datasets.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Language Modelling:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;MbPA is used to adapt to the shift in the word distribution that is common to language modelling tasks. PTB and WikiText datasets were used.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;MbPA exhibits strong performance on all these tasks showing that the memory-based parameter adaption technique is effective across a range of tasks in supervised learning.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Born Again Neural Networks</title>
+   <link href="/site/2018/06/09/Born-Again-Neural-Networks.html"/>
+   <updated>2018-06-09T00:00:00-04:00</updated>
+   <id>/site/2018/06/09/Born Again Neural Networks</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper explores knowledge distillation (KD) from the perspective of transferring knowledge between 2 networks of identical capacity.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This is in contrast to much of the previous work in KD which has focused on transferring knowledge from a larger network to a smaller network.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper reports that these Born Again Networks (BANs) outperform their teachers by significant margins in many cases.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1805.04770&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The standard KD setting is as follows:
+    &lt;ul&gt;
+      &lt;li&gt;Start with an untrained network (or ensemble of networks) and train them for the given task. This network is referred to as the teacher network.&lt;/li&gt;
+      &lt;li&gt;Now start with another untrained network (generally of smaller size than the teacher network) and train it using the output of the teacher network. This network is referred to as the student network.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper augments this setting with an extra cross-entropy loss between the output of the teacher and the student networks. The student tried to predict the correct answer while matching the output distribution of the teacher.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The resulting student network is referred to as BAN - Born Again Network.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The same approach can be used multiple times (with diminishing returns) where the kth generation student is initialized by knowledge transfer from (k-1)th generation student.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;The output of multiple generation BANs are combined via averaging to produce BANE (Born Again Network Ensemble).&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;dark-knowledge&quot;&gt;Dark Knowledge&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://shagunsodhani.in/papers-I-read/Distilling-the-Knowledge-in-a-Neural-Network&quot;&gt;Hinton et al&lt;/a&gt; suggested that even when the output of the teacher network is incorrect, it contains useful information about the similarity between the output classes. This information is referred to as the “dark knowledge”.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The current paper observed that the gradient of the correct output dimension during distillation and normal supervised training resembles the original gradient up to a  weight factor. This sample specific weight is defined by the value of the teacher’s max output.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This suggests distillation may be performing some kind of importance weighing. To explore this further, the paper considers 2 cases:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Confidence Weighted By Teacher Max (CWTM) - where each example in the student’s loss function is weighted by the confidence that the teacher has on the prediction for that sample. The student incurs a higher loss if the teacher was more confident about the example.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Dark Knowledge with Permuted Predictions (DKPP) - The non-argmax output of teacher’s predictive distribution are permuted thus destroying the information about which output classes are related.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The key effect of these variations is that the covariance between the output classes is lost and classical knowledge distillation would not be sufficient to explain improvements (if any).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;h3 id=&quot;image-data&quot;&gt;Image Data&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Datasets
+    &lt;ul&gt;
+      &lt;li&gt;CIFAR10&lt;/li&gt;
+      &lt;li&gt;CIFAR100&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;Baselines
+    &lt;ul&gt;
+      &lt;li&gt;ResNets&lt;/li&gt;
+      &lt;li&gt;DenseNets&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;BAN Variants
+    &lt;ul&gt;
+      &lt;li&gt;BAN-DenseNet and BAN-ResNet  - Train a sequence of 2 or 3 BANs using DenseNets and ResNets. Different variants constrain BANs to be similar to their teacher or penalize l2-distance between student and teacher activations etc.&lt;/li&gt;
+      &lt;li&gt;Two settings with CWTM and DKPP as explained earlier.&lt;/li&gt;
+      &lt;li&gt;BAN-Resnet with DenseNet teacher and BAN-DenseNet with ResNet teacher&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;text-data&quot;&gt;Text Data&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Datasets:
+    &lt;ul&gt;
+      &lt;li&gt;PTB Dataset&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;Baselines
+    &lt;ul&gt;
+      &lt;li&gt;CNN-LSTM model&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;BAN Variant
+    &lt;ul&gt;
+      &lt;li&gt;LSTM&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;BAN student models improved over their teachers in most of the configurations.&lt;/li&gt;
+  &lt;li&gt;Training BANs across multiple generations leads to saturating improvements.&lt;/li&gt;
+  &lt;li&gt;The student models exhibit improvements even in the control settings (CWTM and DKPP).
+    &lt;ul&gt;
+      &lt;li&gt;One reason could be that the permutation procedure did not remove the higher order moments of output distribution.&lt;/li&gt;
+      &lt;li&gt;Improvements in the CWTM model suggests that the pre-trained models can be used to rebalance the training set by giving lesser weight for samples where the teacher’s output distribution is more spread.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Net2Net-Accelerating Learning via Knowledge Transfer</title>
+   <link href="/site/2018/05/21/Net2Net-Accelerating-Learning-via-Knowledge-Transfer.html"/>
+   <updated>2018-05-21T00:00:00-04:00</updated>
+   <id>/site/2018/05/21/Net2Net - Accelerating Learning via Knowledge Transfer</id>
+   <content type="html">&lt;h2 id=&quot;notes&quot;&gt;Notes&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper presents a simple yet effective approach for transferring knowledge from a trained neural network (referred to as the teacher network) to a large, untrained neural network (referred to as the student network).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The key idea is to use a function-preserving transformation that guarantees that for any given input, the output from the teacher network and the newly created student network would be the same.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1511.05641&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://github.com/paengs/Net2Net&quot;&gt;Link to an implementation&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The approach works as follows - Let us say that the teacher network was represented by the transformation &lt;em&gt;y = f(x, θ)&lt;/em&gt; where &lt;em&gt;θ&lt;/em&gt; refer to the parameters of the network. The task is to choose a new set of parameters &lt;em&gt;θ’&lt;/em&gt; for the student network &lt;em&gt;g(x, θ’)&lt;/em&gt; such that for all &lt;em&gt;x, f(x, θ) = g(x, θ’)&lt;/em&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;To start, we can assume that &lt;em&gt;f&lt;/em&gt; and &lt;em&gt;g&lt;/em&gt; are composed of standard linear layers. Layer &lt;em&gt;i&lt;/em&gt; and &lt;em&gt;i+1&lt;/em&gt; are represented by weights &lt;em&gt;W&lt;sub&gt;mxn&lt;/sub&gt;&lt;sup&gt;i&lt;/sup&gt;&lt;/em&gt; and &lt;em&gt;W&lt;sub&gt;nxp&lt;/sub&gt;&lt;sup&gt;i+1&lt;/sup&gt;&lt;/em&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;We want to grow layer &lt;em&gt;i&lt;/em&gt; to have &lt;em&gt;q&lt;/em&gt; output units (where &lt;em&gt;q&lt;/em&gt; &amp;gt; &lt;em&gt;n&lt;/em&gt;) and layer &lt;em&gt;i+1&lt;/em&gt; to have &lt;em&gt;q&lt;/em&gt; input units. The new weight matrix would be &lt;em&gt;U&lt;sub&gt;mxq&lt;/sub&gt;&lt;sup&gt;i&lt;/sup&gt;&lt;/em&gt; and &lt;em&gt;U&lt;sub&gt;qxp&lt;/sub&gt;&lt;sup&gt;i+1&lt;/sup&gt;&lt;/em&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The first &lt;em&gt;q&lt;/em&gt; columns (rows) of &lt;em&gt;W&lt;sup&gt;i&lt;/sup&gt;&lt;/em&gt; (&lt;em&gt;W&lt;sup&gt;i+1&lt;/sup&gt;&lt;/em&gt;) would be copied as it is into &lt;em&gt;U&lt;sup&gt;i&lt;/sup&gt;&lt;/em&gt;(&lt;em&gt;U&lt;sup&gt;i+1&lt;/sup&gt;&lt;/em&gt;).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For filling the remaining &lt;em&gt;n-q&lt;/em&gt; slots, columns (rows) would be sampled randomly from &lt;em&gt;W&lt;sup&gt;i&lt;/sup&gt;&lt;/em&gt; (&lt;em&gt;W&lt;sup&gt;i+1&lt;/sup&gt;&lt;/em&gt;).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Finally, each layer in &lt;em&gt;U&lt;sup&gt;i&lt;/sup&gt;&lt;/em&gt; is scaled by dividing by the corresponding replication factor to ensure that the output value of function remains unchanged by the operation.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Since convolutions can be seen as multiplication by a double block circulant matrix, the approach can be readily extended for convolutional networks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The benefits of using this approach are the following:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;The newly created student network performs at least as good as the teacher network.&lt;/li&gt;
+      &lt;li&gt;Any changes to the network are guaranteed to be an improvement.&lt;/li&gt;
+      &lt;li&gt;It is safe to optimize all the parameters in the network.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The variant discussed above is called the &lt;strong&gt;Net2WiderNet&lt;/strong&gt; variant. There is another variant called&lt;strong&gt;Net2DeeperNet&lt;/strong&gt; that enables the network to grow in depth.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In that case, a new matrix, &lt;em&gt;U&lt;/em&gt;, initialized as the identity matrix, is added to the network. Note that unlike the &lt;strong&gt;Net2WiderNet&lt;/strong&gt;, this approach would not work with arbitrary activation function between the layers.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;strengths&quot;&gt;Strengths&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model can accelerate the training of neural networks, especially during development cycle when the designers try out different models.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The approach could potentially be used in life-long learning systems where the model is trained over a stream of data and needs to grow over time.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;limitations&quot;&gt;Limitations&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The function preserving transformations need to be worked out manually. Extra care needs to be taken when operations like concatenation or batch norm are present.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Learning to Count Objects in Natural Images for Visual Question Answering</title>
+   <link href="/site/2018/05/06/Learning-to-Count-Objects-in-Natural-Images-for-Visual-Question-Answering.html"/>
+   <updated>2018-05-06T00:00:00-04:00</updated>
+   <id>/site/2018/05/06/Learning to Count Objects in Natural Images for Visual Question Answering</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Most of the visual question-answering (VQA) models perform poorly on the task of counting objects in an image. The main reasons are:
+    &lt;ul&gt;
+      &lt;li&gt;Most VQA models use a soft attention mechanism to perform a weighted sum over the spatial features to obtain a single feature vector. These aggregated features helps in most category of questions but seems to hurt for counting based questions.&lt;/li&gt;
+      &lt;li&gt;For the counting questions, we do not have a ground truth segmentation of where the objects to be counted are present on the image. This limits the scope of supervision.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Additionally, we need to ensure that any modification in the architecture, to enhance the performance on the counting questions, should not degrade the performance on other classes of questions.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes to overcome these challenges by using the attention maps (and not the aggregated feature vectors) as input to a separate &lt;strong&gt;count&lt;/strong&gt; module.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1802.05766&quot;&gt;Link to the paper&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;notes&quot;&gt;Notes&lt;/h2&gt;
+
+&lt;p&gt;The basic idea is quite intuitive: when we perform weighted averaging based on different attention maps, we end up averaging the features corresponding to the difference instances of an object. This makes the feature vectors indistinguishable from the scenario where we had just one instance of the object in the image.&lt;/p&gt;
+
+&lt;p&gt;Even multiple glimpses (multiple attention steps) can not resolve this problem as the weights given to one feature vector would not depend on the other feature vectors (that are attended to). Hard attention could be more useful than soft-attention but there is not much empirical evidence in support of this hypothesis.&lt;/p&gt;
+
+&lt;p&gt;The proposed &lt;strong&gt;count&lt;/strong&gt; module is a separate pipeline that can be integrated with most of the existing attention based VQA models without affecting the performance on non-count based questions.&lt;/p&gt;
+
+&lt;p&gt;The inputs to the &lt;strong&gt;count&lt;/strong&gt; module are the attention maps and the object proposals (coming from some pre-trained model like the RCNN model) and the output is an count-feature vector which is used to answer the count based question.&lt;/p&gt;
+
+&lt;p&gt;The top level idea is the following - given the object proposals and the attention maps, create a graph where nodes are objects (object proposals) and edges capture how similar two object proposals are (how much do they overlap). The graph is transformed (by removing and scaling edges) so that the count of the object can be obtained easily.&lt;/p&gt;
+
+&lt;p&gt;To explain their methodology, the paper simplifies the setting by making two assumptions:&lt;/p&gt;
+&lt;ul&gt;
+  &lt;li&gt;The first assumption is that the attention weights are either 1 (when the object is present in the proposal) or 0 (when the object is absent from the proposal).&lt;/li&gt;
+  &lt;li&gt;The second assumption is that any two object proposals either overlap completely (in which case, they are corresponding to the exact same object and hence receive the exact same weights) or the two proposals have zero overlap (in which case, they must be corresponding to completely different objects).&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;These simplifying assumptions are made only for the sake of exposition and do not limit the capabilities of the &lt;strong&gt;count&lt;/strong&gt; module.&lt;/p&gt;
+
+&lt;p&gt;Given the assumptions, the task of the count module is to handle the exact duplicates to prevent double-counting of objects.&lt;/p&gt;
+
+&lt;p&gt;As the first step, the attention weights (&lt;strong&gt;a&lt;/strong&gt;) are used to generate an attention matrix (&lt;strong&gt;A&lt;/strong&gt;) by performing an outer product between &lt;strong&gt;a&lt;/strong&gt; and &lt;strong&gt;a&lt;sup&gt;T&lt;/sup&gt;&lt;/strong&gt;. This corresponds to the step of creating a graph from the input.&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;A&lt;/strong&gt; corresponds to the adjacency matrix of that graph. The attention weight for the &lt;em&gt;i&lt;sup&gt;th&lt;/sup&gt;&lt;/em&gt; proposal corresponds to the &lt;em&gt;i&lt;sup&gt;th&lt;/sup&gt;&lt;/em&gt; node in the graph and the edge between the nodes &lt;em&gt;i&lt;/em&gt; and &lt;em&gt;j&lt;/em&gt; has the weight &lt;strong&gt;a&lt;sub&gt;i&lt;/sub&gt;*a&lt;sub&gt;j&lt;/sub&gt;&lt;/strong&gt;.&lt;/p&gt;
+
+&lt;p&gt;Also note that the graph is a weighted directed graph and the subgraph of vertices satisfying the condition &lt;strong&gt;a&lt;sub&gt;i&lt;/sub&gt;&lt;/strong&gt; = 1 is a complete directed graph with self-loops. Given such a graph, the number of vertices, &lt;em&gt;V = sqrt(E)&lt;/em&gt; where &lt;em&gt;E&lt;/em&gt; could be computed by summing over the adjacency matrix.This implies that if the proposals are distinct, then the count can be obtained trivially by performing a sum over the adjacency matrix.&lt;/p&gt;
+
+&lt;p&gt;The objective is now to eliminate the edges such that the underlying objects are the vertices of a complete subgraph. This requires removing two type of duplicate edges - intra-object edges and inter-object edges.&lt;/p&gt;
+
+&lt;p&gt;Intra-object edges can be removed by computing a distance matrix, &lt;strong&gt;D&lt;/strong&gt;, defined as 1 - IoU, where IoU matrix corresponds to the Intersection-over-Union matrix. A modified adjacency matrix &lt;strong&gt;A’&lt;/strong&gt; is obtained by performing the element-wise product between f&lt;sub&gt;1&lt;/sub&gt;(&lt;strong&gt;A&lt;/strong&gt;) and f&lt;sub&gt;2&lt;/sub&gt;(&lt;strong&gt;D&lt;/strong&gt;) where f&lt;sub&gt;1&lt;/sub&gt; and f&lt;sub&gt;2&lt;/sub&gt; are piece-wise linear functions that are learnt via backpropogation.&lt;/p&gt;
+
+&lt;p&gt;The inter-object edges are removed in the following manner:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Count the number of proposals that correspond of each instance of an object and then scale down the edges corresponding to the different instances by that number.&lt;/li&gt;
+  &lt;li&gt;This creates the effect of reducing the weights of multiple proposals equivalent to a single proposal.&lt;/li&gt;
+  &lt;li&gt;The number of proposals corresponding to an object is not available as an annotation in the training pipeline and is estimated based on the similarity between the different proposals (measured via the attention weights &lt;strong&gt;a&lt;/strong&gt;, adjacency matrix &lt;strong&gt;A&lt;/strong&gt; and distance matrix &lt;strong&gt;D&lt;/strong&gt;).&lt;/li&gt;
+  &lt;li&gt;The matrix corresponding to the similarity between proposals  (&lt;strong&gt;sim&lt;sub&gt;i, j&lt;/sub&gt;&lt;/strong&gt;) is transformed into a vector corresponding to the scaling factor of each node (&lt;strong&gt;s&lt;sub&gt;i&lt;/sub&gt;&lt;/strong&gt;)&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;strong&gt;s&lt;/strong&gt; can be converted into a matrix (by doing outer-product with itself) so as to scale both the incoming and the outgoing edges. The self edges (which were removed while computing &lt;strong&gt;A’&lt;/strong&gt; are added back (after scaling with &lt;strong&gt;s&lt;/strong&gt;) to obtain a new transformed matrix &lt;strong&gt;C&lt;/strong&gt;.&lt;/p&gt;
+
+&lt;p&gt;The transformed matrix &lt;strong&gt;C&lt;/strong&gt; is a complete graph with self-loops where the nodes corresponds to all the relevant object instances and not to object proposals. The actual count can be obtained from &lt;strong&gt;C&lt;/strong&gt; by performing a sum over all its values as described earlier. The original count problem was a regression problem but it is transformed into a classification problem to avoid scale issues. The network produces a &lt;strong&gt;k&lt;/strong&gt;-hot &lt;strong&gt;n&lt;/strong&gt;-dimensional vector called &lt;strong&gt;o&lt;/strong&gt; where &lt;strong&gt;n&lt;/strong&gt; is the number of object proposals that were feed into the module (and hence the upper limit on upto how large a number could the module count). In the ideal setting, &lt;strong&gt;k&lt;/strong&gt; should be one, as the network would produce an integer value but in practice, the network produces a real number so &lt;strong&gt;k&lt;/strong&gt; can be upto 2. If &lt;strong&gt;c&lt;/strong&gt; is an exact integer, the output is a 1-hot vector with the value in index corresponding to &lt;strong&gt;c&lt;/strong&gt; set to 1. If &lt;strong&gt;c&lt;/strong&gt; is a real number, the output is a linear interpolation between two one-hot vectors (the one-hot vectors correspond to the two integers between  which &lt;strong&gt;c&lt;/strong&gt; lies).&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;count&lt;/strong&gt; module supports computing the confidence of a prediction by defining two variables p&lt;sub&gt;&lt;strong&gt;a&lt;/strong&gt;&lt;/sub&gt; and p&lt;sub&gt;&lt;strong&gt;D&lt;/strong&gt;&lt;/sub&gt; which compute the average distance of f&lt;sub&gt;6&lt;/sub&gt;(&lt;strong&gt;a&lt;/strong&gt;) and $f&lt;sub&gt;7&lt;/sub&gt;(&lt;strong&gt;D&lt;/strong&gt;) from 0.5. The final output &lt;strong&gt;o’&lt;/strong&gt; is defined as f&lt;sub&gt;8&lt;/sub&gt;(p&lt;sub&gt;&lt;strong&gt;a&lt;/strong&gt;&lt;/sub&gt; + p&lt;sub&gt;&lt;strong&gt;D&lt;/strong&gt;&lt;/sub&gt;) . &lt;strong&gt;o&lt;/strong&gt;&lt;/p&gt;
+
+&lt;p&gt;All the different f functions are piece wise linear functions and are learnt via backpropagation.&lt;/p&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;p&gt;The authors created a new category of count-based questions by filtering the number-type questions to remove questions like “What is the time right now”. These questions do have a neumerical answer but do not fall under the purview of count based questions and hence are not targeted by the &lt;strong&gt;count&lt;/strong&gt; model.&lt;/p&gt;
+
+&lt;p&gt;The authors augmented a state of the art &lt;a href=&quot;https://arxiv.org/abs/1704.03162&quot;&gt;VQA model&lt;/a&gt; with their &lt;strong&gt;count&lt;/strong&gt; module and show substantial gains over the count-type questions for the &lt;a href=&quot;https://arxiv.org/abs/1612.00837&quot;&gt;VQA-v2 dataset&lt;/a&gt;. This augmentation does not drastically impact the performance on non-count questions.&lt;/p&gt;
+
+&lt;p&gt;The overall idea is quite crisp and intutive and the paper is easy to follow. It would be even better if there were some more abalation studies. For example, why are the piece-wise linear functions assumed to have 16 linear components? Would a smaller or larger number be better?&lt;/p&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Neural Message Passing for Quantum Chemistry</title>
+   <link href="/site/2018/04/08/Neural-Message-Passing-for-Quantum-Chemistry.html"/>
+   <updated>2018-04-08T00:00:00-04:00</updated>
+   <id>/site/2018/04/08/Neural Message Passing for Quantum Chemistry</id>
+   <content type="html">&lt;h1 id=&quot;introduction&quot;&gt;Introduction&lt;/h1&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper presents a general message passing architecture called as Message Passing Neural Networks (MPNNs) that unify various existing models for performing supervised learning on molecules.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Variants of the MPNN model achieve very good performance on the task of predicting the property of the molecules.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1704.01212&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h1 id=&quot;mpnn&quot;&gt;MPNN&lt;/h1&gt;
+
+&lt;h2 id=&quot;setting&quot;&gt;Setting&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The input to the model is an undirected graph &lt;em&gt;G&lt;/em&gt; where node features are represented as &lt;em&gt;x&lt;sub&gt;v&lt;/sub&gt;&lt;/em&gt; (corresponding to node &lt;em&gt;v&lt;/em&gt;) and edge features are &lt;em&gt;e&lt;sub&gt;v, w&lt;/sub&gt;&lt;/em&gt; (corresponding to edge between nodes &lt;em&gt;v, w&lt;/em&gt;).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The idea is to learn a representation (or feature vector) for all the nodes (and possibly edges) in the graph and use that for the downstream supervised learning task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model can be easily extended to the setting of directed graphs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model works in 2 phases:&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;message-passing-phase&quot;&gt;Message Passing Phase&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;All nodes send a &lt;em&gt;message&lt;/em&gt; to their neighbouring nodes. The message is a function of the feature vectors corresponding to the sender node (or vertex), the receiver node and the edge connecting the two nodes. The feature vectors can be combined to form the message using the &lt;em&gt;message function&lt;/em&gt; which can be implemented as a neural network.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Once a node has received messages from all its neighbours, it updated its feature vector by aggregating all the message. The function used to aggregate and update the feature vector is called as the &lt;em&gt;update function&lt;/em&gt; and can be implemented as a neural network.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;After updating the feature vectors, the graph could initiate another round of message passing. After a sufficient number of message passing rounds, the Readout phase is invoked.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;readout-phase&quot;&gt;Readout Phase&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The feature vectors corresponding to different nodes in the graph are aggregated into a single feature vector (corresponding to the feature vector of the graph) using the &lt;em&gt;readout function&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The &lt;em&gt;readout function&lt;/em&gt; can also be implemented using a neural network with the condition that it is invariant to the permutation of the nodes within the graph (to ensure that the MPNN is independent of the graph isomorphism).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h1 id=&quot;existing-variants-in-literature&quot;&gt;Existing Variants in literature&lt;/h1&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The paper provides various examples where the existing architectures could be explained in terms of the message passing framework. This includes examples like &lt;a href=&quot;https://arxiv.org/abs/1509.09292&quot;&gt;Convolutional Networks on Graphs for Learning Molecular Fingerprints&lt;/a&gt;, &lt;a href=&quot;https://arxiv.org/abs/1511.05493&quot;&gt;
+Gated Graph Sequence Neural Networks&lt;/a&gt;, &lt;a href=&quot;http://tkipf.github.io/graph-convolutional-networks/&quot;&gt;Graph Convolutional Networks&lt;/a&gt; etc.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h1 id=&quot;experiments&quot;&gt;Experiments&lt;/h1&gt;
+
+&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Broadly speaking, the task is to predict the properties of given molecules (regression problem).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The QM9 dataset consists of 130K molecules whose properties have been measured using Quantum Mechanical Simulations (DFT).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Properties to be predicted include atomization energy, enthalpy, highest fundamental vibrational frequency etc.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;There are two benchmarks for error:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;DFT Error - Estimated average error of DFT approximation&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Chemical Accuracy - As established by the chemistry community&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;model&quot;&gt;Model&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Following variants of &lt;em&gt;message function&lt;/em&gt; are explored:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Matrix multiplication between &lt;em&gt;A&lt;sub&gt;evw&lt;/sub&gt;&lt;/em&gt; and &lt;em&gt;h&lt;sub&gt;v&lt;/sub&gt;&lt;/em&gt; where &lt;em&gt;A&lt;/em&gt; is the adjacency matrix &lt;em&gt;h&lt;sub&gt;v&lt;/sub&gt;&lt;/em&gt; is the feature corresponding to node &lt;em&gt;v&lt;/em&gt;.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Edge Network which is same as matrix multiplication case with the difference that &lt;em&gt;A&lt;/em&gt; is a learned matrix for each edge type.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Pair Network where the feature vector corresponding to the source node, target node and edge is fed to a neural network.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;virtual-elements&quot;&gt;Virtual Elements&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Since all messages are shared via edges, it could take a long time for the message to move between two ends of the graph. To fasten this process, virtual elements are provided.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the first setting, “virtual edges” are inserted between nodes.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the second setting, a “master” node connects to all the other nodes.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;message-passing-complexity&quot;&gt;Message Passing Complexity&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;In a graph with &lt;em&gt;n&lt;/em&gt; nodes and &lt;em&gt;d&lt;/em&gt; dimensional feature vectors, a single step of message passing would have the worst case time complexity of &lt;em&gt;O(n&lt;sup&gt;2&lt;/sup&gt;d&lt;sup&gt;2&lt;/sup&gt;&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This complexity can be reduced by breaking the &lt;em&gt;d&lt;/em&gt; dimensional embedding into &lt;em&gt;k&lt;/em&gt; different groups of &lt;em&gt;d/k&lt;/em&gt; embeddings which can be updated in parallel. The complexity of the modified approach is &lt;em&gt;O(n&lt;sup&gt;2&lt;/sup&gt;d&lt;sup&gt;2&lt;/sup&gt;/k&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h1 id=&quot;results&quot;&gt;Results&lt;/h1&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Best performing MPNN model uses edge network as the &lt;em&gt;message function&lt;/em&gt; and &lt;a href=&quot;https://arxiv.org/abs/1511.06391&quot;&gt;set2set&lt;/a&gt; as the &lt;em&gt;readout function&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Using group of embeddings helps to improve generalization. This effect could also be because of ensemble-like nature of the modified architecture.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model performs worse without the virtual elements.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h1 id=&quot;takeaways&quot;&gt;Takeaways&lt;/h1&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Long range interaction between vertices is necessary.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Scaling to larger molecule sizes is challenging because the model creates a fully connected graph by incorporating virtual elements.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Unsupervised Learning by Predicting Noise</title>
+   <link href="/site/2018/04/02/Unsupervised-Learning-By-Predicting-Noise.html"/>
+   <updated>2018-04-02T00:00:00-04:00</updated>
+   <id>/site/2018/04/02/Unsupervised Learning By Predicting Noise</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Convolutional Neural Networks are extremely good feature extractors in the sense that features extracted for one task (say image classification) can be easily transferred to another task (say image segmentation).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Existing unsupervised approaches do not aim to learn discriminative features and supervised approaches for discriminative features do not scale well.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper presents an approach to learn features in an unsupervised setting by using a set of target representations called as Noise As Target (NAT) which acts as a kind of proxy supervising signal.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1704.05310&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;h3 id=&quot;unsupervised-setting&quot;&gt;Unsupervised Setting&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Given a collection of image X (x&lt;sub&gt;1&lt;/sub&gt;, x&lt;sub&gt;2&lt;/sub&gt;, …, x&lt;sub&gt;n&lt;/sub&gt;), we want to learn a parameterized mapping &lt;em&gt;f&lt;/em&gt; such that &lt;em&gt;f(x&lt;sub&gt;i&lt;/sub&gt;)&lt;/em&gt; gives the features of image &lt;em&gt;x&lt;sub&gt;i&lt;/sub&gt;&lt;/em&gt;. We would jointly learn the target vectors &lt;em&gt;y&lt;sub&gt;i&lt;/sub&gt;&lt;/em&gt; (more on it later).&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;loss-function&quot;&gt;Loss Function&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Squared L2 norm is used as the distance measure while making sure that final activations are unit normalized.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;fixed-target-representation&quot;&gt;Fixed Target Representation&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the setting of the problem where we are learning both the features and the target representation, a trivial solution would be the one where all the input images map to the same target and are assigned the same representation. No discriminative features are learned in this case.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;To avoid such situations, a set of k predefined target representations are chosen and each image is mapped to one of these k representations (based on the features).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;There is an assumption that k &amp;gt; n so that each image is assigned a different target.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One simple choice of target representation is the standard one-hot vector which implies that all the class (and by extension, the associated images) are orthogonal and equidistant from each other. But this is not a reasonable approximation as not all the image pairs are equally similar or dissimilar.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Instead, the target vectors are uniformly sampled from a d-dimensional unit sphere, where d is the dimensionality of the feature representation. That is, the idea is to map the features to the manifold of the d-dimensional L2 sphere by using the K predefined representations as for the discrete approximation of the manifold.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Since each data point (image) is mapped to a new point on the manifold, the algorithm is suited for online training as well.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;optimisation&quot;&gt;Optimisation&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;For the training, the number of target K is reduced to the number of images n and an assignment matrix P is learned which ensures that the mapping between the image to target is 1-to-1.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The resulting optimisation equation can be solved using the Hungarian Algorithm but at a high-cost O(n^3). An optimisation is to take a batch of b images and update the square matrix P&lt;sub&gt;B&lt;/sub&gt; for dimension bXb (made of the images and their corresponding targets). This reduces the overall complexity of O(nb^2).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Other optimisation techniques, that are common to supervised learning, like batch norm used in this setting as well.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;implementation-detail&quot;&gt;Implementation Detail&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Used AlexNet with NATs to train the unsupervised model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;An MLP is trained on these features to learn the classifier.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Standard preprocessing techniques like random cropping/flipping are used.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;experimental-details&quot;&gt;Experimental Details&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Dataset&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;ImageNet for training the AlexNet architecture with the proposed approach.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Pascal VOC 2007 for transfer learning experiments.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Baselines&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Unsupervised approaches like autoencoder, GAN, BiGAN&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Self-supervised&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;SOTA models using hand-made features SIFT with Fisher Vector.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;observation&quot;&gt;Observation&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Using squared loss instead of softmax does not deteriorate the performance too much.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The authors compare the effect of using discrete vs continuous target representations for transfer learning. For the discrete representation, elements of the canonical basis of a k-dimensional space (k=1000, 10000, 100000) are used. Experiments demonstrate that d-dimensional continuous vectors perform much better than the discrete vectors.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;While training the unsupervised network, its features were extracted after every 20 iterations to evaluate the performance on transfer learning task. The test accuracy increases up to around 100 iterations then saturate.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Comparing the visualization of the first convolutional layer filters (for AlexNet with and without supervision) shows that while unsupervised filters are less sharp, they maintain the edge and orientation information.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The proposed unsupervised method outperforms all the unsupervised baselines and is competitive with respect to the supervised baseline. But it is still far behind the model using handcrafted features.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For transfer learning, on Pascal VOC, the proposed approach beats the supervised baseline and works at par with the supervised approach.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;notes&quot;&gt;Notes&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposed a simple unsupervised framework for learning discriminative features without having to rely on proxy tasks like image generation and without having to make an assumption about the input domain.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The key aspect of the proposed approach is that each image is assigned to a unique point in the d-dimensional manifold which means 2 images could be very close to each other on the manifold while being quite distinct in reality. It is interesting to see that such a simple strategy is able to give such good results.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>The Lottery Ticket Hypothesis - Training Pruned Neural Networks</title>
+   <link href="/site/2018/03/25/The-Lottery-Ticket-Hypothesis-Training-Pruned-Neural-Networks.html"/>
+   <updated>2018-03-25T00:00:00-04:00</updated>
+   <id>/site/2018/03/25/The Lottery Ticket Hypothesis - Training Pruned Neural Networks</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Empirical evidence indicates that at training time, the neural networks need to be of significantly larger size than necessary.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper purposes a hypothesis called the &lt;em&gt;lottery ticket hypothesis&lt;/em&gt; to explain this behaviour.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The idea is the following - Successful training of a neural network depends on a &lt;em&gt;lucky&lt;/em&gt; random initialization of a subcomponent of the network. Such components are referred to as &lt;em&gt;lottery tickets&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Larger networks are more likely to have these &lt;em&gt;lottery tickets&lt;/em&gt; and hence are easier to train.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1803.03635&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;methodology&quot;&gt;Methodology&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Various aspects of the hypothesis are explored empirically.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Two tasks are considered - MNIST and XOR.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For each task, the paper considers networks of different sizes and empirically shows that larger networks are more likely to converge (or have better performance) for a fixed number of epochs as compared to the smaller networks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given a large, trained network, some weights (or units) of the network are pruned and the resulting network is reset to its initial random weights.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The resulting network is the &lt;em&gt;lottery-ticket&lt;/em&gt; in the sense that when the pruned network is trained, it is more likely to converge than an otherwise randomly initialised network of the same size. Further, it is more likely to match the original, larger network in terms of performance.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper explores different aspects of this experiment:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;Pruning Strategies:
+        &lt;ul&gt;
+          &lt;li&gt;One-shot strategy prunes the network in one-go while the iterative strategy prunes the network iteratively.&lt;/li&gt;
+          &lt;li&gt;Though the latter is computationally more intensive, it is more likely to find a lottery ticket.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Size of the pruned network affects the speed of convergence when training the &lt;em&gt;lottery ticket&lt;/em&gt;.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;If only the architecture or only the initial weights of the &lt;em&gt;lottery ticket&lt;/em&gt; are used, the resulting network tends to converge more slowly and achieves a lower level of performance.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;This indicates that the lottery ticket depends on both the network architecture and the weight initialization.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;discussion&quot;&gt;Discussion&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper includes some more interesting experiments. For instance, the distribution of the initialization in the weights that survived the pruning suggests that small weights from before training tend to remain small after training.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One interesting experiment would be to show the performance of the pruned network before resetting its weights and retraining again. This performance should be compared with the performance of the initial large network and the performance of the &lt;em&gt;lottery ticket&lt;/em&gt; after training.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Overall, the experiments are not sufficient to conclude anything about the correctness of the hypothesis. The proposition itself is very interesting and could enhance our understanding of how the neural networks work.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Cyclical Learning Rates for Training Neural Networks</title>
+   <link href="/site/2018/03/18/Cyclical-Learning-Rates-for-Training-Neural-Networks.html"/>
+   <updated>2018-03-18T00:00:00-04:00</updated>
+   <id>/site/2018/03/18/Cyclical Learning Rates for Training Neural Networks</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Conventional wisdom says that when training neural networks, learning rate should monotonically decrease. This insight forms the basis of the different type of adaptive learning rates.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Counter to this expected behaviour, the paper demonstrates that using a cyclical learning rate (CLR), varying between a minimum and a maximum value, helps to train the neural network faster without requiring fine-tuning of learning rate.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper also provides a simple approach to estimate the lower and upper bound for CLR.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1506.01186&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://github.com/bckenstler/CLR&quot;&gt;Link to the implementation&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;intution&quot;&gt;Intution&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Difficulty in minimizing the loss arises from saddle points and not from local minima. &lt;a href=&quot;http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf&quot;&gt;[Ref]&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Increasing the learning rate allows for rapid traversal of saddle points.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Alternatively, the optimal learning rate is expected to be between bounds of CLR and thus the learning rate would always be close to the optimal learning rate.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;parameter-estimation&quot;&gt;Parameter Estimation&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Cycle Length = Number of iterations till learning rate returns to the initial value = 2 * step_size&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;step_size should be set to 2-10 times the number of iterations in an epoch.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Estimating the CLR boundary values:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Run the model for several epochs while increasing the learning rate between the allowed low and high values.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Plot accuracy vs learning rate and note the learning rate values when the accuracy starts to fall.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;This gives a good candidate value for upper and lower bound. Alternatively, the lower bound could be set to be 1/3 or 3/4 of the upper bound. But it is difficult to judge if the model has run for the sufficient number of epochs in the first place.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;notes&quot;&gt;Notes&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The idea in itself is very simple and straight-forward to add to any existing model which makes it very appealing.&lt;/li&gt;
+  &lt;li&gt;The author has experimented with various architectures and datasets (from vision domain) and has reported faster training results.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning</title>
+   <link href="/site/2018/03/11/Improving-Information-Extraction-by-Acquiring-External-Evidence-with-Reinforcement-Learning.html"/>
+   <updated>2018-03-11T00:00:00-05:00</updated>
+   <id>/site/2018/03/11/Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Information Extraction  - Given a query to be answered and an external search engine, information extraction entails the task of issuing search queries, extracting information from new sources and reconciling the extracted values till we are sufficiently confident about the extracted values.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes the use of Reinforcement Learning (RL) to solve this task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1603.07954&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://github.com/karthikncode/DeepRL-InformationExtraction&quot;&gt;Implementation&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;key-aspect&quot;&gt;Key Aspect&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Use of Reinforcement Learning to resolve the ambiguity inherent in the textual documents.&lt;/li&gt;
+  &lt;li&gt;Given a query, the RL agent would use template statement to formulate the queries (to be performed on the black box search engine). It would further resolve and combine the result for the query from the set of retrieved documents.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;datasets&quot;&gt;Datasets&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Database of Mass Shootings in the United States.&lt;/li&gt;
+  &lt;li&gt;Food Shield database of illegal food adulteration.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;framework&quot;&gt;Framework&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Information extraction task is modelled as a Markov Decision Process (MDP) &amp;lt;S, A, T, R&amp;gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;S&lt;/strong&gt; - Set of all possible states
+    &lt;ul&gt;
+      &lt;li&gt;The state consists of:
+        &lt;ul&gt;
+          &lt;li&gt;Extractor’s confidence in predicted entity values.&lt;/li&gt;
+          &lt;li&gt;Context from which values are extracted.&lt;/li&gt;
+          &lt;li&gt;Similarity between the new document (extracted just now from the search engine) and the original document accompanying the given query.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;A&lt;/strong&gt; - Set of all possible actions
+    &lt;ul&gt;
+      &lt;li&gt;Reconciliation decision - d
+        &lt;ul&gt;
+          &lt;li&gt;Accept all entities values.&lt;/li&gt;
+          &lt;li&gt;Reject all entities values.&lt;/li&gt;
+          &lt;li&gt;Stop the current episode.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;Query choice - q
+        &lt;ul&gt;
+          &lt;li&gt;Choose the next query from a set of automatically generated alternatives.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;R&lt;/strong&gt; - Rewards
+    &lt;ul&gt;
+      &lt;li&gt;Maximise the final extraction accuracy while minimising the number of queries.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Q&lt;/strong&gt; - Queries
+    &lt;ul&gt;
+      &lt;li&gt;Generated using a template.&lt;/li&gt;
+      &lt;li&gt;The query is searched on a search engine and the top k links are retrieved.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Transition&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;Start with a single source article x&lt;sub&gt;i&lt;/sub&gt; and extract the initial set of entities.&lt;/li&gt;
+      &lt;li&gt;At each timestep, the agent is given the state (s) on basis of which it chooses the action (d, q). The episode stops whenever the action is a stop action.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Deep Q Network is used.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;Parameters are learned using SGD and RMSProp.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experimental-setup&quot;&gt;Experimental Setup&lt;/h2&gt;
+
+&lt;h3 id=&quot;extraction-model&quot;&gt;Extraction Model&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Max Entropy Classifier is used as the base extraction system.&lt;/li&gt;
+  &lt;li&gt;First, all the words in the document are tagged as one of the entity types and the mode of these values is used to obtain the set of extracted entities.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;baseline&quot;&gt;Baseline&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Basic Extractors&lt;/li&gt;
+  &lt;li&gt;Aggregation System which either chooses the entity value with the highest confidence or takes a majority vote over all extracted values.&lt;/li&gt;
+  &lt;li&gt;Meta-Classifier which operates over the same input state space and produces the same set of reconciliation decisions as the DQN.&lt;/li&gt;
+  &lt;li&gt;Oracle Extractor which is computed assuming perfect reconciliation and query decisions on the top of the Maxnet base extractor.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;rl-models&quot;&gt;RL Models&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;RL Basic - Only reconciliation decision.&lt;/li&gt;
+  &lt;li&gt;RL Query - Only query decision with a fixed reconciliation strategy.&lt;/li&gt;
+  &lt;li&gt;RL Extract - the full system with both reconciliation and query decision.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;result&quot;&gt;Result&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;RL Extract obtains substantial gains eg up to 11% over Maxnet.&lt;/li&gt;
+  &lt;li&gt;Simple aggregation schemes do not handle the task well.&lt;/li&gt;
+  &lt;li&gt;In terms of reward structure, providing rewards after each step works better than a single delayed reward.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks</title>
+   <link href="/site/2018/03/05/An-Empirical-Investigation-of-Catastrophic-Forgetting-in-Gradient-Based-Neural-Networks.html"/>
+   <updated>2018-03-05T00:00:00-05:00</updated>
+   <id>/site/2018/03/05/An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;Catastrophic Forgetting&lt;/em&gt; refers to the phenomenon where when a learning system is trained on two tasks in succession, it may forget how to perform the first task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper investigates this behaviour for different learning activations in presence and absence of dropout.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1312.6211&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://github.com/goodfeli/forgetting&quot;&gt;Link to the implementation&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiment-formulation&quot;&gt;Experiment Formulation&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;For each experiment, two tasks are defined - “old” task and “new” task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The network is first trained on the “old” task until the validation set error has not improved for the last 100 epochs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The “best” performing model is then trained for the “new” task until the combined error on the “old” and the “new” validation datasets has not improved in the last 100 epochs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;All the tasks used the same model architecture - 2 hidden layers followed by a softmax layer.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;Following activations were tested:
+    &lt;ul&gt;
+      &lt;li&gt;Sigmoid&lt;/li&gt;
+      &lt;li&gt;ReLU&lt;/li&gt;
+      &lt;li&gt;Hard Local Winner Takes It All&lt;/li&gt;
+      &lt;li&gt;Maxout&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Models were trained using SGD with or without dropout.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For each combination of the model, activation and the training mechanism, a random hyper param search was performed with set of 25 hyperparams.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;The authors took care to keep the hyperparams and other settings consistent and comparable across different experiments. Deviations, wherever applicable, and their reasons were documented.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;observations&quot;&gt;Observations&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;In terms of the relationship between the “old” and the “new” tasks, three kinds of settings are considered:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;The tasks are very very similar but the input is processed in a different format. For this setting, MNIST dataset was used with a different permutation of pixels for the “old” and the “new” task.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The tasks are similar but not exactly the same. For this setting, the task was to predict sentiments of reviews across 2 different product categories.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;In the last setting, 2 dissimilar tasks were used. One task was to predict sentiment of reviews and another task was to perform classification over MNIST dataset (reduced to 2 classes).&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Using Dropout improved the overall validation performance for all the models for all the tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Using Dropout also increase the size of the optimal model across all the activations indicating that maybe the increased size of the model could explain the increased resistance to forgetting. It would have been interesting to check if dropout always selected the largest model possible given the set of the hyperparams.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;On the dissimilar task, dropout improved the performance while reducing the model size so it might have other properties as well that helps to prevent forgetting.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;As compared to the choice of training technique, the activation function has a less consistent effect on resistance to forgetting. The paper recommends performing cross-validation for the choice of the activation function. If that is not feasible, maxout activation function with dropout could be used.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Learning an SAT Solver from Single-Bit Supervision</title>
+   <link href="/site/2018/02/24/Learning-a-SAT-Solver-from-Single-Bit-Supervision.html"/>
+   <updated>2018-02-24T00:00:00-05:00</updated>
+   <id>/site/2018/02/24/Learning a SAT Solver from Single-Bit Supervision</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper presents NeuroSAT, a message passing neural network that is trained to predict if a given SAT can be solved. As a side effect of training, the model also learns how to solve the SAT problem itself without any extra supervision.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1802.03685&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;background&quot;&gt;Background&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given an expression in the propositional logic, the task is to predict if there exists a substitution of variables that make the expression true.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The expression itself can be written as a conjunction of disjunctions (“and” over “or”) where each conjunct is called a clause and each variable within a clause is called a literal.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Invariants&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;The variables or clauses or literals (within the clauses) can be permuted.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Every occurrence of a variable can be negated.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;model&quot;&gt;Model&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given the SAT problem,  create an undirected graph of literals, their negations and the clauses they belong to.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Put an edge between every literal and the clause to which it belongs and another kind of edge between every literal and its negation.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Perform message passing between nodes to obtain vector representations corresponding to each node. Specifically, first, each clause received a message from its neighbours (literals) and updates its embeddings. Then every literal receives a message from its neighbours (both literals and clauses) and updates its embeddings.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;After T iterations, the nodes vote to decide the prediction of the model as a whole.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model is trained end-to-end using the cross-entropy loss between logit and the true label.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Permutation invariance is ensured by operating on the nodes and the edges in the topological order and negation invariance is ensured by treating all literals as the same.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;decoding-satisfying-assignment&quot;&gt;Decoding Satisfying Assignment&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The most interesting aspect of this work is that even though the model was trained to predict if the SAT problem can be satisfied, it is actually possible to extract the correct assignment from the classifier.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the early iterations, all the nodes vote “unsolvable” with low confidence. Then a few nodes start voting “solvable” and then a phase transition happens where most of the nodes start voting “solvable” with high confidence.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model never becomes highly confident that problem is “unsolvable” and almost never guesses “solvable” on an “unsolvable” problem. So in some sense, the model is looking for the combination of literals that actually solves the problem.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The authors found that the 2 dimensional PCA projections of the literal embeddings are initially mixed up but become more and more linearly separable as the phase transition happens.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Based on this insight, the authors propose to obtain cluster centres C1 and C2, partition the variables according to the cluster centres and then try assignments from both the partitions.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This alone provides a satisfying solution in over 70% of the cases when though there is no explicit supervising signal about how to solve the problem.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The other strengths of the paper includes&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Generalizing to longer and more difficult SAT problems (than those seen during training).&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Generalizing to another kind of search problems like graph colouring, clique detection etc (over small random graphs).&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper also reports that by adding supervising signal about which clauses in the given expression are unsatisfiable, it is possible to decode the literals which prove the “unsatisfiability” of an expression at test time. Though not a lot of details have been provided about this part and would probably be covered in the next iteration of the paper.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Neural Relational Inference for Interacting Systems</title>
+   <link href="/site/2018/02/17/Neural-Relational-Inference-for-Interacting-Systems.html"/>
+   <updated>2018-02-17T00:00:00-05:00</updated>
+   <id>/site/2018/02/17/Neural Relational Inference for Interacting Systems</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper presents Neural Relational Inference (NRI) model which can infer underlying interactions in a dynamical system in an unsupervised manner, using just the observational data in terms of the trajectories.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For instance, consider a simulated system where the particles are connected to each other by springs. The observational data does not explicitly specify which particles are connected to each other and only contains information like position and velocity of each particle at different timesteps.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The task is to explicitly infer the interaction structure (in this example, which pair of particles are connected to each other) while learning the dynamical model of the system itself.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1802.04687&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://github.com/ethanfetaya/nri&quot;&gt;Link to the implementation&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;model&quot;&gt;Model&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model consists of an encoder that encodes the given trajectories into an interaction graph and a decoder that decodes the dynamical model given the interaction graph.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model starts by assuming that a full connected interaction graph exists between the objects in the system.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For this latent graph &lt;strong&gt;z&lt;/strong&gt;, &lt;em&gt;z&lt;sub&gt;i, j&lt;/sub&gt;&lt;/em&gt; denotes the (discrete) edge type between object &lt;em&gt;v&lt;sub&gt;i&lt;/sub&gt;&lt;/em&gt; and &lt;em&gt;v&lt;sub&gt;j&lt;/sub&gt;&lt;/em&gt; with the assumption that there are &lt;em&gt;K&lt;/em&gt; edge types.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The object &lt;em&gt;v&lt;sub&gt;i&lt;/sub&gt;&lt;/em&gt; has a feature vector &lt;em&gt;x&lt;sub&gt;i&lt;/sub&gt;&lt;sup&gt;t&lt;/sup&gt;&lt;/em&gt; associated with it at time &lt;em&gt;t&lt;/em&gt;. This feature vector captures information like location and velocity.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;encoder&quot;&gt;Encoder&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;A Graph Neural Network (GNN) acts on the fully connected latent graph &lt;em&gt;z&lt;/em&gt;, performs message passing from node to node via edges and predicts the discrete label for each edge.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The GNN architecture may itself use MLPs or ConvNets and returns a factorised distribution over the edge types &lt;em&gt;q&lt;sub&gt;φ&lt;/sub&gt;(z|x)&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;decoder&quot;&gt;Decoder&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The decoder is another GNN (with separate params for each edge type) that predicts the future dynamics of the system and returns &lt;em&gt;p&lt;sub&gt;θ&lt;/sub&gt;(x|z)&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;The overall model is a VAE that optimizes the ELBO given as:&lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;E&lt;sub&gt;q&lt;sub&gt;φ&lt;/sub&gt;(z|x)&lt;/sub&gt;[log p&lt;sub&gt;θ&lt;/sub&gt;(x|z)] − KL[q&lt;sub&gt;φ&lt;/sub&gt;(z|x)||p&lt;sub&gt;θ&lt;/sub&gt;(z)]&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;p&lt;sub&gt;θ&lt;/sub&gt;(x)&lt;/em&gt; is the prior which is assumed to be uniform distribution over the edge types.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Instead of predicting the dynamics of the system for just the next timestep, the paper chooses to use the prediction multiple steps (10) in the future. This ensures that the interactions can have a significant effect on the dynamics of the system.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;In some cases, like real humans playing a physical sport, the dynamics of the system need not be Markovian and a recurrent decoder is used to model the time dependence.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;pipeline&quot;&gt;Pipeline&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given the dynamical system, run the encoder to obtain &lt;em&gt;q&lt;sub&gt;φ&lt;/sub&gt;(z|x)&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Sample &lt;em&gt;z&lt;sub&gt;i, j&lt;/sub&gt;&lt;/em&gt; from &lt;em&gt;q&lt;sub&gt;φ&lt;/sub&gt;(z|x)&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Run the decoder to predict the future dynamics for the next T timesteps.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Optimise the ELBO loss.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Note that since the latent variables (edge labels) are discrete in this case, the sampling is done from a continuous approximation of the discrete distribution and reparameterization trick is applied over this discrete approximation to get the (biased) gradients.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;observations&quot;&gt;Observations&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Experiments are performed using simulated systems like particles connected to springs, phase coupled oscillators and charged particles and using real-world data like CMU Motion Capture database and NBA tracking data.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The NRI system effectively predicts the dynamics of the systems and is able to reconstruct the ground truth interaction graph (for simulated systems).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks</title>
+   <link href="/site/2018/02/11/Stylistic-Transfer-in-Natural-Language-Generation-Systems-Using-Recurrent-Neural-Networks.html"/>
+   <updated>2018-02-11T00:00:00-05:00</updated>
+   <id>/site/2018/02/11/Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;&lt;a href=&quot;https://aclweb.org/anthology/W/W16/W16-6010.pdf&quot;&gt;This workshop paper&lt;/a&gt; explores the problem of style transfer in natural language generation (NLG).&lt;/li&gt;
+  &lt;li&gt;One possible manifestation would be rewriting technical articles in an easy-to-understate manner.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;challenges&quot;&gt;Challenges&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Identifying relevant stylistic cues and using them to control text generation in NLG systems.&lt;/li&gt;
+  &lt;li&gt;Absence of a large amount of training data.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;pitch&quot;&gt;Pitch&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Using Recurrent Neural Networks (RNNs) to disentangle the style from semantic content.&lt;/li&gt;
+  &lt;li&gt;Autoencoder model with two components - one for learning style and another for learning content.&lt;/li&gt;
+  &lt;li&gt;This allows for “style” component to be replaced while keeping the “content” component same, resulting in a style transfer.&lt;/li&gt;
+  &lt;li&gt;One way to think about this is - the encoder generates a 100-dimensional vector. In this, the first 50 entries, correspond to the “style” component and remaining to the “content” component.&lt;/li&gt;
+  &lt;li&gt;The proposal is that the loss function should be modified to include a cross-covariance term for ensuring disentanglement.&lt;/li&gt;
+  &lt;li&gt;I think one way of doing this is to have two loss functions:
+    &lt;ul&gt;
+      &lt;li&gt;The &lt;strong&gt;first loss&lt;/strong&gt; function ensures that the input sentence is decoded properly into the target sentence. This loss is computed for each sentence.&lt;/li&gt;
+      &lt;li&gt;The &lt;strong&gt;second loss&lt;/strong&gt; ensures that the first 50 entries across all the encoded represenations are are correlated. This loss operates at the batch level.&lt;/li&gt;
+      &lt;li&gt;The &lt;strong&gt;total loss&lt;/strong&gt; is the weighted sum of these 2 losses.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;possible-datasets&quot;&gt;Possible Datasets&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;&lt;a href=&quot;http://norvig.com/ngrams/shakespeare.txt&quot;&gt;Complete works of Shakespeare&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://www.kaggle.com/c/wikichallenge/data&quot;&gt;Wikpedia Kaggle dataset&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://ota.ox.ac.uk/&quot;&gt;Oxford Text Archive&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Twitter data&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;possible-metrics&quot;&gt;Possible Metrics&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Soundness - is the generated text entailed with the input sentence.&lt;/li&gt;
+  &lt;li&gt;Coherence - free of grammatical errors, proper word usage etc.&lt;/li&gt;
+  &lt;li&gt;Effectiveness - how effective was the style transfer&lt;/li&gt;
+  &lt;li&gt;Since some of the metrics are subjective, human evaluators also need to be employed.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Get To The Point - Summarization with Pointer-Generator Networks</title>
+   <link href="/site/2018/02/05/Get-To-The-Point-Summarization-with-Pointer-Generator-Networks.html"/>
+   <updated>2018-02-05T00:00:00-05:00</updated>
+   <id>/site/2018/02/05/Get To The Point-Summarization with Pointer-Generator Networks</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://gist.github.com/shagunsodhani/a2915921d7d0ac5cfd0e379025acfb9f&quot;&gt;Sequence-to-Sequence models&lt;/a&gt; have made abstract summarization viable but they still suffer from issues like &lt;em&gt;out of vocabulary&lt;/em&gt; words and repetitive sentences.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes to overcome these limitations by using a hybrid Pointer-Generator network (to copy words from the source text) and a &lt;em&gt;coverage&lt;/em&gt; vector that keeps track of content that has already been summarized so as to discourage repetition.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1704.04368&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://github.com/abisee/pointer-generator&quot;&gt;Code&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;model&quot;&gt;Model&lt;/h2&gt;
+
+&lt;h3 id=&quot;pointer-generator-network&quot;&gt;Pointer Generator Network&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;It is a hybrid model between the Sequence-to-Sequence network and &lt;a href=&quot;https://shagunsodhani.in/papers-I-read/Pointer-Networks&quot;&gt;Pointer Network&lt;/a&gt; such that when generating a word, the model decides whether the word would be generated using the softmax vocabulary (Sequence-to-Sequence) or using the source vocabulary (Pointer Network).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Since the model can choose a word from the source vocabulary, the issue of &lt;em&gt;out of vocabulary&lt;/em&gt; words is handled.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;coverage-mechanism&quot;&gt;Coverage Mechanism&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model maintains a &lt;em&gt;coverage&lt;/em&gt; vector which is the sum of attention distributions over all previous decoder timesteps.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This &lt;em&gt;coverage&lt;/em&gt; vector is fed as an input to the attention mechanism.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A &lt;em&gt;coverage loss&lt;/em&gt; is added to prevent the model from repeatedly attending to the same word.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The idea is to capture how much coverage different words have already received from the attention mechanism.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;observation&quot;&gt;Observation&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Model when evaluated on CNN/Daily Mail summarization task, outperforms the state-of-the-art by at least 2 ROUGE points though it still does not outperform the lead-3 baseline.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Lead-3 baseline uses first 3 sentences as the summary of the article which should be a strong baseline given that the dataset is actually about news articles.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model is initially trained without coverage and then finetuned with the coverage loss.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;During training, the model first learns how to copy words and then how to generate words (p&lt;sup&gt;gen&lt;/sup&gt; starts from 0.3 and converges to 0.53).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;During testing, the model strongly prefers copying over generating (p&lt;sup&gt;gen&lt;/sup&gt; = 0.17).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Further, whenever the model is at beginning of sentences or at the join between switched-together fragments, it prefers to generate a word instead of copying one from the source language.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The overall model is very simple, neat and interpretable and also performs well in practice.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>StarSpace - Embed All The Things!</title>
+   <link href="/site/2018/01/29/StarSpace-Embed-All-The-Things.html"/>
+   <updated>2018-01-29T00:00:00-05:00</updated>
+   <id>/site/2018/01/29/StarSpace - Embed All The Things</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper describes a general purpose neural embedding model where different type of entities (described in terms of discrete features) are embedded in a common vector space.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A similarity function is learnt to compare these entities in a meaningful way and score their similarity. The definition of the similarity function could depend on the downstream task where the embeddings are used.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1709.03856&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://github.com/facebookresearch/StarSpace&quot;&gt;Link to the implementation&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each entity is described as a set of discrete features. For example, for the recommendation use case, the users may be described as a bag-of-words of movies they have liked. For the search use case, the document may be described as a bag-of-words of words they are made up of.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given a dataset and a task at hand, generate a set of positive samples &lt;em&gt;E = (a, b)&lt;/em&gt; such that &lt;em&gt;a&lt;/em&gt; is the input to the task (from the dataset) and &lt;em&gt;b&lt;/em&gt; is the expected label(answer/entity) for the given task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Similarly, generate another set of negative samples &lt;em&gt;E &lt;sup&gt;-&lt;/sup&gt; = (a, b&lt;sub&gt;i&lt;/sub&gt;&lt;sup&gt;-&lt;/sup&gt;)&lt;/em&gt; such that &lt;em&gt;b&lt;sub&gt;i&lt;/sub&gt;&lt;sup&gt;-&lt;/sup&gt;&lt;/em&gt; is one of the incorrect label(answer/entity) for the given task. The incorrect entity can be sampled randomly from the set of candidate entities. Multiple incorrect samples could be generated for each positive example. These incorrect samples are indexed using &lt;em&gt;i&lt;/em&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For example, in case of supervised learning problem like document classification, &lt;em&gt;a&lt;/em&gt; would be one of the documents (probably described in terms of words), &lt;em&gt;b&lt;/em&gt; is the correct label and &lt;em&gt;b&lt;sub&gt;i&lt;/sub&gt;&lt;sup&gt;-&lt;/sup&gt;)&lt;/em&gt; is one of the randomly sampled label from set of all the labels (excluding the correct label).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In case of collaborative filtering, &lt;em&gt;a&lt;/em&gt; would be the user (either described as a discrete entity like a userid or in terms of items purchased so far), &lt;em&gt;b&lt;/em&gt; is the next item the user purchases and &lt;em&gt;b&lt;sub&gt;i&lt;/sub&gt;&lt;sup&gt;-&lt;/sup&gt;)&lt;/em&gt; is one of the randomly sampled item from the set of all the items.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A similarity function is chosen to compare the representation of entities of type &lt;em&gt;a&lt;/em&gt; and &lt;em&gt;b&lt;/em&gt;. The paper considered cosine similarity and inner product and observed that cosine similarity works better for the case with a large number of entities.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A loss function compares the similarity between positive pairs &lt;em&gt;(a, b)&lt;/em&gt; and &lt;em&gt;(a, b&lt;sub&gt;i&lt;/sub&gt;&lt;sup&gt;-&lt;/sup&gt;)&lt;/em&gt;. The paper considered margin ranking loss and negative log loss of softmax and reported that margin ranking loss works better.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The norm of embeddings is capped at 1.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;observations&quot;&gt;Observations&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The same model architecture is applied to a variety of tasks including multi-class classification, multi-label classification, collaborative filtering, content-based recommendation, link prediction, information retrieval, word embeddings and sentence embeddings.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model provides a strong baseline on all the tasks and performs at par with much more complicated and task-specific networks.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Emotional Chatting Machine - Emotional Conversation Generation with Internal and External Memory</title>
+   <link href="/site/2018/01/22/Emotional-Chatting-Machine-Emotional-Conversation-Generation-with-Internal-and-External-Memory.html"/>
+   <updated>2018-01-22T00:00:00-05:00</updated>
+   <id>/site/2018/01/22/Emotional Chatting Machine-Emotional Conversation Generation with Internal and External Memory</id>
+   <content type="html">&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes ECM (Emotional Chatting Machine) which can generate both semantically and emotionally appropriate responses in a dialogue setting.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;More specifically, given an input utterance or dialogue and the desired emotional category of the response, ECM is to generate an appropriate response that conforms to the given emotional category.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1704.01074&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Much of the recent, deep learning based work on conversational agents has focused on the use of encoder-decoder framework where the input utterance (given sequence of words) is mapped to a response utterance (target sequence of words). This is the so-called seq2seq family of models.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;ECM model can sit within this framework and introduces 3 new components:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;&lt;strong&gt;Emotion Category Embedding&lt;/strong&gt;
+        &lt;ul&gt;
+          &lt;li&gt;Embed the emotion categories into a real-valued, low-dimensional vector space.&lt;/li&gt;
+          &lt;li&gt;These embeddings are used as input to the decoder and are learnt along with rest of the model.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;&lt;strong&gt;Internal Memory&lt;/strong&gt;
+        &lt;ul&gt;
+          &lt;li&gt;Physiological, emotional responses are relatively short-lived and involve changes.&lt;/li&gt;
+          &lt;li&gt;ECM accounts for this effect by adding an Internal Memory which captures this dynamics of emotions during decoding.&lt;/li&gt;
+          &lt;li&gt;It starts with “full” emotions in the beginning and keeps decaying the emotion value over time.&lt;/li&gt;
+          &lt;li&gt;How much of the emotion value is to be decayed is determined by a sigmoid gate.&lt;/li&gt;
+          &lt;li&gt;By the time the sentence is decoded, the value becomes zero, signifying that the emotion has been completely expressed.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;&lt;strong&gt;External Memory&lt;/strong&gt;
+        &lt;ul&gt;
+          &lt;li&gt;Emotional responses are expected to carry emotionally strong words along with generic, neutral words.&lt;/li&gt;
+          &lt;li&gt;An external memory is used to include the emotionally strong words explicitly by using 2 non-overlapping vocabularies - &lt;em&gt;generic&lt;/em&gt; vocabulary and the &lt;em&gt;emotion&lt;/em&gt; vocabulary (read from the external memory).&lt;/li&gt;
+          &lt;li&gt;Both these vocabularies are assigned different generation probabilities and an output gate controls the weights of &lt;em&gt;generic&lt;/em&gt; and &lt;em&gt;emotion&lt;/em&gt; words.&lt;/li&gt;
+          &lt;li&gt;This way the &lt;em&gt;emotion&lt;/em&gt; words are included in an otherwise neutral response.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Loss function&lt;/strong&gt;&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;The first component is the cross-entropy loss between predicted and target token distribution.&lt;/li&gt;
+      &lt;li&gt;A regularization term on internal memory to make sure the emotional state decays to 0 at the end of the decoding process.&lt;/li&gt;
+      &lt;li&gt;Another regularization term on external memory to supervise the probability of selection of a &lt;em&gt;generic&lt;/em&gt; vs &lt;em&gt;emotion&lt;/em&gt; word.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;em&gt;*Dataset&lt;/em&gt;
+    &lt;ul&gt;
+      &lt;li&gt;STC Dataset (~220K posts and ~4300K responses) annotated by the emotional classifier. Any error on the part of the classifier degrades the quality of the training dataset.&lt;/li&gt;
+      &lt;li&gt;NLPCC Dataset - Emotion classification dataset with 23105 sentences.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Metric&lt;/strong&gt;&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;Perplexity to evaluate the model at the content level.&lt;/li&gt;
+      &lt;li&gt;Emotion accuracy to evaluate the model at the emotional level.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;ECM achieves a perplexity of 65.9 and emotional accuracy of 0.773.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Based on human evaluations, ECM statistically outperforms the seq2seq baselines on both naturalness (likeliness of response being generated by a human) and emotion accuracy.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Notes&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;It is an interesting idea to let the sigmoid gate decide how the emotion “value” be spent while decoding. It seems similar to the idea of how much do we want to “attend” to the emotion value the key difference being that your total attention is limited. It would be interesting to see the shape of the distribution of how much of the emotion value is spent at each decoding time step. If the curve is highly biased towards say using most of the emotion value towards the end of the decoding process, maybe another regularisation term is needed to ensure a more balanced distribution of how the emotion is spent.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Exploring Models and Data for Image Question Answering</title>
+   <link href="/site/2018/01/14/Exploring-Models-and-Data-for-Image-Question-Answering.html"/>
+   <updated>2018-01-14T00:00:00-05:00</updated>
+   <id>/site/2018/01/14/Exploring Models and Data for Image Question Answering</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Problem Statement&lt;/strong&gt;: Given an image, answer a given question about the image.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1505.02074&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Assumptions&lt;/strong&gt;:&lt;/p&gt;
+    &lt;ul&gt;
+      &lt;li&gt;The answer is assumed to be a single word thereby bypassing the evaluation issues of multi-word generation tasks.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;vis-lstm-model&quot;&gt;VIS-LSTM Model&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Treat the input image as the first word in the question.&lt;/li&gt;
+  &lt;li&gt;Obtain the vector representation (skip-gram) for words in the question.&lt;/li&gt;
+  &lt;li&gt;Obtain the VGG Net embeddings of the image and use a linear transformation (dimensionality reduction weight matrix) to match the dimensions of word embeddings.&lt;/li&gt;
+  &lt;li&gt;Keep image embedding frozen during training and use an LSTM to combine the word vectors.&lt;/li&gt;
+  &lt;li&gt;LSTM outputs are fed into a softmax layer which generates the answer.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;dataset&quot;&gt;Dataset&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;DAtaset for QUestion Ansering on Real-world images (DAQUAR)
+    &lt;ul&gt;
+      &lt;li&gt;1300 images and 7000 questions with 37 object classes.&lt;/li&gt;
+      &lt;li&gt;Downside is that even guess work can yield good results.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;The paper proposed an algorithm for generating questions using MS-COCO dataset.
+    &lt;ul&gt;
+      &lt;li&gt;Perform preprocessing steps like breaking large sentences and changing indefinite determines to definite ones.&lt;/li&gt;
+      &lt;li&gt;&lt;em&gt;object&lt;/em&gt; questions, &lt;em&gt;number&lt;/em&gt; questions, &lt;em&gt;colour&lt;/em&gt; questions and &lt;em&gt;location&lt;/em&gt; questions can be generated by searching for nouns, numbers, colours and prepositions respectively.&lt;/li&gt;
+      &lt;li&gt;Resulting dataset has ~120K questions across above 4 semantic types.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;models&quot;&gt;Models&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;VIS+LSTM - explained above&lt;/li&gt;
+  &lt;li&gt;2-VIS+BLSTM - Add the image features twice, in beginning and in the end (using different linear transformations) plus use bidirectional LSTM&lt;/li&gt;
+  &lt;li&gt;IMG+BOW - Multinomial logistic regression on image features without dimensionality reduction + bag of words (averaging word vectors).&lt;/li&gt;
+  &lt;li&gt;FULL - Simple average of above 2 models.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;baseline&quot;&gt;Baseline&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Includes models where the answer is guessed, or only image or question features are used or image features along with prior knowledge of object are used.&lt;/li&gt;
+  &lt;li&gt;Also includes a KNN model where the system finds the nearest (image, question) pair.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;metrics&quot;&gt;Metrics&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Accuracy&lt;/li&gt;
+  &lt;li&gt;Wu-Palmer similarity measure&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;observations&quot;&gt;Observations&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The VIS-LSTM model outperforms the baselines while the FULL model benefits from averaging across all the models.&lt;/li&gt;
+  &lt;li&gt;Some useful information seems to be lost when downsizing the VGG vectors.&lt;/li&gt;
+  &lt;li&gt;Fine tuning the word vectors helps with performance.&lt;/li&gt;
+  &lt;li&gt;Normalising CNN hidden image features into zero mean and unit variance leads to faster training.&lt;/li&gt;
+  &lt;li&gt;Model does not perform well on the task of considering spatial relations between multiple objects and counting objects when multiple objects are present&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>How transferable are features in deep neural networks</title>
+   <link href="/site/2018/01/06/How-transferable-are-features-in-deep-neural-networks.html"/>
+   <updated>2018-01-06T00:00:00-05:00</updated>
+   <id>/site/2018/01/06/How transferable are features in deep neural networks</id>
+   <content type="html">&lt;h1 id=&quot;introduction&quot;&gt;Introduction&lt;/h1&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;When neural networks are trained on images, they tend to learn the same kind of features for the first layer (corresponding to Gabor filters or colour blobs). The first layer features are “general” irrespective of the task/optimizer etc.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The final layer features tend to be “specific” in the sense that they strongly depend on the task.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper studies the transition of generalization property across layers in the network. This could be useful in the domain of transfer learning where features are reused across tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;http://papers.nips.cc/paper/5347-how-transferable-are-features-in-deep-neural-networks.pdf&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h1 id=&quot;setup&quot;&gt;Setup&lt;/h1&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Degree of generality of a set of features, learned on task A, is defined as the extent to which these features can be used for another task B.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Randomly split 1000 ImageNet classes into 2 groups (corresponding to tasks A and B). Each group has 500 classes and half the total number of examples.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Two 8-layer convolutional networks are trained on the two datasets and labelled as baseA and baseB respectively.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Now choose a layer numbered n from {1, 2…7}.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For each layer n, train the following two networks:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;&lt;strong&gt;Selffer Network BnB&lt;/strong&gt;
+        &lt;ul&gt;
+          &lt;li&gt;Copy (and freeze) first n layers from baseB. The remaining layers are initialized randomly and trained on B.&lt;/li&gt;
+          &lt;li&gt;This serves as the control group.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;&lt;strong&gt;Transfer Network AnB&lt;/strong&gt;
+        &lt;ul&gt;
+          &lt;li&gt;Copy (and freeze) first n layers from baseA. The remaining layers are initialized randomly and trained on B.&lt;/li&gt;
+          &lt;li&gt;This corresponds to transferring features from A to B.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;If AnB performs well, n&lt;sup&gt;th&lt;/sup&gt; layer features are “general”.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In another setting, the transferred layers are also fine-tuned (BnB&lt;sup&gt;+&lt;/sup&gt; and AnB&lt;sup&gt;+&lt;/sup&gt;).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;ImageNet dataset contains a hierarchy of classes which allow for creating the datasets A and B with high and low similarity.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h1 id=&quot;observation&quot;&gt;Observation&lt;/h1&gt;
+
+&lt;h2 id=&quot;dataset-a-and-b-are-similar&quot;&gt;Dataset A and B are similar&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;For n = {1, 2}, the performance of the BnB model is same as baseB model. For n = {3, 4, 5, 6}, the performance of BnB model is worse.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This indicates the presence of “fragile co-adaption” features on successive layers where features interact with each other in a complex way and can not be easily separated across layers. This is more prominent across middle layers and less across the first and the last layers.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For model AnB, the performance of baseB for n = {1, 2}. Beyond that, the performance begins to drop.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Transfer learning of features followed by fine-tuning gives better results than training the network from scratch.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;dataset-a-and-b-are-dissimilar&quot;&gt;Dataset A and B are dissimilar&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Effectiveness of feature transfer decreases as the two tasks become less similar.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;random-weights&quot;&gt;Random Weights&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Instead of using transferred weights in BnB and BnA, the first n layers were initialized randomly.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The performance falls for layer 1 and 2. It further drops to near-random level for layers 3 and beyond.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Another interesting insight is that even for dissimilar tasks, transferring features is better than using random features.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Distilling the Knowledge in a Neural Network</title>
+   <link href="/site/2017/12/31/Distilling-the-Knowledge-in-a-Neural-Network.html"/>
+   <updated>2017-12-31T00:00:00-05:00</updated>
+   <id>/site/2017/12/31/Distilling the Knowledge in a Neural Network</id>
+   <content type="html">&lt;h1 id=&quot;introduction&quot;&gt;Introduction&lt;/h1&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;In machine learning, it is common to train a single large model (with a large number of parameters) or ensemble of multiple smaller models using the same dataset.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;While such large models help to improve the performance of the system, they also make it difficult and computationally expensive to deploy the system.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes to transfer the knowledge from such “cumbersome” models into a single, “simpler” model which is more suitable for deployment. This transfer of knowledge is referred to as “distillation”.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1503.02531&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h1 id=&quot;idea&quot;&gt;Idea&lt;/h1&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Train the cumbersome model using the given training data in the usual way.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Train the simpler, distilled model using the class probabilities (from the cumbersome model) as the soft target. Thus, the simpler model is trained to generalise the same way as the cumbersome model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;If the soft targets have high entropy, they provide much more information than the hard targets and the gradient (between training examples) would vary lesser.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;One approach is to minimise the L2 difference between logits produced by the cumbersome model and the simpler model. This approach was pursued by &lt;a href=&quot;https://www.cs.cornell.edu/~caruana/compression.kdd06.pdf&quot;&gt;Buciluǎ et al.&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes a more general solution which they name “distillation”. The temperature of the final softmax is increased till the cumbersome model produces a set of soft targets (from the final softmax layer). These soft targets are then used to train the simpler model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It also shows that the proposed approach is, in fact, a more general case of the first approach.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h1 id=&quot;approach&quot;&gt;Approach&lt;/h1&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the simplest setting, the cumbersome model is first trained with a high value of temperature and then the same temperature value is used to train the simpler model. The temperature is set to 1 when making predictions using the simpler model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It helps to add an auxiliary objective function which corresponds to the cross-entropy loss with the correct labels. The second objective function should be given a much lower weight though. Further, the magnitude of the soft targets needs to be scaled by multiplying with the square of temperature.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h1 id=&quot;experiment&quot;&gt;Experiment&lt;/h1&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper reports favourable results for distillation task for the following domains:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Image Classification (on MNIST dataset)&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;An extra experiment is performed where the simpler model is not shown any images of “3” but the model fails for only 133 cases out of 1010 cases involving “3”.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Automatic Speech Recognition (ASR)&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;An extra experiment is performed where the baseline model is trained using both hard targets and soft targets alternatively. Further, only 3% of the total dataset is used.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;The model using hard targets overfits and has poor test accuracy while the model using soft targets does not overfit and gets much better test accuracy. This shows the regularizing effect of soft targets.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Training ensemble specialists for very large datasets (JFT dataset - an internal dataset at Google)&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;The experiment shows that while training a single large model would take a lot of time, the performance of the model can be improved by learning a small number of specialised networks (which are faster to train).&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Though it is yet to be shown that the knowledge of such specialist models can be distilled back into a single model.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>PTE - Predictive Text Embedding through Large-scale Heterogeneous Text Networks</title>
+   <link href="/site/2017/12/24/PTE-Predictive-Text-Embedding-through-Large-scale-Heterogeneous-Text-Networks.html"/>
+   <updated>2017-12-24T00:00:00-05:00</updated>
+   <id>/site/2017/12/24/PTE - Predictive Text Embedding through Large-scale Heterogeneous Text Networks</id>
+   <content type="html">&lt;h1 id=&quot;introduction&quot;&gt;Introduction&lt;/h1&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Unsupervised text embeddings can be generalized for different tasks but they have weaker predictive powers (as compared to end-to-end trained deep learning methods) for any particular task. But the deep learning techniques are expensive and need a large amount of supervised data and a large number of parameters to tune.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper introduces Predictive Text Embedding (PTE) - a semi-supervised approach which learns an effective low dimensional representation using a large amount of unsupervised data and a small amount of supervised data.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The work can be extended to general information networks as well as classic techniques like MDS, Iso-map, Laplacian EigenMaps etc do not scale well for large graphs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Further, this model can be applied to heterogeneous networks as well unlike the previous works &lt;a href=&quot;https://arxiv.org/abs/1503.03578&quot;&gt;LINE&lt;/a&gt; and &lt;a href=&quot;https://arxiv.org/abs/1403.6652&quot;&gt;DeepWalk&lt;/a&gt; which work on homogeneous networks only.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1508.00200&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h1 id=&quot;approach&quot;&gt;Approach&lt;/h1&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes 3 different kinds of networks:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;&lt;strong&gt;Word-Word Network&lt;/strong&gt; which captures the word co-occurrence information (local level).&lt;/li&gt;
+      &lt;li&gt;&lt;strong&gt;Word-Document Network&lt;/strong&gt; which captures the word-document co-occurrence information (local + document level).&lt;/li&gt;
+      &lt;li&gt;&lt;strong&gt;Word-Label Network&lt;/strong&gt; which captures the word-label co-occurrence information (bipartite graph).&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;All 3 graphs are integrated into one heterogeneous text network.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;First, the authors extend their previous work, LINE, for heterogenous bipartite text networks as explained:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Given a bipartite graph &lt;em&gt;G = (V&lt;sub&gt;A&lt;/sub&gt; \bigcup V&lt;sub&gt;B&lt;/sub&gt;, E)&lt;/em&gt; , where &lt;em&gt;V&lt;sub&gt;A&lt;/sub&gt; and V&lt;sub&gt;B&lt;/sub&gt;&lt;/em&gt; are disjoint set of vertices, the conditional probability of &lt;em&gt;v&lt;sub&gt;a&lt;/sub&gt;&lt;/em&gt; (in set &lt;em&gt;V&lt;sub&gt;A&lt;/sub&gt;&lt;/em&gt;) being generated by &lt;em&gt;v&lt;sub&gt;b&lt;/sub&gt;&lt;/em&gt; (in set &lt;em&gt;V&lt;sub&gt;B&lt;/sub&gt;&lt;/em&gt;) is given as the softmax score between embeddings of &lt;em&gt;v&lt;sub&gt;a&lt;/sub&gt;&lt;/em&gt; and &lt;em&gt;v&lt;sub&gt;b&lt;/sub&gt;&lt;/em&gt; and normalised by the sum of exponentials of dot products between &lt;em&gt;v&lt;sub&gt;b&lt;/sub&gt;&lt;/em&gt;  and all nodes in &lt;em&gt;V&lt;sub&gt;A&lt;/sub&gt;&lt;/em&gt;.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;table&gt;
+          &lt;tbody&gt;
+            &lt;tr&gt;
+              &lt;td&gt;The second order proximity can be determined by the conditional distributions *p(.&lt;/td&gt;
+              &lt;td&gt;v&lt;sub&gt;j&lt;/sub&gt;)*p(.&lt;/td&gt;
+              &lt;td&gt;v&lt;sub&gt;j&lt;/sub&gt;)*.&lt;/td&gt;
+            &lt;/tr&gt;
+          &lt;/tbody&gt;
+        &lt;/table&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The objective to be minimised the KL divergence between the conditional distribution &lt;em&gt;p(.\v&lt;sub&gt;j&lt;/sub&gt;)&lt;/em&gt; and the emperical distribution &lt;em&gt;p&lt;sup&gt;^&lt;/sup&gt;(.\v&lt;sub&gt;j&lt;/sub&gt;)&lt;/em&gt; (given as w&lt;sub&gt;i, j&lt;/sub&gt;/deg&lt;sub&gt;j&lt;/sub&gt;).&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;The objective can be further simplified and optimised using SGD with edge sampling and negative sampling.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Now, the 3 individual networks can all be interpreted as bipartite networks. So node representation of all the 3 individual networks is obtained as described above.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For the word-label network, since the training data is sparse, one could either train the unlabelled networks first and then the labelled network or they all could be trained jointly.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For the case of joint training, the edges are sampled from the 3 networks alternatively.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For the fine-tuning case, the edges are first sampled from the unlabelled network and then from the labelled network.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Once the word embeddings are obtained, the text embeddings may be obtained by simply averaging the word embeddings.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h1 id=&quot;evaluation&quot;&gt;Evaluation&lt;/h1&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Baseline Models&lt;/strong&gt;&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;Local word co-occurence based methods - SkipGram, LINE(Gww)&lt;/li&gt;
+      &lt;li&gt;Document word co-occurence based methods - LINE(Gwd), PV-DBOW&lt;/li&gt;
+      &lt;li&gt;Combined method - LINE (Gww + Gwd)&lt;/li&gt;
+      &lt;li&gt;CNN&lt;/li&gt;
+      &lt;li&gt;PTE&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For long documents, PTE (joint) outperforms CNN and other PTE variants and is around 10 times faster than CNN model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For short documents, PTE (joint) does not always outperform CNN model probably because the word sense ambiguity is more relevant in the short documents.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Revisiting Semi-Supervised Learning with Graph Embeddings</title>
+   <link href="/site/2017/12/11/Revisiting-Semi-Supervised-Learning-with-Graph-Embeddings.html"/>
+   <updated>2017-12-11T00:00:00-05:00</updated>
+   <id>/site/2017/12/11/Revisiting Semi-Supervised Learning with Graph Embeddings</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper presents a semi-supervised learning framework for graphs where the node embeddings are used to jointly predict both the class labels and neighbourhood context. Usually, graph embeddings are learnt in an unsupervised manner and can not leverage the supervising signal coming from the labelled data.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The framework is called &lt;a href=&quot;https://github.com/kimiyoung/planetoid&quot;&gt;Planetoid (Predicting Labels And Neighbors with Embeddings Transductively Or Inductively from Data)&lt;/a&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1603.08861&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;problem-setting&quot;&gt;Problem Setting&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given a graph G = (V, E) and x&lt;sub&gt;L&lt;/sub&gt; and x&lt;sub&gt;U&lt;/sub&gt; as feature vectors for labelled and unlabelled nodes and y&lt;sub&gt;L&lt;/sub&gt; as labels for the labelled nodes, the problem is to learn a mapping (classifier) f: x -&amp;gt; y&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;There are two settings possible:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;strong&gt;Transductive&lt;/strong&gt; - Predictions are made only for those nodes which are already observed in the graph at training time.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;strong&gt;Inductive&lt;/strong&gt; - Predictions are made for nodes whether they have been observed in the graph at training time or not.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The general semi-supervised learning loss would be &lt;em&gt;L&lt;sub&gt;S&lt;/sub&gt; + λL&lt;sub&gt;U&lt;/sub&gt;&lt;/em&gt; where &lt;em&gt;L&lt;sub&gt;S&lt;/sub&gt;&lt;/em&gt; is the supervised learning loss while &lt;em&gt;L&lt;sub&gt;U&lt;/sub&gt;&lt;/em&gt; is the unsupervised learning loss.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The unsupervised loss is a variant of the Skip-gram loss with negative edge sampling.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;More specifically, first a random walk sequence S is sampled. Then either a positive edge is sampled from S (within a given context distance) or a negative edge is sampled.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The label information is injected by using the label as a context and minimising the distance between the positive edges (edges where the nodes have the same label) and maximising the distance between the negative edges (edges where the nodes have different labels).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;transductive-formulation&quot;&gt;Transductive Formulation&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Two separate fully connected networks are applied over the node features and node embeddings.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;These 2 representations are then concatenated and fed to a softmax classifier to predict the class label.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;inductive-formulation&quot;&gt;Inductive Formulation&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the inductive setting, it is difficult to obtain the node embeddings at test time. One naive approach is to retrain the network to obtain the embeddings on the previously unobserved nodes but that is inefficient.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The embeddings of node x are parameterized as a function of its input feature vector and is learnt by applying a fully connected neural network on the node feature vector.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;This provides a simple way to extend the original approach to the inductive setting.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The proposed approach is evaluated in 3 settings (text classification, distantly supervised entity extraction and entity classification) and it consistently outperforms approaches that use just node features or node embeddings.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The key takeaway is that the joint training in the semi-supervised setting has several benefits over the unsupervised setting and that using the graph context (in terms of node embeddings) is much more effective than using graph Laplacian-based regularization term.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension</title>
+   <link href="/site/2017/11/28/Two-Stage-Synthesis-Networks-for-Transfer-Learning-in-Machine-Comprehension.html"/>
+   <updated>2017-11-28T00:00:00-05:00</updated>
+   <id>/site/2017/11/28/Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The paper proposes a two-stage synthesis network that can perform transfer learning for the task of machine comprehension.&lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The problem is the following:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;We have a domain D&lt;sub&gt;S&lt;/sub&gt; for which we have labelled dataset of question-answer pairs and another domain D&lt;sub&gt;T&lt;/sub&gt; for which we do not have any labelled dataset.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;We use the data for domain D&lt;sub&gt;S&lt;/sub&gt; to train SynNet and use that to generate synthetic question-answer pairs for domain D&lt;sub&gt;T&lt;/sub&gt;.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Now we can train a machine comprehension model M on D&lt;sub&gt;S&lt;/sub&gt; and finetune using the synthetic data for D&lt;sub&gt;T&lt;/sub&gt;.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://www.microsoft.com/en-us/research/publication/two-stage-synthesis-networks-transfer-learning-machine-comprehension/&quot;&gt;Link to the paper&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;synnet&quot;&gt;SynNet&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Works in two stages:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;Answer Synthesis - Given a text paragraph, generate an answer.&lt;/li&gt;
+      &lt;li&gt;Question Synthesis - Given a text paragraph and an answer, generate a question.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;answer-synthesis-network&quot;&gt;Answer Synthesis Network&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Given the labelled dataset for D&lt;sub&gt;S&lt;/sub&gt;, generate a labelled dataset of &amp;lt;word, tag&amp;gt; pair such that each word in the given paragraph is assigned one of the 4 tags:
+    &lt;ul&gt;
+      &lt;li&gt;IOB&lt;sub&gt;start&lt;/sub&gt; - if it is the starting word of an answer&lt;/li&gt;
+      &lt;li&gt;IOB&lt;sub&gt;mid&lt;/sub&gt; - if it is the intermediate word of an answer&lt;/li&gt;
+      &lt;li&gt;IOB&lt;sub&gt;end&lt;/sub&gt; - if it is the ending word of an answer&lt;/li&gt;
+      &lt;li&gt;IOB&lt;sub&gt;none&lt;/sub&gt; - if it is not part of any answer&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For training, map the words to their GloVe embeddings and pass through a Bi-LSTM. Next, pass them through two-FC layers followed by a softmax layer.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;For the target domain D&lt;sub&gt;T&lt;/sub&gt;, all the consecutive word spans where no label is IOB&lt;sub&gt;none&lt;/sub&gt; are returned as candidate answers.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;question-synthesis-network&quot;&gt;Question Synthesis Network&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given an input paragraph and a candidate answer, Question Synthesis network generates question one word at a time.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Map each word in the paragraph to their GloVe embedding. After the word vector, append a ‘1’ if the word was part of the candidate answer else append a ‘0’.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Feed to a Bi-LSTM network (encoder-decoder) where the decoder conditions on the representation generated by the encoder as well as the question tokens generated so far. Decoding is stopped when “END” token is produced.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paragraph may contain some named entities or rare words which do not appear in the softmax vocabulary. To account for such words, a copying mechanism is also incorporated.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;At each time step, a Pointer Network (C&lt;sub&gt;P&lt;/sub&gt;) and a Vocabulary Predictor (V&lt;sub&gt;P&lt;/sub&gt;) are used to generate probability distribution for the next word and a Latent Predictor Network is used to decide which of the two networks would be used for the prediction.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;At inference time, a greedy decoding is used where the most likely predictor is chosen and then the most likely word from that predictor is chosen.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;machine-comprehension-model&quot;&gt;Machine Comprehension Model&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Given any MC model, first train it over domain D&lt;sub&gt;S&lt;/sub&gt; and then fine-tune using the artificial questions generated using D&lt;sub&gt;T&lt;/sub&gt;.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;implementation-details&quot;&gt;Implementation Details&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Data Regularization&lt;/strong&gt; - There is a need to alternate between mini batches from source and target domain while fine-tuning the MC model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;At inference time, the fine-tuned MC model is used to get the distribution P(i=start) and P(i=end) (corresponding to the likelihood of choosing word I as the starting or ending word for the answer) for all the words and DP is used to find the optimal answer span.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Checkpoint Averaging&lt;/strong&gt; - Use the different checkpointed models to average the answer likelihood before running DP.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Using the synthetically generated dataset helps to gain a 2% improvement in terms of F-score (from SQuAD -&amp;gt; NewsQA). Using checkpointed models further improves the performance to overall 46.6% F score which closes the gap with respect to the performance of model trained on NewsQA itself (~52.3% F score)&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Higher-order organization of complex networks</title>
+   <link href="/site/2017/11/19/Higher-order-organization-of-complex-networks.html"/>
+   <updated>2017-11-19T00:00:00-05:00</updated>
+   <id>/site/2017/11/19/Higher-order organization of complex networks</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper presents a generalized framework for graph clustering (clusters of network motifs) on the basis of higher-order connectivity patterns.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;http://science.sciencemag.org/content/353/6295/163&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given a &lt;a href=&quot;https://shagunsodhani.in/papers-I-read/Network-Motifs-Simple-Building-Blocks-of-Complex-Networks&quot;&gt;motif M&lt;/a&gt;, the framework aims to find a cluster of the set of nodes S such that nodes of S participate in many instances of M and avoid cutting instances of M (that is only a subset of nodes in instances of M appears in S).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Mathematically, the aim is to minimise the motif conductance metric given as &lt;em&gt;cut&lt;sub&gt;M&lt;/sub&gt;(S, S’) / min[vol&lt;sub&gt;M&lt;/sub&gt;(S), vol&lt;sub&gt;M&lt;/sub&gt;(S’)]&lt;/em&gt; where &lt;em&gt;S’&lt;/em&gt; is complement of &lt;em&gt;S&lt;/em&gt;, &lt;em&gt;cut&lt;sub&gt;M&lt;/sub&gt;(S, S’)&lt;/em&gt; = number of instances of M which have atleast one node from both &lt;em&gt;S&lt;/em&gt; and &lt;em&gt;S’&lt;/em&gt; and &lt;em&gt;vol&lt;sub&gt;M&lt;/sub&gt;(S)&lt;/em&gt; = Number of nodes in instances of M that belong only to S.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Solving the above equation is computationally infeasible and an approximate solution is proposed using eigenvalues and matrices.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The approximate solution is easy to implement, efficient and guaranteed to find clusters that are at most a quadratic factor away from the optimal.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;algorithm&quot;&gt;Algorithm&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given the network and motif M, form a motif adjacency matrix W&lt;sub&gt;M&lt;/sub&gt; where W&lt;sub&gt;M&lt;/sub&gt;(i, j) is the number of instances of M that contains i and j.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Compute spectral ordering of the nodes from normalized motif laplacian matrix.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Compute prefix set of spectral ordering with small motif conductance.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;scalability&quot;&gt;Scalability&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Worst case &lt;em&gt;O(m&lt;sup&gt;1.5&lt;/sup&gt;)&lt;/em&gt;, based on experiments &lt;em&gt;O(m&lt;sup&gt;1.2&lt;/sup&gt;)&lt;/em&gt; where &lt;em&gt;m&lt;/em&gt; is the number of edges.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;advantages&quot;&gt;Advantages&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Applicable to directed, undirected and weighted graphs (allows for negative edge weights as well).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In case the motif is not known beforehand, the framework can be used to compute significant motifs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The proposed framework unifies the two fundamental tools of network science (motif analysis and network partitioning) along with some worst-case guarantees for the approximations employed and can be extended to identify higher order modular organization of networks.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Network Motifs - Simple Building Blocks of Complex Networks</title>
+   <link href="/site/2017/11/12/Network-Motifs-Simple-Building-Blocks-of-Complex-Networks.html"/>
+   <updated>2017-11-12T00:00:00-05:00</updated>
+   <id>/site/2017/11/12/Network Motifs-Simple Building Blocks of Complex Networks</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The paper presents the concept of “network motifs” to understand the structural design of a network or a graph.&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;http://science.sciencemag.org/content/298/5594/824&quot;&gt;Link to the paper&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;idea&quot;&gt;Idea&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;A network motif is defined as “a pattern of inter-connections occurring in complex networks in numbers that are significantly higher than those in randomized networks”.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the practical setting, given an input network, we first create randomized networks which have same single node characteristics (like a number of incoming and outgoing edges) as the input network.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The patterns that occur at a much higher frequency in the input graph (than the randomized graphs) are reported as motifs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;More specifically, the patterns for which the probability of appearing in a randomized network an equal or more number of times than in the real network is lower than a cutoff value (say 0.01).&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;motivation&quot;&gt;Motivation&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Real-life networks exhibit properties like “small world” property ( the majority of nodes are within a distance of fewer than 7 hops from each other) and “scale-free” property (fraction of nodes having k edges decays as a power-law).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Motifs are one such structural property that is exhibited by networks in biochemistry, neurobiology, ecology, and engineering. Further, motifs shared by graphs of different domains are different which hints at the usefulness of motifs as a fundamental structural property of the graph and relates to the process of evolution of the graph.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Word Representations via Gaussian Embedding</title>
+   <link href="/site/2017/11/05/Word-Representations-via-Gaussian-Embedding.html"/>
+   <updated>2017-11-05T00:00:00-04:00</updated>
+   <id>/site/2017/11/05/Word Representations via Gaussian Embedding</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Existing word embedding models like &lt;a href=&quot;https://gist.github.com/shagunsodhani/176a283e2c158a75a0a6&quot;&gt;Skip-Gram&lt;/a&gt;, &lt;a href=&quot;https://gist.github.com/shagunsodhani/efea5a42d17e0fcf18374df8e3e4b3e8&quot;&gt;GloVe&lt;/a&gt; etc map words to fixed sized vectors in a low dimensional vector space.&lt;/li&gt;
+  &lt;li&gt;This fixed point setting cannot capture uncertainty about representation.&lt;/li&gt;
+  &lt;li&gt;Further, these fixed point vectors are compared with measures like dot product and cosine similarity which are not suitable for capturing asymmetric properties like textual entailment and inclusion.&lt;/li&gt;
+  &lt;li&gt;The paper proposes to learn Gaussian function embeddings (with diagonal covariance) for the word vectors.&lt;/li&gt;
+  &lt;li&gt;This way, the words are mapped to soft regions in the embedding space which enables modeling uncertainty and asymmetric properties like inclusion and uncertainty.&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1412.6623&quot;&gt;Link to the paper&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://github.com/seomoz/word2gauss&quot;&gt;Implementation&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;KL divergence is used as the asymmetric distance function for comparing the distributions.&lt;/li&gt;
+  &lt;li&gt;Unlike the Word2Vec model, the proposed model uses ranking-based loss.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;similarity-measures-used&quot;&gt;Similarity Measures used&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Symmetric Similarity&lt;/strong&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;For two gaussian distributions, &lt;em&gt;P&lt;sub&gt;i&lt;/sub&gt;&lt;/em&gt; and &lt;em&gt;P&lt;sub&gt;j&lt;/sub&gt;&lt;/em&gt;, compute the inner product &lt;em&gt;E(P&lt;sub&gt;i&lt;/sub&gt;, P&lt;sub&gt;j&lt;/sub&gt;)&lt;/em&gt; as &lt;em&gt;N(0; mean&lt;sub&gt;i&lt;/sub&gt; - mean&lt;sub&gt;j&lt;/sub&gt;, sigma&lt;sub&gt;i&lt;/sub&gt; + sigma&lt;sub&gt;j&lt;/sub&gt;)&lt;/em&gt;.&lt;/li&gt;
+  &lt;li&gt;Compute the gradient of &lt;em&gt;mean&lt;/em&gt; and &lt;em&gt;sigma&lt;/em&gt; with respect to &lt;em&gt;log(E)&lt;/em&gt;.&lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The resulting loss function can be interpreted as pushing the means closer which encouraging the two gaussians to be more concentrated.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Asymmetric Similarity&lt;/strong&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;Use KL divergence to encode the context distribution.&lt;/li&gt;
+  &lt;li&gt;The benefit over the symmetric setting is that now entailment type relations can also be modeled.&lt;/li&gt;
+  &lt;li&gt;For example, a low KL divergence from x to y indicates that y can be encoded as x or that y “entails” x.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;learning&quot;&gt;Learning&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;One of the two notions of similarity is chosen and max-margin is used as the loss function.&lt;/li&gt;
+  &lt;li&gt;Mean is regularized by adding a simple constraint on the L2-norm.&lt;/li&gt;
+  &lt;li&gt;For covariance matrix, the eigenvalues are constrained to lie within a hypercube. This ensures that the positive-definite property of the covariance matrix is maintained while having a constraint on the size.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;observations&quot;&gt;Observations&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Polysemous words have higher variance in their word embeddings as compared to specific words.&lt;/li&gt;
+  &lt;li&gt;KL divergence (with diagonal covariance) outperforms other models.&lt;/li&gt;
+  &lt;li&gt;Simple tree hierarchies can also be modeled by embedding into the Gaussian space. A Gaussian is created for each node with randomly initialized mean and the same set of embeddings is used for nodes and context.&lt;/li&gt;
+  &lt;li&gt;For word similarity benchmarks, embeddings with spherical covariance have a slight edge over embeddings with diagonal covariance and outperform the Skip-Gram model in all the cases.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;future-work&quot;&gt;Future Work&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Use combinations of low rank and diagonal matrices for covariances.&lt;/li&gt;
+  &lt;li&gt;Improved optimisation strategies.&lt;/li&gt;
+  &lt;li&gt;Trying other distributions like Student’s-t distribution.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>HARP - Hierarchical Representation Learning for Networks</title>
+   <link href="/site/2017/10/28/HARP-Hierarchical-Representation-Learning-for-Networks.html"/>
+   <updated>2017-10-28T00:00:00-04:00</updated>
+   <id>/site/2017/10/28/HARP - Hierarchical Representation Learning for Networks</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;HARP is an architecture to learn low-dimensional node embeddings by compressing the input graph into smaller graphs.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1706.07845&quot;&gt;Link to the paper&lt;/a&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given a graph &lt;em&gt;G = (V, E)&lt;/em&gt;, compute a series of successively smaller (coarse) graphs &lt;em&gt;G&lt;sub&gt;0&lt;/sub&gt;, …, G&lt;sub&gt;L&lt;/sub&gt;&lt;/em&gt;. Learn the node representations in &lt;em&gt;G&lt;sub&gt;L&lt;/sub&gt;&lt;/em&gt; and successively refine the embeddings for larger graphs in the series.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The architecture is independent of the algorithms used to embed the nodes or to refine the node representations.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Graph coarsening technique that preserves global structure&lt;/strong&gt;&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Collapse edges and stars to preserve first and second order proximity.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;strong&gt;Edge collapsing&lt;/strong&gt; - select the subset of &lt;em&gt;E&lt;/em&gt; such that no two edges are incident on the same vertex and merge their nodes into a single node and merge their edges as well.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;strong&gt;Star collapsing&lt;/strong&gt; - given star structure, collapse the pairs of neighboring nodes (of the central node).&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;In practice, first apply star collapsing, followed by edge collapsing.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Extending node representation from coarse graph to finer graph&lt;/strong&gt;&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Lets say &lt;em&gt;node1&lt;/em&gt; and &lt;em&gt;node2&lt;/em&gt; were merged into &lt;em&gt;node12&lt;/em&gt; during coarsening. First copy the representation of &lt;em&gt;node12&lt;/em&gt; into &lt;em&gt;node1&lt;/em&gt;, &lt;em&gt;node2&lt;/em&gt;.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Additionally, if hierarchical softmax was used, extend the B-tree such that &lt;em&gt;node12&lt;/em&gt; is replaced by 2 child nodes &lt;em&gt;node1&lt;/em&gt; and &lt;em&gt;node2&lt;/em&gt;.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Time complexity for HARP + DeepWalk is &lt;em&gt;O(number of walks * |V|)&lt;/em&gt; while for HARP + LINE is &lt;em&gt;O(number of iterations * |E|)&lt;/em&gt;.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The asymptotic complexity remains the same as the HARP-less version for the two cases.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Multilabel classification task shows that HAR improves all the node embedding technique with gains up to 14%.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Swish - a Self-Gated Activation Function</title>
+   <link href="/site/2017/10/22/Swish-A-self-gated-activation-function.html"/>
+   <updated>2017-10-22T00:00:00-04:00</updated>
+   <id>/site/2017/10/22/Swish-A self gated activation function</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper presents a new activation function called Swish with formulation &lt;em&gt;f(x) = x.sigmod(x)&lt;/em&gt; and its parameterised version called Swish-β where &lt;em&gt;f(x, β) = 2x.sigmoid(β.x)&lt;/em&gt; and β is a training parameter.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper shows that Swish is consistently able to outperform RELU and other activations functions over a variety of datasets (CIFAR, ImageNet, WMT2014) though by small margins only in some cases.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1710.05941&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;properties-of-swish&quot;&gt;Properties of Swish&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;img src=&quot;https://raw.githubusercontent.com/shagunsodhani/papers-I-read/master/assets/Swish/plot.png&quot; alt=&quot;Plot Of Swish&quot; /&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Smooth, non-monotonic function.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Swish-β can be thought of as a smooth function that interpolates between a linear function and RELU.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Uses self-gating mechanism (that is, it uses its own value to gate itself). Gating generally uses multiple scalar inputs but since self-gating uses a single scalar input, it can be used to replace activation functions which are generally pointwise.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Being unbounded on the x&amp;gt;0 side, it avoids saturation when training is slow due to near 0 gradients.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Being bounded below induces a kind of regularization effect as large, negative inputs are forgotten.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Since the Swish function is smooth, the output landscape and the loss landscape are also smooth. A smooth landscape should be more traversable and less sensitive to initialization and learning rates.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;criticism&quot;&gt;Criticism&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Swish is much more complicated than ReLU (when weighted against the small improvements that are provided) so it might not end up with as strong an adoption as ReLU.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Reading Wikipedia to Answer Open-Domain Questions</title>
+   <link href="/site/2017/10/15/Reading-Wikipedia-to-Answer-Open-Domain-Questions.html"/>
+   <updated>2017-10-15T00:00:00-04:00</updated>
+   <id>/site/2017/10/15/Reading Wikipedia to Answer Open-Domain Questions</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper presents a new machine comprehension dataset for question answering in real life setting (say when interacting with Cortana/Siri).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1704.00051&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;unique-aspects-of-the-dataset&quot;&gt;Unique Aspects of the dataset&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Existing machine comprehension (MC) datasets are either too small or synthetic (with a distribution different from that or real-questions posted by humans). MARCO questions are sampled from real, anonymized user queries.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Most datasets would provide a comparatively small and clean context to answer the question. In MARCO, the context documents (which may or may not contain the answer) are extracted using Bing from real-world documents. As such the questions and the context documents are noisy.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In general, the answer to the questions are restricted to an entity or text span within the document. In case of MARCO, the human judges are encouraged to generate complete sentences as answers.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;dataset-description&quot;&gt;Dataset Description&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;First release consists of 100K questions with the aim of releasing 1M questions in the future releases.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;All questions are tagged with segment information.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A subset of questions has multiple answers and another subset has no answers at all.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each record in the dataset contains the following information:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;&lt;strong&gt;Query&lt;/strong&gt; - The actual question&lt;/li&gt;
+      &lt;li&gt;&lt;strong&gt;Passage&lt;/strong&gt; - Top 10 contextual passages extracted from web search engine (which may or may not contain the answer to the question).&lt;/li&gt;
+      &lt;li&gt;&lt;strong&gt;Document URLs&lt;/strong&gt; - URLs for the top documents (which are the source of the contextual passages).&lt;/li&gt;
+      &lt;li&gt;&lt;strong&gt;Answer&lt;/strong&gt; - Answer synthesised by human evaluators.&lt;/li&gt;
+      &lt;li&gt;&lt;strong&gt;Segment&lt;/strong&gt; - Query type, description, neumeric, entity, location, person.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experimental-results&quot;&gt;Experimental Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Metrics&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;Accuracy and precision/recall for numeric questions&lt;/li&gt;
+      &lt;li&gt;ROGUE-L/paraphrasing aware evaluation framework for long, textual answers.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Among generative models, Memory Networks performed better than seq-to-seq.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the cloze-style test, &lt;a href=&quot;https://arxiv.org/abs/1609.05284&quot;&gt;ReasoNet&lt;/a&gt; achieved an accuracy of approx. 59% while &lt;a href=&quot;ASR&quot;&gt;Attention Sum Reader&lt;/a&gt; achieved an accuracy of approx 55%.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Current QA systems (including the ones using memory and attention) derive their power from supervised data and are very different from how humans do reasoning.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Imagenet dataset pushed the state-of-the-art performance on object classification to beyond human accuracy. Similar was the case with speech recognition dataset from DARPA which led to the advancement of speech recognition. Having a large, diverse and human-like questions dataset is a fundamental requirement to advance the field and the paper aims to provide just the right kind of dataset.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Task-Oriented Query Reformulation with Reinforcement Learning</title>
+   <link href="/site/2017/10/01/Task-Oriented-Query-Reformulation-with-Reinforcement-Learning.html"/>
+   <updated>2017-10-01T00:00:00-04:00</updated>
+   <id>/site/2017/10/01/Task-Oriented Query Reformulation with Reinforcement Learning</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The paper introduces a query reformulation system that rewrites a query to maximise the number of “relevant” documents that are extracted from a given black box search engine.&lt;/li&gt;
+  &lt;li&gt;A Reinforcement Learning (RL) agent selects the terms that are to be added to the reformulated query and the rewards are decided on the basis of document recall.&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1704.04572&quot;&gt;Link to the paper&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://github.com/nyu-dl/QueryReformulator&quot;&gt;Implementation&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;key-aspect&quot;&gt;Key Aspect&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The underlying problem is as follows: when the end user makes a query to a search engine, the engine often relies on word matching techniques to perform retrieval. This means relevant documents could be missed if there is no exactly matching words between the query and the document.&lt;/li&gt;
+  &lt;li&gt;This problem can be handled at two levels: First, the search engine itself takes care of query semantics. Alternatively, we assume the search engine to be dumb and instead have a system in place that can improve the original queries (automatic query reformulation).&lt;/li&gt;
+  &lt;li&gt;The paper takes the latter approach and expands the original query by adding terms from the set of retrieved documents (pseudo relevance feedback).&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;datasets&quot;&gt;Datasets&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;TREC - Complex Answer Retrieval (TREC-CAR)&lt;/li&gt;
+  &lt;li&gt;Jeopardy Q&amp;amp;A dataset&lt;/li&gt;
+  &lt;li&gt;Microsoft Academic (MSA) dataset - created by the authors using papers crawled from Microsoft Academic API&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;framework&quot;&gt;Framework&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Query Reformulation task is modeled as an RL problem where:
+    &lt;ul&gt;
+      &lt;li&gt;Environment is the search engine.&lt;/li&gt;
+      &lt;li&gt;Actions are whether a word is to be added to the query or not and if yes, then what word is added.&lt;/li&gt;
+      &lt;li&gt;Reward is the retrieval accuracy.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;The input to the system is a query q&lt;sub&gt;0&lt;/sub&gt; consisting of a sequence of words w&lt;sub&gt;1&lt;/sub&gt;, …, w&lt;sub&gt;n&lt;/sub&gt; and a candidate term t&lt;sub&gt;i&lt;/sub&gt; with some context words.&lt;/li&gt;
+  &lt;li&gt;Candidate terms are all the terms that appear in the original query and the documents retrieved using the query.&lt;/li&gt;
+  &lt;li&gt;The words are mapped to vectors and then a fixed size representation is obtained for the sequence using CNN’s or RNNs.&lt;/li&gt;
+  &lt;li&gt;Similarly, a representation is obtained for the candidate words by feeding them and their context words to the CNN or RNNs.&lt;/li&gt;
+  &lt;li&gt;Finally, a sigmoidal score is computed for all the candidate words.&lt;/li&gt;
+  &lt;li&gt;An RNN sequentially applies this model to emit query words till an end token is emitted.&lt;/li&gt;
+  &lt;li&gt;Vocabulary is used only from the extracted documents and not the entire vocabulary set, to keep the inference fast.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;training&quot;&gt;Training&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The model is trained using REINFORCE algorithm which minimizes the &lt;em&gt;C&lt;sub&gt;a&lt;/sub&gt; = (R − R~) * sum(log(P(t|q))) where R~ is the baseline.&lt;/em&gt;&lt;/li&gt;
+  &lt;li&gt;Value network minimises &lt;em&gt;C&lt;sub&gt;b&lt;/sub&gt; = &amp;amp;\alpha(||R-R~||&lt;sup&gt;2&lt;/sup&gt;)&lt;/em&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;em&gt;C&lt;sub&gt;a&lt;/sub&gt;&lt;/em&gt; and &lt;em&gt;C&lt;sub&gt;b&lt;/sub&gt;&lt;/em&gt; are minimised using SGD.&lt;/li&gt;
+  &lt;li&gt;An entropy regulation term is added to prevent the probability distribution from reaching the peak.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;h3 id=&quot;baseline-methods&quot;&gt;Baseline Methods&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Raw&lt;/strong&gt; - Original query is fed to the search engine without any modification.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Pseudo-Relevance Feedback (PRF-TFIDF)&lt;/strong&gt; - The query is expanded using the top-N TF-IDF terms.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;PRF-Relevance Model (PRF-RM)&lt;/strong&gt; - Probability of adding token &lt;em&gt;t&lt;/em&gt; to the query &lt;em&gt;q0&lt;/em&gt; is given by &lt;em&gt;P(t|q0) = (1 − λ)P′(t|q0) + λ sum (P(d)P(t|d)P(q0|d))&lt;/em&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;proposed-methods&quot;&gt;Proposed Methods&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;&lt;strong&gt;Supervised Learning&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;Assumes that the query words contribute indepently to the query retrival performace. (Too strong an assumption).&lt;/li&gt;
+      &lt;li&gt;A term is marked as relevant if &lt;em&gt;(R(new_query) - R(old_query))/R(old_query) &amp;gt; 0.005&lt;/em&gt;&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Reinforcement Learning&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;RL-RNN/CNN - RL Framework + RNN/CNN to encode the input features.&lt;/li&gt;
+      &lt;li&gt;RL-RNN-SEQ - Add a sequential generator.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Metrics&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;Recall@K&lt;/li&gt;
+      &lt;li&gt;Precision@K&lt;/li&gt;
+      &lt;li&gt;Mean Average Precision@K&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Reward&lt;/strong&gt; - The paper uses Recall@K as a reward when training the RL-based models with the argument that the “metric has shown to be effective in improving the other metrics as well”, without any justification though.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;SL-Oracle&lt;/strong&gt; - classifier that perfectly selects terms that will increase performance based on the supervised learning approach.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;RL-Oracle&lt;/strong&gt; - Produces a conservative upper-bound for the performance of the RL Agent. It splits the test data into N subsets and trains an RL agent for each subset. Then, the reward is averaged over all the N subsets.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;observations&quot;&gt;Observations&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Reformulation based methods &amp;gt; original query&lt;/li&gt;
+  &lt;li&gt;RL methods &amp;gt; Supervised methods &amp;gt; unsupervised methods&lt;/li&gt;
+  &lt;li&gt;RL-RNN-SEQ performs slightly worse than RL-RNN but is much faster (as it produces shorter queries).&lt;/li&gt;
+  &lt;li&gt;RL-based model benefits from more candidate terms while the classical PRF method quickly saturates.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;comments&quot;&gt;Comments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Interestingly, for each raw query, they carried out the reformulation step just once and not multiple times. The number of times a query is reformulated could also have become a part of the RL framework.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Refining Source Representations with Relation Networks for Neural Machine Translation</title>
+   <link href="/site/2017/09/22/Refining-Source-Representations-with-Relation-Networks-for-Neural-Machine-Translation.html"/>
+   <updated>2017-09-22T00:00:00-04:00</updated>
+   <id>/site/2017/09/22/Refining Source Representations with Relation Networks for Neural Machine Translation</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The paper introduces Relation Network (RN) that refines the encoding representation of the given source document (or sentence).&lt;/li&gt;
+  &lt;li&gt;This refined source representation can then be used in Neural Machine Translation (NMT) systems to counter the problem of RNNs forgetting old information.&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1709.03980&quot;&gt;Link to the paper&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;limitations-of-existing-nmt-models&quot;&gt;Limitations of existing NMT models&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The RNN encoder-decoder architecture is the standard choice for NMT systems. But the RNNs are prone to forgetting old information.&lt;/li&gt;
+  &lt;li&gt;In NMT models, the attention is modeled in the unit of words while the use of phrases (instead of words) would be a better choice.&lt;/li&gt;
+  &lt;li&gt;While NMT systems might be able to capture certain relationships between words, they are not explicitly designed to capture such information.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;contributions-of-the-paper&quot;&gt;Contributions of the paper&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Learn the relationship between the source words using the context (neighboring words).&lt;/li&gt;
+  &lt;li&gt;Relation Networks (RNs) build pairwise relations between source words using the representations generated by the RNNs. The RN would sit between the encoder and the attention layer of the encoder-decoder framework thereby keeping the main architecture unaffected.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;relation-network&quot;&gt;Relation Network&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Neural network which is desgined for relational reasoning.&lt;/li&gt;
+  &lt;li&gt;Given a set of inputs * O = o&lt;sub&gt;1&lt;/sub&gt;, …, o&lt;sub&gt;n&lt;/sub&gt; *, RN is formed as a composition of inputs:
+   RN(O) = f(sum(g(o&lt;sub&gt;i&lt;/sub&gt;, o&lt;sub&gt;j&lt;/sub&gt;))), f and g are functions used to learn the relations (feed forward networks)&lt;/li&gt;
+  &lt;li&gt;&lt;em&gt;g&lt;/em&gt; learns how the objects are related hence the name “relation”.&lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Components&lt;/strong&gt;:
+    &lt;ul&gt;
+      &lt;li&gt;CNN Layer
+        &lt;ul&gt;
+          &lt;li&gt;Extract information from the words surrounding the given word (context).&lt;/li&gt;
+          &lt;li&gt;The final output of this layer is the sequence of vectors for different kernel width.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;Graph Propagation (GP) Layer
+        &lt;ul&gt;
+          &lt;li&gt;Connect all the words with each other in the form of a graph.&lt;/li&gt;
+          &lt;li&gt;Each output vector from the CNN corresponds to a node in the graph and there is an edge between all possible pair of nodes.&lt;/li&gt;
+          &lt;li&gt;The information flows between the nodes of the graph in a message passing sort of fashion (graph propagation) to obtain a new set of vectors for each node.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;Multi-Layer Perceptron (MLP) Layer
+        &lt;ul&gt;
+          &lt;li&gt;The representation from the GP Layer is fed to the MLP layer.&lt;/li&gt;
+          &lt;li&gt;The layer uses residual connections from previous layers in form of concatenation.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;datasets&quot;&gt;Datasets&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;IWSLT Data - 44K sentences from tourism and travel domain.&lt;/li&gt;
+  &lt;li&gt;NIST Data - 1M Chinese-English parallel sentence pairs.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;models&quot;&gt;Models&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;MOSES - Open source translation system - http://www.statmt.org/moses/&lt;/li&gt;
+  &lt;li&gt;NMT - Attention based NMT&lt;/li&gt;
+  &lt;li&gt;NMT+ - NMT with improved decoder&lt;/li&gt;
+  &lt;li&gt;TRANSFORMER - Google’s new NMT&lt;/li&gt;
+  &lt;li&gt;RNMT+ - Relation Network integrated with NMT+&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;evaluation-metric&quot;&gt;Evaluation Metric&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;case-insensitive 4-gram BLEU score&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;observations&quot;&gt;Observations&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;As sentences become larger (more than 50 words), RNMT clearly outperforms other baselines.&lt;/li&gt;
+  &lt;li&gt;Qualitative evaluation shows that RNMT+ model captures the word alignment better than the NMT+ models.&lt;/li&gt;
+  &lt;li&gt;Similarly, NMT+ system tends to miss some information from the source sentence (more so for longer sentences). While both CNNs and RNNs are weak at capturing long-term dependency, using the relation layer mitigates this issue to some extent.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Pointer Networks</title>
+   <link href="/site/2017/08/27/Pointer-Networks.html"/>
+   <updated>2017-08-27T00:00:00-04:00</updated>
+   <id>/site/2017/08/27/Pointer Networks</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper introduces a novel architecture that generates an output sequence such that the elements of the output sequence are discrete tokens corresponding to positions in the input sequence.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Such a problem can not be solved using &lt;a href=&quot;https://gist.github.com/shagunsodhani/a2915921d7d0ac5cfd0e379025acfb9f&quot;&gt;Seq2Seq&lt;/a&gt; or Neural Turing Machines as the size of the output softmax is variable (as it depends on the size of the input sequence).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1506.03134&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;architecture&quot;&gt;Architecture&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Traditional attention-base sequence-to-sequence models compute an attention vector for each step of the output decoder and use that to blend the individual context vectors of the input into a single, consolidated attention vector. This attention vector is used to compute a fixed size softmax.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In Pointer Nets, the normalized attention vector (over all the tokens in the input sequence) is normalized and treated as the softmax output over the input tokens.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;So Pointer Net is a very simple modification of the attention model.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;application&quot;&gt;Application&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Any problem where the size of the output depends on the size of the input because of which fixed length softmax is ruled out.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;eg combinatorial problems such as planar convex hull where the size of the output would depend on the size of the input.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;evaluation&quot;&gt;Evaluation&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper considers the following 3 problems:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;Convex Hull&lt;/li&gt;
+      &lt;li&gt;Delaunay triangulations&lt;/li&gt;
+      &lt;li&gt;Travelling Salesman Problem (TSP)&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Since some of the problems are NP hard, the paper considers approximate solutions whereever the exact solutions are not feasible to compute.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The authors used the exact same architecture and model parameters of all the instances of the 3 problems to show the generality of the model.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The proosed Pointer Nets outperforms LSTMs and LSTMs with attention and can generalise quite well for much larger sequences.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Interestingly, the order in which the inputs are fed to the system affects its performance. The authors discussed this apsect in their subsequent paper titled &lt;a href=&quot;https://arxiv.org/pdf/1511.06391v4.pdf&quot;&gt;Order Matters: Sequence To Sequence for Sets&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Learning to Compute Word Embeddings On the Fly</title>
+   <link href="/site/2017/08/21/Learning-to-Compute-Word-Embeddings-On-the-Fly.html"/>
+   <updated>2017-08-21T00:00:00-04:00</updated>
+   <id>/site/2017/08/21/Learning to Compute Word Embeddings On the Fly</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Word based language models suffer from the problem of rare or Out of Vocabulary (OOV) words.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Learning representations for OOV words directly on the end task often results in poor representation.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The alternative is to replace all the rare words with a single, unique representation (loss of information) or use character level models to obtain word representations (they tend to miss on the semantic relationship).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes to learn a network that can predict the representations of words using auxiliary data (referred to as definitions) such as dictionary definitions, Wikipedia infoboxes, the spelling of the word etc.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The auxiliary data encoders are trained jointly with the end task to ensure that word representations align with the requirements of the end task.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;approach&quot;&gt;Approach&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given a rare word &lt;em&gt;w&lt;/em&gt;, let &lt;em&gt;d(w) = &amp;lt;x&lt;sub&gt;1&lt;/sub&gt;, x&lt;sub&gt;2&lt;/sub&gt;…&amp;gt;&lt;/em&gt; denote its defination where &lt;em&gt;x&lt;sub&gt;i&lt;/sub&gt;&lt;/em&gt; are words.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;em&gt;d(w)&lt;/em&gt; is fed to a &lt;em&gt;defination reader&lt;/em&gt; network &lt;em&gt;f&lt;/em&gt; (LSTM) and its last state is used as the &lt;em&gt;defination embedding e&lt;sub&gt;d&lt;/sub&gt;(w)&lt;/em&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In case &lt;em&gt;w&lt;/em&gt; has multiple definitions, the embeddings are combined using mean pooling.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The approach can be extended to in-vocabulary words as well by using the &lt;em&gt;definition embedding&lt;/em&gt; of such words to update their original embeddings.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Auxiliary data sources
+    &lt;ul&gt;
+      &lt;li&gt;Word definitions from WordNet&lt;/li&gt;
+      &lt;li&gt;Spelling of words&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The proposed approach was tested on following tasks:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;Extractive Question Answering over SQuAD
+        &lt;ul&gt;
+          &lt;li&gt;Base model from &lt;a href=&quot;https://arxiv.org/abs/1611.01604&quot;&gt;Xiong et al. 2016&lt;/a&gt;&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;Entailment Prediction over SNLI corpus
+        &lt;ul&gt;
+          &lt;li&gt;Base models from &lt;a href=&quot;https://nlp.stanford.edu/pubs/snli_paper.pdf&quot;&gt;Bowman et al. 2015&lt;/a&gt; and &lt;a href=&quot;https://arxiv.org/abs/1609.06038&quot;&gt;Chen et al. 2016&lt;/a&gt;&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;One Billion Words Language Modelling&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For all the tasks, models using both spelling and dictionary (SD) outperformed the model using just one.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;While SD does not outperform the Glove model (with full vocabulary), it does bridge the performance gap significantly.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;future-work&quot;&gt;Future Work&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Multi-token words like “San Francisco” are not accounted for now.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The model does not handle the rare words which appear in the definition and just replaces them by the &lt;UNK&gt; token. Making the model recursive would be a useful addition.&lt;/UNK&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>R-NET - Machine Reading Comprehension with Self-matching Networks</title>
+   <link href="/site/2017/08/07/R-NET-Machine-Reading-Comprehension-with-Self-matching-Networks.html"/>
+   <updated>2017-08-07T00:00:00-04:00</updated>
+   <id>/site/2017/08/07/R-NET - Machine Reading Comprehension with Self-matching Networks</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;R-NET is an end-to-end trained neural network model for machine comprehension.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;It starts by matching the question and the given passage (using gated attention based RNN) to obtain question-aware passage representation.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Next, it uses a self-matching attention mechanism to refine the passage representation by matching the passage against itself.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Lastly, it uses pointer networks to determine the position of the answer in the passage.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://www.microsoft.com/en-us/research/publication/mrc/&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;datasets&quot;&gt;Datasets&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;SQuAD&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;MS-MARCO&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;architecture&quot;&gt;Architecture&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Question / Passage Encoder&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;Concatenate the word level and character level embeddings for each word and feed into a bidirectional GRU to obtain question and passage representation.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Gated Attention based RNN&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Given question and passage representation, sentence pair representation is generated via soft-alignment of the words in the question and in the passage.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;The newly added gate captures the relation between the question and the current passage word as only some parts of the passage are relevant for answering the given question.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Self Matching Attention&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;The passage representation obtained so far would not capture most of the context.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;So the current representation is matched against itself so as to collect evidence from the entire passage and encode the evidence relevant to the current passage word and question.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Output Layer&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Use pointer network (initialized using attention pooling over answer representation) to predict the position of the answer.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Loss function is the sum of negative log probabilities of start and end positions.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Results&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;R-NET is ranked second on &lt;a href=&quot;https://rajpurkar.github.io/SQuAD-explorer/&quot;&gt;SQuAD Leaderboard&lt;/a&gt; as of 7th August, 2017 and achieves best-published results on MS-MARCO dataset.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Using ideas like sentence ranking, using syntax information performing multihop inference and augmenting question dataset (using seqToseq network) do not help in improving the performance.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>ReasoNet - Learning to Stop Reading in Machine Comprehension</title>
+   <link href="/site/2017/07/24/ReasoNet-Learning-to-Stop-Reading-in-Machine-Comprehension.html"/>
+   <updated>2017-07-24T00:00:00-04:00</updated>
+   <id>/site/2017/07/24/ReasoNet - Learning to Stop Reading in Machine Comprehension</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;In the domain of machine comprehension, making multiple passes over the given document is an effective technique to extract the relation between the given passage, question and answer.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Unlike previous approaches, which perform a fixed number of passes over the passage, Reasoning Network (ReasoNet) uses reinforcement learning (RL) to decide how many times a document should be read.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Every time the document is read, ReasoNet determines whether the document should be read again or has the termination state been reached. If termination state is reached, the answer module is triggered to generate the answer.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Since the termination state is discrete and not connected to the final output, RL approach is used.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1609.05284&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;datasets&quot;&gt;Datasets&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;CNN, DailyMail Dataset&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;SQuAD&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Graph Reachability Dataset&lt;/p&gt;
+    &lt;ul&gt;
+      &lt;li&gt;2 synthetic datasets to test if the network can answer questions like “Is node_1 connected to node_12”?&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;architecture&quot;&gt;Architecture&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Memory (M)&lt;/strong&gt; - Comprises of the vector representation of the document and the question (encoded using GRU or other RNNs).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Attention&lt;/strong&gt; - Attention vector (&lt;strong&gt;x&lt;sub&gt;t&lt;/sub&gt;&lt;/strong&gt;) is a function of current internal state &lt;strong&gt;s&lt;sub&gt;t&lt;/sub&gt;&lt;/strong&gt; and external memory &lt;strong&gt;M&lt;/strong&gt;. The state and memory are passed through FCs and fed to a similarity function.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Internal State (s&lt;sub&gt;t&lt;/sub&gt;)&lt;/strong&gt; - Vector representation of the question state computed by a RNN using the previous internal state and the attention vector &lt;strong&gt;x&lt;sub&gt;t&lt;/sub&gt;&lt;/strong&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Termination Gate (T&lt;sub&gt;t&lt;/sub&gt;)&lt;/strong&gt; - Uses a logistic regression model to generate a random binary variable using the current internal state &lt;strong&gt;s&lt;sub&gt;t&lt;/sub&gt;&lt;/strong&gt;.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Answer&lt;/strong&gt; - Answer module is triggered when &lt;strong&gt;T&lt;sub&gt;t&lt;/sub&gt; = 1&lt;/strong&gt;.
+    &lt;ul&gt;
+      &lt;li&gt;For CNN and DailyMail, a linear projection of GRU outputs is used to predict the answer from candidate entities.&lt;/li&gt;
+      &lt;li&gt;For SQuAD, the position of the first and the last word from the answer span are predicted.&lt;/li&gt;
+      &lt;li&gt;For Graph Reachability, a logistic regression module is used to predict yes/no as the answer.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Reinforcement Learning&lt;/strong&gt; - For the RL setting, reward at time &lt;strong&gt;t&lt;/strong&gt;, &lt;strong&gt;r&lt;sub&gt;t&lt;/sub&gt;&lt;/strong&gt; = 1 if &lt;strong&gt;T&lt;sub&gt;t&lt;/sub&gt;&lt;/strong&gt; = 1 and answer is correct. Otherwise &lt;strong&gt;r&lt;sub&gt;t&lt;/sub&gt; = 0&lt;/strong&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Workflow&lt;/strong&gt; - Given a passage p, query q and answer a:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;Extract memory using p&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Extract initial hidden state using q&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;ReasoNet executes all possible episodes that can be enumerated by setting an upper limit on the number of passes.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;These episodes generate actions and answers that are used to train the ReasoNet.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Result&lt;/strong&gt;&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;CNN, DailyMail Corpus&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;ReasoNet outperforms all the baselines which use fixed number of reasoning steps and could benefit by capturing the word alignment signals between query and passage.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;SQuAD&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;At the time of submission, ReasoNet was ranked 2nd on the &lt;a href=&quot;https://rajpurkar.github.io/SQuAD-explorer/&quot;&gt;SQuAD leaderboard&lt;/a&gt; and as of 9th July 2017, it is ranked 4th.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Graph Reachability Dataset&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;ReasoNet - Standard ReasoNet as described above.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;ReasoNet-Last - Use the prediction from the &lt;strong&gt;T&lt;sub&gt;max&lt;/sub&gt;&lt;/strong&gt;&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;ReasoNet &amp;gt; ReasoNet-Last &amp;gt; Deep LSTM Reader&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;ReasoNet converges faster than ReasoNet-Last indicating that the terminate gate is useful.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Notes&lt;/strong&gt;&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;As such there is nothing discouraging the ReasoNet to make unnecessary passes over the passage.&lt;/li&gt;
+      &lt;li&gt;In fact, the modal value of the number of passes = upper bound on the number of passes.&lt;/li&gt;
+      &lt;li&gt;This effect is more prominent for large graph indicating that the ReasoNet may try to play safe by performing extra passes.&lt;/li&gt;
+      &lt;li&gt;It would be interesting to see if the network can be discouraged from making unnecessary passed by awarding a small negative reward for each pass.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Principled Detection of Out-of-Distribution Examples in Neural Networks</title>
+   <link href="/site/2017/07/17/Principled-Detection-of-Out-of-Distribution-Examples-in-Neural-Networks.html"/>
+   <updated>2017-07-17T00:00:00-04:00</updated>
+   <id>/site/2017/07/17/Principled Detection of Out of Distribution Examples in Neural Networks</id>
+   <content type="html">&lt;h2 id=&quot;problem-statement&quot;&gt;Problem Statement&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given a pre-trained neural network, which is trained using data from some distribution P (referred to as in-distribution data), the task is to detect the examples coming from a distribution Q which is different from P (referred to as out-of-distribution data).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For example, if a digit recognizer neural network is trained using MNIST images, an out-of-distribution example would be images of animals.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Neural Networks can make high confidence predictions even in such cases where the input is unrecognisable or irrelevant.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper proposes &lt;em&gt;ODIN&lt;/em&gt; which can detect such out-of-distribution examples without changing the pre-trained model itself.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1706.02690&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;odin&quot;&gt;ODIN&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Uses 2 major techniques&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;&lt;strong&gt;Temperature Scaling&lt;/strong&gt;
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Softmax classifier for the classification network can be written as:&lt;/p&gt;
+
+            &lt;p&gt;&lt;em&gt;p&lt;sub&gt;i&lt;/sub&gt;(x, T) = exp(f&lt;sub&gt;i&lt;/sub&gt;(x)/T) / sum(exp(f&lt;sub&gt;j&lt;/sub&gt;(x)/T))&lt;/em&gt;&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+
+        &lt;p&gt;where &lt;em&gt;x&lt;/em&gt; is the input, &lt;em&gt;p&lt;/em&gt; is the softmax probability and &lt;em&gt;T&lt;/em&gt; is the temperature scaling parameter.&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;Increasing &lt;em&gt;T&lt;/em&gt; (up to some extent) boosts the performance in distinguishing in-distribution and out-of-distribution examples.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;&lt;strong&gt;Input Preprocessing&lt;/strong&gt;
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Add small perturbations to the input (image) before feeding it into the network.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;&lt;em&gt;x_perturbed = x - ε * sign(-δ&lt;sub&gt;x&lt;/sub&gt;log(p&lt;sub&gt;y&lt;/sub&gt;(x, T)))&lt;/em&gt;&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+
+        &lt;p&gt;where ε is the perturbation magnitude&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;The perturbations are such that softmax scores between in-distribution and out-of-distribution samples become separable.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;Given an input (image), first perturb the input.&lt;/li&gt;
+  &lt;li&gt;Feed the perturbed input to the network to get its softmax score.&lt;/li&gt;
+  &lt;li&gt;If the softmax score is greater than some threshold, mark the input as in-distribution and feed in the unperturbed version of the input to the network for classification.&lt;/li&gt;
+  &lt;li&gt;Otherwise, mark the input as out-of-distribution.&lt;/li&gt;
+  &lt;li&gt;For detailed mathematical treatment, refer section 6 and appendix in the &lt;a href=&quot;https://arxiv.org/abs/1706.02690&quot;&gt;paper&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Code available on &lt;a href=&quot;https://github.com/ShiyuLiang/odin-pytorch&quot;&gt;github&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Models&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;DenseNet with depth L = 100 and growth rate k = 12&lt;/li&gt;
+      &lt;li&gt;Wide ResNet with depth = 28 and widen factor = 10&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;In-Distribution Datasets&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;CIFAR-10&lt;/li&gt;
+      &lt;li&gt;CIFAR-100&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Out-of-Distribution Datasets&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;TinyImageNet&lt;/li&gt;
+      &lt;li&gt;LSUN&lt;/li&gt;
+      &lt;li&gt;iSUN&lt;/li&gt;
+      &lt;li&gt;Gaussian Noise&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Metrics&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;False Positive Rate at 95% True Positive Rate&lt;/li&gt;
+      &lt;li&gt;Detection Error - minimum misclassification probability over all thresholds&lt;/li&gt;
+      &lt;li&gt;Area Under the Receiver Operating Characteristic Curve&lt;/li&gt;
+      &lt;li&gt;Area Under the Precision-Recall Curve&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;ODIN outperforms the baseline across all datasets and all models by a good margin.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;notes&quot;&gt;Notes&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Very simple and straightforward approach with theoretical justification under some conditions.&lt;/li&gt;
+  &lt;li&gt;Limited to examples from Vision so can not judge its applicability for NLP tasks.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Ask Me Anything -  Dynamic Memory Networks for Natural Language Processing</title>
+   <link href="/site/2017/07/09/Ask-Me-Anything-Dynamic-Memory-Networks-for-Natural-Language-Processing.html"/>
+   <updated>2017-07-09T00:00:00-04:00</updated>
+   <id>/site/2017/07/09/Ask Me Anything- Dynamic Memory Networks for Natural Language Processing</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Dynamic Memory Networks (DMN) is a neural network based general framework that can be used for tasks like sequence tagging, classification, sequence to sequence and question answering requiring transitive reasoning.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The basic idea is that all these tasks can be modelled as question answering task in general and a common architecture could be used for solving them.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1506.07285&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;architecture&quot;&gt;Architecture&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;DMN takes as input a document(sentence, story, article etc) and a question which is to be answered given the document.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;input-module&quot;&gt;Input Module&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Concatenate all the sentences (or facts) in the document and encode them by feeding the word embeddings of the text to a GRU.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Each time a sentence ends, extract the hidden representation of the GRU till that point and use as the encoded representation of the sentence.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;question-module&quot;&gt;Question Module&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Similarly, feed the question to a GRU to obtain its representation.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;episodic-memory-module&quot;&gt;Episodic Memory Module&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Episodic memory consists of an attention mechanism and a recurrent network with which it updates its memory.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;During each iteration, the network generates an episode &lt;em&gt;e&lt;/em&gt; by attending over the representation of the sentences, question and the previous memory.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The episodic memory is updated using the current episode and the previous memory.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Depending on the amount of supervision available, the network may perform multiple passes. eg, in the bAbI dataset, some tasks specify how many passes would be needed and which sentence should be attended to in each pass. For others, a fixed number of passes are made.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Multiple passes allow the network to perform transitive inference.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;attention-mechanism&quot;&gt;Attention Mechanism&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given the input representation &lt;em&gt;c&lt;/em&gt;, memory &lt;em&gt;m&lt;/em&gt; and question &lt;em&gt;q&lt;/em&gt;, produce a scalar score using a 2-layer feedforward network, to use as attention mechanism.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;A separate GRU encodes the input representation and weights it by the attention.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Final state of the GRU is fed to the answer module.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;answer-module&quot;&gt;Answer Module&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Use a GRU (initialized with the final state of the episodic module) and at each timestep, feed it the question vector, last hidden state of the same GRU and the previously predicted output.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;training&quot;&gt;Training&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;There are two possible losses:
+    &lt;ul&gt;
+      &lt;li&gt;Cross-entropy loss of the predicted answer (all datasets)&lt;/li&gt;
+      &lt;li&gt;Cross-entropy loss of the attention supervision (for datasets like bAbI)&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;h3 id=&quot;question-answering&quot;&gt;Question Answering&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;bAbI Dataset&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For most tasks, DMN either outperforms or performs as good as Memory Networks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For tasks like answering with 2 or 3 supporting facts, DMN lags because of limitation of RNN in modelling long sentences.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;text-classification&quot;&gt;Text Classification&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Stanford Sentiment Treebank Dataset&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;DMN outperforms all the baselines for both binary and fine-grained sentiment analysis.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;sequence-tagging&quot;&gt;Sequence Tagging&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Wall Street Journal Dataset&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;DMN archives state of the art accuracy of 97.56%&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;observations&quot;&gt;Observations&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Multiple passes help in reasoning tasks but not so much for sentiment/POS tags.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Attention in the case of 2-iteration DMN is more focused than attention in 1-iteration DMN.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;For 2-iteration DMN, attention in the second iteration focuses only on relevant words and less attention is paid to words that lose their relevance in the context of the entire document.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;notes&quot;&gt;Notes&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;It would be interesting to put some mechanism in place to determine the number of episodes that should be generated before an answer is predicted. A naive way would be to predict the answer after each episode and check if the softmax score of the predicted answer is more than a threshold.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Alternatively, the softmax score and other information could be fed to a Reinforcement Learning (RL) agent which decided if the document should be read again. So every time an episode is generated, the state is passed to the RL agent which decides if another iteration should be performed. If it decides to predict the answer and correct answer is generated, the agent gets a large +ve reward else a large -ve reward.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;To discourage unnecessary iterations, a small -ve reward could be given everytime the agent decides to perform another iteration.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>One Model To Learn Them All</title>
+   <link href="/site/2017/07/01/One-Model-To-Learn-Them-All.html"/>
+   <updated>2017-07-01T00:00:00-04:00</updated>
+   <id>/site/2017/07/01/One Model To Learn Them All</id>
+   <content type="html">&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The current trend in deep learning is to design, train and fine tune a separate model for each problem.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Though multi-task models have been explored, they have been trained for problems from the same domain only and no competitive multi-task, multi-modal models have been proposed.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;The paper explores the possibility of such a unified deep learning model that can solve different tasks across multiple domains by training concurrently on them.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1706.05137&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;design-philosophy&quot;&gt;Design Philosophy&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Small, modality-specific subnetworks (called modality nets) should be used to map input data to a joint representation space and back.&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;The joint representation is to be of variable size.&lt;/p&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;Different tasks from the same domain share the modality net.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;MultiModel networks should use computational blocks from different domains even if they are not specifically designed for the task at hand.&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;Eg the paper reports that attention and mixture-of-experts (MOE) layers slightly improve the performance on ImageNet even though they are not explicitly needed.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;architecture&quot;&gt;Architecture&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;MulitModel Network consists of few, small modality nets, an encoder, I/O mixer and an autoregressive decoder.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Encoder and decoder use the following computational blocks:&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;strong&gt;Convolutional Block&lt;/strong&gt;&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;ReLU activations on inputs followed by depthwise separable convolutions and layer normalization.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;strong&gt;Attention Block&lt;/strong&gt;&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;Multihead, dot product based attention mechanism.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;strong&gt;Mixture-of-Experts (MoE) Block&lt;/strong&gt;&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;Consists of simple feed-forward networks (called experts) and a trainable gating network which selects a sparse combination of experts to process the inputs.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;For further details, refer the &lt;a href=&quot;https://arxiv.org/abs/1706.05137&quot;&gt;original paper&lt;/a&gt;.&lt;/p&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Encoder&lt;/strong&gt; consists of 6 conv blocks with a MoE block in the middle.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;I/O mixer&lt;/strong&gt; consists of an attention block and 2 conv blocks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Decoder&lt;/strong&gt; consists of 4 blocks of convolution and attention with a MoE block in the middle.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;strong&gt;Modality Nets&lt;/strong&gt;&lt;/p&gt;
+
+    &lt;ul&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;strong&gt;Language Data&lt;/strong&gt;&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Input is the sequence of tokens ending in a termination token.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;This sequence is mapped to correct dimensionality using a learned embedding.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;For output, the network takes the decoded output and performs a learned linear mapping followed by Softmax.&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;strong&gt;Image&lt;/strong&gt; and &lt;strong&gt;Categorical Data&lt;/strong&gt;&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;
+            &lt;p&gt;Uses residual convolution blocks.&lt;/p&gt;
+          &lt;/li&gt;
+          &lt;li&gt;
+            &lt;p&gt;Similar to the exit flow for &lt;a href=&quot;https://arxiv.org/abs/1610.02357&quot;&gt;Xception Network&lt;/a&gt;&lt;/p&gt;
+          &lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+      &lt;li&gt;
+        &lt;p&gt;&lt;strong&gt;Audio Data&lt;/strong&gt;&lt;/p&gt;
+
+        &lt;ul&gt;
+          &lt;li&gt;1-d waveform over time or 2-d spectrogram operated upon by stack of 8 residual convolution blocks.&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;tasks&quot;&gt;Tasks&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;WSJ speech corpus&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;ImageNet dataset&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;COCO image captioning dataset&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;WSJ parsing dataset&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;WMT English-German translation corpus&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;German-English translation&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;WMT English-French translation corpus&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;German-French translation&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;The experimental section is not very rigorous with many details skipped (would probably be added later).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;While MultiModel does not beat the state of the art models, it does outperform some recent models.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Jointly trained model performs similar to single trained models on tasks with a lot of data and sometimes outperformed single trained models on tasks with less data (like parsing).&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Interestingly, jointly training the model for parsing task and Imagenet tasks improves the performance of parsing task even though the two tasks are seemingly unrelated.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Another experiment was done to evaluate the effect of components (like MoE) on tasks (like Imagenet) which do not explicitly need them. It was observed that either the performance either went down or remained the same when MoE component was removed. This indicates that mixing different components does help to improve performance over multiple tasks.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;But this observation is not conclusive as a different combination of say the encoder (that does not use MoE) could achieve better performance than one that does. The paper does not explore possibilities like these.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Two/Too Simple Adaptations of Word2Vec for Syntax Problems</title>
+   <link href="/site/2017/06/26/Two-Too-Simple-Adaptations-of-Word2Vec-for-Syntax-Problems.html"/>
+   <updated>2017-06-26T00:00:00-04:00</updated>
+   <id>/site/2017/06/26/Two-Too Simple Adaptations of Word2Vec for Syntax Problems</id>
+   <content type="html">&lt;ul&gt;
+  &lt;li&gt;The paper proposes two variants of Word2Vec model so that it may account for syntactic properties of words and perform better on syntactic tasks like POS tagging and dependency parsing.&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;http://www.cs.cmu.edu/~lingwang/papers/naacl2015.pdf&quot;&gt;Link to the paper&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;In the original Skip-Gram setting, the model predicts the &lt;em&gt;2c&lt;/em&gt; words in the context window (&lt;em&gt;c&lt;/em&gt; is the size of the context window). But it uses the same set of parameters whether predicting the word next to the centre word or the word farthest away, thus losing all information about the word order.&lt;/li&gt;
+  &lt;li&gt;Similarly, the CBOW (Continuous Bas Of Words) model just adds the embedding of all the surrounding words thereby losing the word order information.&lt;/li&gt;
+  &lt;li&gt;The paper proposes to use a set of &lt;em&gt;2c&lt;/em&gt; matrices each for a different word in the context window for both Skip-Gram and CBOW models.&lt;/li&gt;
+  &lt;li&gt;This simple trick allows for accounting of syntactic properties in the word vectors and improves the performance of dependency parsing task and POS tagging.&lt;/li&gt;
+  &lt;li&gt;The downside of using this is that now the model has far more parameters than before which increases the training time and needs a large enough corpus to avoid sparse representation.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>A Decomposable Attention Model for Natural Language Inference</title>
+   <link href="/site/2017/06/17/A-Decomposable-Attention-Model-for-Natural-Language-Inference.html"/>
+   <updated>2017-06-17T00:00:00-04:00</updated>
+   <id>/site/2017/06/17/A Decomposable Attention Model for Natural Language Inference</id>
+   <content type="html">&lt;h3 id=&quot;introduction&quot;&gt;Introduction&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The paper proposes an attention based mechanism to decompose the problem of Natural Language Inference (NLI) into parallelizable subproblems.&lt;/li&gt;
+  &lt;li&gt;Further, it uses much fewer parameters as compared to any other model while obtaining state of the art results.&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1606.01933&quot;&gt;Link to the paper&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;The motivation behind the paper is that the tasks like NLI do not require deep modelling of the sentence structure and comparison of local text substructures followed by aggregation can also work very well&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;approach&quot;&gt;Approach&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given two sentences &lt;strong&gt;a&lt;/strong&gt; and &lt;strong&gt;b&lt;/strong&gt;, the model has to predict whether they have an “entailment” relationship, “neutral” relationship or “contradiction” relationship.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Embed&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;All the words are mapped to their corresponding word vector representation. In subsequent steps, “word” refers to the word vector representation of the actual word.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Attend&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;For each word &lt;em&gt;i&lt;/em&gt; in &lt;strong&gt;a&lt;/strong&gt; and &lt;em&gt;j&lt;/em&gt; in &lt;strong&gt;b&lt;/strong&gt;, obtain unnormalized attention weights *e(i, j)=F(i)&lt;sup&gt;T&lt;/sup&gt;F(j) where F is a feed-forward neural network.&lt;/li&gt;
+      &lt;li&gt;For &lt;em&gt;i&lt;/em&gt;, compute a β&lt;sub&gt;i&lt;/sub&gt; by performing softmax-like normalization of &lt;em&gt;j&lt;/em&gt; using &lt;em&gt;e(i, j)&lt;/em&gt; as the weight and normalizing for all words &lt;em&gt;j&lt;/em&gt; in &lt;strong&gt;b&lt;/strong&gt;.&lt;/li&gt;
+      &lt;li&gt;β&lt;sub&gt;i&lt;/sub&gt; captures the subphrase in &lt;strong&gt;b&lt;/strong&gt; that is softly aligned to &lt;em&gt;a&lt;/em&gt;.&lt;/li&gt;
+      &lt;li&gt;Similarly compute α&lt;sub&gt;j&lt;/sub&gt; for &lt;em&gt;j&lt;/em&gt;.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Compare&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;Create two set of comparison vectors, one for &lt;strong&gt;a&lt;/strong&gt; and another for &lt;strong&gt;b&lt;/strong&gt;&lt;/li&gt;
+      &lt;li&gt;For &lt;strong&gt;a&lt;/strong&gt;, &lt;strong&gt;v&lt;sub&gt;1, i&lt;/sub&gt;&lt;/strong&gt; = G(concatenate(i, β&lt;sub&gt;i&lt;/sub&gt;)).&lt;/li&gt;
+      &lt;li&gt;Similarly for &lt;strong&gt;b&lt;/strong&gt;, &lt;strong&gt;v&lt;sub&gt;2, j&lt;/sub&gt;&lt;/strong&gt; = G(concatenate(j, α&lt;sub&gt;j&lt;/sub&gt;))&lt;/li&gt;
+      &lt;li&gt;G is another feed-forward neural network.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Aggregate&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;Aggregate over the two set of comparison vectors to obtain &lt;strong&gt;v&lt;sub&gt;1&lt;/sub&gt;&lt;/strong&gt; and &lt;strong&gt;v&lt;sub&gt;2&lt;/sub&gt;&lt;/strong&gt;.&lt;/li&gt;
+      &lt;li&gt;Feed the aggregated results through the final classifier layer.&lt;/li&gt;
+      &lt;li&gt;Multi-class cross-entropy loss function.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;The paper also explains how this representation can be augmented using intra-sentence attention to the model compositional relationship between words.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;computational-complexity&quot;&gt;Computational Complexity&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Computationally, the proposed model is asymptotically as good as LSTM with attention.&lt;/li&gt;
+  &lt;li&gt;Assuming that dimensionality of word vectors &amp;gt; length of the sentence (reasonable for the given SNLI dataset), the model is asymptotically as good as regular LSTM.&lt;/li&gt;
+  &lt;li&gt;Further, the model has the advantage of being parallelizable.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;experiment&quot;&gt;Experiment&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;On Stanford Natural Language Inference (SNLI) dataset, the proposed model achieves the state of the art results even when it uses an order of magnitude lesser parameters than the next best model.&lt;/li&gt;
+  &lt;li&gt;Adding intra-sentence attention further improve the test accuracy by 0.5 percent.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;notes&quot;&gt;Notes&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;A similar approach could be tried on paraphrase detection problem as even that problem should not require very deep sentence representation. &lt;a href=&quot;https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs&quot;&gt;Quora Duplicate Question Detection Challenege&lt;/a&gt;  would have been an ideal dataset but it has a lot of out-of-vocabulary information related to named entities which need to be accounted for.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>A Fast and Accurate Dependency Parser using Neural Networks</title>
+   <link href="/site/2017/06/03/A-Fast-and-Accurate-Dependency-Parser-using-Neural-Networks.html"/>
+   <updated>2017-06-03T00:00:00-04:00</updated>
+   <id>/site/2017/06/03/A Fast and Accurate Dependency Parser using Neural Networks</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+&lt;ul&gt;
+  &lt;li&gt;The paper proposes a neural network classifier to perform transition-based dependency parsing using dense vector representation for the features.&lt;/li&gt;
+  &lt;li&gt;Earlier approaches used a large, manually designed sparse feature vector which took a lot of time and effort to compute and was often incomplete.&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;http://cs.stanford.edu/people/danqi/papers/emnlp2014.pdf&quot;&gt;Link to the paper&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;description-of-the-system&quot;&gt;Description of the system&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The system described in the paper uses &lt;a href=&quot;http://www.mitpressjournals.org/doi/pdf/10.1162/coli.07-056-R1-07-027&quot;&gt;&lt;strong&gt;arc-standard&lt;/strong&gt; system&lt;/a&gt; (a greedy, transition-based dependency parsing system).&lt;/li&gt;
+  &lt;li&gt;Words, POS tags and arc labels are represented as d dimensional vectors.&lt;/li&gt;
+  &lt;li&gt;S&lt;sup&gt;w&lt;/sup&gt;, S&lt;sup&gt;t&lt;/sup&gt;, S&lt;sup&gt;l&lt;/sup&gt; denote the set of words, POS and labels respectively.&lt;/li&gt;
+  &lt;li&gt;Neural network takes as input selected words from the 3 sets and uses a single hidden layer followed by Softmax which models the different actions that can be chosen by the arc-standard system.&lt;/li&gt;
+  &lt;li&gt;Uses a cube activation function to allow interaction between features coming from the set of words, POS and labels in the first layer itself. These features come from different embeddings and are not related as such.&lt;/li&gt;
+  &lt;li&gt;Using separate embedding for POS tags and labels allow for capturing aspects like NN (singular noun) should be closer to NNS (plural noun) than DT (determiner).&lt;/li&gt;
+  &lt;li&gt;Input to the network contains words on the stack and buffer and their left and right children (read upon transition-based parsing), their labels and corresponding arc labels.&lt;/li&gt;
+  &lt;li&gt;Output generated by the system is the action to be taken (transition to be performed) when reading each word in the input.&lt;/li&gt;
+  &lt;li&gt;This sequential and deterministic nature of the input-output mapping allows the problem to be modelled as a supervised learning problem and a cross entropy loss can be used.&lt;/li&gt;
+  &lt;li&gt;L2-regularization term is also added to the loss.&lt;/li&gt;
+  &lt;li&gt;During inference, a greedy decoding strategy is used and transition with the highest score is chosen.&lt;/li&gt;
+  &lt;li&gt;The paper mentions a pre-computation trick where matrix computation of most frequent top 10000 words is performed beforehand and cached.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Dataset
+    &lt;ul&gt;
+      &lt;li&gt;English Penn Treebank (PTB)&lt;/li&gt;
+      &lt;li&gt;Chinese Penn Treebank (CTB)&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;Two dependency representations used:
+    &lt;ul&gt;
+      &lt;li&gt;CoNLL Syntactic Dependencies (CD)&lt;/li&gt;
+      &lt;li&gt;Stanford Basic Dependencies (SD)&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;Metrics:
+    &lt;ul&gt;
+      &lt;li&gt;Unlabeled Attached Scores (UAS)&lt;/li&gt;
+      &lt;li&gt;Labeled Attached Scores (LAS)&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;Benchmarked against:
+    &lt;ul&gt;
+      &lt;li&gt;Greedy arc-eager parser&lt;/li&gt;
+      &lt;li&gt;Greedy arc-standard parser&lt;/li&gt;
+      &lt;li&gt;Malt-Parser&lt;/li&gt;
+      &lt;li&gt;MSTParser&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;Results
+    &lt;ul&gt;
+      &lt;li&gt;The system proposed in the paper outperforms all other parsers in both speed and accuracy.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;analysis&quot;&gt;Analysis&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Cube function gives a 0.8-1.2% improvement over tanh.&lt;/li&gt;
+  &lt;li&gt;Pretained embeddings give 0.7-1.7% improvement over training embeddings from scratch.&lt;/li&gt;
+  &lt;li&gt;Using POS and labels gives an improvement of 1.7% and 0.4% respectively.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Neural Module Networks</title>
+   <link href="/site/2017/05/23/Neural-Module-Networks.html"/>
+   <updated>2017-05-23T00:00:00-04:00</updated>
+   <id>/site/2017/05/23/Neural Module Networks</id>
+   <content type="html">&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;For the task of &lt;a href=&quot;https://shagunsodhani.in/papers-I-read/VQA-Visual-Question-Answering&quot;&gt;Visual Question Answering&lt;/a&gt;, decompose a question into its linguistic substructures and train a neural network module for each substructure.&lt;/li&gt;
+  &lt;li&gt;Jointly train the modules and dynamically compose them into deep networks which can learn to answer the question.&lt;/li&gt;
+  &lt;li&gt;Start by analyzing the question and decide what logical units are needed to answer the question and what should be the relationship between them.&lt;/li&gt;
+  &lt;li&gt;The paper also introduces a new dataset for Visual Question Answering which has challenging, highly compositional questions about abstract shapes.&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1511.02799&quot;&gt;Link to the paper&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;inspiration&quot;&gt;Inspiration&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Questions tend to be compositional.&lt;/li&gt;
+  &lt;li&gt;Different architectures are needed for different tasks - CNNs for object detection, RNNs for counting.&lt;/li&gt;
+  &lt;li&gt;Recurrent and Recursive Neural Networks also use the idea of a different network graph for each input.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;neural-module-network-for-vqa&quot;&gt;Neural Module Network for VQA&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Training samples of form &lt;em&gt;(w, x, y)&lt;/em&gt;
+    &lt;ul&gt;
+      &lt;li&gt;&lt;em&gt;w&lt;/em&gt; - Natural Language Question&lt;/li&gt;
+      &lt;li&gt;&lt;em&gt;x&lt;/em&gt; - Images&lt;/li&gt;
+      &lt;li&gt;&lt;em&gt;y&lt;/em&gt; - Answer&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;Model specified by collection of modules &lt;em&gt;{m}&lt;/em&gt; and a network layout predictor &lt;em&gt;P&lt;/em&gt;.&lt;/li&gt;
+  &lt;li&gt;Model instantiates a network based on &lt;em&gt;P(w)&lt;/em&gt; and uses that to encode a distribution &lt;em&gt;P(y|w, x, model_params)&lt;/em&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;modules&quot;&gt;Modules&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Find: Finds objects of interest.&lt;/li&gt;
+  &lt;li&gt;Transform: Shift regions of attention.&lt;/li&gt;
+  &lt;li&gt;Combine: Merge two attention maps into a single one.&lt;/li&gt;
+  &lt;li&gt;Describe: Map a pair of attention and input image to a distribution over the labels.&lt;/li&gt;
+  &lt;li&gt;Measure: Map attention to a distribution over the labels.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;natural-language-question-to-networks&quot;&gt;Natural Language Question to Networks&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Map question to the layout which specifies the set of modules and connections between them.&lt;/li&gt;
+  &lt;li&gt;Assemble the final network using the layout.&lt;/li&gt;
+  &lt;li&gt;Parse the input question to obtain set of dependencies and obtain a representation similar to combinatory logic.&lt;/li&gt;
+  &lt;li&gt;eg “what is the colour of the truck?” becomes “colour(truck)”&lt;/li&gt;
+  &lt;li&gt;The symbolic representation is mapped to a layout:
+    &lt;ul&gt;
+      &lt;li&gt;All leaves become &lt;em&gt;find&lt;/em&gt; module.&lt;/li&gt;
+      &lt;li&gt;All internal nodes become &lt;em&gt;transform/combine&lt;/em&gt; module.&lt;/li&gt;
+      &lt;li&gt;All root nodes become &lt;em&gt;describe/measure&lt;/em&gt; module.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;answering-natural-language-question&quot;&gt;Answering Natural Language Question&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Final model combines output from a simple LSTM question encoder with the output of the neural module network.&lt;/li&gt;
+  &lt;li&gt;This helps in modelling the syntactic and semantic regularities of the question.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Since some modules are updated more frequently than others, adaptive per weight learning rates are better.&lt;/li&gt;
+  &lt;li&gt;The paper introduces a small SHAPES datasets (64 images and 244 unique questions per image).&lt;/li&gt;
+  &lt;li&gt;Neural Module Network achieves a score of 90% on SHAPES dataset while VIS + LSTM baseline achieves an accuracy of 65.3%.&lt;/li&gt;
+  &lt;li&gt;Even on natural images (VQA dataset), the neural module network outperforms the VIS + LSTM baseline.&lt;/li&gt;
+&lt;/ul&gt;
+
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Making the V in VQA Matter - Elevating the Role of Image Understanding in Visual Question Answering</title>
+   <link href="/site/2017/05/14/Making-the-V-in-VQA-Matter-Elevating-the-Role-of-Image-Understanding-in-Visual-Question-Answering.html"/>
+   <updated>2017-05-14T00:00:00-04:00</updated>
+   <id>/site/2017/05/14/Making the V in VQA Matter - Elevating the Role of Image Understanding in Visual Question Answering</id>
+   <content type="html">&lt;h3 id=&quot;problem-statement&quot;&gt;Problem Statement&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Standard VQA models benefit from the inherent bias in the structure of the world and the language of the question.&lt;/li&gt;
+  &lt;li&gt;For example, if the question starts with “Do you see a …”, it is more likely to be “yes” than “no”.&lt;/li&gt;
+  &lt;li&gt;To truly assess the capability of any VQA system, we need to have evaluation tasks that require the use of both the visual and the language modality.&lt;/li&gt;
+  &lt;li&gt;The authors present a balanced version of &lt;a href=&quot;https://shagunsodhani.in/papers-I-read/VQA-Visual-Question-Answering&quot;&gt;VQA dataset&lt;/a&gt; where each question in the dataset is associated with a pair of similar images such that the same question would give different answers on the two images.&lt;/li&gt;
+  &lt;li&gt;The proposed data collection procedure enables the authors to develop a novel interpretable model which, given an image and a question, identifies an image that is similar to the original image but has a different answer to the same question thereby building trust for the system.&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1612.00837&quot;&gt;Link to the paper&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;dataset-collection&quot;&gt;Dataset Collection&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Given an (image, question, answer) triplet (I, Q, A) from the VQA dataset, a human worker (on AMT) is asked to identify an image I’ which is similar to I but for which the answer to question Q is A’ (different from A).&lt;/li&gt;
+  &lt;li&gt;To facilitate the search for I’, the worker is shown 24 nearest-neighbor images of I (based on VGGNet features) and is asked to choose the most similar image to I, for which Q makes sense and answer for Q is different than A. In case none of the 24 images qualifies, the worker may select “not possible”.&lt;/li&gt;
+  &lt;li&gt;In the second round, the workers were asked to answer Q for I’.&lt;/li&gt;
+  &lt;li&gt;This 2-stage protocol results in a significantly more balanced dataset than the previous dataset.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;observation&quot;&gt;Observation&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;State-of-the-art models trained on unbalanced VQA dataset perform significantly worse on the new, balanced dataset indicating that those models benefitted from the language bias in the older dataset.&lt;/li&gt;
+  &lt;li&gt;Training on balanced dataset improves performance on the unbalanced dataset.&lt;/li&gt;
+  &lt;li&gt;Further, the VQA model, trained on the balanced dataset, learns to differentiate between otherwise similar images.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;counter-example-explanations&quot;&gt;Counter-example Explanations&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Given an image and a question, the model not only answers the question, it also provides an image (from the k nearest neighbours of I, based on VGGNet features) which is similar to the input image but for which the model would have given different answer for the same image.&lt;/li&gt;
+  &lt;li&gt;Supervising signal is provided by the data collection procedure where humans pick the image I’ from the same set of candidate images.&lt;/li&gt;
+  &lt;li&gt;For each image in the candidate set, compute the inner product of question-image embedding and answer embedding.&lt;/li&gt;
+  &lt;li&gt;The K inner product values are passed through a fully connected layer to generate K scores.&lt;/li&gt;
+  &lt;li&gt;Trained with pairwise hinge ranking loss so that the score of the human picked image is higher than the score of all other images by a margin of M (hyperparameter).&lt;/li&gt;
+  &lt;li&gt;The proposed explanation model achieves a recall@5 of 43.49%&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Conditional Similarity Networks</title>
+   <link href="/site/2017/05/07/Conditional-Similarity-Networks.html"/>
+   <updated>2017-05-07T00:00:00-04:00</updated>
+   <id>/site/2017/05/07/Conditional Similarity Networks</id>
+   <content type="html">&lt;h2 id=&quot;problem-statement&quot;&gt;Problem Statement&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;A common way of measuring image similarity is to embed them into feature spaces where distance acts as a proxy for similarity.&lt;/li&gt;
+  &lt;li&gt;But this feature space can capture one (or a weighted combination) of the many possible notions of similarity.&lt;/li&gt;
+  &lt;li&gt;What if contracting notions of similarity could be captured at the same time - in terms of semantically distinct subspaces.&lt;/li&gt;
+  &lt;li&gt;The paper proposes a new architecture called as Conditional Similarity Networks (CSNs) which learns a disentangled embedding such that the features, for different notions of similarity, are encoded into separate dimensions.&lt;/li&gt;
+  &lt;li&gt;It jointly learns masks (or feature extractors) that select and reweights relevant dimensions to induce a subspace that encodes a specific notion of similarity.&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;https://vision.cornell.edu/se3/conditional-similarity-networks/&quot;&gt;Link to the paper&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;conditional-similarity-networks&quot;&gt;Conditional Similarity Networks&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Given an image, &lt;em&gt;x&lt;/em&gt;, learn a non-linear feature embedding &lt;em&gt;f(x)&lt;/em&gt; such that for any 2 images &lt;em&gt;x&lt;sub&gt;1&lt;/sub&gt;&lt;/em&gt; and &lt;em&gt;x&lt;sub&gt;2&lt;/sub&gt;&lt;/em&gt;, the euclidean distance between &lt;em&gt;f(x&lt;sub&gt;1&lt;/sub&gt;)&lt;/em&gt; and &lt;em&gt;f(x&lt;sub&gt;2&lt;/sub&gt;)&lt;/em&gt; reflects their similarity.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;conditional-similarity-triplets&quot;&gt;Conditional Similarity Triplets&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Given a triplet of images &lt;em&gt;(x&lt;sub&gt;1&lt;/sub&gt;, x&lt;sub&gt;2&lt;/sub&gt;, x&lt;sub&gt;3&lt;/sub&gt;)&lt;/em&gt; and a condition &lt;em&gt;c&lt;/em&gt; (the notion of similarity), an oracle (say crowd) is used to determmine if &lt;em&gt;x&lt;sub&gt;1&lt;/sub&gt;&lt;/em&gt; is more similar to &lt;em&gt;x&lt;sub&gt;2&lt;/sub&gt;&lt;/em&gt; or &lt;em&gt;x&lt;sub&gt;3&lt;/sub&gt;&lt;/em&gt; as per the given criteria &lt;em&gt;c&lt;/em&gt;.&lt;/li&gt;
+  &lt;li&gt;In general, for images &lt;em&gt;i, j, l&lt;/em&gt;, the triplet &lt;em&gt;t&lt;/em&gt; is ordered {i, j, l | c} if &lt;em&gt;i&lt;/em&gt; is more similar to &lt;em&gt;j&lt;/em&gt; than &lt;em&gt;l&lt;/em&gt;.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;learning-from-triplets&quot;&gt;Learning From Triplets&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Define a loss function &lt;em&gt;L&lt;sub&gt;T&lt;/sub&gt;()&lt;/em&gt; to model the similarity structure over the triplets.&lt;/li&gt;
+  &lt;li&gt;&lt;em&gt;L&lt;sub&gt;T&lt;/sub&gt;(i, j, l) = max{0, D(i, j) - D(i, l) + h}&lt;/em&gt; where &lt;em&gt;D&lt;/em&gt; is the euclidean distance function and &lt;em&gt;h&lt;/em&gt; is the similarity scalar margin to prevent trivial solutions.&lt;/li&gt;
+  &lt;li&gt;To model conditional similarities, masks &lt;em&gt;m&lt;/em&gt; are defined as &lt;em&gt;m = σ(β)&lt;/em&gt; where σ is the RELU unit and β is a set of parameters to be learnt.&lt;/li&gt;
+  &lt;li&gt;&lt;em&gt;m&lt;sub&gt;c&lt;/sub&gt;&lt;/em&gt; denotes the selection of the c-th mask column from feature vector. It thus acts as an element-wise gating function which selects the relevant dimensions of the embedding to attend to a particular similarity concept.&lt;/li&gt;
+  &lt;li&gt;The euclidean function &lt;em&gt;D&lt;/em&gt; now computes the masked distance (&lt;em&gt;f(i, c)m&lt;sub&gt;c&lt;/sub&gt;&lt;/em&gt;) between the two given images.&lt;/li&gt;
+  &lt;li&gt;Two regularising terms are also added - L2 norm for &lt;em&gt;D&lt;/em&gt; and L1 norm for &lt;em&gt;m&lt;/em&gt;.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;experiments&quot;&gt;Experiments&lt;/h2&gt;
+
+&lt;h3 id=&quot;datasets&quot;&gt;Datasets&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Fonts dataset by Bernhardsson
+    &lt;ul&gt;
+      &lt;li&gt;3.1 million 64 by 64-pixel grey scale images.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;Zappos50k shoe dataset
+    &lt;ul&gt;
+      &lt;li&gt;Contains 50,000 images of individual richly annotated shoes.&lt;/li&gt;
+      &lt;li&gt;Characteristics of interest:
+        &lt;ul&gt;
+          &lt;li&gt;Type of the shoes (i.e., shoes, boots, sandals or slippers)&lt;/li&gt;
+          &lt;li&gt;Suggested gender of the shoes (i.e., for women, men, girls or boys)&lt;/li&gt;
+          &lt;li&gt;Height of the shoes’ heels (0 to 5 inches)&lt;/li&gt;
+          &lt;li&gt;Closing mechanism of the shoes (buckle, pull on, slip on, hook and loop or laced up)&lt;/li&gt;
+        &lt;/ul&gt;
+      &lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;models&quot;&gt;Models&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Initial model for the experiments is a ConvNet pre-trained on ImageNet&lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Standard Triplet Network&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;Learn from all available triplets jointly as if they have the same notion of similarity.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Set of Task Specific Triplet Networks&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;Train n separate triplet networks such that each is trained on a single notion of similarity.&lt;/li&gt;
+      &lt;li&gt;Needs far more parameters and compute.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Conditional Similarity Networks - fixed disjoint masks&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;In this version, only the convolutional filters and the embedding is learnt and masks are predefined to be disjoint.&lt;/li&gt;
+      &lt;li&gt;Aims to learn a fully disjoint embedding.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Conditional Similarity Networks - learned masks&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;Learns all the components - conv filters, embedding and the masks.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;Refer paper for details on hyperparameters.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Visual exploration of the learned subspaces (t-sne visualisation) show that network successfully disentangles different features in the embedded vector space.&lt;/li&gt;
+  &lt;li&gt;The learned masks are very sparse and share dimensions. This shows that CSNs may learn to only use the required number of dimensions thereby doing away with the need of picking the right size of embedding.&lt;/li&gt;
+  &lt;li&gt;Order of performance:
+    &lt;ul&gt;
+      &lt;li&gt;CSNs with learned masks &amp;gt; CSNs with fixed masks &amp;gt; Task-specific networks &amp;gt; standard triplet network.&lt;/li&gt;
+      &lt;li&gt;Though CSNs with learned masks require more training data.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;CSNs also outperform Standard Triplet Network when used as off the shelf features for (brand) classification task and is very close to the performance of ResNet trained on ImageNet.&lt;/li&gt;
+  &lt;li&gt;This shows that while CSN retained most of the information in the original network, the training mechanism of Standard Triplet Network hurts the underlying conv features and their generalising capability&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>Simple Baseline for Visual Question Answering</title>
+   <link href="/site/2017/04/28/Simple-Baseline-for-Visual-Question-Answering.html"/>
+   <updated>2017-04-28T00:00:00-04:00</updated>
+   <id>/site/2017/04/28/Simple Baseline for Visual Question Answering</id>
+   <content type="html">&lt;h3 id=&quot;problem-statement&quot;&gt;Problem Statement&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;VQA Task: Given an image and a free-form, open-ended, natural language question (about the image), produce the answer for the image.&lt;/li&gt;
+  &lt;li&gt;The paper attempts to fine tune the simple baseline method of Bag-of-Words + Image features (iBOWIMG) to make it competitive against more sophisticated LSTM models.&lt;/li&gt;
+  &lt;li&gt;&lt;a href=&quot;http://arxiv.org/pdf/1512.02167.pdf&quot;&gt;Link to the paper&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;model&quot;&gt;Model&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;VQA modelled as a classification task where the system learns to choose among one of the top k most prominent answers.&lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Text Features&lt;/strong&gt; - Convert input question to a one-hot vector and then transform to word vectors using a word embedding.&lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Image Features&lt;/strong&gt; - Last layer activations from GoogLeNet.&lt;/li&gt;
+  &lt;li&gt;Text features are concatenated with image features and fed into a softmax.&lt;/li&gt;
+  &lt;li&gt;Different learning rates and weight clipping for word embedding layer and softmax layer with the learning rate for embedding layer much higher than that of softmax layer.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;results&quot;&gt;Results&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;iBOWIMG model reports an accuracy of 55.89% for Open-ended questions and 61.97% for Multiple-Choice questions which is comparable to the performance of other, more sophisticated models.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;interpretation-of-the-model&quot;&gt;Interpretation of the model&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Since the model is very simple, it is possible to interpret the model to know what exactly is the model learning. This is the greatest strength of the paper even though the model is very simple and naive.&lt;/li&gt;
+  &lt;li&gt;The model attempts to memorise the correlation between the answer class and the informative words (in the question) and image features.&lt;/li&gt;
+  &lt;li&gt;Question words generally can influence the answer given the bias in images occurring in COCO dataset.&lt;/li&gt;
+  &lt;li&gt;Given the simple linear transformation being used, it is possible to quantify the importance of each single words (in the question) to the answer.&lt;/li&gt;
+  &lt;li&gt;The paper uses the Class Activation Mapping (CAM) approach (which uses the linear relation between softmax and final image feature map) to highlight the informative image regions relevant to the predicted answer.&lt;/li&gt;
+  &lt;li&gt;While the results reported by the paper are not themselves so significant, the described approach provides a way to interpret the strengths and weakness of different VQA datasets.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+ <entry>
+   <title>VQA-Visual Question Answering</title>
+   <link href="/site/2017/04/27/VQA-Visual-Question-Answering.html"/>
+   <updated>2017-04-27T00:00:00-04:00</updated>
+   <id>/site/2017/04/27/VQA Visual Question Answering</id>
+   <content type="html">&lt;h3 id=&quot;problem-statement&quot;&gt;Problem Statement&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Given an image and a free-form, open-ended, natural language question (about the image), produce the answer for the image.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1505.00468v6&quot;&gt;Link to the paper&lt;/a&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;vqa-challenge-and-workshop&quot;&gt;&lt;a href=&quot;http://www.visualqa.org/&quot;&gt;VQA Challenge and Workshop&lt;/a&gt;&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The authors organise an annual challenge and workshop to discuss the state-of-the-art methods and best practices in this domain.&lt;/li&gt;
+  &lt;li&gt;Interestingly, the second version is starting on 27th April 2017 (today).&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;benefits-over-tasks-like-image-captioning&quot;&gt;Benefits over tasks like image captioning:&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Simple, &lt;em&gt;n-gram&lt;/em&gt; statistics based methods are not sufficient.&lt;/li&gt;
+  &lt;li&gt;Requires the system to blend in different aspects of knowledge - object detection, activity recognition, commonsense reasoning etc.&lt;/li&gt;
+  &lt;li&gt;Since only short answers are expected, evaluation is easier.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;dataset&quot;&gt;Dataset&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Created a new dataset of 50000 realistic, abstract images.&lt;/li&gt;
+  &lt;li&gt;Used AMT to crowdsource the task of collecting questions and answers for MS COCO dataset (&amp;gt;200K images) and abstract images.&lt;/li&gt;
+  &lt;li&gt;Three questions per image and ten answers per question (along with their confidence) were collected.&lt;/li&gt;
+  &lt;li&gt;The entire dataset contains over 760K questions and 10M answers.&lt;/li&gt;
+  &lt;li&gt;The authors also performed an exhaustive analysis of the dataset to establish its diversity and to explore how the content of these question-answers differ from that of standard image captioning datasets.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;highlights-of-data-collection-methodology&quot;&gt;Highlights of data collection methodology&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Emphasis on questions that require an image, and not just common sense, to be answered correctly.&lt;/li&gt;
+  &lt;li&gt;Workers were shown previous questions when writing new questions to increase diversity.&lt;/li&gt;
+  &lt;li&gt;Answers collected from multiple users to account for discrepancies in answers by humans.&lt;/li&gt;
+  &lt;li&gt;Two modalities supported:
+    &lt;ul&gt;
+      &lt;li&gt;&lt;strong&gt;Open-ended&lt;/strong&gt; - produce the answer&lt;/li&gt;
+      &lt;li&gt;&lt;strong&gt;multiple-choice&lt;/strong&gt; - select from a set of options provided (18 options comprising of popular, plausible, random and ofc correct answer)&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;highlights-from-data-analysis&quot;&gt;Highlights from data analysis&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Most questions range from four to ten words while answers range from one to three words.&lt;/li&gt;
+  &lt;li&gt;Around 40% questions are “yes/no” questions.&lt;/li&gt;
+  &lt;li&gt;Significant (&amp;gt;80%) inter-human agreement for answers.&lt;/li&gt;
+  &lt;li&gt;The authors performed a study where human evaluators were asked to answer the questions without looking at the images.&lt;/li&gt;
+  &lt;li&gt;Further, they performed a study where evaluators were asked to label if a question could be answered using common sense and what was the youngest age group, they felt, could answer the question.&lt;/li&gt;
+  &lt;li&gt;The idea was to establish that a sufficient number of questions in the dataset required more than just common sense to answer.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;baseline-models&quot;&gt;Baseline Models&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;&lt;strong&gt;random&lt;/strong&gt; selection&lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;prior (“yes”)&lt;/strong&gt; - always answer as yes.&lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;per Q-type prior&lt;/strong&gt; - pick the most popular answer per question type.&lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;nearest neighbor&lt;/strong&gt; - find the k nearest neighbors for the given (image, question) pair.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;methods&quot;&gt;Methods&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;2-channel model (using vision and language models) followed by softmax over (K = 1000) most frequent answers.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Image Channel&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;&lt;strong&gt;I&lt;/strong&gt; - Used last hidden layer of VGGNet to obtain 4096-dim image embedding.&lt;/li&gt;
+      &lt;li&gt;&lt;strong&gt;norm I&lt;/strong&gt; - : l2 normalized version of &lt;strong&gt;I&lt;/strong&gt;.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Question Channel&lt;/strong&gt;
+    &lt;ul&gt;
+      &lt;li&gt;&lt;strong&gt;BoW Q&lt;/strong&gt; - Bag-of-Words representation for the questions using the top 1000 words plus the top 1- first, second and third words of the questions.&lt;/li&gt;
+      &lt;li&gt;&lt;strong&gt;LSTM Q&lt;/strong&gt; - Each word is encoded into 300-dim vectors using fully connected + tanh non-linearity. These embeddings are fed to an LSTM to obtain 1024d-dim embedding.&lt;/li&gt;
+      &lt;li&gt;&lt;strong&gt;Deeper LSTM Q&lt;/strong&gt; - Same as LSTM Q but uses two hidden layers to obtain 2048-dim embedding.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;&lt;strong&gt;Multi-Layer Perceptron (MLP)&lt;/strong&gt; - Combine image and question embeddings to obtain a single embedding.
+    &lt;ul&gt;
+      &lt;li&gt;&lt;strong&gt;BoW Q + I&lt;/strong&gt; method - concatenate BoW Q and I embeddings.&lt;/li&gt;
+      &lt;li&gt;&lt;strong&gt;LSTM Q + I, deeper LSTM Q + norm I&lt;/strong&gt; methods - image embedding transformed to 1024-dim using a FC layer and tanh non-linearity followed by element-wise multiplication of image and question vectors.&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;Pass combined embedding to an MLP - FC neural network with 2 hidden layers (1000 neurons and 0.5 dropout) with tanh, followed by softmax.&lt;/li&gt;
+  &lt;li&gt;Cross-entropy loss with VGGNet parameters frozen.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;h3 id=&quot;results&quot;&gt;Results&lt;/h3&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Deeper LSTM Q + norm I is the best model with 58.16% accuracy on open-ended dataset and 63.09% on multiple-choice but far behind the human evaluators (&amp;gt;80% and &amp;gt;90% respectively).&lt;/li&gt;
+  &lt;li&gt;The best model performs well for answers involving common visual objects but performs poorly for answers involving counts.&lt;/li&gt;
+  &lt;li&gt;Vision only model performs even worse than the model which always produces “yes” as the answer.&lt;/li&gt;
+&lt;/ul&gt;
+</content>
+ </entry>
+ 
+
+</feed>
diff --git a/_site/site/index.html b/_site/site/index.html
new file mode 100644
index 00000000..c842c537
--- /dev/null
+++ b/_site/site/index.html
@@ -0,0 +1,13 @@
+<div class="posts">
+  
+</div>
+
+<div class="pagination">
+  
+    <span class="pagination-item older">Older</span>
+  
+  
+    <span class="pagination-item newer">Newer</span>
+  
+</div>
+
diff --git a/_site/site/index.html.1 b/_site/site/index.html.1
new file mode 100755
index 00000000..cac41710
--- /dev/null
+++ b/_site/site/index.html.1
@@ -0,0 +1,924 @@
+#!/usr/bin/env bash
+
+shopt -s extglob
+set -o errtrace
+set -o errexit
+
+rvm_install_initialize()
+{
+  DEFAULT_SOURCES=(github.com/rvm/rvm bitbucket.org/mpapis/rvm)
+
+  BASH_MIN_VERSION="3.2.25"
+  if
+    [[ -n "${BASH_VERSION:-}" &&
+      "$(\printf "%b" "${BASH_VERSION:-}\n${BASH_MIN_VERSION}\n" | LC_ALL=C \sort -t"." -k1,1n -k2,2n -k3,3n | \head -n1)" != "${BASH_MIN_VERSION}"
+    ]]
+  then
+    echo "BASH ${BASH_MIN_VERSION} required (you have $BASH_VERSION)"
+    exit 1
+  fi
+
+  export HOME PS4
+  export rvm_trace_flag rvm_debug_flag rvm_user_install_flag rvm_ignore_rvmrc rvm_prefix rvm_path
+
+  PS4="+ \${BASH_SOURCE##\${rvm_path:-}} : \${FUNCNAME[0]:+\${FUNCNAME[0]}()}  \${LINENO} > "
+}
+
+log()  { printf "%b\n" "$*"; }
+debug(){ [[ ${rvm_debug_flag:-0} -eq 0 ]] || printf "%b\n" "Running($#): $*"; }
+fail() { log "\nERROR: $*\n" ; exit 1 ; }
+
+rvm_install_commands_setup()
+{
+  \which which >/dev/null 2>&1 || fail "Could not find 'which' command, make sure it's available first before continuing installation."
+  \which grep >/dev/null 2>&1 || fail "Could not find 'grep' command, make sure it's available first before continuing installation."
+  if
+    [[ -z "${rvm_tar_command:-}" ]] && builtin command -v gtar >/dev/null
+  then
+    rvm_tar_command=gtar
+  elif
+    ${rvm_tar_command:-tar} --help 2>&1 | GREP_OPTIONS="" \grep -- --strip-components >/dev/null
+  then
+    rvm_tar_command="${rvm_tar_command:-tar}"
+  else
+    case "$(uname)" in
+      (OpenBSD)
+        log "Trying to install GNU version of tar, might require sudo password"
+        if (( UID ))
+        then sudo pkg_add -z gtar-1
+        else pkg_add -z gtar-1
+        fi
+        rvm_tar_command=gtar
+        ;;
+      (Darwin|FreeBSD|DragonFly) # it's not possible to autodetect on OSX, the help/man does not mention all flags
+        rvm_tar_command=tar
+        ;;
+      (SunOS)
+        case "$(uname -r)" in
+          (5.10)
+            log "Trying to install GNU version of tar, might require sudo password"
+            if (( UID ))
+            then
+              if \which sudo >/dev/null 2>&1
+              then sudo_10=sudo
+              elif \which /opt/csw/bin/sudo >/dev/null 2>&1
+              then sudo_10=/opt/csw/bin/sudo
+              else fail "sudo is required but not found. You may install sudo from OpenCSW repository (https://www.opencsw.org/about)"
+              fi
+              pkginfo -q CSWpkgutil || $sudo_10 pkgadd -a $rvm_path/config/solaris/noask -d https://get.opencsw.org/now CSWpkgutil
+              sudo /opt/csw/bin/pkgutil -iy CSWgtar -t https://mirror.opencsw.org/opencsw/unstable
+            else
+              pkginfo -q CSWpkgutil || pkgadd -a $rvm_path/config/solaris/noask -d https://get.opencsw.org/now CSWpkgutil
+              /opt/csw/bin/pkgutil -iy CSWgtar -t https://mirror.opencsw.org/opencsw/unstable
+            fi
+            rvm_tar_command=/opt/csw/bin/gtar
+            ;;
+          (*)
+            rvm_tar_command=tar
+            ;;
+        esac
+    esac
+    builtin command -v ${rvm_tar_command:-gtar} >/dev/null ||
+    fail "Could not find GNU compatible version of 'tar' command, make sure it's available first before continuing installation."
+  fi
+  if
+    [[ " ${rvm_tar_options:-} " != *" --no-same-owner "*  ]] &&
+    $rvm_tar_command --help 2>&1 | GREP_OPTIONS="" \grep -- --no-same-owner >/dev/null
+  then
+    rvm_tar_options="${rvm_tar_options:-}${rvm_tar_options:+ }--no-same-owner"
+  fi
+}
+
+usage()
+{
+  printf "%b" "
+
+Usage
+
+  rvm-installer [options] [action]
+
+Options
+
+  [[--]version] <version>
+
+    The version or tag to install. Valid values are:
+
+      latest         - The latest tagged version.
+      latest-minor   - The latest minor version of the current major version.
+      latest-<x>     - The latest minor version of version x.
+      latest-<x>.<y> - The latest patch version of version x.y.
+      <x>.<y>.<z>    - Major version x, minor version y and patch z.
+
+  [--]branch <branch>
+
+    The name of the branch from which RVM is installed. This option can be used
+    with the following formats for <branch>:
+
+      <account>/
+
+        If account is wayneeseguin or mpapis, installs from one of the following:
+
+          https://github.com/rvm/rvm/archive/master.tar.gz
+          https://bitbucket.org/mpapis/rvm/get/master.tar.gz
+
+       Otherwise, installs from:
+
+         https://github.com/<account>/rvm/archive/master.tar.gz
+
+      <account>/<branch>
+
+        If account is wayneeseguin or mpapis, installs from one of the following:
+
+          https://github.com/rvm/rvm/archive/<branch>.tar.gz
+          https://bitbucket.org/mpapis/rvm/get/<branch>.tar.gz
+
+        Otherwise, installs from:
+
+          https://github.com/<account>/rvm/archive/<branch>.tar.gz
+
+      [/]<branch>
+
+        Installs the branch from one of the following:
+
+          https://github.com/rvm/rvm/archive/<branch>.tar.gz
+          https://bitbucket.org/mpapis/rvm/get/<branch>.tar.gz
+
+      [--]source <source>
+
+        Defines the repository from which RVM is retrieved and installed in the format:
+
+          <domain>/<account>/<repo>
+
+        Where:
+
+          <domain>  - Is bitbucket.org, github.com or a github enterprise site serving
+                      an RVM repository.
+          <account> - Is the user account in which the RVM repository resides.
+          <repo>    - Is the name of the RVM repository.
+
+        Note that when using the [--]source option, one should only use the [/]branch format
+        with the [--]branch option. Failure to do so will result in undefined behavior.
+
+      --trace
+
+        Provides debug logging for the installation script.
+Actions
+
+  master - Installs RVM from the master branch at rvm/rvm on github or mpapis/rvm
+           on bitbucket.org.
+  stable - Installs RVM from the stable branch a rvm/rvm on github or mpapis/rvm
+           on bitbucket.org.
+  help   - Displays this output.
+
+"
+}
+
+## duplication marker 32fosjfjsznkjneuera48jae
+__rvm_curl_output_control()
+{
+  if
+    (( ${rvm_quiet_curl_flag:-0} == 1 ))
+  then
+    __flags+=( "--silent" "--show-error" )
+  elif
+    [[ " $*" == *" -s"* || " $*" == *" --silent"* ]]
+  then
+    # make sure --show-error is used with --silent
+    [[ " $*" == *" -S"* || " $*" == *" -sS"* || " $*" == *" --show-error"* ]] ||
+    {
+      __flags+=( "--show-error" )
+    }
+  fi
+}
+
+## duplication marker 32fosjfjsznkjneuera48jae
+# -S is automatically added to -s
+__rvm_curl()
+(
+  __rvm_which curl >/dev/null ||
+  {
+    rvm_error "RVM requires 'curl'. Install 'curl' first and try again."
+    return 200
+  }
+
+  typeset -a __flags
+  __flags=( --fail --location --max-redirs 10 )
+
+  [[ "$*" == *"--max-time"* ]] ||
+  [[ "$*" == *"--connect-timeout"* ]] ||
+    __flags+=( --connect-timeout 30 --retry-delay 2 --retry 3 )
+
+  if [[ -n "${rvm_proxy:-}" ]]
+  then __flags+=( --proxy "${rvm_proxy:-}" )
+  fi
+
+  __rvm_curl_output_control
+
+  unset curl
+  __rvm_debug_command \curl "${__flags[@]}" "$@" || return $?
+)
+
+rvm_error()  { printf "ERROR: %b\n" "$*"; }
+__rvm_which(){   which "$@" || return $?; true; }
+__rvm_debug_command()
+{
+  debug "Running($#): $*"
+  "$@" || return $?
+  true
+}
+rvm_is_a_shell_function()
+{
+  [[ -t 0 && -t 1 ]] || return $?
+  return ${rvm_is_not_a_shell_function:-0}
+}
+
+# Searches the tags for the highest available version matching a given pattern.
+# fetch_version (github.com/rvm/rvm bitbucket.org/mpapis/rvm) 1.10. -> 1.10.3
+# fetch_version (github.com/rvm/rvm bitbucket.org/mpapis/rvm) 1.10. -> 1.10.3
+# fetch_version (github.com/rvm/rvm bitbucket.org/mpapis/rvm) 1.    -> 1.11.0
+# fetch_version (github.com/rvm/rvm bitbucket.org/mpapis/rvm) ""    -> 2.0.1
+fetch_version()
+{
+  typeset _account _domain _pattern _repo _sources _values _version
+  _sources=(${!1})
+  _pattern=$2
+  for _source in "${_sources[@]}"
+  do
+    IFS='/' read -r _domain _account _repo <<< "${_source}"
+    _version="$(
+      fetch_versions ${_domain} ${_account} ${_repo} |
+      GREP_OPTIONS="" \grep "^${_pattern:-}" | tail -n 1
+    )"
+    if
+      [[ -n ${_version} ]]
+    then
+      echo "${_version}"
+      return 0
+    fi
+  done
+}
+
+# Returns a sorted list of all version tags from a repository
+fetch_versions()
+{
+  typeset _account _domain _repo _url
+  _domain=$1
+  _account=$2
+  _repo=$3
+  case ${_domain} in
+    (bitbucket.org)
+      _url=https://${_domain}/api/1.0/repositories/${_account}/${_repo}/branches-tags
+      ;;
+    (github.com)
+      _url=https://api.${_domain}/repos/${_account}/${_repo}/tags
+      ;;
+
+    (*)
+      _url=https://${_domain}/api/v3/repos/${_account}/${_repo}/tags
+      ;;
+  esac
+  __rvm_curl -s ${_url} |
+    \awk -v RS=',' -v FS='"' '$2=="name"{print $4}' |
+    sort -t. -k 1,1n -k 2,2n -k 3,3n -k 4,4n -k 5,5n
+}
+
+install_release()
+{
+  typeset _source _sources _url _version _verify_pgp
+  _sources=(${!1})
+  _version=$2
+  debug "Downloading RVM version ${_version}"
+  for _source in "${_sources[@]}"
+  do
+    case ${_source} in
+      (bitbucket.org*)
+        _url="https://${_source}/get/${_version}.tar.gz"
+        _verify_pgp="https://${_source}/downloads/${_version}.tar.gz.asc"
+        ;;
+      (*)
+        _url="https://${_source}/archive/${_version}.tar.gz"
+        _verify_pgp="https://${_source}/releases/download/${_version}/${_version}.tar.gz.asc"
+        ;;
+    esac
+    get_and_unpack "${_url}" "rvm-${_version}.tgz" "$_verify_pgp" && return
+  done
+  return $?
+}
+
+install_head()
+{
+  typeset _branch _source _sources _url
+  _sources=(${!1})
+  _branch=$2
+  debug "Selected RVM branch ${_branch}"
+  for _source in "${_sources[@]}"
+  do
+    case ${_source} in
+      (bitbucket.org*)
+        _url=https://${_source}/get/${_branch}.tar.gz
+        ;;
+      (*)
+        _url=https://${_source}/archive/${_branch}.tar.gz
+        ;;
+    esac
+    get_and_unpack "${_url}" "rvm-${_branch//\//_}.tgz" && return
+  done
+  return $?
+}
+
+# duplication marker dfkjdjngdfjngjcszncv
+# Drop in cd which _doesn't_ respect cdpath
+__rvm_cd()
+{
+  typeset old_cdpath ret
+  ret=0
+  old_cdpath="${CDPATH}"
+  CDPATH="."
+  chpwd_functions="" builtin cd "$@" || ret=$?
+  CDPATH="${old_cdpath}"
+  return $ret
+}
+
+get_package()
+{
+  typeset _url _file
+  _url="$1"
+  _file="$2"
+  log "Downloading ${_url}"
+  __rvm_curl -sS ${_url} > ${rvm_archives_path}/${_file} ||
+  {
+    _return=$?
+    case $_return in
+      # duplication marker lfdgzkngdkjvnfjknkjvcnbjkncvjxbn
+      (60)
+        log "
+Could not download '${_url}', you can read more about it here:
+https://rvm.io/support/fixing-broken-ssl-certificates/
+To continue in insecure mode run 'echo insecure >> ~/.curlrc'.
+"
+        ;;
+      # duplication marker lfdgzkngdkjvnfjknkjvcnbjkncvjxbn
+      (77)
+        log "
+It looks like you have old certificates, you can read more about it here:
+https://rvm.io/support/fixing-broken-ssl-certificates/
+"
+        ;;
+      # duplication marker lfdgzkngdkjvnfjknkjvcnbjkncvjxbn
+      (141)
+        log "
+Curl returned 141 - it is result of a segfault which means it's Curls fault.
+Try again and if it crashes more than a couple of times you either need to
+reinstall Curl or consult with your distribution manual and contact support.
+"
+        ;;
+      (*)
+        log "
+Could not download '${_url}'.
+  curl returned status '$_return'.
+"
+        ;;
+    esac
+    return $_return
+  }
+}
+
+# duplication marker flnglfdjkngjndkfjhsbdjgfghdsgfklgg
+rvm_install_gpg_setup()
+{
+  export rvm_gpg_command
+  {
+    rvm_gpg_command="$( \which gpg2 2>/dev/null )" &&
+    [[ ${rvm_gpg_command} != "/cygdrive/"* ]]
+  } || rvm_gpg_command=""
+
+  debug "Detected GPG program: '$rvm_gpg_command'"
+
+  [[ -n "$rvm_gpg_command" ]] || return $?
+}
+
+# duplication marker rdjgndfnghdfnhgfdhbghdbfhgbfdhbn
+verify_package_pgp()
+{
+  if
+    "${rvm_gpg_command}" --verify "$2" "$1"
+  then
+    log "GPG verified '$1'"
+  else
+    typeset _ret=$?
+    log "\
+Warning, RVM 1.26.0 introduces signed releases and automated check of signatures when GPG software found. \
+Assuming you trust Michal Papis import the mpapis public key (downloading the signatures).
+
+GPG signature verification failed for '$1' - '$3'! Try to install GPG v2 and then fetch the public key:
+
+    ${SUDO_USER:+sudo }${rvm_gpg_command##*/} --keyserver hkp://keys.gnupg.net --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3
+
+or if it fails:
+
+    command curl -sSL https://rvm.io/mpapis.asc | ${SUDO_USER:+sudo }${rvm_gpg_command##*/} --import -
+
+the key can be compared with:
+
+    https://rvm.io/mpapis.asc
+    https://keybase.io/mpapis
+
+NOTE: GPG version 2.1.17 have a bug which cause failures during fetching keys from remote server. Please downgrade \
+or upgrade to newer version (if available) or use the second method described above.
+"
+    exit $_ret
+  fi
+}
+
+verify_pgp()
+{
+  [[ -n "${1:-}" ]] ||
+  {
+    debug "No PGP url given, skipping."
+    return 0
+  }
+
+  get_package "$1" "$2.asc" ||
+  {
+    debug "PGP url given but does not exist: '$1'"
+    return 0
+  }
+
+  rvm_install_gpg_setup ||
+  {
+    log "Found PGP signature at: '$1',
+but no GPG software exists to validate it, skipping."
+    return 0
+  }
+
+  verify_package_pgp "${rvm_archives_path}/$2" "${rvm_archives_path}/$2.asc" "$1"
+}
+
+get_and_unpack()
+{
+  typeset _url _file _patern _return _verify_pgp
+  _url="$1"
+  _file="$2"
+  _verify_pgp="$3"
+
+  get_package "$_url" "$_file" || return $?
+  verify_pgp "$_verify_pgp" "$_file" || return $?
+
+  [[ -d "${rvm_src_path}/rvm" ]] || \mkdir -p "${rvm_src_path}/rvm"
+  __rvm_cd "${rvm_src_path}/rvm" ||
+  {
+    _return=$?
+    log "Could not change directory '${rvm_src_path}/rvm'."
+    return $_return
+  }
+
+  rm -rf ${rvm_src_path}/rvm/*
+  __rvm_debug_command $rvm_tar_command xzf ${rvm_archives_path}/${_file} ${rvm_tar_options:-} --strip-components 1 ||
+  {
+    _return=$?
+    log "Could not extract RVM sources."
+    return $_return
+  }
+}
+
+rvm_install_default_settings()
+{
+  # Tracing, if asked for.
+  if
+    [[ "$*" == *--trace* ]] || (( ${rvm_trace_flag:-0} > 0 ))
+  then
+    set -o xtrace
+    rvm_trace_flag=1
+  fi
+
+  # Variable initialization, remove trailing slashes if they exist on HOME
+  true \
+    ${rvm_trace_flag:=0} ${rvm_debug_flag:=0}\
+    ${rvm_ignore_rvmrc:=0} HOME="${HOME%%+(\/)}"
+
+  if
+    (( rvm_ignore_rvmrc == 0 ))
+  then
+    for rvmrc in /etc/rvmrc "$HOME/.rvmrc"
+    do
+      if
+        [[ -s "$rvmrc" ]]
+      then
+        if
+          GREP_OPTIONS="" \grep '^\s*rvm .*$' "$rvmrc" >/dev/null 2>&1
+        then
+          printf "%b" "
+  Error: $rvmrc is for rvm settings only.
+  rvm CLI may NOT be called from within $rvmrc.
+  Skipping the loading of $rvmrc
+  "
+          exit 1
+        else
+          source "$rvmrc"
+        fi
+      fi
+    done
+  fi
+
+  if
+    [[ -z "${rvm_path:-}" ]]
+  then
+    if
+      (( UID == 0 ))
+    then
+      rvm_user_install_flag=0
+      rvm_prefix="/usr/local"
+      rvm_path="${rvm_prefix}/rvm"
+    else
+      rvm_user_install_flag=1
+      rvm_prefix="$HOME"
+      rvm_path="${rvm_prefix}/.rvm"
+    fi
+  fi
+  if [[ -z "${rvm_prefix}" ]]
+  then rvm_prefix=$( dirname $rvm_path )
+  fi
+
+  # duplication marker kkdfkgnjfndgjkndfjkgnkfjdgn
+  [[ -n "${rvm_user_install_flag:-}" ]] ||
+  case "$rvm_path" in
+    (/usr/local/rvm)         rvm_user_install_flag=0 ;;
+    ($HOME/*|/${USER// /_}*) rvm_user_install_flag=1 ;;
+    (*)                      rvm_user_install_flag=0 ;;
+  esac
+}
+
+rvm_install_parse_params()
+{
+  install_rubies=()
+  install_gems=()
+  flags=( ./scripts/install )
+  forwarded_flags=()
+  while
+    (( $# > 0 ))
+  do
+    token="$1"
+    shift
+    case "$token" in
+
+      (--trace)
+        set -o xtrace
+        rvm_trace_flag=1
+        flags=( -x "${flags[@]}" "$token" )
+        forwarded_flags+=( "$token" )
+        ;;
+
+      (--debug|--quiet-curl)
+        flags+=( "$token" )
+        forwarded_flags+=( "$token" )
+        token=${token#--}
+        token=${token//-/_}
+        export "rvm_${token}_flag"=1
+        printf "%b" "Turning on ${token/_/ } mode.\n"
+        ;;
+
+      (--path)
+        if [[ -n "${1:-}" ]]
+        then
+          rvm_path="$1"
+          shift
+        else
+          fail "--path must be followed by a path."
+        fi
+        ;;
+
+      (--branch|branch) # Install RVM from a given branch
+        if [[ -n "${1:-}" ]]
+        then
+          case "$1" in
+            (/*)
+              branch=${1#/}
+              ;;
+            (*/)
+              branch=master
+              if [[ "${1%/}" -ne wayneeseguin ]] && [[ "${1%/}" -ne mpapis ]]
+              then sources=(github.com/${1%/}/rvm)
+              fi
+              ;;
+            (*/*)
+              branch=${1#*/}
+              if [[ "${1%%/*}" -ne wayneeseguin ]] && [[ "${1%%/*}" -ne mpapis ]]
+              then sources=(github.com/${1%%/*}/rvm)
+              fi
+              ;;
+            (*)
+              branch="$1"
+              ;;
+          esac
+          shift
+        else
+          fail "--branch must be followed by a branchname."
+        fi
+        ;;
+
+      (--source|source)
+        if [[ -n "${1:-}" ]]
+        then
+          if [[ "$1" = */*/* ]]
+          then
+            sources=($1)
+            shift
+          else
+            fail "--source must be in the format <domain>/<account>/<repo>."
+          fi
+        else
+          fail "--source must be followed by a source."
+        fi
+        ;;
+
+      (--user-install|--ignore-dotfiles)
+        token=${token#--}
+        token=${token//-/_}
+        export "rvm_${token}_flag"=1
+        printf "%b" "Turning on ${token/_/ } mode.\n"
+        ;;
+
+      (--auto-dotfiles)
+        flags+=( "$token" )
+        export "rvm_auto_dotfiles_flag"=1
+        printf "%b" "Turning on auto dotfiles mode.\n"
+        ;;
+
+      (--auto)
+        export "rvm_auto_dotfiles_flag"=1
+        printf "%b" "Warning, --auto is deprecated in favor of --auto-dotfiles.\n"
+        ;;
+
+      (--verify-downloads)
+        if [[ -n "${1:-}" ]]
+        then
+          export rvm_verify_downloads_flag="$1"
+          forwarded_flags+=( "$token" "$1" )
+          shift
+        else
+          fail "--verify-downloads must be followed by level(0|1|2)."
+        fi
+        ;;
+
+      (--autolibs=*)
+        flags+=( "$token" )
+        export rvm_autolibs_flag="${token#--autolibs=}"
+        forwarded_flags+=( "$token" )
+        ;;
+
+      (--without-gems=*|--with-gems=*|--with-default-gems=*)
+        flags+=( "$token" )
+        value="${token#*=}"
+        token="${token%%=*}"
+        token="${token#--}"
+        token="${token//-/_}"
+        export "rvm_${token}"="${value}"
+        printf "%b" "Installing RVM ${token/_/ }: ${value}.\n"
+        ;;
+
+      (--version|version)
+        version="$1"
+        shift
+        ;;
+
+      (head|master)
+        version="head"
+        branch="master"
+        ;;
+
+      (stable)
+        version="latest"
+        ;;
+
+      (latest|latest-*|+([[:digit:]]).+([[:digit:]]).+([[:digit:]]))
+        version="$token"
+        ;;
+
+      (--ruby)
+        install_rubies+=( ruby )
+        ;;
+
+      (--ruby=*)
+        token=${token#--ruby=}
+        install_rubies+=( ${token//,/ } )
+        ;;
+
+      (--rails)
+        install_gems+=( rails )
+        ;;
+
+      (--gems=*)
+        token=${token#--gems=}
+        install_gems+=( ${token//,/ } )
+        ;;
+
+      (--add-to-rvm-group)
+        export rvm_add_users_to_rvm_group="$1"
+        shift
+        ;;
+
+      (help|usage)
+        usage
+        exit 0
+        ;;
+
+      (*)
+        usage
+        exit 1
+        ;;
+
+    esac
+  done
+
+  if (( ${#install_gems[@]} > 0 && ${#install_rubies[@]} == 0 ))
+  then install_rubies=( ruby )
+  fi
+
+  true "${version:=head}"
+  true "${branch:=master}"
+
+  if [[ -z "${sources[@]}" ]]
+  then sources=("${DEFAULT_SOURCES[@]}")
+  fi
+
+  rvm_src_path="$rvm_path/src"
+  rvm_archives_path="$rvm_path/archives"
+  rvm_releases_url="https://rvm.io/releases"
+}
+
+rvm_install_validate_rvm_path()
+{
+  case "$rvm_path" in
+    (*[[:space:]]*)
+      printf "%b" "
+It looks you are one of the happy *space* users (in home dir name),
+RVM is not yet fully ready for it, use this trick to fix it:
+
+    sudo mkdir -p /${USER// /_}.rvm
+    sudo chown -R \"$USER:\" /${USER// /_}.rvm
+    echo \"export rvm_path=/${USER// /_}.rvm\" >> \"$HOME/.rvmrc\"
+
+and start installing again.
+
+"
+      exit 2
+    ;;
+    (/usr/share/ruby-rvm)
+      printf "%b" "
+It looks you are one of the happy Ubuntu users,
+RVM packaged by Ubuntu is old and broken,
+follow this link for details how to fix:
+
+  https://stackoverflow.com/a/9056395/497756
+
+"
+      [[ "${rvm_uses_broken_ubuntu_path:-no}" == "yes" ]] || exit 3
+    ;;
+  esac
+
+  if [[ "$rvm_path" != "/"* ]]
+  then fail "The rvm install path must be fully qualified. Tried $rvm_path"
+  fi
+}
+
+rvm_install_validate_volume_mount_mode()
+{
+  \typeset path partition test_exec
+
+  path=$rvm_path
+
+  # Directory $rvm_path might not exists at this point so we need to traverse the tree upwards
+  while [[ -n "$path" ]]
+  do
+      if [[ -d $path ]]
+      then
+        partition=`df -P $path | awk 'END{print $1}'`
+
+        test_exec=$(mktemp $path/rvm-exec-test.XXXXXX)
+        echo '#!/bin/sh' > "$test_exec"
+        chmod +x "$test_exec"
+
+        if ! "$test_exec"
+        then
+          rm -f "$test_exec"
+          printf "%b" "
+It looks that scripts located in ${path}, which would be RVM destination ${rvm_path},
+are not executable. One of the reasons might be that partition ${partition} holding this location
+is mounted in *noexec* mode, which prevents RVM from working correctly. Please verify your setup 
+and re-mount partition ${partition} without the noexec option."
+          exit 2
+        fi
+
+        rm -f "$test_exec"
+        break
+      fi
+
+      path=${path%/*}
+  done
+}
+
+rvm_install_select_and_get_version()
+{
+  typeset _version_release
+
+  for dir in "$rvm_src_path" "$rvm_archives_path"
+  do
+    [[ -d "$dir" ]] || mkdir -p "$dir"
+  done
+
+  _version_release="${version}"
+  case "${version}" in
+    (head)
+      _version_release="${branch}"
+      install_head sources[@] ${branch:-master} || exit $?
+      ;;
+
+    (latest)
+      install_release sources[@] $(fetch_version sources[@]) || exit $?
+      ;;
+
+    (latest-minor)
+      version="$(\cat "$rvm_path/VERSION")"
+      install_release sources[@] $(fetch_version sources[@] ${version%.*}) || exit $?
+      ;;
+
+    (latest-*)
+      install_release sources[@] $(fetch_version sources[@] ${version#latest-}) || exit $?
+      ;;
+
+    (+([[:digit:]]).+([[:digit:]]).+([[:digit:]])) # x.y.z
+      install_release sources[@] ${version} || exit $?
+      ;;
+
+    (*)
+      fail "Something went wrong, unrecognized version '$version'"
+      ;;
+  esac
+  echo "${_version_release}" > "$rvm_path/RELEASE"
+}
+
+rvm_install_main()
+{
+  [[ -f ./scripts/install ]] ||
+  {
+    log "'./scripts/install' can not be found for installation, something went wrong, it usally means your 'tar' is broken, please report it here: https://github.com/rvm/rvm/issues"
+    return 127
+  }
+
+  # required flag - path to install
+  flags+=( --path "$rvm_path" )
+  \command bash "${flags[@]}"
+}
+
+rvm_install_ruby_and_gems()
+(
+  if
+    (( ${#install_rubies[@]} > 0 ))
+  then
+    source ${rvm_scripts_path:-${rvm_path}/scripts}/rvm
+    source ${rvm_scripts_path:-${rvm_path}/scripts}/version
+    __rvm_version
+
+    for _ruby in ${install_rubies[@]}
+    do command rvm "${forwarded_flags[@]}" install ${_ruby} -j 2
+    done
+    # set the first one as default, skip rest
+    for _ruby in ${install_rubies[@]}
+    do
+      rvm "${forwarded_flags[@]}" alias create default ${_ruby}
+      break
+    done
+
+    for _gem in ${install_gems[@]}
+    do rvm "${forwarded_flags[@]}" all do gem install ${_gem}
+    done
+
+    printf "%b" "
+  * To start using RVM you need to run \`source $rvm_path/scripts/rvm\`
+    in all your open shell windows, in rare cases you need to reopen all shell windows.
+"
+
+    if
+      [[ "${install_gems[*]}" == *"rails"* ]]
+    then
+      printf "%b" "
+  * To start using rails you need to run \`rails new <project_dir>\`.
+"
+    fi
+  fi
+)
+
+rvm_install()
+{
+  rvm_install_initialize
+  rvm_install_commands_setup
+  rvm_install_default_settings
+  rvm_install_parse_params "$@"
+  rvm_install_validate_rvm_path
+  rvm_install_validate_volume_mount_mode
+  rvm_install_select_and_get_version
+  rvm_install_main
+  rvm_install_ruby_and_gems
+}
+
+rvm_install "$@"
diff --git a/_site/site/public/apple-touch-icon-precomposed.png b/_site/site/public/apple-touch-icon-precomposed.png
new file mode 100755
index 00000000..6cb41a8e
Binary files /dev/null and b/_site/site/public/apple-touch-icon-precomposed.png differ
diff --git a/_site/site/public/css/lanyon.css b/_site/site/public/css/lanyon.css
new file mode 100755
index 00000000..1d57108e
--- /dev/null
+++ b/_site/site/public/css/lanyon.css
@@ -0,0 +1,563 @@
+/*
+ *  ___
+ * /\_ \
+ * \//\ \      __      ___   __  __    ___     ___
+ *   \ \ \   /'__`\  /' _ `\/\ \/\ \  / __`\ /' _ `\
+ *    \_\ \_/\ \_\.\_/\ \/\ \ \ \_\ \/\ \_\ \/\ \/\ \
+ *    /\____\ \__/.\_\ \_\ \_\/`____ \ \____/\ \_\ \_\
+ *    \/____/\/__/\/_/\/_/\/_/`/___/> \/___/  \/_/\/_/
+ *                               /\___/
+ *                               \/__/
+ *
+ * Designed, built, and released under MIT license by @mdo. Learn more at
+ * https://github.com/poole/lanyon.
+ */
+
+
+/*
+ * Contents
+ *
+ * Global resets
+ * Masthead
+ * Sidebar
+ * Slide effect
+ * Posts and pages
+ * Pagination
+ * Reverse layout
+ * Themes
+ */
+
+
+/*
+ * Global resets
+ *
+ * Update the foundational and global aspects of the page.
+ */
+
+/* Prevent scroll on narrow devices */
+html,
+body {
+  overflow-x: hidden;
+}
+
+html {
+  font-family: "PT Serif", Georgia, "Times New Roman", serif;
+}
+
+h1, h2, h3, h4, h5, h6 {
+  font-family: "PT Sans", Helvetica, Arial, sans-serif;
+  font-weight: 400;
+  color: #313131;
+  letter-spacing: -.025rem;
+}
+
+
+/*
+ * Wrapper
+ *
+ * The wrapper is used to position site content when the sidebar is toggled. We
+ * use an outter wrap to position the sidebar without interferring with the
+ * regular page content.
+ */
+
+.wrap {
+  position: relative;
+  width: 100%;
+}
+
+
+/*
+ * Container
+ *
+ * Center the page content.
+ */
+
+.container {
+  max-width: 28rem;
+}
+@media (min-width: 38em) {
+  .container {
+    max-width: 32rem;
+  }
+}
+@media (min-width: 56em) {
+  .container {
+    max-width: 38rem;
+  }
+}
+
+
+/*
+ * Masthead
+ *
+ * Super small header above the content for site name and short description.
+ */
+
+.masthead {
+  padding-top:    1rem;
+  padding-bottom: 1rem;
+  margin-bottom: 3rem;
+  border-bottom: 1px solid #eee;
+}
+.masthead-title {
+  margin-top: 0;
+  margin-bottom: 0;
+  color: #505050;
+}
+.masthead-title a {
+  color: #505050;
+}
+.masthead-title small {
+  font-size: 75%;
+  font-weight: 400;
+  color: #c0c0c0;
+  letter-spacing: 0;
+}
+
+@media (max-width: 48em) {
+  .masthead-title {
+    text-align: center;
+  }
+  .masthead-title small {
+    display: none;
+  }
+}
+
+
+/*
+ * Sidebar
+ *
+ * The sidebar is the drawer, the item we are toggling with our handy hamburger
+ * button in the corner of the page.
+ *
+ * This particular sidebar implementation was inspired by Chris Coyier's
+ * "Offcanvas Menu with CSS Target" article, and the checkbox variation from the
+ * comments by a reader. It modifies both implementations to continue using the
+ * checkbox (no change in URL means no polluted browser history), but this uses
+ * `position` for the menu to avoid some potential content reflow issues.
+ *
+ * Source: http://css-tricks.com/off-canvas-menu-with-css-target/#comment-207504
+ */
+
+/* Style and "hide" the sidebar */
+.sidebar {
+  position: fixed;
+  top: 0;
+  bottom: 0;
+  left: -14rem;
+  width: 14rem;
+  visibility: hidden;
+  overflow-y: auto;
+  font-family: "PT Sans", Helvetica, Arial, sans-serif;
+  font-size: .875rem; /* 15px */
+  color: rgba(255,255,255,.6);
+  background-color: #202020;
+  -webkit-transition: all .3s ease-in-out;
+          transition: all .3s ease-in-out;
+}
+@media (min-width: 30em) {
+  .sidebar {
+    font-size: .75rem; /* 14px */
+  }
+}
+
+/* Sidebar content */
+.sidebar a {
+  font-weight: normal;
+  color: #fff;
+}
+.sidebar-item {
+  padding: 1rem;
+}
+.sidebar-item p:last-child {
+  margin-bottom: 0;
+}
+
+/* Sidebar nav */
+.sidebar-nav {
+  border-bottom: 1px solid rgba(255,255,255,.1);
+}
+.sidebar-nav-item {
+  display: block;
+  padding: .5rem 1rem;
+  border-top: 1px solid rgba(255,255,255,.1);
+}
+.sidebar-nav-item.active,
+a.sidebar-nav-item:hover,
+a.sidebar-nav-item:focus {
+  text-decoration: none;
+  background-color: rgba(255,255,255,.1);
+  border-color: transparent;
+}
+
+@media (min-width: 48em) {
+  .sidebar-item {
+    padding: 1.5rem;
+  }
+  .sidebar-nav-item {
+    padding-left:  1.5rem;
+    padding-right: 1.5rem;
+  }
+}
+
+/* Hide the sidebar checkbox that we toggle with `.sidebar-toggle` */
+.sidebar-checkbox {
+  position: absolute;
+  opacity: 0;
+  -webkit-user-select: none;
+     -moz-user-select: none;
+          user-select: none;
+}
+
+/* Style the `label` that we use to target the `.sidebar-checkbox` */
+.sidebar-toggle {
+  position: absolute;
+  top:  .8rem;
+  left: 1rem;
+  display: block;
+  padding: .25rem .75rem;
+  color: #505050;
+  background-color: #fff;
+  border-radius: .25rem;
+  cursor: pointer;
+}
+
+.sidebar-toggle:before {
+  display: inline-block;
+  width: 1rem;
+  height: .75rem;
+  content: "";
+  background-image: -webkit-linear-gradient(to bottom, #555, #555 20%, #fff 20%, #fff 40%, #555 40%, #555 60%, #fff 60%, #fff 80%, #555 80%, #555 100%);
+  background-image:    -moz-linear-gradient(to bottom, #555, #555 20%, #fff 20%, #fff 40%, #555 40%, #555 60%, #fff 60%, #fff 80%, #555 80%, #555 100%);
+  background-image:     -ms-linear-gradient(to bottom, #555, #555 20%, #fff 20%, #fff 40%, #555 40%, #555 60%, #fff 60%, #fff 80%, #555 80%, #555 100%);
+  background-image:         linear-gradient(to bottom, #555, #555 20%, #fff 20%, #fff 40%, #555 40%, #555 60%, #fff 60%, #fff 80%, #555 80%, #555 100%);
+}
+
+.sidebar-toggle:active,
+#sidebar-checkbox:focus ~ .sidebar-toggle,
+#sidebar-checkbox:checked ~ .sidebar-toggle {
+  color: #fff;
+  background-color: #555;
+}
+
+.sidebar-toggle:active:before,
+#sidebar-checkbox:focus ~ .sidebar-toggle:before,
+#sidebar-checkbox:checked ~ .sidebar-toggle:before {
+  background-image: -webkit-linear-gradient(to bottom, #fff, #fff 20%, #555 20%, #555 40%, #fff 40%, #fff 60%, #555 60%, #555 80%, #fff 80%, #fff 100%);
+  background-image:    -moz-linear-gradient(to bottom, #fff, #fff 20%, #555 20%, #555 40%, #fff 40%, #fff 60%, #555 60%, #555 80%, #fff 80%, #fff 100%);
+  background-image:     -ms-linear-gradient(to bottom, #fff, #fff 20%, #555 20%, #555 40%, #fff 40%, #fff 60%, #555 60%, #555 80%, #fff 80%, #fff 100%);
+  background-image:         linear-gradient(to bottom, #fff, #fff 20%, #555 20%, #555 40%, #fff 40%, #fff 60%, #555 60%, #555 80%, #fff 80%, #fff 100%);
+}
+
+@media (min-width: 30.1em) {
+  .sidebar-toggle {
+    position: fixed;
+  }
+}
+
+@media print {
+  .sidebar-toggle {
+    display: none;
+  }
+}
+
+/* Slide effect
+ *
+ * Handle the sliding effects of the sidebar and content in one spot, seperate
+ * from the default styles.
+ *
+ * As an a heads up, we don't use `transform: translate3d()` here because when
+ * mixed with `position: fixed;` for the sidebar toggle, it creates a new
+ * containing block. Put simply, the fixed sidebar toggle behaves like
+ * `position: absolute;` when transformed.
+ *
+ * Read more about it at http://meyerweb.com/eric/thoughts/2011/09/12/.
+ */
+
+.wrap,
+.sidebar,
+.sidebar-toggle {
+  -webkit-backface-visibility: hidden;
+      -ms-backface-visibility: hidden;
+          backface-visibility: hidden;
+}
+.wrap,
+.sidebar-toggle {
+  -webkit-transition: -webkit-transform .3s ease-in-out;
+          transition: transform .3s ease-in-out;
+}
+
+#sidebar-checkbox:checked + .sidebar {
+  z-index: 10;
+  visibility: visible;
+}
+#sidebar-checkbox:checked ~ .sidebar,
+#sidebar-checkbox:checked ~ .wrap,
+#sidebar-checkbox:checked ~ .sidebar-toggle {
+  -webkit-transform: translateX(14rem);
+      -ms-transform: translateX(14rem);
+          transform: translateX(14rem);
+}
+
+
+/*
+ * Posts and pages
+ *
+ * Each post is wrapped in `.post` and is used on default and post layouts. Each
+ * page is wrapped in `.page` and is only used on the page layout.
+ */
+
+.page,
+.post {
+  margin-bottom: 4em;
+}
+
+/* Blog post or page title */
+.page-title,
+.post-title,
+.post-title a {
+  color: #303030;
+}
+.page-title,
+.post-title {
+  margin-top: 0;
+}
+
+/* Meta data line below post title */
+.post-date {
+  display: block;
+  margin-top: -.5rem;
+  margin-bottom: 1rem;
+  color: #9a9a9a;
+}
+
+/* Related posts */
+.related {
+  padding-top: 2rem;
+  padding-bottom: 2rem;
+  border-top: 1px solid #eee;
+}
+.related-posts {
+  padding-left: 0;
+  list-style: none;
+}
+.related-posts h3 {
+  margin-top: 0;
+}
+.related-posts li small {
+  font-size: 75%;
+  color: #999;
+}
+.related-posts li a:hover {
+  color: #268bd2;
+  text-decoration: none;
+}
+.related-posts li a:hover small {
+  color: inherit;
+}
+
+
+/*
+ * Pagination
+ *
+ * Super lightweight (HTML-wise) blog pagination. `span`s are provide for when
+ * there are no more previous or next posts to show.
+ */
+
+.pagination {
+  overflow: hidden; /* clearfix */
+  margin-left: -1rem;
+  margin-right: -1rem;
+  font-family: "PT Sans", Helvetica, Arial, sans-serif;
+  color: #ccc;
+  text-align: center;
+}
+
+/* Pagination items can be `span`s or `a`s */
+.pagination-item {
+  display: block;
+  padding: 1rem;
+  border: 1px solid #eee;
+}
+.pagination-item:first-child {
+  margin-bottom: -1px;
+}
+
+/* Only provide a hover state for linked pagination items */
+a.pagination-item:hover {
+  background-color: #f5f5f5;
+}
+
+@media (min-width: 30em) {
+  .pagination {
+    margin: 3rem 0;
+  }
+  .pagination-item {
+    float: left;
+    width: 50%;
+  }
+  .pagination-item:first-child {
+    margin-bottom: 0;
+    border-top-left-radius:    4px;
+    border-bottom-left-radius: 4px;
+  }
+  .pagination-item:last-child {
+    margin-left: -1px;
+    border-top-right-radius:    4px;
+    border-bottom-right-radius: 4px;
+  }
+}
+
+
+/*
+ * Reverse layout
+ *
+ * Flip the orientation of the page by placing the `.sidebar` and sidebar toggle
+ * on the right side.
+ */
+
+.layout-reverse .sidebar {
+  left: auto;
+  right: -14rem;
+}
+.layout-reverse .sidebar-toggle {
+  left: auto;
+  right: 1rem;
+}
+
+.layout-reverse #sidebar-checkbox:checked ~ .sidebar,
+.layout-reverse #sidebar-checkbox:checked ~ .wrap,
+.layout-reverse #sidebar-checkbox:checked ~ .sidebar-toggle {
+  -webkit-transform: translateX(-14rem);
+      -ms-transform: translateX(-14rem);
+          transform: translateX(-14rem);
+}
+
+
+/*
+ * Themes
+ *
+ * Apply custom color schemes by adding the appropriate class to the `body`.
+ * Based on colors from Base16: http://chriskempson.github.io/base16/#default.
+ */
+
+/* Red */
+.theme-base-08 .sidebar,
+.theme-base-08 .sidebar-toggle:active,
+.theme-base-08 #sidebar-checkbox:checked ~ .sidebar-toggle {
+  background-color: #ac4142;
+}
+.theme-base-08 .container a,
+.theme-base-08 .sidebar-toggle,
+.theme-base-08 .related-posts li a:hover {
+  color: #ac4142;
+}
+
+/* Orange */
+.theme-base-09 .sidebar,
+.theme-base-09 .sidebar-toggle:active,
+.theme-base-09 #sidebar-checkbox:checked ~ .sidebar-toggle {
+  background-color: #d28445;
+}
+.theme-base-09 .container a,
+.theme-base-09 .sidebar-toggle,
+.theme-base-09 .related-posts li a:hover {
+  color: #d28445;
+}
+
+/* Yellow */
+.theme-base-0a .sidebar,
+.theme-base-0a .sidebar-toggle:active,
+.theme-base-0a #sidebar-checkbox:checked ~ .sidebar-toggle {
+  background-color: #f4bf75;
+}
+.theme-base-0a .container a,
+.theme-base-0a .sidebar-toggle,
+.theme-base-0a .related-posts li a:hover {
+  color: #f4bf75;
+}
+
+/* Green */
+.theme-base-0b .sidebar,
+.theme-base-0b .sidebar-toggle:active,
+.theme-base-0b #sidebar-checkbox:checked ~ .sidebar-toggle {
+  background-color: #90a959;
+}
+.theme-base-0b .container a,
+.theme-base-0b .sidebar-toggle,
+.theme-base-0b .related-posts li a:hover {
+  color: #90a959;
+}
+
+/* Cyan */
+.theme-base-0c .sidebar,
+.theme-base-0c .sidebar-toggle:active,
+.theme-base-0c #sidebar-checkbox:checked ~ .sidebar-toggle {
+  background-color: #75b5aa;
+}
+.theme-base-0c .container a,
+.theme-base-0c .sidebar-toggle,
+.theme-base-0c .related-posts li a:hover {
+  color: #75b5aa;
+}
+
+/* Blue */
+.theme-base-0d .sidebar,
+.theme-base-0d .sidebar-toggle:active,
+.theme-base-0d #sidebar-checkbox:checked ~ .sidebar-toggle {
+  background-color: #6a9fb5;
+}
+.theme-base-0d .container a,
+.theme-base-0d .sidebar-toggle,
+.theme-base-0d .related-posts li a:hover {
+  color: #6a9fb5;
+}
+
+/* Magenta */
+.theme-base-0e .sidebar,
+.theme-base-0e .sidebar-toggle:active,
+.theme-base-0e #sidebar-checkbox:checked ~ .sidebar-toggle {
+  background-color: #aa759f;
+}
+.theme-base-0e .container a,
+.theme-base-0e .sidebar-toggle,
+.theme-base-0e .related-posts li a:hover {
+  color: #aa759f;
+}
+
+/* Brown */
+.theme-base-0f .sidebar,
+.theme-base-0f .sidebar-toggle:active,
+.theme-base-0f #sidebar-checkbox:checked ~ .sidebar-toggle {
+  background-color: #8f5536;
+}
+.theme-base-0f .container a,
+.theme-base-0f .sidebar-toggle,
+.theme-base-0f .related-posts li a:hover {
+  color: #8f5536;
+}
+
+
+/*
+ * Overlay sidebar
+ *
+ * Make the sidebar content overlay the viewport content instead of pushing it
+ * aside when toggled.
+ */
+
+.sidebar-overlay #sidebar-checkbox:checked ~ .wrap {
+  -webkit-transform: translateX(0);
+      -ms-transform: translateX(0);
+          transform: translateX(0);
+}
+.sidebar-overlay #sidebar-checkbox:checked ~ .sidebar-toggle {
+  box-shadow: 0 0 0 .25rem #fff;
+}
+.sidebar-overlay #sidebar-checkbox:checked ~ .sidebar {
+  box-shadow: .25rem 0 .5rem rgba(0,0,0,.1);
+}
+
+/* Only one tweak for a reverse layout */
+.layout-reverse.sidebar-overlay #sidebar-checkbox:checked ~ .sidebar {
+  box-shadow: -.25rem 0 .5rem rgba(0,0,0,.1);
+}
diff --git a/_site/site/public/css/poole.css b/_site/site/public/css/poole.css
new file mode 100755
index 00000000..8ec27e7a
--- /dev/null
+++ b/_site/site/public/css/poole.css
@@ -0,0 +1,430 @@
+/*
+ *                        ___
+ *                       /\_ \
+ *  _____     ___     ___\//\ \      __
+ * /\ '__`\  / __`\  / __`\\ \ \   /'__`\
+ * \ \ \_\ \/\ \_\ \/\ \_\ \\_\ \_/\  __/
+ *  \ \ ,__/\ \____/\ \____//\____\ \____\
+ *   \ \ \/  \/___/  \/___/ \/____/\/____/
+ *    \ \_\
+ *     \/_/
+ *
+ * Designed, built, and released under MIT license by @mdo. Learn more at
+ * https://github.com/poole/poole.
+ */
+
+
+/*
+ * Contents
+ *
+ * Body resets
+ * Custom type
+ * Messages
+ * Container
+ * Masthead
+ * Posts and pages
+ * Pagination
+ * Reverse layout
+ * Themes
+ */
+
+
+/*
+ * Body resets
+ *
+ * Update the foundational and global aspects of the page.
+ */
+
+* {
+  -webkit-box-sizing: border-box;
+     -moz-box-sizing: border-box;
+          box-sizing: border-box;
+}
+
+html,
+body {
+  margin: 0;
+  padding: 0;
+}
+
+html {
+  font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
+  font-size: 16px;
+  line-height: 1.5;
+}
+@media (min-width: 38em) {
+  html {
+    font-size: 20px;
+  }
+}
+
+body {
+  color: #515151;
+  background-color: #fff;
+  -webkit-text-size-adjust: 100%;
+      -ms-text-size-adjust: 100%;
+}
+
+/* No `:visited` state is required by default (browsers will use `a`) */
+a {
+  color: #268bd2;
+  text-decoration: none;
+}
+a strong {
+  color: inherit;
+}
+/* `:focus` is linked to `:hover` for basic accessibility */
+a:hover,
+a:focus {
+  text-decoration: underline;
+}
+
+/* Headings */
+h1, h2, h3, h4, h5, h6 {
+  margin-bottom: .5rem;
+  font-weight: bold;
+  line-height: 1.25;
+  color: #313131;
+  text-rendering: optimizeLegibility;
+}
+h1 {
+  font-size: 2rem;
+}
+h2 {
+  margin-top: 1rem;
+  font-size: 1.5rem;
+}
+h3 {
+  margin-top: 1.5rem;
+  font-size: 1.25rem;
+}
+h4, h5, h6 {
+  margin-top: 1rem;
+  font-size: 1rem;
+}
+
+/* Body text */
+p {
+  margin-top: 0;
+  margin-bottom: 1rem;
+}
+
+strong {
+  color: #303030;
+}
+
+
+/* Lists */
+ul, ol, dl {
+  margin-top: 0;
+  margin-bottom: 1rem;
+}
+
+dt {
+  font-weight: bold;
+}
+dd {
+  margin-bottom: .5rem;
+}
+
+/* Misc */
+hr {
+  position: relative;
+  margin: 1.5rem 0;
+  border: 0;
+  border-top: 1px solid #eee;
+  border-bottom: 1px solid #fff;
+}
+
+abbr {
+  font-size: 85%;
+  font-weight: bold;
+  color: #555;
+  text-transform: uppercase;
+}
+abbr[title] {
+  cursor: help;
+  border-bottom: 1px dotted #e5e5e5;
+}
+
+/* Code */
+code,
+pre {
+  font-family: Menlo, Monaco, "Courier New", monospace;
+}
+code {
+  padding: .25em .5em;
+  font-size: 85%;
+  color: #bf616a;
+  background-color: #f9f9f9;
+  border-radius: 3px;
+}
+pre {
+  display: block;
+  margin-top: 0;
+  margin-bottom: 1rem;
+  padding: 1rem;
+  font-size: .8rem;
+  line-height: 1.4;
+  white-space: pre;
+  white-space: pre-wrap;
+  word-break: break-all;
+  word-wrap: break-word;
+  background-color: #f9f9f9;
+}
+pre code {
+  padding: 0;
+  font-size: 100%;
+  color: inherit;
+  background-color: transparent;
+}
+
+/* Pygments via Jekyll */
+.highlight {
+  margin-bottom: 1rem;
+  border-radius: 4px;
+}
+.highlight pre {
+  margin-bottom: 0;
+}
+
+/* Gist via GitHub Pages */
+.gist .gist-file {
+  font-family: Menlo, Monaco, "Courier New", monospace !important;
+}
+.gist .markdown-body {
+  padding: 15px;
+}
+.gist pre {
+  padding: 0;
+  background-color: transparent;
+}
+.gist .gist-file .gist-data {
+  font-size: .8rem !important;
+  line-height: 1.4;
+}
+.gist code {
+  padding: 0;
+  color: inherit;
+  background-color: transparent;
+  border-radius: 0;
+}
+
+/* Quotes */
+blockquote {
+  padding: .5rem 1rem;
+  margin: .8rem 0;
+  color: #7a7a7a;
+  border-left: .25rem solid #e5e5e5;
+}
+blockquote p:last-child {
+  margin-bottom: 0;
+}
+@media (min-width: 30em) {
+  blockquote {
+    padding-right: 5rem;
+    padding-left: 1.25rem;
+  }
+}
+
+img {
+  display: block;
+  max-width: 100%;
+  margin: 0 0 1rem;
+  border-radius: 5px;
+}
+
+/* Tables */
+table {
+  margin-bottom: 1rem;
+  width: 100%;
+  border: 1px solid #e5e5e5;
+  border-collapse: collapse;
+}
+td,
+th {
+  padding: .25rem .5rem;
+  border: 1px solid #e5e5e5;
+}
+tbody tr:nth-child(odd) td,
+tbody tr:nth-child(odd) th {
+  background-color: #f9f9f9;
+}
+
+
+/*
+ * Custom type
+ *
+ * Extend paragraphs with `.lead` for larger introductory text.
+ */
+
+.lead {
+  font-size: 1.25rem;
+  font-weight: 300;
+}
+
+
+/*
+ * Messages
+ *
+ * Show alert messages to users. You may add it to single elements like a `<p>`,
+ * or to a parent if there are multiple elements to show.
+ */
+
+.message {
+  margin-bottom: 1rem;
+  padding: 1rem;
+  color: #717171;
+  background-color: #f9f9f9;
+}
+
+
+/*
+ * Container
+ *
+ * Center the page content.
+ */
+
+.container {
+  max-width: 38rem;
+  padding-left:  1rem;
+  padding-right: 1rem;
+  margin-left:  auto;
+  margin-right: auto;
+}
+
+
+/*
+ * Masthead
+ *
+ * Super small header above the content for site name and short description.
+ */
+
+.masthead {
+  padding-top:    1rem;
+  padding-bottom: 1rem;
+  margin-bottom: 3rem;
+}
+.masthead-title {
+  margin-top: 0;
+  margin-bottom: 0;
+  color: #505050;
+}
+.masthead-title a {
+  color: #505050;
+}
+.masthead-title small {
+  font-size: 75%;
+  font-weight: 400;
+  color: #c0c0c0;
+  letter-spacing: 0;
+}
+
+
+/*
+ * Posts and pages
+ *
+ * Each post is wrapped in `.post` and is used on default and post layouts. Each
+ * page is wrapped in `.page` and is only used on the page layout.
+ */
+
+.page,
+.post {
+  margin-bottom: 4em;
+}
+
+/* Blog post or page title */
+.page-title,
+.post-title,
+.post-title a {
+  color: #303030;
+}
+.page-title,
+.post-title {
+  margin-top: 0;
+}
+
+/* Meta data line below post title */
+.post-date {
+  display: block;
+  margin-top: -.5rem;
+  margin-bottom: 1rem;
+  color: #9a9a9a;
+}
+
+/* Related posts */
+.related {
+  padding-top: 2rem;
+  padding-bottom: 2rem;
+  border-top: 1px solid #eee;
+}
+.related-posts {
+  padding-left: 0;
+  list-style: none;
+}
+.related-posts h3 {
+  margin-top: 0;
+}
+.related-posts li small {
+  font-size: 75%;
+  color: #999;
+}
+.related-posts li a:hover {
+  color: #268bd2;
+  text-decoration: none;
+}
+.related-posts li a:hover small {
+  color: inherit;
+}
+
+
+/*
+ * Pagination
+ *
+ * Super lightweight (HTML-wise) blog pagination. `span`s are provide for when
+ * there are no more previous or next posts to show.
+ */
+
+.pagination {
+  overflow: hidden; /* clearfix */
+  margin-left: -1rem;
+  margin-right: -1rem;
+  font-family: "PT Sans", Helvetica, Arial, sans-serif;
+  color: #ccc;
+  text-align: center;
+}
+
+/* Pagination items can be `span`s or `a`s */
+.pagination-item {
+  display: block;
+  padding: 1rem;
+  border: 1px solid #eee;
+}
+.pagination-item:first-child {
+  margin-bottom: -1px;
+}
+
+/* Only provide a hover state for linked pagination items */
+a.pagination-item:hover {
+  background-color: #f5f5f5;
+}
+
+@media (min-width: 30em) {
+  .pagination {
+    margin: 3rem 0;
+  }
+  .pagination-item {
+    float: left;
+    width: 50%;
+  }
+  .pagination-item:first-child {
+    margin-bottom: 0;
+    border-top-left-radius:    4px;
+    border-bottom-left-radius: 4px;
+  }
+  .pagination-item:last-child {
+    margin-left: -1px;
+    border-top-right-radius:    4px;
+    border-bottom-right-radius: 4px;
+  }
+}
diff --git a/_site/site/public/css/style.css b/_site/site/public/css/style.css
new file mode 100755
index 00000000..8013c531
--- /dev/null
+++ b/_site/site/public/css/style.css
@@ -0,0 +1,58 @@
+.tag-box {
+  list-style: none;
+  margin: 0;
+  padding: 4px 0;
+  overflow: hidden;
+  *zoom: 1;
+}
+
+.tag-box:before, .tag-box:after {
+  display: table;
+  content: "";
+  line-height: 0;
+}
+
+.tag-box:after {
+  clear: both;
+}
+
+.tag-box.inline li {
+  float: left;
+  font-size: 14px;
+  font-size: 0.875rem;
+  line-height: 2.5;
+}
+
+.tag-box a {
+  padding: 4px 6px;
+  margin: 2px;
+  background-color: #e6e6e6;
+  -webkit-border-radius: 4px;
+  -moz-border-radius: 4px;
+  border-radius: 4px;
+  text-decoration: none;
+}
+
+.tag-box a span {
+  vertical-align: super;
+  font-size: 10px;
+  font-size: 0.625rem;
+}
+
+.sidebar .social-icons a {
+  color: rgba(255, 255, 255, 0.6);
+  padding-right: 0.75em;
+}
+
+.sidebar .social-icons a:hover {
+  text-decoration: none;
+}
+
+  .page .social-icons {
+  text-align: center;
+}
+
+.page .social-icons a {
+  color: #515151;
+  padding: 10px;
+}
\ No newline at end of file
diff --git a/_site/site/public/css/syntax.css b/_site/site/public/css/syntax.css
new file mode 100755
index 00000000..15ad7977
--- /dev/null
+++ b/_site/site/public/css/syntax.css
@@ -0,0 +1,65 @@
+.highlight .hll { background-color: #ffc; }
+.highlight .c { color: #999; } /* Comment */
+.highlight .err { color: #a00; background-color: #faa } /* Error */
+.highlight .k { color: #069; } /* Keyword */
+.highlight .o { color: #555 } /* Operator */
+.highlight .cm { color: #09f; font-style: italic } /* Comment.Multiline */
+.highlight .cp { color: #099 } /* Comment.Preproc */
+.highlight .c1 { color: #999; } /* Comment.Single */
+.highlight .cs { color: #999; } /* Comment.Special */
+.highlight .gd { background-color: #fcc; border: 1px solid #c00 } /* Generic.Deleted */
+.highlight .ge { font-style: italic } /* Generic.Emph */
+.highlight .gr { color: #f00 } /* Generic.Error */
+.highlight .gh { color: #030; } /* Generic.Heading */
+.highlight .gi { background-color: #cfc; border: 1px solid #0c0 } /* Generic.Inserted */
+.highlight .go { color: #aaa } /* Generic.Output */
+.highlight .gp { color: #009; } /* Generic.Prompt */
+.highlight .gs { } /* Generic.Strong */
+.highlight .gu { color: #030; } /* Generic.Subheading */
+.highlight .gt { color: #9c6 } /* Generic.Traceback */
+.highlight .kc { color: #069; } /* Keyword.Constant */
+.highlight .kd { color: #069; } /* Keyword.Declaration */
+.highlight .kn { color: #069; } /* Keyword.Namespace */
+.highlight .kp { color: #069 } /* Keyword.Pseudo */
+.highlight .kr { color: #069; } /* Keyword.Reserved */
+.highlight .kt { color: #078; } /* Keyword.Type */
+.highlight .m { color: #f60 } /* Literal.Number */
+.highlight .s { color: #d44950 } /* Literal.String */
+.highlight .na { color: #4f9fcf } /* Name.Attribute */
+.highlight .nb { color: #366 } /* Name.Builtin */
+.highlight .nc { color: #0a8; } /* Name.Class */
+.highlight .no { color: #360 } /* Name.Constant */
+.highlight .nd { color: #99f } /* Name.Decorator */
+.highlight .ni { color: #999; } /* Name.Entity */
+.highlight .ne { color: #c00; } /* Name.Exception */
+.highlight .nf { color: #c0f } /* Name.Function */
+.highlight .nl { color: #99f } /* Name.Label */
+.highlight .nn { color: #0cf; } /* Name.Namespace */
+.highlight .nt { color: #2f6f9f; } /* Name.Tag */
+.highlight .nv { color: #033 } /* Name.Variable */
+.highlight .ow { color: #000; } /* Operator.Word */
+.highlight .w { color: #bbb } /* Text.Whitespace */
+.highlight .mf { color: #f60 } /* Literal.Number.Float */
+.highlight .mh { color: #f60 } /* Literal.Number.Hex */
+.highlight .mi { color: #f60 } /* Literal.Number.Integer */
+.highlight .mo { color: #f60 } /* Literal.Number.Oct */
+.highlight .sb { color: #c30 } /* Literal.String.Backtick */
+.highlight .sc { color: #c30 } /* Literal.String.Char */
+.highlight .sd { color: #c30; font-style: italic } /* Literal.String.Doc */
+.highlight .s2 { color: #c30 } /* Literal.String.Double */
+.highlight .se { color: #c30; } /* Literal.String.Escape */
+.highlight .sh { color: #c30 } /* Literal.String.Heredoc */
+.highlight .si { color: #a00 } /* Literal.String.Interpol */
+.highlight .sx { color: #c30 } /* Literal.String.Other */
+.highlight .sr { color: #3aa } /* Literal.String.Regex */
+.highlight .s1 { color: #c30 } /* Literal.String.Single */
+.highlight .ss { color: #fc3 } /* Literal.String.Symbol */
+.highlight .bp { color: #366 } /* Name.Builtin.Pseudo */
+.highlight .vc { color: #033 } /* Name.Variable.Class */
+.highlight .vg { color: #033 } /* Name.Variable.Global */
+.highlight .vi { color: #033 } /* Name.Variable.Instance */
+.highlight .il { color: #f60 } /* Literal.Number.Integer.Long */
+
+.css .o,
+.css .o + .nt,
+.css .nt + .nt { color: #999; }
diff --git a/_site/site/public/favicon.ico b/_site/site/public/favicon.ico
new file mode 100755
index 00000000..9aa5f194
Binary files /dev/null and b/_site/site/public/favicon.ico differ
diff --git a/_site/site/public/font-awesome-4.7.0/HELP-US-OUT.txt b/_site/site/public/font-awesome-4.7.0/HELP-US-OUT.txt
new file mode 100755
index 00000000..83d083dd
--- /dev/null
+++ b/_site/site/public/font-awesome-4.7.0/HELP-US-OUT.txt
@@ -0,0 +1,7 @@
+I hope you love Font Awesome. If you've found it useful, please do me a favor and check out my latest project,
+Fort Awesome (https://fortawesome.com). It makes it easy to put the perfect icons on your website. Choose from our awesome,
+comprehensive icon sets or copy and paste your own.
+
+Please. Check it out.
+
+-Dave Gandy
diff --git a/_site/site/public/font-awesome-4.7.0/css/font-awesome.css b/_site/site/public/font-awesome-4.7.0/css/font-awesome.css
new file mode 100755
index 00000000..ee906a81
--- /dev/null
+++ b/_site/site/public/font-awesome-4.7.0/css/font-awesome.css
@@ -0,0 +1,2337 @@
+/*!
+ *  Font Awesome 4.7.0 by @davegandy - http://fontawesome.io - @fontawesome
+ *  License - http://fontawesome.io/license (Font: SIL OFL 1.1, CSS: MIT License)
+ */
+/* FONT PATH
+ * -------------------------- */
+@font-face {
+  font-family: 'FontAwesome';
+  src: url('../fonts/fontawesome-webfont.eot?v=4.7.0');
+  src: url('../fonts/fontawesome-webfont.eot?#iefix&v=4.7.0') format('embedded-opentype'), url('../fonts/fontawesome-webfont.woff2?v=4.7.0') format('woff2'), url('../fonts/fontawesome-webfont.woff?v=4.7.0') format('woff'), url('../fonts/fontawesome-webfont.ttf?v=4.7.0') format('truetype'), url('../fonts/fontawesome-webfont.svg?v=4.7.0#fontawesomeregular') format('svg');
+  font-weight: normal;
+  font-style: normal;
+}
+.fa {
+  display: inline-block;
+  font: normal normal normal 14px/1 FontAwesome;
+  font-size: inherit;
+  text-rendering: auto;
+  -webkit-font-smoothing: antialiased;
+  -moz-osx-font-smoothing: grayscale;
+}
+/* makes the font 33% larger relative to the icon container */
+.fa-lg {
+  font-size: 1.33333333em;
+  line-height: 0.75em;
+  vertical-align: -15%;
+}
+.fa-2x {
+  font-size: 2em;
+}
+.fa-3x {
+  font-size: 3em;
+}
+.fa-4x {
+  font-size: 4em;
+}
+.fa-5x {
+  font-size: 5em;
+}
+.fa-fw {
+  width: 1.28571429em;
+  text-align: center;
+}
+.fa-ul {
+  padding-left: 0;
+  margin-left: 2.14285714em;
+  list-style-type: none;
+}
+.fa-ul > li {
+  position: relative;
+}
+.fa-li {
+  position: absolute;
+  left: -2.14285714em;
+  width: 2.14285714em;
+  top: 0.14285714em;
+  text-align: center;
+}
+.fa-li.fa-lg {
+  left: -1.85714286em;
+}
+.fa-border {
+  padding: .2em .25em .15em;
+  border: solid 0.08em #eeeeee;
+  border-radius: .1em;
+}
+.fa-pull-left {
+  float: left;
+}
+.fa-pull-right {
+  float: right;
+}
+.fa.fa-pull-left {
+  margin-right: .3em;
+}
+.fa.fa-pull-right {
+  margin-left: .3em;
+}
+/* Deprecated as of 4.4.0 */
+.pull-right {
+  float: right;
+}
+.pull-left {
+  float: left;
+}
+.fa.pull-left {
+  margin-right: .3em;
+}
+.fa.pull-right {
+  margin-left: .3em;
+}
+.fa-spin {
+  -webkit-animation: fa-spin 2s infinite linear;
+  animation: fa-spin 2s infinite linear;
+}
+.fa-pulse {
+  -webkit-animation: fa-spin 1s infinite steps(8);
+  animation: fa-spin 1s infinite steps(8);
+}
+@-webkit-keyframes fa-spin {
+  0% {
+    -webkit-transform: rotate(0deg);
+    transform: rotate(0deg);
+  }
+  100% {
+    -webkit-transform: rotate(359deg);
+    transform: rotate(359deg);
+  }
+}
+@keyframes fa-spin {
+  0% {
+    -webkit-transform: rotate(0deg);
+    transform: rotate(0deg);
+  }
+  100% {
+    -webkit-transform: rotate(359deg);
+    transform: rotate(359deg);
+  }
+}
+.fa-rotate-90 {
+  -ms-filter: "progid:DXImageTransform.Microsoft.BasicImage(rotation=1)";
+  -webkit-transform: rotate(90deg);
+  -ms-transform: rotate(90deg);
+  transform: rotate(90deg);
+}
+.fa-rotate-180 {
+  -ms-filter: "progid:DXImageTransform.Microsoft.BasicImage(rotation=2)";
+  -webkit-transform: rotate(180deg);
+  -ms-transform: rotate(180deg);
+  transform: rotate(180deg);
+}
+.fa-rotate-270 {
+  -ms-filter: "progid:DXImageTransform.Microsoft.BasicImage(rotation=3)";
+  -webkit-transform: rotate(270deg);
+  -ms-transform: rotate(270deg);
+  transform: rotate(270deg);
+}
+.fa-flip-horizontal {
+  -ms-filter: "progid:DXImageTransform.Microsoft.BasicImage(rotation=0, mirror=1)";
+  -webkit-transform: scale(-1, 1);
+  -ms-transform: scale(-1, 1);
+  transform: scale(-1, 1);
+}
+.fa-flip-vertical {
+  -ms-filter: "progid:DXImageTransform.Microsoft.BasicImage(rotation=2, mirror=1)";
+  -webkit-transform: scale(1, -1);
+  -ms-transform: scale(1, -1);
+  transform: scale(1, -1);
+}
+:root .fa-rotate-90,
+:root .fa-rotate-180,
+:root .fa-rotate-270,
+:root .fa-flip-horizontal,
+:root .fa-flip-vertical {
+  filter: none;
+}
+.fa-stack {
+  position: relative;
+  display: inline-block;
+  width: 2em;
+  height: 2em;
+  line-height: 2em;
+  vertical-align: middle;
+}
+.fa-stack-1x,
+.fa-stack-2x {
+  position: absolute;
+  left: 0;
+  width: 100%;
+  text-align: center;
+}
+.fa-stack-1x {
+  line-height: inherit;
+}
+.fa-stack-2x {
+  font-size: 2em;
+}
+.fa-inverse {
+  color: #ffffff;
+}
+/* Font Awesome uses the Unicode Private Use Area (PUA) to ensure screen
+   readers do not read off random characters that represent icons */
+.fa-glass:before {
+  content: "\f000";
+}
+.fa-music:before {
+  content: "\f001";
+}
+.fa-search:before {
+  content: "\f002";
+}
+.fa-envelope-o:before {
+  content: "\f003";
+}
+.fa-heart:before {
+  content: "\f004";
+}
+.fa-star:before {
+  content: "\f005";
+}
+.fa-star-o:before {
+  content: "\f006";
+}
+.fa-user:before {
+  content: "\f007";
+}
+.fa-film:before {
+  content: "\f008";
+}
+.fa-th-large:before {
+  content: "\f009";
+}
+.fa-th:before {
+  content: "\f00a";
+}
+.fa-th-list:before {
+  content: "\f00b";
+}
+.fa-check:before {
+  content: "\f00c";
+}
+.fa-remove:before,
+.fa-close:before,
+.fa-times:before {
+  content: "\f00d";
+}
+.fa-search-plus:before {
+  content: "\f00e";
+}
+.fa-search-minus:before {
+  content: "\f010";
+}
+.fa-power-off:before {
+  content: "\f011";
+}
+.fa-signal:before {
+  content: "\f012";
+}
+.fa-gear:before,
+.fa-cog:before {
+  content: "\f013";
+}
+.fa-trash-o:before {
+  content: "\f014";
+}
+.fa-home:before {
+  content: "\f015";
+}
+.fa-file-o:before {
+  content: "\f016";
+}
+.fa-clock-o:before {
+  content: "\f017";
+}
+.fa-road:before {
+  content: "\f018";
+}
+.fa-download:before {
+  content: "\f019";
+}
+.fa-arrow-circle-o-down:before {
+  content: "\f01a";
+}
+.fa-arrow-circle-o-up:before {
+  content: "\f01b";
+}
+.fa-inbox:before {
+  content: "\f01c";
+}
+.fa-play-circle-o:before {
+  content: "\f01d";
+}
+.fa-rotate-right:before,
+.fa-repeat:before {
+  content: "\f01e";
+}
+.fa-refresh:before {
+  content: "\f021";
+}
+.fa-list-alt:before {
+  content: "\f022";
+}
+.fa-lock:before {
+  content: "\f023";
+}
+.fa-flag:before {
+  content: "\f024";
+}
+.fa-headphones:before {
+  content: "\f025";
+}
+.fa-volume-off:before {
+  content: "\f026";
+}
+.fa-volume-down:before {
+  content: "\f027";
+}
+.fa-volume-up:before {
+  content: "\f028";
+}
+.fa-qrcode:before {
+  content: "\f029";
+}
+.fa-barcode:before {
+  content: "\f02a";
+}
+.fa-tag:before {
+  content: "\f02b";
+}
+.fa-tags:before {
+  content: "\f02c";
+}
+.fa-book:before {
+  content: "\f02d";
+}
+.fa-bookmark:before {
+  content: "\f02e";
+}
+.fa-print:before {
+  content: "\f02f";
+}
+.fa-camera:before {
+  content: "\f030";
+}
+.fa-font:before {
+  content: "\f031";
+}
+.fa-bold:before {
+  content: "\f032";
+}
+.fa-italic:before {
+  content: "\f033";
+}
+.fa-text-height:before {
+  content: "\f034";
+}
+.fa-text-width:before {
+  content: "\f035";
+}
+.fa-align-left:before {
+  content: "\f036";
+}
+.fa-align-center:before {
+  content: "\f037";
+}
+.fa-align-right:before {
+  content: "\f038";
+}
+.fa-align-justify:before {
+  content: "\f039";
+}
+.fa-list:before {
+  content: "\f03a";
+}
+.fa-dedent:before,
+.fa-outdent:before {
+  content: "\f03b";
+}
+.fa-indent:before {
+  content: "\f03c";
+}
+.fa-video-camera:before {
+  content: "\f03d";
+}
+.fa-photo:before,
+.fa-image:before,
+.fa-picture-o:before {
+  content: "\f03e";
+}
+.fa-pencil:before {
+  content: "\f040";
+}
+.fa-map-marker:before {
+  content: "\f041";
+}
+.fa-adjust:before {
+  content: "\f042";
+}
+.fa-tint:before {
+  content: "\f043";
+}
+.fa-edit:before,
+.fa-pencil-square-o:before {
+  content: "\f044";
+}
+.fa-share-square-o:before {
+  content: "\f045";
+}
+.fa-check-square-o:before {
+  content: "\f046";
+}
+.fa-arrows:before {
+  content: "\f047";
+}
+.fa-step-backward:before {
+  content: "\f048";
+}
+.fa-fast-backward:before {
+  content: "\f049";
+}
+.fa-backward:before {
+  content: "\f04a";
+}
+.fa-play:before {
+  content: "\f04b";
+}
+.fa-pause:before {
+  content: "\f04c";
+}
+.fa-stop:before {
+  content: "\f04d";
+}
+.fa-forward:before {
+  content: "\f04e";
+}
+.fa-fast-forward:before {
+  content: "\f050";
+}
+.fa-step-forward:before {
+  content: "\f051";
+}
+.fa-eject:before {
+  content: "\f052";
+}
+.fa-chevron-left:before {
+  content: "\f053";
+}
+.fa-chevron-right:before {
+  content: "\f054";
+}
+.fa-plus-circle:before {
+  content: "\f055";
+}
+.fa-minus-circle:before {
+  content: "\f056";
+}
+.fa-times-circle:before {
+  content: "\f057";
+}
+.fa-check-circle:before {
+  content: "\f058";
+}
+.fa-question-circle:before {
+  content: "\f059";
+}
+.fa-info-circle:before {
+  content: "\f05a";
+}
+.fa-crosshairs:before {
+  content: "\f05b";
+}
+.fa-times-circle-o:before {
+  content: "\f05c";
+}
+.fa-check-circle-o:before {
+  content: "\f05d";
+}
+.fa-ban:before {
+  content: "\f05e";
+}
+.fa-arrow-left:before {
+  content: "\f060";
+}
+.fa-arrow-right:before {
+  content: "\f061";
+}
+.fa-arrow-up:before {
+  content: "\f062";
+}
+.fa-arrow-down:before {
+  content: "\f063";
+}
+.fa-mail-forward:before,
+.fa-share:before {
+  content: "\f064";
+}
+.fa-expand:before {
+  content: "\f065";
+}
+.fa-compress:before {
+  content: "\f066";
+}
+.fa-plus:before {
+  content: "\f067";
+}
+.fa-minus:before {
+  content: "\f068";
+}
+.fa-asterisk:before {
+  content: "\f069";
+}
+.fa-exclamation-circle:before {
+  content: "\f06a";
+}
+.fa-gift:before {
+  content: "\f06b";
+}
+.fa-leaf:before {
+  content: "\f06c";
+}
+.fa-fire:before {
+  content: "\f06d";
+}
+.fa-eye:before {
+  content: "\f06e";
+}
+.fa-eye-slash:before {
+  content: "\f070";
+}
+.fa-warning:before,
+.fa-exclamation-triangle:before {
+  content: "\f071";
+}
+.fa-plane:before {
+  content: "\f072";
+}
+.fa-calendar:before {
+  content: "\f073";
+}
+.fa-random:before {
+  content: "\f074";
+}
+.fa-comment:before {
+  content: "\f075";
+}
+.fa-magnet:before {
+  content: "\f076";
+}
+.fa-chevron-up:before {
+  content: "\f077";
+}
+.fa-chevron-down:before {
+  content: "\f078";
+}
+.fa-retweet:before {
+  content: "\f079";
+}
+.fa-shopping-cart:before {
+  content: "\f07a";
+}
+.fa-folder:before {
+  content: "\f07b";
+}
+.fa-folder-open:before {
+  content: "\f07c";
+}
+.fa-arrows-v:before {
+  content: "\f07d";
+}
+.fa-arrows-h:before {
+  content: "\f07e";
+}
+.fa-bar-chart-o:before,
+.fa-bar-chart:before {
+  content: "\f080";
+}
+.fa-twitter-square:before {
+  content: "\f081";
+}
+.fa-facebook-square:before {
+  content: "\f082";
+}
+.fa-camera-retro:before {
+  content: "\f083";
+}
+.fa-key:before {
+  content: "\f084";
+}
+.fa-gears:before,
+.fa-cogs:before {
+  content: "\f085";
+}
+.fa-comments:before {
+  content: "\f086";
+}
+.fa-thumbs-o-up:before {
+  content: "\f087";
+}
+.fa-thumbs-o-down:before {
+  content: "\f088";
+}
+.fa-star-half:before {
+  content: "\f089";
+}
+.fa-heart-o:before {
+  content: "\f08a";
+}
+.fa-sign-out:before {
+  content: "\f08b";
+}
+.fa-linkedin-square:before {
+  content: "\f08c";
+}
+.fa-thumb-tack:before {
+  content: "\f08d";
+}
+.fa-external-link:before {
+  content: "\f08e";
+}
+.fa-sign-in:before {
+  content: "\f090";
+}
+.fa-trophy:before {
+  content: "\f091";
+}
+.fa-github-square:before {
+  content: "\f092";
+}
+.fa-upload:before {
+  content: "\f093";
+}
+.fa-lemon-o:before {
+  content: "\f094";
+}
+.fa-phone:before {
+  content: "\f095";
+}
+.fa-square-o:before {
+  content: "\f096";
+}
+.fa-bookmark-o:before {
+  content: "\f097";
+}
+.fa-phone-square:before {
+  content: "\f098";
+}
+.fa-twitter:before {
+  content: "\f099";
+}
+.fa-facebook-f:before,
+.fa-facebook:before {
+  content: "\f09a";
+}
+.fa-github:before {
+  content: "\f09b";
+}
+.fa-unlock:before {
+  content: "\f09c";
+}
+.fa-credit-card:before {
+  content: "\f09d";
+}
+.fa-feed:before,
+.fa-rss:before {
+  content: "\f09e";
+}
+.fa-hdd-o:before {
+  content: "\f0a0";
+}
+.fa-bullhorn:before {
+  content: "\f0a1";
+}
+.fa-bell:before {
+  content: "\f0f3";
+}
+.fa-certificate:before {
+  content: "\f0a3";
+}
+.fa-hand-o-right:before {
+  content: "\f0a4";
+}
+.fa-hand-o-left:before {
+  content: "\f0a5";
+}
+.fa-hand-o-up:before {
+  content: "\f0a6";
+}
+.fa-hand-o-down:before {
+  content: "\f0a7";
+}
+.fa-arrow-circle-left:before {
+  content: "\f0a8";
+}
+.fa-arrow-circle-right:before {
+  content: "\f0a9";
+}
+.fa-arrow-circle-up:before {
+  content: "\f0aa";
+}
+.fa-arrow-circle-down:before {
+  content: "\f0ab";
+}
+.fa-globe:before {
+  content: "\f0ac";
+}
+.fa-wrench:before {
+  content: "\f0ad";
+}
+.fa-tasks:before {
+  content: "\f0ae";
+}
+.fa-filter:before {
+  content: "\f0b0";
+}
+.fa-briefcase:before {
+  content: "\f0b1";
+}
+.fa-arrows-alt:before {
+  content: "\f0b2";
+}
+.fa-group:before,
+.fa-users:before {
+  content: "\f0c0";
+}
+.fa-chain:before,
+.fa-link:before {
+  content: "\f0c1";
+}
+.fa-cloud:before {
+  content: "\f0c2";
+}
+.fa-flask:before {
+  content: "\f0c3";
+}
+.fa-cut:before,
+.fa-scissors:before {
+  content: "\f0c4";
+}
+.fa-copy:before,
+.fa-files-o:before {
+  content: "\f0c5";
+}
+.fa-paperclip:before {
+  content: "\f0c6";
+}
+.fa-save:before,
+.fa-floppy-o:before {
+  content: "\f0c7";
+}
+.fa-square:before {
+  content: "\f0c8";
+}
+.fa-navicon:before,
+.fa-reorder:before,
+.fa-bars:before {
+  content: "\f0c9";
+}
+.fa-list-ul:before {
+  content: "\f0ca";
+}
+.fa-list-ol:before {
+  content: "\f0cb";
+}
+.fa-strikethrough:before {
+  content: "\f0cc";
+}
+.fa-underline:before {
+  content: "\f0cd";
+}
+.fa-table:before {
+  content: "\f0ce";
+}
+.fa-magic:before {
+  content: "\f0d0";
+}
+.fa-truck:before {
+  content: "\f0d1";
+}
+.fa-pinterest:before {
+  content: "\f0d2";
+}
+.fa-pinterest-square:before {
+  content: "\f0d3";
+}
+.fa-google-plus-square:before {
+  content: "\f0d4";
+}
+.fa-google-plus:before {
+  content: "\f0d5";
+}
+.fa-money:before {
+  content: "\f0d6";
+}
+.fa-caret-down:before {
+  content: "\f0d7";
+}
+.fa-caret-up:before {
+  content: "\f0d8";
+}
+.fa-caret-left:before {
+  content: "\f0d9";
+}
+.fa-caret-right:before {
+  content: "\f0da";
+}
+.fa-columns:before {
+  content: "\f0db";
+}
+.fa-unsorted:before,
+.fa-sort:before {
+  content: "\f0dc";
+}
+.fa-sort-down:before,
+.fa-sort-desc:before {
+  content: "\f0dd";
+}
+.fa-sort-up:before,
+.fa-sort-asc:before {
+  content: "\f0de";
+}
+.fa-envelope:before {
+  content: "\f0e0";
+}
+.fa-linkedin:before {
+  content: "\f0e1";
+}
+.fa-rotate-left:before,
+.fa-undo:before {
+  content: "\f0e2";
+}
+.fa-legal:before,
+.fa-gavel:before {
+  content: "\f0e3";
+}
+.fa-dashboard:before,
+.fa-tachometer:before {
+  content: "\f0e4";
+}
+.fa-comment-o:before {
+  content: "\f0e5";
+}
+.fa-comments-o:before {
+  content: "\f0e6";
+}
+.fa-flash:before,
+.fa-bolt:before {
+  content: "\f0e7";
+}
+.fa-sitemap:before {
+  content: "\f0e8";
+}
+.fa-umbrella:before {
+  content: "\f0e9";
+}
+.fa-paste:before,
+.fa-clipboard:before {
+  content: "\f0ea";
+}
+.fa-lightbulb-o:before {
+  content: "\f0eb";
+}
+.fa-exchange:before {
+  content: "\f0ec";
+}
+.fa-cloud-download:before {
+  content: "\f0ed";
+}
+.fa-cloud-upload:before {
+  content: "\f0ee";
+}
+.fa-user-md:before {
+  content: "\f0f0";
+}
+.fa-stethoscope:before {
+  content: "\f0f1";
+}
+.fa-suitcase:before {
+  content: "\f0f2";
+}
+.fa-bell-o:before {
+  content: "\f0a2";
+}
+.fa-coffee:before {
+  content: "\f0f4";
+}
+.fa-cutlery:before {
+  content: "\f0f5";
+}
+.fa-file-text-o:before {
+  content: "\f0f6";
+}
+.fa-building-o:before {
+  content: "\f0f7";
+}
+.fa-hospital-o:before {
+  content: "\f0f8";
+}
+.fa-ambulance:before {
+  content: "\f0f9";
+}
+.fa-medkit:before {
+  content: "\f0fa";
+}
+.fa-fighter-jet:before {
+  content: "\f0fb";
+}
+.fa-beer:before {
+  content: "\f0fc";
+}
+.fa-h-square:before {
+  content: "\f0fd";
+}
+.fa-plus-square:before {
+  content: "\f0fe";
+}
+.fa-angle-double-left:before {
+  content: "\f100";
+}
+.fa-angle-double-right:before {
+  content: "\f101";
+}
+.fa-angle-double-up:before {
+  content: "\f102";
+}
+.fa-angle-double-down:before {
+  content: "\f103";
+}
+.fa-angle-left:before {
+  content: "\f104";
+}
+.fa-angle-right:before {
+  content: "\f105";
+}
+.fa-angle-up:before {
+  content: "\f106";
+}
+.fa-angle-down:before {
+  content: "\f107";
+}
+.fa-desktop:before {
+  content: "\f108";
+}
+.fa-laptop:before {
+  content: "\f109";
+}
+.fa-tablet:before {
+  content: "\f10a";
+}
+.fa-mobile-phone:before,
+.fa-mobile:before {
+  content: "\f10b";
+}
+.fa-circle-o:before {
+  content: "\f10c";
+}
+.fa-quote-left:before {
+  content: "\f10d";
+}
+.fa-quote-right:before {
+  content: "\f10e";
+}
+.fa-spinner:before {
+  content: "\f110";
+}
+.fa-circle:before {
+  content: "\f111";
+}
+.fa-mail-reply:before,
+.fa-reply:before {
+  content: "\f112";
+}
+.fa-github-alt:before {
+  content: "\f113";
+}
+.fa-folder-o:before {
+  content: "\f114";
+}
+.fa-folder-open-o:before {
+  content: "\f115";
+}
+.fa-smile-o:before {
+  content: "\f118";
+}
+.fa-frown-o:before {
+  content: "\f119";
+}
+.fa-meh-o:before {
+  content: "\f11a";
+}
+.fa-gamepad:before {
+  content: "\f11b";
+}
+.fa-keyboard-o:before {
+  content: "\f11c";
+}
+.fa-flag-o:before {
+  content: "\f11d";
+}
+.fa-flag-checkered:before {
+  content: "\f11e";
+}
+.fa-terminal:before {
+  content: "\f120";
+}
+.fa-code:before {
+  content: "\f121";
+}
+.fa-mail-reply-all:before,
+.fa-reply-all:before {
+  content: "\f122";
+}
+.fa-star-half-empty:before,
+.fa-star-half-full:before,
+.fa-star-half-o:before {
+  content: "\f123";
+}
+.fa-location-arrow:before {
+  content: "\f124";
+}
+.fa-crop:before {
+  content: "\f125";
+}
+.fa-code-fork:before {
+  content: "\f126";
+}
+.fa-unlink:before,
+.fa-chain-broken:before {
+  content: "\f127";
+}
+.fa-question:before {
+  content: "\f128";
+}
+.fa-info:before {
+  content: "\f129";
+}
+.fa-exclamation:before {
+  content: "\f12a";
+}
+.fa-superscript:before {
+  content: "\f12b";
+}
+.fa-subscript:before {
+  content: "\f12c";
+}
+.fa-eraser:before {
+  content: "\f12d";
+}
+.fa-puzzle-piece:before {
+  content: "\f12e";
+}
+.fa-microphone:before {
+  content: "\f130";
+}
+.fa-microphone-slash:before {
+  content: "\f131";
+}
+.fa-shield:before {
+  content: "\f132";
+}
+.fa-calendar-o:before {
+  content: "\f133";
+}
+.fa-fire-extinguisher:before {
+  content: "\f134";
+}
+.fa-rocket:before {
+  content: "\f135";
+}
+.fa-maxcdn:before {
+  content: "\f136";
+}
+.fa-chevron-circle-left:before {
+  content: "\f137";
+}
+.fa-chevron-circle-right:before {
+  content: "\f138";
+}
+.fa-chevron-circle-up:before {
+  content: "\f139";
+}
+.fa-chevron-circle-down:before {
+  content: "\f13a";
+}
+.fa-html5:before {
+  content: "\f13b";
+}
+.fa-css3:before {
+  content: "\f13c";
+}
+.fa-anchor:before {
+  content: "\f13d";
+}
+.fa-unlock-alt:before {
+  content: "\f13e";
+}
+.fa-bullseye:before {
+  content: "\f140";
+}
+.fa-ellipsis-h:before {
+  content: "\f141";
+}
+.fa-ellipsis-v:before {
+  content: "\f142";
+}
+.fa-rss-square:before {
+  content: "\f143";
+}
+.fa-play-circle:before {
+  content: "\f144";
+}
+.fa-ticket:before {
+  content: "\f145";
+}
+.fa-minus-square:before {
+  content: "\f146";
+}
+.fa-minus-square-o:before {
+  content: "\f147";
+}
+.fa-level-up:before {
+  content: "\f148";
+}
+.fa-level-down:before {
+  content: "\f149";
+}
+.fa-check-square:before {
+  content: "\f14a";
+}
+.fa-pencil-square:before {
+  content: "\f14b";
+}
+.fa-external-link-square:before {
+  content: "\f14c";
+}
+.fa-share-square:before {
+  content: "\f14d";
+}
+.fa-compass:before {
+  content: "\f14e";
+}
+.fa-toggle-down:before,
+.fa-caret-square-o-down:before {
+  content: "\f150";
+}
+.fa-toggle-up:before,
+.fa-caret-square-o-up:before {
+  content: "\f151";
+}
+.fa-toggle-right:before,
+.fa-caret-square-o-right:before {
+  content: "\f152";
+}
+.fa-euro:before,
+.fa-eur:before {
+  content: "\f153";
+}
+.fa-gbp:before {
+  content: "\f154";
+}
+.fa-dollar:before,
+.fa-usd:before {
+  content: "\f155";
+}
+.fa-rupee:before,
+.fa-inr:before {
+  content: "\f156";
+}
+.fa-cny:before,
+.fa-rmb:before,
+.fa-yen:before,
+.fa-jpy:before {
+  content: "\f157";
+}
+.fa-ruble:before,
+.fa-rouble:before,
+.fa-rub:before {
+  content: "\f158";
+}
+.fa-won:before,
+.fa-krw:before {
+  content: "\f159";
+}
+.fa-bitcoin:before,
+.fa-btc:before {
+  content: "\f15a";
+}
+.fa-file:before {
+  content: "\f15b";
+}
+.fa-file-text:before {
+  content: "\f15c";
+}
+.fa-sort-alpha-asc:before {
+  content: "\f15d";
+}
+.fa-sort-alpha-desc:before {
+  content: "\f15e";
+}
+.fa-sort-amount-asc:before {
+  content: "\f160";
+}
+.fa-sort-amount-desc:before {
+  content: "\f161";
+}
+.fa-sort-numeric-asc:before {
+  content: "\f162";
+}
+.fa-sort-numeric-desc:before {
+  content: "\f163";
+}
+.fa-thumbs-up:before {
+  content: "\f164";
+}
+.fa-thumbs-down:before {
+  content: "\f165";
+}
+.fa-youtube-square:before {
+  content: "\f166";
+}
+.fa-youtube:before {
+  content: "\f167";
+}
+.fa-xing:before {
+  content: "\f168";
+}
+.fa-xing-square:before {
+  content: "\f169";
+}
+.fa-youtube-play:before {
+  content: "\f16a";
+}
+.fa-dropbox:before {
+  content: "\f16b";
+}
+.fa-stack-overflow:before {
+  content: "\f16c";
+}
+.fa-instagram:before {
+  content: "\f16d";
+}
+.fa-flickr:before {
+  content: "\f16e";
+}
+.fa-adn:before {
+  content: "\f170";
+}
+.fa-bitbucket:before {
+  content: "\f171";
+}
+.fa-bitbucket-square:before {
+  content: "\f172";
+}
+.fa-tumblr:before {
+  content: "\f173";
+}
+.fa-tumblr-square:before {
+  content: "\f174";
+}
+.fa-long-arrow-down:before {
+  content: "\f175";
+}
+.fa-long-arrow-up:before {
+  content: "\f176";
+}
+.fa-long-arrow-left:before {
+  content: "\f177";
+}
+.fa-long-arrow-right:before {
+  content: "\f178";
+}
+.fa-apple:before {
+  content: "\f179";
+}
+.fa-windows:before {
+  content: "\f17a";
+}
+.fa-android:before {
+  content: "\f17b";
+}
+.fa-linux:before {
+  content: "\f17c";
+}
+.fa-dribbble:before {
+  content: "\f17d";
+}
+.fa-skype:before {
+  content: "\f17e";
+}
+.fa-foursquare:before {
+  content: "\f180";
+}
+.fa-trello:before {
+  content: "\f181";
+}
+.fa-female:before {
+  content: "\f182";
+}
+.fa-male:before {
+  content: "\f183";
+}
+.fa-gittip:before,
+.fa-gratipay:before {
+  content: "\f184";
+}
+.fa-sun-o:before {
+  content: "\f185";
+}
+.fa-moon-o:before {
+  content: "\f186";
+}
+.fa-archive:before {
+  content: "\f187";
+}
+.fa-bug:before {
+  content: "\f188";
+}
+.fa-vk:before {
+  content: "\f189";
+}
+.fa-weibo:before {
+  content: "\f18a";
+}
+.fa-renren:before {
+  content: "\f18b";
+}
+.fa-pagelines:before {
+  content: "\f18c";
+}
+.fa-stack-exchange:before {
+  content: "\f18d";
+}
+.fa-arrow-circle-o-right:before {
+  content: "\f18e";
+}
+.fa-arrow-circle-o-left:before {
+  content: "\f190";
+}
+.fa-toggle-left:before,
+.fa-caret-square-o-left:before {
+  content: "\f191";
+}
+.fa-dot-circle-o:before {
+  content: "\f192";
+}
+.fa-wheelchair:before {
+  content: "\f193";
+}
+.fa-vimeo-square:before {
+  content: "\f194";
+}
+.fa-turkish-lira:before,
+.fa-try:before {
+  content: "\f195";
+}
+.fa-plus-square-o:before {
+  content: "\f196";
+}
+.fa-space-shuttle:before {
+  content: "\f197";
+}
+.fa-slack:before {
+  content: "\f198";
+}
+.fa-envelope-square:before {
+  content: "\f199";
+}
+.fa-wordpress:before {
+  content: "\f19a";
+}
+.fa-openid:before {
+  content: "\f19b";
+}
+.fa-institution:before,
+.fa-bank:before,
+.fa-university:before {
+  content: "\f19c";
+}
+.fa-mortar-board:before,
+.fa-graduation-cap:before {
+  content: "\f19d";
+}
+.fa-yahoo:before {
+  content: "\f19e";
+}
+.fa-google:before {
+  content: "\f1a0";
+}
+.fa-reddit:before {
+  content: "\f1a1";
+}
+.fa-reddit-square:before {
+  content: "\f1a2";
+}
+.fa-stumbleupon-circle:before {
+  content: "\f1a3";
+}
+.fa-stumbleupon:before {
+  content: "\f1a4";
+}
+.fa-delicious:before {
+  content: "\f1a5";
+}
+.fa-digg:before {
+  content: "\f1a6";
+}
+.fa-pied-piper-pp:before {
+  content: "\f1a7";
+}
+.fa-pied-piper-alt:before {
+  content: "\f1a8";
+}
+.fa-drupal:before {
+  content: "\f1a9";
+}
+.fa-joomla:before {
+  content: "\f1aa";
+}
+.fa-language:before {
+  content: "\f1ab";
+}
+.fa-fax:before {
+  content: "\f1ac";
+}
+.fa-building:before {
+  content: "\f1ad";
+}
+.fa-child:before {
+  content: "\f1ae";
+}
+.fa-paw:before {
+  content: "\f1b0";
+}
+.fa-spoon:before {
+  content: "\f1b1";
+}
+.fa-cube:before {
+  content: "\f1b2";
+}
+.fa-cubes:before {
+  content: "\f1b3";
+}
+.fa-behance:before {
+  content: "\f1b4";
+}
+.fa-behance-square:before {
+  content: "\f1b5";
+}
+.fa-steam:before {
+  content: "\f1b6";
+}
+.fa-steam-square:before {
+  content: "\f1b7";
+}
+.fa-recycle:before {
+  content: "\f1b8";
+}
+.fa-automobile:before,
+.fa-car:before {
+  content: "\f1b9";
+}
+.fa-cab:before,
+.fa-taxi:before {
+  content: "\f1ba";
+}
+.fa-tree:before {
+  content: "\f1bb";
+}
+.fa-spotify:before {
+  content: "\f1bc";
+}
+.fa-deviantart:before {
+  content: "\f1bd";
+}
+.fa-soundcloud:before {
+  content: "\f1be";
+}
+.fa-database:before {
+  content: "\f1c0";
+}
+.fa-file-pdf-o:before {
+  content: "\f1c1";
+}
+.fa-file-word-o:before {
+  content: "\f1c2";
+}
+.fa-file-excel-o:before {
+  content: "\f1c3";
+}
+.fa-file-powerpoint-o:before {
+  content: "\f1c4";
+}
+.fa-file-photo-o:before,
+.fa-file-picture-o:before,
+.fa-file-image-o:before {
+  content: "\f1c5";
+}
+.fa-file-zip-o:before,
+.fa-file-archive-o:before {
+  content: "\f1c6";
+}
+.fa-file-sound-o:before,
+.fa-file-audio-o:before {
+  content: "\f1c7";
+}
+.fa-file-movie-o:before,
+.fa-file-video-o:before {
+  content: "\f1c8";
+}
+.fa-file-code-o:before {
+  content: "\f1c9";
+}
+.fa-vine:before {
+  content: "\f1ca";
+}
+.fa-codepen:before {
+  content: "\f1cb";
+}
+.fa-jsfiddle:before {
+  content: "\f1cc";
+}
+.fa-life-bouy:before,
+.fa-life-buoy:before,
+.fa-life-saver:before,
+.fa-support:before,
+.fa-life-ring:before {
+  content: "\f1cd";
+}
+.fa-circle-o-notch:before {
+  content: "\f1ce";
+}
+.fa-ra:before,
+.fa-resistance:before,
+.fa-rebel:before {
+  content: "\f1d0";
+}
+.fa-ge:before,
+.fa-empire:before {
+  content: "\f1d1";
+}
+.fa-git-square:before {
+  content: "\f1d2";
+}
+.fa-git:before {
+  content: "\f1d3";
+}
+.fa-y-combinator-square:before,
+.fa-yc-square:before,
+.fa-hacker-news:before {
+  content: "\f1d4";
+}
+.fa-tencent-weibo:before {
+  content: "\f1d5";
+}
+.fa-qq:before {
+  content: "\f1d6";
+}
+.fa-wechat:before,
+.fa-weixin:before {
+  content: "\f1d7";
+}
+.fa-send:before,
+.fa-paper-plane:before {
+  content: "\f1d8";
+}
+.fa-send-o:before,
+.fa-paper-plane-o:before {
+  content: "\f1d9";
+}
+.fa-history:before {
+  content: "\f1da";
+}
+.fa-circle-thin:before {
+  content: "\f1db";
+}
+.fa-header:before {
+  content: "\f1dc";
+}
+.fa-paragraph:before {
+  content: "\f1dd";
+}
+.fa-sliders:before {
+  content: "\f1de";
+}
+.fa-share-alt:before {
+  content: "\f1e0";
+}
+.fa-share-alt-square:before {
+  content: "\f1e1";
+}
+.fa-bomb:before {
+  content: "\f1e2";
+}
+.fa-soccer-ball-o:before,
+.fa-futbol-o:before {
+  content: "\f1e3";
+}
+.fa-tty:before {
+  content: "\f1e4";
+}
+.fa-binoculars:before {
+  content: "\f1e5";
+}
+.fa-plug:before {
+  content: "\f1e6";
+}
+.fa-slideshare:before {
+  content: "\f1e7";
+}
+.fa-twitch:before {
+  content: "\f1e8";
+}
+.fa-yelp:before {
+  content: "\f1e9";
+}
+.fa-newspaper-o:before {
+  content: "\f1ea";
+}
+.fa-wifi:before {
+  content: "\f1eb";
+}
+.fa-calculator:before {
+  content: "\f1ec";
+}
+.fa-paypal:before {
+  content: "\f1ed";
+}
+.fa-google-wallet:before {
+  content: "\f1ee";
+}
+.fa-cc-visa:before {
+  content: "\f1f0";
+}
+.fa-cc-mastercard:before {
+  content: "\f1f1";
+}
+.fa-cc-discover:before {
+  content: "\f1f2";
+}
+.fa-cc-amex:before {
+  content: "\f1f3";
+}
+.fa-cc-paypal:before {
+  content: "\f1f4";
+}
+.fa-cc-stripe:before {
+  content: "\f1f5";
+}
+.fa-bell-slash:before {
+  content: "\f1f6";
+}
+.fa-bell-slash-o:before {
+  content: "\f1f7";
+}
+.fa-trash:before {
+  content: "\f1f8";
+}
+.fa-copyright:before {
+  content: "\f1f9";
+}
+.fa-at:before {
+  content: "\f1fa";
+}
+.fa-eyedropper:before {
+  content: "\f1fb";
+}
+.fa-paint-brush:before {
+  content: "\f1fc";
+}
+.fa-birthday-cake:before {
+  content: "\f1fd";
+}
+.fa-area-chart:before {
+  content: "\f1fe";
+}
+.fa-pie-chart:before {
+  content: "\f200";
+}
+.fa-line-chart:before {
+  content: "\f201";
+}
+.fa-lastfm:before {
+  content: "\f202";
+}
+.fa-lastfm-square:before {
+  content: "\f203";
+}
+.fa-toggle-off:before {
+  content: "\f204";
+}
+.fa-toggle-on:before {
+  content: "\f205";
+}
+.fa-bicycle:before {
+  content: "\f206";
+}
+.fa-bus:before {
+  content: "\f207";
+}
+.fa-ioxhost:before {
+  content: "\f208";
+}
+.fa-angellist:before {
+  content: "\f209";
+}
+.fa-cc:before {
+  content: "\f20a";
+}
+.fa-shekel:before,
+.fa-sheqel:before,
+.fa-ils:before {
+  content: "\f20b";
+}
+.fa-meanpath:before {
+  content: "\f20c";
+}
+.fa-buysellads:before {
+  content: "\f20d";
+}
+.fa-connectdevelop:before {
+  content: "\f20e";
+}
+.fa-dashcube:before {
+  content: "\f210";
+}
+.fa-forumbee:before {
+  content: "\f211";
+}
+.fa-leanpub:before {
+  content: "\f212";
+}
+.fa-sellsy:before {
+  content: "\f213";
+}
+.fa-shirtsinbulk:before {
+  content: "\f214";
+}
+.fa-simplybuilt:before {
+  content: "\f215";
+}
+.fa-skyatlas:before {
+  content: "\f216";
+}
+.fa-cart-plus:before {
+  content: "\f217";
+}
+.fa-cart-arrow-down:before {
+  content: "\f218";
+}
+.fa-diamond:before {
+  content: "\f219";
+}
+.fa-ship:before {
+  content: "\f21a";
+}
+.fa-user-secret:before {
+  content: "\f21b";
+}
+.fa-motorcycle:before {
+  content: "\f21c";
+}
+.fa-street-view:before {
+  content: "\f21d";
+}
+.fa-heartbeat:before {
+  content: "\f21e";
+}
+.fa-venus:before {
+  content: "\f221";
+}
+.fa-mars:before {
+  content: "\f222";
+}
+.fa-mercury:before {
+  content: "\f223";
+}
+.fa-intersex:before,
+.fa-transgender:before {
+  content: "\f224";
+}
+.fa-transgender-alt:before {
+  content: "\f225";
+}
+.fa-venus-double:before {
+  content: "\f226";
+}
+.fa-mars-double:before {
+  content: "\f227";
+}
+.fa-venus-mars:before {
+  content: "\f228";
+}
+.fa-mars-stroke:before {
+  content: "\f229";
+}
+.fa-mars-stroke-v:before {
+  content: "\f22a";
+}
+.fa-mars-stroke-h:before {
+  content: "\f22b";
+}
+.fa-neuter:before {
+  content: "\f22c";
+}
+.fa-genderless:before {
+  content: "\f22d";
+}
+.fa-facebook-official:before {
+  content: "\f230";
+}
+.fa-pinterest-p:before {
+  content: "\f231";
+}
+.fa-whatsapp:before {
+  content: "\f232";
+}
+.fa-server:before {
+  content: "\f233";
+}
+.fa-user-plus:before {
+  content: "\f234";
+}
+.fa-user-times:before {
+  content: "\f235";
+}
+.fa-hotel:before,
+.fa-bed:before {
+  content: "\f236";
+}
+.fa-viacoin:before {
+  content: "\f237";
+}
+.fa-train:before {
+  content: "\f238";
+}
+.fa-subway:before {
+  content: "\f239";
+}
+.fa-medium:before {
+  content: "\f23a";
+}
+.fa-yc:before,
+.fa-y-combinator:before {
+  content: "\f23b";
+}
+.fa-optin-monster:before {
+  content: "\f23c";
+}
+.fa-opencart:before {
+  content: "\f23d";
+}
+.fa-expeditedssl:before {
+  content: "\f23e";
+}
+.fa-battery-4:before,
+.fa-battery:before,
+.fa-battery-full:before {
+  content: "\f240";
+}
+.fa-battery-3:before,
+.fa-battery-three-quarters:before {
+  content: "\f241";
+}
+.fa-battery-2:before,
+.fa-battery-half:before {
+  content: "\f242";
+}
+.fa-battery-1:before,
+.fa-battery-quarter:before {
+  content: "\f243";
+}
+.fa-battery-0:before,
+.fa-battery-empty:before {
+  content: "\f244";
+}
+.fa-mouse-pointer:before {
+  content: "\f245";
+}
+.fa-i-cursor:before {
+  content: "\f246";
+}
+.fa-object-group:before {
+  content: "\f247";
+}
+.fa-object-ungroup:before {
+  content: "\f248";
+}
+.fa-sticky-note:before {
+  content: "\f249";
+}
+.fa-sticky-note-o:before {
+  content: "\f24a";
+}
+.fa-cc-jcb:before {
+  content: "\f24b";
+}
+.fa-cc-diners-club:before {
+  content: "\f24c";
+}
+.fa-clone:before {
+  content: "\f24d";
+}
+.fa-balance-scale:before {
+  content: "\f24e";
+}
+.fa-hourglass-o:before {
+  content: "\f250";
+}
+.fa-hourglass-1:before,
+.fa-hourglass-start:before {
+  content: "\f251";
+}
+.fa-hourglass-2:before,
+.fa-hourglass-half:before {
+  content: "\f252";
+}
+.fa-hourglass-3:before,
+.fa-hourglass-end:before {
+  content: "\f253";
+}
+.fa-hourglass:before {
+  content: "\f254";
+}
+.fa-hand-grab-o:before,
+.fa-hand-rock-o:before {
+  content: "\f255";
+}
+.fa-hand-stop-o:before,
+.fa-hand-paper-o:before {
+  content: "\f256";
+}
+.fa-hand-scissors-o:before {
+  content: "\f257";
+}
+.fa-hand-lizard-o:before {
+  content: "\f258";
+}
+.fa-hand-spock-o:before {
+  content: "\f259";
+}
+.fa-hand-pointer-o:before {
+  content: "\f25a";
+}
+.fa-hand-peace-o:before {
+  content: "\f25b";
+}
+.fa-trademark:before {
+  content: "\f25c";
+}
+.fa-registered:before {
+  content: "\f25d";
+}
+.fa-creative-commons:before {
+  content: "\f25e";
+}
+.fa-gg:before {
+  content: "\f260";
+}
+.fa-gg-circle:before {
+  content: "\f261";
+}
+.fa-tripadvisor:before {
+  content: "\f262";
+}
+.fa-odnoklassniki:before {
+  content: "\f263";
+}
+.fa-odnoklassniki-square:before {
+  content: "\f264";
+}
+.fa-get-pocket:before {
+  content: "\f265";
+}
+.fa-wikipedia-w:before {
+  content: "\f266";
+}
+.fa-safari:before {
+  content: "\f267";
+}
+.fa-chrome:before {
+  content: "\f268";
+}
+.fa-firefox:before {
+  content: "\f269";
+}
+.fa-opera:before {
+  content: "\f26a";
+}
+.fa-internet-explorer:before {
+  content: "\f26b";
+}
+.fa-tv:before,
+.fa-television:before {
+  content: "\f26c";
+}
+.fa-contao:before {
+  content: "\f26d";
+}
+.fa-500px:before {
+  content: "\f26e";
+}
+.fa-amazon:before {
+  content: "\f270";
+}
+.fa-calendar-plus-o:before {
+  content: "\f271";
+}
+.fa-calendar-minus-o:before {
+  content: "\f272";
+}
+.fa-calendar-times-o:before {
+  content: "\f273";
+}
+.fa-calendar-check-o:before {
+  content: "\f274";
+}
+.fa-industry:before {
+  content: "\f275";
+}
+.fa-map-pin:before {
+  content: "\f276";
+}
+.fa-map-signs:before {
+  content: "\f277";
+}
+.fa-map-o:before {
+  content: "\f278";
+}
+.fa-map:before {
+  content: "\f279";
+}
+.fa-commenting:before {
+  content: "\f27a";
+}
+.fa-commenting-o:before {
+  content: "\f27b";
+}
+.fa-houzz:before {
+  content: "\f27c";
+}
+.fa-vimeo:before {
+  content: "\f27d";
+}
+.fa-black-tie:before {
+  content: "\f27e";
+}
+.fa-fonticons:before {
+  content: "\f280";
+}
+.fa-reddit-alien:before {
+  content: "\f281";
+}
+.fa-edge:before {
+  content: "\f282";
+}
+.fa-credit-card-alt:before {
+  content: "\f283";
+}
+.fa-codiepie:before {
+  content: "\f284";
+}
+.fa-modx:before {
+  content: "\f285";
+}
+.fa-fort-awesome:before {
+  content: "\f286";
+}
+.fa-usb:before {
+  content: "\f287";
+}
+.fa-product-hunt:before {
+  content: "\f288";
+}
+.fa-mixcloud:before {
+  content: "\f289";
+}
+.fa-scribd:before {
+  content: "\f28a";
+}
+.fa-pause-circle:before {
+  content: "\f28b";
+}
+.fa-pause-circle-o:before {
+  content: "\f28c";
+}
+.fa-stop-circle:before {
+  content: "\f28d";
+}
+.fa-stop-circle-o:before {
+  content: "\f28e";
+}
+.fa-shopping-bag:before {
+  content: "\f290";
+}
+.fa-shopping-basket:before {
+  content: "\f291";
+}
+.fa-hashtag:before {
+  content: "\f292";
+}
+.fa-bluetooth:before {
+  content: "\f293";
+}
+.fa-bluetooth-b:before {
+  content: "\f294";
+}
+.fa-percent:before {
+  content: "\f295";
+}
+.fa-gitlab:before {
+  content: "\f296";
+}
+.fa-wpbeginner:before {
+  content: "\f297";
+}
+.fa-wpforms:before {
+  content: "\f298";
+}
+.fa-envira:before {
+  content: "\f299";
+}
+.fa-universal-access:before {
+  content: "\f29a";
+}
+.fa-wheelchair-alt:before {
+  content: "\f29b";
+}
+.fa-question-circle-o:before {
+  content: "\f29c";
+}
+.fa-blind:before {
+  content: "\f29d";
+}
+.fa-audio-description:before {
+  content: "\f29e";
+}
+.fa-volume-control-phone:before {
+  content: "\f2a0";
+}
+.fa-braille:before {
+  content: "\f2a1";
+}
+.fa-assistive-listening-systems:before {
+  content: "\f2a2";
+}
+.fa-asl-interpreting:before,
+.fa-american-sign-language-interpreting:before {
+  content: "\f2a3";
+}
+.fa-deafness:before,
+.fa-hard-of-hearing:before,
+.fa-deaf:before {
+  content: "\f2a4";
+}
+.fa-glide:before {
+  content: "\f2a5";
+}
+.fa-glide-g:before {
+  content: "\f2a6";
+}
+.fa-signing:before,
+.fa-sign-language:before {
+  content: "\f2a7";
+}
+.fa-low-vision:before {
+  content: "\f2a8";
+}
+.fa-viadeo:before {
+  content: "\f2a9";
+}
+.fa-viadeo-square:before {
+  content: "\f2aa";
+}
+.fa-snapchat:before {
+  content: "\f2ab";
+}
+.fa-snapchat-ghost:before {
+  content: "\f2ac";
+}
+.fa-snapchat-square:before {
+  content: "\f2ad";
+}
+.fa-pied-piper:before {
+  content: "\f2ae";
+}
+.fa-first-order:before {
+  content: "\f2b0";
+}
+.fa-yoast:before {
+  content: "\f2b1";
+}
+.fa-themeisle:before {
+  content: "\f2b2";
+}
+.fa-google-plus-circle:before,
+.fa-google-plus-official:before {
+  content: "\f2b3";
+}
+.fa-fa:before,
+.fa-font-awesome:before {
+  content: "\f2b4";
+}
+.fa-handshake-o:before {
+  content: "\f2b5";
+}
+.fa-envelope-open:before {
+  content: "\f2b6";
+}
+.fa-envelope-open-o:before {
+  content: "\f2b7";
+}
+.fa-linode:before {
+  content: "\f2b8";
+}
+.fa-address-book:before {
+  content: "\f2b9";
+}
+.fa-address-book-o:before {
+  content: "\f2ba";
+}
+.fa-vcard:before,
+.fa-address-card:before {
+  content: "\f2bb";
+}
+.fa-vcard-o:before,
+.fa-address-card-o:before {
+  content: "\f2bc";
+}
+.fa-user-circle:before {
+  content: "\f2bd";
+}
+.fa-user-circle-o:before {
+  content: "\f2be";
+}
+.fa-user-o:before {
+  content: "\f2c0";
+}
+.fa-id-badge:before {
+  content: "\f2c1";
+}
+.fa-drivers-license:before,
+.fa-id-card:before {
+  content: "\f2c2";
+}
+.fa-drivers-license-o:before,
+.fa-id-card-o:before {
+  content: "\f2c3";
+}
+.fa-quora:before {
+  content: "\f2c4";
+}
+.fa-free-code-camp:before {
+  content: "\f2c5";
+}
+.fa-telegram:before {
+  content: "\f2c6";
+}
+.fa-thermometer-4:before,
+.fa-thermometer:before,
+.fa-thermometer-full:before {
+  content: "\f2c7";
+}
+.fa-thermometer-3:before,
+.fa-thermometer-three-quarters:before {
+  content: "\f2c8";
+}
+.fa-thermometer-2:before,
+.fa-thermometer-half:before {
+  content: "\f2c9";
+}
+.fa-thermometer-1:before,
+.fa-thermometer-quarter:before {
+  content: "\f2ca";
+}
+.fa-thermometer-0:before,
+.fa-thermometer-empty:before {
+  content: "\f2cb";
+}
+.fa-shower:before {
+  content: "\f2cc";
+}
+.fa-bathtub:before,
+.fa-s15:before,
+.fa-bath:before {
+  content: "\f2cd";
+}
+.fa-podcast:before {
+  content: "\f2ce";
+}
+.fa-window-maximize:before {
+  content: "\f2d0";
+}
+.fa-window-minimize:before {
+  content: "\f2d1";
+}
+.fa-window-restore:before {
+  content: "\f2d2";
+}
+.fa-times-rectangle:before,
+.fa-window-close:before {
+  content: "\f2d3";
+}
+.fa-times-rectangle-o:before,
+.fa-window-close-o:before {
+  content: "\f2d4";
+}
+.fa-bandcamp:before {
+  content: "\f2d5";
+}
+.fa-grav:before {
+  content: "\f2d6";
+}
+.fa-etsy:before {
+  content: "\f2d7";
+}
+.fa-imdb:before {
+  content: "\f2d8";
+}
+.fa-ravelry:before {
+  content: "\f2d9";
+}
+.fa-eercast:before {
+  content: "\f2da";
+}
+.fa-microchip:before {
+  content: "\f2db";
+}
+.fa-snowflake-o:before {
+  content: "\f2dc";
+}
+.fa-superpowers:before {
+  content: "\f2dd";
+}
+.fa-wpexplorer:before {
+  content: "\f2de";
+}
+.fa-meetup:before {
+  content: "\f2e0";
+}
+.sr-only {
+  position: absolute;
+  width: 1px;
+  height: 1px;
+  padding: 0;
+  margin: -1px;
+  overflow: hidden;
+  clip: rect(0, 0, 0, 0);
+  border: 0;
+}
+.sr-only-focusable:active,
+.sr-only-focusable:focus {
+  position: static;
+  width: auto;
+  height: auto;
+  margin: 0;
+  overflow: visible;
+  clip: auto;
+}
diff --git a/_site/site/public/font-awesome-4.7.0/css/font-awesome.min.css b/_site/site/public/font-awesome-4.7.0/css/font-awesome.min.css
new file mode 100755
index 00000000..540440ce
--- /dev/null
+++ b/_site/site/public/font-awesome-4.7.0/css/font-awesome.min.css
@@ -0,0 +1,4 @@
+/*!
+ *  Font Awesome 4.7.0 by @davegandy - http://fontawesome.io - @fontawesome
+ *  License - http://fontawesome.io/license (Font: SIL OFL 1.1, CSS: MIT License)
+ */@font-face{font-family:'FontAwesome';src:url('../fonts/fontawesome-webfont.eot?v=4.7.0');src:url('../fonts/fontawesome-webfont.eot?#iefix&v=4.7.0') format('embedded-opentype'),url('../fonts/fontawesome-webfont.woff2?v=4.7.0') format('woff2'),url('../fonts/fontawesome-webfont.woff?v=4.7.0') format('woff'),url('../fonts/fontawesome-webfont.ttf?v=4.7.0') format('truetype'),url('../fonts/fontawesome-webfont.svg?v=4.7.0#fontawesomeregular') format('svg');font-weight:normal;font-style:normal}.fa{display:inline-block;font:normal normal normal 14px/1 FontAwesome;font-size:inherit;text-rendering:auto;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale}.fa-lg{font-size:1.33333333em;line-height:.75em;vertical-align:-15%}.fa-2x{font-size:2em}.fa-3x{font-size:3em}.fa-4x{font-size:4em}.fa-5x{font-size:5em}.fa-fw{width:1.28571429em;text-align:center}.fa-ul{padding-left:0;margin-left:2.14285714em;list-style-type:none}.fa-ul>li{position:relative}.fa-li{position:absolute;left:-2.14285714em;width:2.14285714em;top:.14285714em;text-align:center}.fa-li.fa-lg{left:-1.85714286em}.fa-border{padding:.2em .25em .15em;border:solid .08em #eee;border-radius:.1em}.fa-pull-left{float:left}.fa-pull-right{float:right}.fa.fa-pull-left{margin-right:.3em}.fa.fa-pull-right{margin-left:.3em}.pull-right{float:right}.pull-left{float:left}.fa.pull-left{margin-right:.3em}.fa.pull-right{margin-left:.3em}.fa-spin{-webkit-animation:fa-spin 2s infinite linear;animation:fa-spin 2s infinite linear}.fa-pulse{-webkit-animation:fa-spin 1s infinite steps(8);animation:fa-spin 1s infinite steps(8)}@-webkit-keyframes fa-spin{0%{-webkit-transform:rotate(0deg);transform:rotate(0deg)}100%{-webkit-transform:rotate(359deg);transform:rotate(359deg)}}@keyframes fa-spin{0%{-webkit-transform:rotate(0deg);transform:rotate(0deg)}100%{-webkit-transform:rotate(359deg);transform:rotate(359deg)}}.fa-rotate-90{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=1)";-webkit-transform:rotate(90deg);-ms-transform:rotate(90deg);transform:rotate(90deg)}.fa-rotate-180{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=2)";-webkit-transform:rotate(180deg);-ms-transform:rotate(180deg);transform:rotate(180deg)}.fa-rotate-270{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=3)";-webkit-transform:rotate(270deg);-ms-transform:rotate(270deg);transform:rotate(270deg)}.fa-flip-horizontal{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=0, mirror=1)";-webkit-transform:scale(-1, 1);-ms-transform:scale(-1, 1);transform:scale(-1, 1)}.fa-flip-vertical{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=2, mirror=1)";-webkit-transform:scale(1, -1);-ms-transform:scale(1, -1);transform:scale(1, -1)}:root .fa-rotate-90,:root .fa-rotate-180,:root .fa-rotate-270,:root .fa-flip-horizontal,:root .fa-flip-vertical{filter:none}.fa-stack{position:relative;display:inline-block;width:2em;height:2em;line-height:2em;vertical-align:middle}.fa-stack-1x,.fa-stack-2x{position:absolute;left:0;width:100%;text-align:center}.fa-stack-1x{line-height:inherit}.fa-stack-2x{font-size:2em}.fa-inverse{color:#fff}.fa-glass:before{content:"\f000"}.fa-music:before{content:"\f001"}.fa-search:before{content:"\f002"}.fa-envelope-o:before{content:"\f003"}.fa-heart:before{content:"\f004"}.fa-star:before{content:"\f005"}.fa-star-o:before{content:"\f006"}.fa-user:before{content:"\f007"}.fa-film:before{content:"\f008"}.fa-th-large:before{content:"\f009"}.fa-th:before{content:"\f00a"}.fa-th-list:before{content:"\f00b"}.fa-check:before{content:"\f00c"}.fa-remove:before,.fa-close:before,.fa-times:before{content:"\f00d"}.fa-search-plus:before{content:"\f00e"}.fa-search-minus:before{content:"\f010"}.fa-power-off:before{content:"\f011"}.fa-signal:before{content:"\f012"}.fa-gear:before,.fa-cog:before{content:"\f013"}.fa-trash-o:before{content:"\f014"}.fa-home:before{content:"\f015"}.fa-file-o:before{content:"\f016"}.fa-clock-o:before{content:"\f017"}.fa-road:before{content:"\f018"}.fa-download:before{content:"\f019"}.fa-arrow-circle-o-down:before{content:"\f01a"}.fa-arrow-circle-o-up:before{content:"\f01b"}.fa-inbox:before{content:"\f01c"}.fa-play-circle-o:before{content:"\f01d"}.fa-rotate-right:before,.fa-repeat:before{content:"\f01e"}.fa-refresh:before{content:"\f021"}.fa-list-alt:before{content:"\f022"}.fa-lock:before{content:"\f023"}.fa-flag:before{content:"\f024"}.fa-headphones:before{content:"\f025"}.fa-volume-off:before{content:"\f026"}.fa-volume-down:before{content:"\f027"}.fa-volume-up:before{content:"\f028"}.fa-qrcode:before{content:"\f029"}.fa-barcode:before{content:"\f02a"}.fa-tag:before{content:"\f02b"}.fa-tags:before{content:"\f02c"}.fa-book:before{content:"\f02d"}.fa-bookmark:before{content:"\f02e"}.fa-print:before{content:"\f02f"}.fa-camera:before{content:"\f030"}.fa-font:before{content:"\f031"}.fa-bold:before{content:"\f032"}.fa-italic:before{content:"\f033"}.fa-text-height:before{content:"\f034"}.fa-text-width:before{content:"\f035"}.fa-align-left:before{content:"\f036"}.fa-align-center:before{content:"\f037"}.fa-align-right:before{content:"\f038"}.fa-align-justify:before{content:"\f039"}.fa-list:before{content:"\f03a"}.fa-dedent:before,.fa-outdent:before{content:"\f03b"}.fa-indent:before{content:"\f03c"}.fa-video-camera:before{content:"\f03d"}.fa-photo:before,.fa-image:before,.fa-picture-o:before{content:"\f03e"}.fa-pencil:before{content:"\f040"}.fa-map-marker:before{content:"\f041"}.fa-adjust:before{content:"\f042"}.fa-tint:before{content:"\f043"}.fa-edit:before,.fa-pencil-square-o:before{content:"\f044"}.fa-share-square-o:before{content:"\f045"}.fa-check-square-o:before{content:"\f046"}.fa-arrows:before{content:"\f047"}.fa-step-backward:before{content:"\f048"}.fa-fast-backward:before{content:"\f049"}.fa-backward:before{content:"\f04a"}.fa-play:before{content:"\f04b"}.fa-pause:before{content:"\f04c"}.fa-stop:before{content:"\f04d"}.fa-forward:before{content:"\f04e"}.fa-fast-forward:before{content:"\f050"}.fa-step-forward:before{content:"\f051"}.fa-eject:before{content:"\f052"}.fa-chevron-left:before{content:"\f053"}.fa-chevron-right:before{content:"\f054"}.fa-plus-circle:before{content:"\f055"}.fa-minus-circle:before{content:"\f056"}.fa-times-circle:before{content:"\f057"}.fa-check-circle:before{content:"\f058"}.fa-question-circle:before{content:"\f059"}.fa-info-circle:before{content:"\f05a"}.fa-crosshairs:before{content:"\f05b"}.fa-times-circle-o:before{content:"\f05c"}.fa-check-circle-o:before{content:"\f05d"}.fa-ban:before{content:"\f05e"}.fa-arrow-left:before{content:"\f060"}.fa-arrow-right:before{content:"\f061"}.fa-arrow-up:before{content:"\f062"}.fa-arrow-down:before{content:"\f063"}.fa-mail-forward:before,.fa-share:before{content:"\f064"}.fa-expand:before{content:"\f065"}.fa-compress:before{content:"\f066"}.fa-plus:before{content:"\f067"}.fa-minus:before{content:"\f068"}.fa-asterisk:before{content:"\f069"}.fa-exclamation-circle:before{content:"\f06a"}.fa-gift:before{content:"\f06b"}.fa-leaf:before{content:"\f06c"}.fa-fire:before{content:"\f06d"}.fa-eye:before{content:"\f06e"}.fa-eye-slash:before{content:"\f070"}.fa-warning:before,.fa-exclamation-triangle:before{content:"\f071"}.fa-plane:before{content:"\f072"}.fa-calendar:before{content:"\f073"}.fa-random:before{content:"\f074"}.fa-comment:before{content:"\f075"}.fa-magnet:before{content:"\f076"}.fa-chevron-up:before{content:"\f077"}.fa-chevron-down:before{content:"\f078"}.fa-retweet:before{content:"\f079"}.fa-shopping-cart:before{content:"\f07a"}.fa-folder:before{content:"\f07b"}.fa-folder-open:before{content:"\f07c"}.fa-arrows-v:before{content:"\f07d"}.fa-arrows-h:before{content:"\f07e"}.fa-bar-chart-o:before,.fa-bar-chart:before{content:"\f080"}.fa-twitter-square:before{content:"\f081"}.fa-facebook-square:before{content:"\f082"}.fa-camera-retro:before{content:"\f083"}.fa-key:before{content:"\f084"}.fa-gears:before,.fa-cogs:before{content:"\f085"}.fa-comments:before{content:"\f086"}.fa-thumbs-o-up:before{content:"\f087"}.fa-thumbs-o-down:before{content:"\f088"}.fa-star-half:before{content:"\f089"}.fa-heart-o:before{content:"\f08a"}.fa-sign-out:before{content:"\f08b"}.fa-linkedin-square:before{content:"\f08c"}.fa-thumb-tack:before{content:"\f08d"}.fa-external-link:before{content:"\f08e"}.fa-sign-in:before{content:"\f090"}.fa-trophy:before{content:"\f091"}.fa-github-square:before{content:"\f092"}.fa-upload:before{content:"\f093"}.fa-lemon-o:before{content:"\f094"}.fa-phone:before{content:"\f095"}.fa-square-o:before{content:"\f096"}.fa-bookmark-o:before{content:"\f097"}.fa-phone-square:before{content:"\f098"}.fa-twitter:before{content:"\f099"}.fa-facebook-f:before,.fa-facebook:before{content:"\f09a"}.fa-github:before{content:"\f09b"}.fa-unlock:before{content:"\f09c"}.fa-credit-card:before{content:"\f09d"}.fa-feed:before,.fa-rss:before{content:"\f09e"}.fa-hdd-o:before{content:"\f0a0"}.fa-bullhorn:before{content:"\f0a1"}.fa-bell:before{content:"\f0f3"}.fa-certificate:before{content:"\f0a3"}.fa-hand-o-right:before{content:"\f0a4"}.fa-hand-o-left:before{content:"\f0a5"}.fa-hand-o-up:before{content:"\f0a6"}.fa-hand-o-down:before{content:"\f0a7"}.fa-arrow-circle-left:before{content:"\f0a8"}.fa-arrow-circle-right:before{content:"\f0a9"}.fa-arrow-circle-up:before{content:"\f0aa"}.fa-arrow-circle-down:before{content:"\f0ab"}.fa-globe:before{content:"\f0ac"}.fa-wrench:before{content:"\f0ad"}.fa-tasks:before{content:"\f0ae"}.fa-filter:before{content:"\f0b0"}.fa-briefcase:before{content:"\f0b1"}.fa-arrows-alt:before{content:"\f0b2"}.fa-group:before,.fa-users:before{content:"\f0c0"}.fa-chain:before,.fa-link:before{content:"\f0c1"}.fa-cloud:before{content:"\f0c2"}.fa-flask:before{content:"\f0c3"}.fa-cut:before,.fa-scissors:before{content:"\f0c4"}.fa-copy:before,.fa-files-o:before{content:"\f0c5"}.fa-paperclip:before{content:"\f0c6"}.fa-save:before,.fa-floppy-o:before{content:"\f0c7"}.fa-square:before{content:"\f0c8"}.fa-navicon:before,.fa-reorder:before,.fa-bars:before{content:"\f0c9"}.fa-list-ul:before{content:"\f0ca"}.fa-list-ol:before{content:"\f0cb"}.fa-strikethrough:before{content:"\f0cc"}.fa-underline:before{content:"\f0cd"}.fa-table:before{content:"\f0ce"}.fa-magic:before{content:"\f0d0"}.fa-truck:before{content:"\f0d1"}.fa-pinterest:before{content:"\f0d2"}.fa-pinterest-square:before{content:"\f0d3"}.fa-google-plus-square:before{content:"\f0d4"}.fa-google-plus:before{content:"\f0d5"}.fa-money:before{content:"\f0d6"}.fa-caret-down:before{content:"\f0d7"}.fa-caret-up:before{content:"\f0d8"}.fa-caret-left:before{content:"\f0d9"}.fa-caret-right:before{content:"\f0da"}.fa-columns:before{content:"\f0db"}.fa-unsorted:before,.fa-sort:before{content:"\f0dc"}.fa-sort-down:before,.fa-sort-desc:before{content:"\f0dd"}.fa-sort-up:before,.fa-sort-asc:before{content:"\f0de"}.fa-envelope:before{content:"\f0e0"}.fa-linkedin:before{content:"\f0e1"}.fa-rotate-left:before,.fa-undo:before{content:"\f0e2"}.fa-legal:before,.fa-gavel:before{content:"\f0e3"}.fa-dashboard:before,.fa-tachometer:before{content:"\f0e4"}.fa-comment-o:before{content:"\f0e5"}.fa-comments-o:before{content:"\f0e6"}.fa-flash:before,.fa-bolt:before{content:"\f0e7"}.fa-sitemap:before{content:"\f0e8"}.fa-umbrella:before{content:"\f0e9"}.fa-paste:before,.fa-clipboard:before{content:"\f0ea"}.fa-lightbulb-o:before{content:"\f0eb"}.fa-exchange:before{content:"\f0ec"}.fa-cloud-download:before{content:"\f0ed"}.fa-cloud-upload:before{content:"\f0ee"}.fa-user-md:before{content:"\f0f0"}.fa-stethoscope:before{content:"\f0f1"}.fa-suitcase:before{content:"\f0f2"}.fa-bell-o:before{content:"\f0a2"}.fa-coffee:before{content:"\f0f4"}.fa-cutlery:before{content:"\f0f5"}.fa-file-text-o:before{content:"\f0f6"}.fa-building-o:before{content:"\f0f7"}.fa-hospital-o:before{content:"\f0f8"}.fa-ambulance:before{content:"\f0f9"}.fa-medkit:before{content:"\f0fa"}.fa-fighter-jet:before{content:"\f0fb"}.fa-beer:before{content:"\f0fc"}.fa-h-square:before{content:"\f0fd"}.fa-plus-square:before{content:"\f0fe"}.fa-angle-double-left:before{content:"\f100"}.fa-angle-double-right:before{content:"\f101"}.fa-angle-double-up:before{content:"\f102"}.fa-angle-double-down:before{content:"\f103"}.fa-angle-left:before{content:"\f104"}.fa-angle-right:before{content:"\f105"}.fa-angle-up:before{content:"\f106"}.fa-angle-down:before{content:"\f107"}.fa-desktop:before{content:"\f108"}.fa-laptop:before{content:"\f109"}.fa-tablet:before{content:"\f10a"}.fa-mobile-phone:before,.fa-mobile:before{content:"\f10b"}.fa-circle-o:before{content:"\f10c"}.fa-quote-left:before{content:"\f10d"}.fa-quote-right:before{content:"\f10e"}.fa-spinner:before{content:"\f110"}.fa-circle:before{content:"\f111"}.fa-mail-reply:before,.fa-reply:before{content:"\f112"}.fa-github-alt:before{content:"\f113"}.fa-folder-o:before{content:"\f114"}.fa-folder-open-o:before{content:"\f115"}.fa-smile-o:before{content:"\f118"}.fa-frown-o:before{content:"\f119"}.fa-meh-o:before{content:"\f11a"}.fa-gamepad:before{content:"\f11b"}.fa-keyboard-o:before{content:"\f11c"}.fa-flag-o:before{content:"\f11d"}.fa-flag-checkered:before{content:"\f11e"}.fa-terminal:before{content:"\f120"}.fa-code:before{content:"\f121"}.fa-mail-reply-all:before,.fa-reply-all:before{content:"\f122"}.fa-star-half-empty:before,.fa-star-half-full:before,.fa-star-half-o:before{content:"\f123"}.fa-location-arrow:before{content:"\f124"}.fa-crop:before{content:"\f125"}.fa-code-fork:before{content:"\f126"}.fa-unlink:before,.fa-chain-broken:before{content:"\f127"}.fa-question:before{content:"\f128"}.fa-info:before{content:"\f129"}.fa-exclamation:before{content:"\f12a"}.fa-superscript:before{content:"\f12b"}.fa-subscript:before{content:"\f12c"}.fa-eraser:before{content:"\f12d"}.fa-puzzle-piece:before{content:"\f12e"}.fa-microphone:before{content:"\f130"}.fa-microphone-slash:before{content:"\f131"}.fa-shield:before{content:"\f132"}.fa-calendar-o:before{content:"\f133"}.fa-fire-extinguisher:before{content:"\f134"}.fa-rocket:before{content:"\f135"}.fa-maxcdn:before{content:"\f136"}.fa-chevron-circle-left:before{content:"\f137"}.fa-chevron-circle-right:before{content:"\f138"}.fa-chevron-circle-up:before{content:"\f139"}.fa-chevron-circle-down:before{content:"\f13a"}.fa-html5:before{content:"\f13b"}.fa-css3:before{content:"\f13c"}.fa-anchor:before{content:"\f13d"}.fa-unlock-alt:before{content:"\f13e"}.fa-bullseye:before{content:"\f140"}.fa-ellipsis-h:before{content:"\f141"}.fa-ellipsis-v:before{content:"\f142"}.fa-rss-square:before{content:"\f143"}.fa-play-circle:before{content:"\f144"}.fa-ticket:before{content:"\f145"}.fa-minus-square:before{content:"\f146"}.fa-minus-square-o:before{content:"\f147"}.fa-level-up:before{content:"\f148"}.fa-level-down:before{content:"\f149"}.fa-check-square:before{content:"\f14a"}.fa-pencil-square:before{content:"\f14b"}.fa-external-link-square:before{content:"\f14c"}.fa-share-square:before{content:"\f14d"}.fa-compass:before{content:"\f14e"}.fa-toggle-down:before,.fa-caret-square-o-down:before{content:"\f150"}.fa-toggle-up:before,.fa-caret-square-o-up:before{content:"\f151"}.fa-toggle-right:before,.fa-caret-square-o-right:before{content:"\f152"}.fa-euro:before,.fa-eur:before{content:"\f153"}.fa-gbp:before{content:"\f154"}.fa-dollar:before,.fa-usd:before{content:"\f155"}.fa-rupee:before,.fa-inr:before{content:"\f156"}.fa-cny:before,.fa-rmb:before,.fa-yen:before,.fa-jpy:before{content:"\f157"}.fa-ruble:before,.fa-rouble:before,.fa-rub:before{content:"\f158"}.fa-won:before,.fa-krw:before{content:"\f159"}.fa-bitcoin:before,.fa-btc:before{content:"\f15a"}.fa-file:before{content:"\f15b"}.fa-file-text:before{content:"\f15c"}.fa-sort-alpha-asc:before{content:"\f15d"}.fa-sort-alpha-desc:before{content:"\f15e"}.fa-sort-amount-asc:before{content:"\f160"}.fa-sort-amount-desc:before{content:"\f161"}.fa-sort-numeric-asc:before{content:"\f162"}.fa-sort-numeric-desc:before{content:"\f163"}.fa-thumbs-up:before{content:"\f164"}.fa-thumbs-down:before{content:"\f165"}.fa-youtube-square:before{content:"\f166"}.fa-youtube:before{content:"\f167"}.fa-xing:before{content:"\f168"}.fa-xing-square:before{content:"\f169"}.fa-youtube-play:before{content:"\f16a"}.fa-dropbox:before{content:"\f16b"}.fa-stack-overflow:before{content:"\f16c"}.fa-instagram:before{content:"\f16d"}.fa-flickr:before{content:"\f16e"}.fa-adn:before{content:"\f170"}.fa-bitbucket:before{content:"\f171"}.fa-bitbucket-square:before{content:"\f172"}.fa-tumblr:before{content:"\f173"}.fa-tumblr-square:before{content:"\f174"}.fa-long-arrow-down:before{content:"\f175"}.fa-long-arrow-up:before{content:"\f176"}.fa-long-arrow-left:before{content:"\f177"}.fa-long-arrow-right:before{content:"\f178"}.fa-apple:before{content:"\f179"}.fa-windows:before{content:"\f17a"}.fa-android:before{content:"\f17b"}.fa-linux:before{content:"\f17c"}.fa-dribbble:before{content:"\f17d"}.fa-skype:before{content:"\f17e"}.fa-foursquare:before{content:"\f180"}.fa-trello:before{content:"\f181"}.fa-female:before{content:"\f182"}.fa-male:before{content:"\f183"}.fa-gittip:before,.fa-gratipay:before{content:"\f184"}.fa-sun-o:before{content:"\f185"}.fa-moon-o:before{content:"\f186"}.fa-archive:before{content:"\f187"}.fa-bug:before{content:"\f188"}.fa-vk:before{content:"\f189"}.fa-weibo:before{content:"\f18a"}.fa-renren:before{content:"\f18b"}.fa-pagelines:before{content:"\f18c"}.fa-stack-exchange:before{content:"\f18d"}.fa-arrow-circle-o-right:before{content:"\f18e"}.fa-arrow-circle-o-left:before{content:"\f190"}.fa-toggle-left:before,.fa-caret-square-o-left:before{content:"\f191"}.fa-dot-circle-o:before{content:"\f192"}.fa-wheelchair:before{content:"\f193"}.fa-vimeo-square:before{content:"\f194"}.fa-turkish-lira:before,.fa-try:before{content:"\f195"}.fa-plus-square-o:before{content:"\f196"}.fa-space-shuttle:before{content:"\f197"}.fa-slack:before{content:"\f198"}.fa-envelope-square:before{content:"\f199"}.fa-wordpress:before{content:"\f19a"}.fa-openid:before{content:"\f19b"}.fa-institution:before,.fa-bank:before,.fa-university:before{content:"\f19c"}.fa-mortar-board:before,.fa-graduation-cap:before{content:"\f19d"}.fa-yahoo:before{content:"\f19e"}.fa-google:before{content:"\f1a0"}.fa-reddit:before{content:"\f1a1"}.fa-reddit-square:before{content:"\f1a2"}.fa-stumbleupon-circle:before{content:"\f1a3"}.fa-stumbleupon:before{content:"\f1a4"}.fa-delicious:before{content:"\f1a5"}.fa-digg:before{content:"\f1a6"}.fa-pied-piper-pp:before{content:"\f1a7"}.fa-pied-piper-alt:before{content:"\f1a8"}.fa-drupal:before{content:"\f1a9"}.fa-joomla:before{content:"\f1aa"}.fa-language:before{content:"\f1ab"}.fa-fax:before{content:"\f1ac"}.fa-building:before{content:"\f1ad"}.fa-child:before{content:"\f1ae"}.fa-paw:before{content:"\f1b0"}.fa-spoon:before{content:"\f1b1"}.fa-cube:before{content:"\f1b2"}.fa-cubes:before{content:"\f1b3"}.fa-behance:before{content:"\f1b4"}.fa-behance-square:before{content:"\f1b5"}.fa-steam:before{content:"\f1b6"}.fa-steam-square:before{content:"\f1b7"}.fa-recycle:before{content:"\f1b8"}.fa-automobile:before,.fa-car:before{content:"\f1b9"}.fa-cab:before,.fa-taxi:before{content:"\f1ba"}.fa-tree:before{content:"\f1bb"}.fa-spotify:before{content:"\f1bc"}.fa-deviantart:before{content:"\f1bd"}.fa-soundcloud:before{content:"\f1be"}.fa-database:before{content:"\f1c0"}.fa-file-pdf-o:before{content:"\f1c1"}.fa-file-word-o:before{content:"\f1c2"}.fa-file-excel-o:before{content:"\f1c3"}.fa-file-powerpoint-o:before{content:"\f1c4"}.fa-file-photo-o:before,.fa-file-picture-o:before,.fa-file-image-o:before{content:"\f1c5"}.fa-file-zip-o:before,.fa-file-archive-o:before{content:"\f1c6"}.fa-file-sound-o:before,.fa-file-audio-o:before{content:"\f1c7"}.fa-file-movie-o:before,.fa-file-video-o:before{content:"\f1c8"}.fa-file-code-o:before{content:"\f1c9"}.fa-vine:before{content:"\f1ca"}.fa-codepen:before{content:"\f1cb"}.fa-jsfiddle:before{content:"\f1cc"}.fa-life-bouy:before,.fa-life-buoy:before,.fa-life-saver:before,.fa-support:before,.fa-life-ring:before{content:"\f1cd"}.fa-circle-o-notch:before{content:"\f1ce"}.fa-ra:before,.fa-resistance:before,.fa-rebel:before{content:"\f1d0"}.fa-ge:before,.fa-empire:before{content:"\f1d1"}.fa-git-square:before{content:"\f1d2"}.fa-git:before{content:"\f1d3"}.fa-y-combinator-square:before,.fa-yc-square:before,.fa-hacker-news:before{content:"\f1d4"}.fa-tencent-weibo:before{content:"\f1d5"}.fa-qq:before{content:"\f1d6"}.fa-wechat:before,.fa-weixin:before{content:"\f1d7"}.fa-send:before,.fa-paper-plane:before{content:"\f1d8"}.fa-send-o:before,.fa-paper-plane-o:before{content:"\f1d9"}.fa-history:before{content:"\f1da"}.fa-circle-thin:before{content:"\f1db"}.fa-header:before{content:"\f1dc"}.fa-paragraph:before{content:"\f1dd"}.fa-sliders:before{content:"\f1de"}.fa-share-alt:before{content:"\f1e0"}.fa-share-alt-square:before{content:"\f1e1"}.fa-bomb:before{content:"\f1e2"}.fa-soccer-ball-o:before,.fa-futbol-o:before{content:"\f1e3"}.fa-tty:before{content:"\f1e4"}.fa-binoculars:before{content:"\f1e5"}.fa-plug:before{content:"\f1e6"}.fa-slideshare:before{content:"\f1e7"}.fa-twitch:before{content:"\f1e8"}.fa-yelp:before{content:"\f1e9"}.fa-newspaper-o:before{content:"\f1ea"}.fa-wifi:before{content:"\f1eb"}.fa-calculator:before{content:"\f1ec"}.fa-paypal:before{content:"\f1ed"}.fa-google-wallet:before{content:"\f1ee"}.fa-cc-visa:before{content:"\f1f0"}.fa-cc-mastercard:before{content:"\f1f1"}.fa-cc-discover:before{content:"\f1f2"}.fa-cc-amex:before{content:"\f1f3"}.fa-cc-paypal:before{content:"\f1f4"}.fa-cc-stripe:before{content:"\f1f5"}.fa-bell-slash:before{content:"\f1f6"}.fa-bell-slash-o:before{content:"\f1f7"}.fa-trash:before{content:"\f1f8"}.fa-copyright:before{content:"\f1f9"}.fa-at:before{content:"\f1fa"}.fa-eyedropper:before{content:"\f1fb"}.fa-paint-brush:before{content:"\f1fc"}.fa-birthday-cake:before{content:"\f1fd"}.fa-area-chart:before{content:"\f1fe"}.fa-pie-chart:before{content:"\f200"}.fa-line-chart:before{content:"\f201"}.fa-lastfm:before{content:"\f202"}.fa-lastfm-square:before{content:"\f203"}.fa-toggle-off:before{content:"\f204"}.fa-toggle-on:before{content:"\f205"}.fa-bicycle:before{content:"\f206"}.fa-bus:before{content:"\f207"}.fa-ioxhost:before{content:"\f208"}.fa-angellist:before{content:"\f209"}.fa-cc:before{content:"\f20a"}.fa-shekel:before,.fa-sheqel:before,.fa-ils:before{content:"\f20b"}.fa-meanpath:before{content:"\f20c"}.fa-buysellads:before{content:"\f20d"}.fa-connectdevelop:before{content:"\f20e"}.fa-dashcube:before{content:"\f210"}.fa-forumbee:before{content:"\f211"}.fa-leanpub:before{content:"\f212"}.fa-sellsy:before{content:"\f213"}.fa-shirtsinbulk:before{content:"\f214"}.fa-simplybuilt:before{content:"\f215"}.fa-skyatlas:before{content:"\f216"}.fa-cart-plus:before{content:"\f217"}.fa-cart-arrow-down:before{content:"\f218"}.fa-diamond:before{content:"\f219"}.fa-ship:before{content:"\f21a"}.fa-user-secret:before{content:"\f21b"}.fa-motorcycle:before{content:"\f21c"}.fa-street-view:before{content:"\f21d"}.fa-heartbeat:before{content:"\f21e"}.fa-venus:before{content:"\f221"}.fa-mars:before{content:"\f222"}.fa-mercury:before{content:"\f223"}.fa-intersex:before,.fa-transgender:before{content:"\f224"}.fa-transgender-alt:before{content:"\f225"}.fa-venus-double:before{content:"\f226"}.fa-mars-double:before{content:"\f227"}.fa-venus-mars:before{content:"\f228"}.fa-mars-stroke:before{content:"\f229"}.fa-mars-stroke-v:before{content:"\f22a"}.fa-mars-stroke-h:before{content:"\f22b"}.fa-neuter:before{content:"\f22c"}.fa-genderless:before{content:"\f22d"}.fa-facebook-official:before{content:"\f230"}.fa-pinterest-p:before{content:"\f231"}.fa-whatsapp:before{content:"\f232"}.fa-server:before{content:"\f233"}.fa-user-plus:before{content:"\f234"}.fa-user-times:before{content:"\f235"}.fa-hotel:before,.fa-bed:before{content:"\f236"}.fa-viacoin:before{content:"\f237"}.fa-train:before{content:"\f238"}.fa-subway:before{content:"\f239"}.fa-medium:before{content:"\f23a"}.fa-yc:before,.fa-y-combinator:before{content:"\f23b"}.fa-optin-monster:before{content:"\f23c"}.fa-opencart:before{content:"\f23d"}.fa-expeditedssl:before{content:"\f23e"}.fa-battery-4:before,.fa-battery:before,.fa-battery-full:before{content:"\f240"}.fa-battery-3:before,.fa-battery-three-quarters:before{content:"\f241"}.fa-battery-2:before,.fa-battery-half:before{content:"\f242"}.fa-battery-1:before,.fa-battery-quarter:before{content:"\f243"}.fa-battery-0:before,.fa-battery-empty:before{content:"\f244"}.fa-mouse-pointer:before{content:"\f245"}.fa-i-cursor:before{content:"\f246"}.fa-object-group:before{content:"\f247"}.fa-object-ungroup:before{content:"\f248"}.fa-sticky-note:before{content:"\f249"}.fa-sticky-note-o:before{content:"\f24a"}.fa-cc-jcb:before{content:"\f24b"}.fa-cc-diners-club:before{content:"\f24c"}.fa-clone:before{content:"\f24d"}.fa-balance-scale:before{content:"\f24e"}.fa-hourglass-o:before{content:"\f250"}.fa-hourglass-1:before,.fa-hourglass-start:before{content:"\f251"}.fa-hourglass-2:before,.fa-hourglass-half:before{content:"\f252"}.fa-hourglass-3:before,.fa-hourglass-end:before{content:"\f253"}.fa-hourglass:before{content:"\f254"}.fa-hand-grab-o:before,.fa-hand-rock-o:before{content:"\f255"}.fa-hand-stop-o:before,.fa-hand-paper-o:before{content:"\f256"}.fa-hand-scissors-o:before{content:"\f257"}.fa-hand-lizard-o:before{content:"\f258"}.fa-hand-spock-o:before{content:"\f259"}.fa-hand-pointer-o:before{content:"\f25a"}.fa-hand-peace-o:before{content:"\f25b"}.fa-trademark:before{content:"\f25c"}.fa-registered:before{content:"\f25d"}.fa-creative-commons:before{content:"\f25e"}.fa-gg:before{content:"\f260"}.fa-gg-circle:before{content:"\f261"}.fa-tripadvisor:before{content:"\f262"}.fa-odnoklassniki:before{content:"\f263"}.fa-odnoklassniki-square:before{content:"\f264"}.fa-get-pocket:before{content:"\f265"}.fa-wikipedia-w:before{content:"\f266"}.fa-safari:before{content:"\f267"}.fa-chrome:before{content:"\f268"}.fa-firefox:before{content:"\f269"}.fa-opera:before{content:"\f26a"}.fa-internet-explorer:before{content:"\f26b"}.fa-tv:before,.fa-television:before{content:"\f26c"}.fa-contao:before{content:"\f26d"}.fa-500px:before{content:"\f26e"}.fa-amazon:before{content:"\f270"}.fa-calendar-plus-o:before{content:"\f271"}.fa-calendar-minus-o:before{content:"\f272"}.fa-calendar-times-o:before{content:"\f273"}.fa-calendar-check-o:before{content:"\f274"}.fa-industry:before{content:"\f275"}.fa-map-pin:before{content:"\f276"}.fa-map-signs:before{content:"\f277"}.fa-map-o:before{content:"\f278"}.fa-map:before{content:"\f279"}.fa-commenting:before{content:"\f27a"}.fa-commenting-o:before{content:"\f27b"}.fa-houzz:before{content:"\f27c"}.fa-vimeo:before{content:"\f27d"}.fa-black-tie:before{content:"\f27e"}.fa-fonticons:before{content:"\f280"}.fa-reddit-alien:before{content:"\f281"}.fa-edge:before{content:"\f282"}.fa-credit-card-alt:before{content:"\f283"}.fa-codiepie:before{content:"\f284"}.fa-modx:before{content:"\f285"}.fa-fort-awesome:before{content:"\f286"}.fa-usb:before{content:"\f287"}.fa-product-hunt:before{content:"\f288"}.fa-mixcloud:before{content:"\f289"}.fa-scribd:before{content:"\f28a"}.fa-pause-circle:before{content:"\f28b"}.fa-pause-circle-o:before{content:"\f28c"}.fa-stop-circle:before{content:"\f28d"}.fa-stop-circle-o:before{content:"\f28e"}.fa-shopping-bag:before{content:"\f290"}.fa-shopping-basket:before{content:"\f291"}.fa-hashtag:before{content:"\f292"}.fa-bluetooth:before{content:"\f293"}.fa-bluetooth-b:before{content:"\f294"}.fa-percent:before{content:"\f295"}.fa-gitlab:before{content:"\f296"}.fa-wpbeginner:before{content:"\f297"}.fa-wpforms:before{content:"\f298"}.fa-envira:before{content:"\f299"}.fa-universal-access:before{content:"\f29a"}.fa-wheelchair-alt:before{content:"\f29b"}.fa-question-circle-o:before{content:"\f29c"}.fa-blind:before{content:"\f29d"}.fa-audio-description:before{content:"\f29e"}.fa-volume-control-phone:before{content:"\f2a0"}.fa-braille:before{content:"\f2a1"}.fa-assistive-listening-systems:before{content:"\f2a2"}.fa-asl-interpreting:before,.fa-american-sign-language-interpreting:before{content:"\f2a3"}.fa-deafness:before,.fa-hard-of-hearing:before,.fa-deaf:before{content:"\f2a4"}.fa-glide:before{content:"\f2a5"}.fa-glide-g:before{content:"\f2a6"}.fa-signing:before,.fa-sign-language:before{content:"\f2a7"}.fa-low-vision:before{content:"\f2a8"}.fa-viadeo:before{content:"\f2a9"}.fa-viadeo-square:before{content:"\f2aa"}.fa-snapchat:before{content:"\f2ab"}.fa-snapchat-ghost:before{content:"\f2ac"}.fa-snapchat-square:before{content:"\f2ad"}.fa-pied-piper:before{content:"\f2ae"}.fa-first-order:before{content:"\f2b0"}.fa-yoast:before{content:"\f2b1"}.fa-themeisle:before{content:"\f2b2"}.fa-google-plus-circle:before,.fa-google-plus-official:before{content:"\f2b3"}.fa-fa:before,.fa-font-awesome:before{content:"\f2b4"}.fa-handshake-o:before{content:"\f2b5"}.fa-envelope-open:before{content:"\f2b6"}.fa-envelope-open-o:before{content:"\f2b7"}.fa-linode:before{content:"\f2b8"}.fa-address-book:before{content:"\f2b9"}.fa-address-book-o:before{content:"\f2ba"}.fa-vcard:before,.fa-address-card:before{content:"\f2bb"}.fa-vcard-o:before,.fa-address-card-o:before{content:"\f2bc"}.fa-user-circle:before{content:"\f2bd"}.fa-user-circle-o:before{content:"\f2be"}.fa-user-o:before{content:"\f2c0"}.fa-id-badge:before{content:"\f2c1"}.fa-drivers-license:before,.fa-id-card:before{content:"\f2c2"}.fa-drivers-license-o:before,.fa-id-card-o:before{content:"\f2c3"}.fa-quora:before{content:"\f2c4"}.fa-free-code-camp:before{content:"\f2c5"}.fa-telegram:before{content:"\f2c6"}.fa-thermometer-4:before,.fa-thermometer:before,.fa-thermometer-full:before{content:"\f2c7"}.fa-thermometer-3:before,.fa-thermometer-three-quarters:before{content:"\f2c8"}.fa-thermometer-2:before,.fa-thermometer-half:before{content:"\f2c9"}.fa-thermometer-1:before,.fa-thermometer-quarter:before{content:"\f2ca"}.fa-thermometer-0:before,.fa-thermometer-empty:before{content:"\f2cb"}.fa-shower:before{content:"\f2cc"}.fa-bathtub:before,.fa-s15:before,.fa-bath:before{content:"\f2cd"}.fa-podcast:before{content:"\f2ce"}.fa-window-maximize:before{content:"\f2d0"}.fa-window-minimize:before{content:"\f2d1"}.fa-window-restore:before{content:"\f2d2"}.fa-times-rectangle:before,.fa-window-close:before{content:"\f2d3"}.fa-times-rectangle-o:before,.fa-window-close-o:before{content:"\f2d4"}.fa-bandcamp:before{content:"\f2d5"}.fa-grav:before{content:"\f2d6"}.fa-etsy:before{content:"\f2d7"}.fa-imdb:before{content:"\f2d8"}.fa-ravelry:before{content:"\f2d9"}.fa-eercast:before{content:"\f2da"}.fa-microchip:before{content:"\f2db"}.fa-snowflake-o:before{content:"\f2dc"}.fa-superpowers:before{content:"\f2dd"}.fa-wpexplorer:before{content:"\f2de"}.fa-meetup:before{content:"\f2e0"}.sr-only{position:absolute;width:1px;height:1px;padding:0;margin:-1px;overflow:hidden;clip:rect(0, 0, 0, 0);border:0}.sr-only-focusable:active,.sr-only-focusable:focus{position:static;width:auto;height:auto;margin:0;overflow:visible;clip:auto}
diff --git a/_site/site/public/font-awesome-4.7.0/fonts/FontAwesome.otf b/_site/site/public/font-awesome-4.7.0/fonts/FontAwesome.otf
new file mode 100755
index 00000000..401ec0f3
Binary files /dev/null and b/_site/site/public/font-awesome-4.7.0/fonts/FontAwesome.otf differ
diff --git a/_site/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.eot b/_site/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.eot
new file mode 100755
index 00000000..e9f60ca9
Binary files /dev/null and b/_site/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.eot differ
diff --git a/_site/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.svg b/_site/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.svg
new file mode 100755
index 00000000..855c845e
--- /dev/null
+++ b/_site/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.svg
@@ -0,0 +1,2671 @@
+<?xml version="1.0" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd" >
+<svg>
+<metadata>
+Created by FontForge 20120731 at Mon Oct 24 17:37:40 2016
+ By ,,,
+Copyright Dave Gandy 2016. All rights reserved.
+</metadata>
+<defs>
+<font id="FontAwesome" horiz-adv-x="1536" >
+  <font-face 
+    font-family="FontAwesome"
+    font-weight="400"
+    font-stretch="normal"
+    units-per-em="1792"
+    panose-1="0 0 0 0 0 0 0 0 0 0"
+    ascent="1536"
+    descent="-256"
+    bbox="-1.02083 -256.962 2304.6 1537.02"
+    underline-thickness="0"
+    underline-position="0"
+    unicode-range="U+0020-F500"
+  />
+<missing-glyph horiz-adv-x="896" 
+d="M224 112h448v1312h-448v-1312zM112 0v1536h672v-1536h-672z" />
+    <glyph glyph-name=".notdef" horiz-adv-x="896" 
+d="M224 112h448v1312h-448v-1312zM112 0v1536h672v-1536h-672z" />
+    <glyph glyph-name=".null" horiz-adv-x="0" 
+ />
+    <glyph glyph-name="nonmarkingreturn" horiz-adv-x="597" 
+ />
+    <glyph glyph-name="space" unicode=" " horiz-adv-x="448" 
+ />
+    <glyph glyph-name="dieresis" unicode="&#xa8;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="copyright" unicode="&#xa9;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="registered" unicode="&#xae;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="acute" unicode="&#xb4;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="AE" unicode="&#xc6;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="Oslash" unicode="&#xd8;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="trademark" unicode="&#x2122;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="infinity" unicode="&#x221e;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="notequal" unicode="&#x2260;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="glass" unicode="&#xf000;" horiz-adv-x="1792" 
+d="M1699 1350q0 -35 -43 -78l-632 -632v-768h320q26 0 45 -19t19 -45t-19 -45t-45 -19h-896q-26 0 -45 19t-19 45t19 45t45 19h320v768l-632 632q-43 43 -43 78q0 23 18 36.5t38 17.5t43 4h1408q23 0 43 -4t38 -17.5t18 -36.5z" />
+    <glyph glyph-name="music" unicode="&#xf001;" 
+d="M1536 1312v-1120q0 -50 -34 -89t-86 -60.5t-103.5 -32t-96.5 -10.5t-96.5 10.5t-103.5 32t-86 60.5t-34 89t34 89t86 60.5t103.5 32t96.5 10.5q105 0 192 -39v537l-768 -237v-709q0 -50 -34 -89t-86 -60.5t-103.5 -32t-96.5 -10.5t-96.5 10.5t-103.5 32t-86 60.5t-34 89
+t34 89t86 60.5t103.5 32t96.5 10.5q105 0 192 -39v967q0 31 19 56.5t49 35.5l832 256q12 4 28 4q40 0 68 -28t28 -68z" />
+    <glyph glyph-name="search" unicode="&#xf002;" horiz-adv-x="1664" 
+d="M1152 704q0 185 -131.5 316.5t-316.5 131.5t-316.5 -131.5t-131.5 -316.5t131.5 -316.5t316.5 -131.5t316.5 131.5t131.5 316.5zM1664 -128q0 -52 -38 -90t-90 -38q-54 0 -90 38l-343 342q-179 -124 -399 -124q-143 0 -273.5 55.5t-225 150t-150 225t-55.5 273.5
+t55.5 273.5t150 225t225 150t273.5 55.5t273.5 -55.5t225 -150t150 -225t55.5 -273.5q0 -220 -124 -399l343 -343q37 -37 37 -90z" />
+    <glyph glyph-name="envelope" unicode="&#xf003;" horiz-adv-x="1792" 
+d="M1664 32v768q-32 -36 -69 -66q-268 -206 -426 -338q-51 -43 -83 -67t-86.5 -48.5t-102.5 -24.5h-1h-1q-48 0 -102.5 24.5t-86.5 48.5t-83 67q-158 132 -426 338q-37 30 -69 66v-768q0 -13 9.5 -22.5t22.5 -9.5h1472q13 0 22.5 9.5t9.5 22.5zM1664 1083v11v13.5t-0.5 13
+t-3 12.5t-5.5 9t-9 7.5t-14 2.5h-1472q-13 0 -22.5 -9.5t-9.5 -22.5q0 -168 147 -284q193 -152 401 -317q6 -5 35 -29.5t46 -37.5t44.5 -31.5t50.5 -27.5t43 -9h1h1q20 0 43 9t50.5 27.5t44.5 31.5t46 37.5t35 29.5q208 165 401 317q54 43 100.5 115.5t46.5 131.5z
+M1792 1120v-1088q0 -66 -47 -113t-113 -47h-1472q-66 0 -113 47t-47 113v1088q0 66 47 113t113 47h1472q66 0 113 -47t47 -113z" />
+    <glyph glyph-name="heart" unicode="&#xf004;" horiz-adv-x="1792" 
+d="M896 -128q-26 0 -44 18l-624 602q-10 8 -27.5 26t-55.5 65.5t-68 97.5t-53.5 121t-23.5 138q0 220 127 344t351 124q62 0 126.5 -21.5t120 -58t95.5 -68.5t76 -68q36 36 76 68t95.5 68.5t120 58t126.5 21.5q224 0 351 -124t127 -344q0 -221 -229 -450l-623 -600
+q-18 -18 -44 -18z" />
+    <glyph glyph-name="star" unicode="&#xf005;" horiz-adv-x="1664" 
+d="M1664 889q0 -22 -26 -48l-363 -354l86 -500q1 -7 1 -20q0 -21 -10.5 -35.5t-30.5 -14.5q-19 0 -40 12l-449 236l-449 -236q-22 -12 -40 -12q-21 0 -31.5 14.5t-10.5 35.5q0 6 2 20l86 500l-364 354q-25 27 -25 48q0 37 56 46l502 73l225 455q19 41 49 41t49 -41l225 -455
+l502 -73q56 -9 56 -46z" />
+    <glyph glyph-name="star_empty" unicode="&#xf006;" horiz-adv-x="1664" 
+d="M1137 532l306 297l-422 62l-189 382l-189 -382l-422 -62l306 -297l-73 -421l378 199l377 -199zM1664 889q0 -22 -26 -48l-363 -354l86 -500q1 -7 1 -20q0 -50 -41 -50q-19 0 -40 12l-449 236l-449 -236q-22 -12 -40 -12q-21 0 -31.5 14.5t-10.5 35.5q0 6 2 20l86 500
+l-364 354q-25 27 -25 48q0 37 56 46l502 73l225 455q19 41 49 41t49 -41l225 -455l502 -73q56 -9 56 -46z" />
+    <glyph glyph-name="user" unicode="&#xf007;" horiz-adv-x="1280" 
+d="M1280 137q0 -109 -62.5 -187t-150.5 -78h-854q-88 0 -150.5 78t-62.5 187q0 85 8.5 160.5t31.5 152t58.5 131t94 89t134.5 34.5q131 -128 313 -128t313 128q76 0 134.5 -34.5t94 -89t58.5 -131t31.5 -152t8.5 -160.5zM1024 1024q0 -159 -112.5 -271.5t-271.5 -112.5
+t-271.5 112.5t-112.5 271.5t112.5 271.5t271.5 112.5t271.5 -112.5t112.5 -271.5z" />
+    <glyph glyph-name="film" unicode="&#xf008;" horiz-adv-x="1920" 
+d="M384 -64v128q0 26 -19 45t-45 19h-128q-26 0 -45 -19t-19 -45v-128q0 -26 19 -45t45 -19h128q26 0 45 19t19 45zM384 320v128q0 26 -19 45t-45 19h-128q-26 0 -45 -19t-19 -45v-128q0 -26 19 -45t45 -19h128q26 0 45 19t19 45zM384 704v128q0 26 -19 45t-45 19h-128
+q-26 0 -45 -19t-19 -45v-128q0 -26 19 -45t45 -19h128q26 0 45 19t19 45zM1408 -64v512q0 26 -19 45t-45 19h-768q-26 0 -45 -19t-19 -45v-512q0 -26 19 -45t45 -19h768q26 0 45 19t19 45zM384 1088v128q0 26 -19 45t-45 19h-128q-26 0 -45 -19t-19 -45v-128q0 -26 19 -45
+t45 -19h128q26 0 45 19t19 45zM1792 -64v128q0 26 -19 45t-45 19h-128q-26 0 -45 -19t-19 -45v-128q0 -26 19 -45t45 -19h128q26 0 45 19t19 45zM1408 704v512q0 26 -19 45t-45 19h-768q-26 0 -45 -19t-19 -45v-512q0 -26 19 -45t45 -19h768q26 0 45 19t19 45zM1792 320v128
+q0 26 -19 45t-45 19h-128q-26 0 -45 -19t-19 -45v-128q0 -26 19 -45t45 -19h128q26 0 45 19t19 45zM1792 704v128q0 26 -19 45t-45 19h-128q-26 0 -45 -19t-19 -45v-128q0 -26 19 -45t45 -19h128q26 0 45 19t19 45zM1792 1088v128q0 26 -19 45t-45 19h-128q-26 0 -45 -19
+t-19 -45v-128q0 -26 19 -45t45 -19h128q26 0 45 19t19 45zM1920 1248v-1344q0 -66 -47 -113t-113 -47h-1600q-66 0 -113 47t-47 113v1344q0 66 47 113t113 47h1600q66 0 113 -47t47 -113z" />
+    <glyph glyph-name="th_large" unicode="&#xf009;" horiz-adv-x="1664" 
+d="M768 512v-384q0 -52 -38 -90t-90 -38h-512q-52 0 -90 38t-38 90v384q0 52 38 90t90 38h512q52 0 90 -38t38 -90zM768 1280v-384q0 -52 -38 -90t-90 -38h-512q-52 0 -90 38t-38 90v384q0 52 38 90t90 38h512q52 0 90 -38t38 -90zM1664 512v-384q0 -52 -38 -90t-90 -38
+h-512q-52 0 -90 38t-38 90v384q0 52 38 90t90 38h512q52 0 90 -38t38 -90zM1664 1280v-384q0 -52 -38 -90t-90 -38h-512q-52 0 -90 38t-38 90v384q0 52 38 90t90 38h512q52 0 90 -38t38 -90z" />
+    <glyph glyph-name="th" unicode="&#xf00a;" horiz-adv-x="1792" 
+d="M512 288v-192q0 -40 -28 -68t-68 -28h-320q-40 0 -68 28t-28 68v192q0 40 28 68t68 28h320q40 0 68 -28t28 -68zM512 800v-192q0 -40 -28 -68t-68 -28h-320q-40 0 -68 28t-28 68v192q0 40 28 68t68 28h320q40 0 68 -28t28 -68zM1152 288v-192q0 -40 -28 -68t-68 -28h-320
+q-40 0 -68 28t-28 68v192q0 40 28 68t68 28h320q40 0 68 -28t28 -68zM512 1312v-192q0 -40 -28 -68t-68 -28h-320q-40 0 -68 28t-28 68v192q0 40 28 68t68 28h320q40 0 68 -28t28 -68zM1152 800v-192q0 -40 -28 -68t-68 -28h-320q-40 0 -68 28t-28 68v192q0 40 28 68t68 28
+h320q40 0 68 -28t28 -68zM1792 288v-192q0 -40 -28 -68t-68 -28h-320q-40 0 -68 28t-28 68v192q0 40 28 68t68 28h320q40 0 68 -28t28 -68zM1152 1312v-192q0 -40 -28 -68t-68 -28h-320q-40 0 -68 28t-28 68v192q0 40 28 68t68 28h320q40 0 68 -28t28 -68zM1792 800v-192
+q0 -40 -28 -68t-68 -28h-320q-40 0 -68 28t-28 68v192q0 40 28 68t68 28h320q40 0 68 -28t28 -68zM1792 1312v-192q0 -40 -28 -68t-68 -28h-320q-40 0 -68 28t-28 68v192q0 40 28 68t68 28h320q40 0 68 -28t28 -68z" />
+    <glyph glyph-name="th_list" unicode="&#xf00b;" horiz-adv-x="1792" 
+d="M512 288v-192q0 -40 -28 -68t-68 -28h-320q-40 0 -68 28t-28 68v192q0 40 28 68t68 28h320q40 0 68 -28t28 -68zM512 800v-192q0 -40 -28 -68t-68 -28h-320q-40 0 -68 28t-28 68v192q0 40 28 68t68 28h320q40 0 68 -28t28 -68zM1792 288v-192q0 -40 -28 -68t-68 -28h-960
+q-40 0 -68 28t-28 68v192q0 40 28 68t68 28h960q40 0 68 -28t28 -68zM512 1312v-192q0 -40 -28 -68t-68 -28h-320q-40 0 -68 28t-28 68v192q0 40 28 68t68 28h320q40 0 68 -28t28 -68zM1792 800v-192q0 -40 -28 -68t-68 -28h-960q-40 0 -68 28t-28 68v192q0 40 28 68t68 28
+h960q40 0 68 -28t28 -68zM1792 1312v-192q0 -40 -28 -68t-68 -28h-960q-40 0 -68 28t-28 68v192q0 40 28 68t68 28h960q40 0 68 -28t28 -68z" />
+    <glyph glyph-name="ok" unicode="&#xf00c;" horiz-adv-x="1792" 
+d="M1671 970q0 -40 -28 -68l-724 -724l-136 -136q-28 -28 -68 -28t-68 28l-136 136l-362 362q-28 28 -28 68t28 68l136 136q28 28 68 28t68 -28l294 -295l656 657q28 28 68 28t68 -28l136 -136q28 -28 28 -68z" />
+    <glyph glyph-name="remove" unicode="&#xf00d;" horiz-adv-x="1408" 
+d="M1298 214q0 -40 -28 -68l-136 -136q-28 -28 -68 -28t-68 28l-294 294l-294 -294q-28 -28 -68 -28t-68 28l-136 136q-28 28 -28 68t28 68l294 294l-294 294q-28 28 -28 68t28 68l136 136q28 28 68 28t68 -28l294 -294l294 294q28 28 68 28t68 -28l136 -136q28 -28 28 -68
+t-28 -68l-294 -294l294 -294q28 -28 28 -68z" />
+    <glyph glyph-name="zoom_in" unicode="&#xf00e;" horiz-adv-x="1664" 
+d="M1024 736v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-224v-224q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v224h-224q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h224v224q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5v-224h224
+q13 0 22.5 -9.5t9.5 -22.5zM1152 704q0 185 -131.5 316.5t-316.5 131.5t-316.5 -131.5t-131.5 -316.5t131.5 -316.5t316.5 -131.5t316.5 131.5t131.5 316.5zM1664 -128q0 -53 -37.5 -90.5t-90.5 -37.5q-54 0 -90 38l-343 342q-179 -124 -399 -124q-143 0 -273.5 55.5
+t-225 150t-150 225t-55.5 273.5t55.5 273.5t150 225t225 150t273.5 55.5t273.5 -55.5t225 -150t150 -225t55.5 -273.5q0 -220 -124 -399l343 -343q37 -37 37 -90z" />
+    <glyph glyph-name="zoom_out" unicode="&#xf010;" horiz-adv-x="1664" 
+d="M1024 736v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-576q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h576q13 0 22.5 -9.5t9.5 -22.5zM1152 704q0 185 -131.5 316.5t-316.5 131.5t-316.5 -131.5t-131.5 -316.5t131.5 -316.5t316.5 -131.5t316.5 131.5t131.5 316.5z
+M1664 -128q0 -53 -37.5 -90.5t-90.5 -37.5q-54 0 -90 38l-343 342q-179 -124 -399 -124q-143 0 -273.5 55.5t-225 150t-150 225t-55.5 273.5t55.5 273.5t150 225t225 150t273.5 55.5t273.5 -55.5t225 -150t150 -225t55.5 -273.5q0 -220 -124 -399l343 -343q37 -37 37 -90z
+" />
+    <glyph glyph-name="off" unicode="&#xf011;" 
+d="M1536 640q0 -156 -61 -298t-164 -245t-245 -164t-298 -61t-298 61t-245 164t-164 245t-61 298q0 182 80.5 343t226.5 270q43 32 95.5 25t83.5 -50q32 -42 24.5 -94.5t-49.5 -84.5q-98 -74 -151.5 -181t-53.5 -228q0 -104 40.5 -198.5t109.5 -163.5t163.5 -109.5
+t198.5 -40.5t198.5 40.5t163.5 109.5t109.5 163.5t40.5 198.5q0 121 -53.5 228t-151.5 181q-42 32 -49.5 84.5t24.5 94.5q31 43 84 50t95 -25q146 -109 226.5 -270t80.5 -343zM896 1408v-640q0 -52 -38 -90t-90 -38t-90 38t-38 90v640q0 52 38 90t90 38t90 -38t38 -90z" />
+    <glyph glyph-name="signal" unicode="&#xf012;" horiz-adv-x="1792" 
+d="M256 96v-192q0 -14 -9 -23t-23 -9h-192q-14 0 -23 9t-9 23v192q0 14 9 23t23 9h192q14 0 23 -9t9 -23zM640 224v-320q0 -14 -9 -23t-23 -9h-192q-14 0 -23 9t-9 23v320q0 14 9 23t23 9h192q14 0 23 -9t9 -23zM1024 480v-576q0 -14 -9 -23t-23 -9h-192q-14 0 -23 9t-9 23
+v576q0 14 9 23t23 9h192q14 0 23 -9t9 -23zM1408 864v-960q0 -14 -9 -23t-23 -9h-192q-14 0 -23 9t-9 23v960q0 14 9 23t23 9h192q14 0 23 -9t9 -23zM1792 1376v-1472q0 -14 -9 -23t-23 -9h-192q-14 0 -23 9t-9 23v1472q0 14 9 23t23 9h192q14 0 23 -9t9 -23z" />
+    <glyph glyph-name="cog" unicode="&#xf013;" 
+d="M1024 640q0 106 -75 181t-181 75t-181 -75t-75 -181t75 -181t181 -75t181 75t75 181zM1536 749v-222q0 -12 -8 -23t-20 -13l-185 -28q-19 -54 -39 -91q35 -50 107 -138q10 -12 10 -25t-9 -23q-27 -37 -99 -108t-94 -71q-12 0 -26 9l-138 108q-44 -23 -91 -38
+q-16 -136 -29 -186q-7 -28 -36 -28h-222q-14 0 -24.5 8.5t-11.5 21.5l-28 184q-49 16 -90 37l-141 -107q-10 -9 -25 -9q-14 0 -25 11q-126 114 -165 168q-7 10 -7 23q0 12 8 23q15 21 51 66.5t54 70.5q-27 50 -41 99l-183 27q-13 2 -21 12.5t-8 23.5v222q0 12 8 23t19 13
+l186 28q14 46 39 92q-40 57 -107 138q-10 12 -10 24q0 10 9 23q26 36 98.5 107.5t94.5 71.5q13 0 26 -10l138 -107q44 23 91 38q16 136 29 186q7 28 36 28h222q14 0 24.5 -8.5t11.5 -21.5l28 -184q49 -16 90 -37l142 107q9 9 24 9q13 0 25 -10q129 -119 165 -170q7 -8 7 -22
+q0 -12 -8 -23q-15 -21 -51 -66.5t-54 -70.5q26 -50 41 -98l183 -28q13 -2 21 -12.5t8 -23.5z" />
+    <glyph glyph-name="trash" unicode="&#xf014;" horiz-adv-x="1408" 
+d="M512 800v-576q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9t-9 23v576q0 14 9 23t23 9h64q14 0 23 -9t9 -23zM768 800v-576q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9t-9 23v576q0 14 9 23t23 9h64q14 0 23 -9t9 -23zM1024 800v-576q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9t-9 23v576
+q0 14 9 23t23 9h64q14 0 23 -9t9 -23zM1152 76v948h-896v-948q0 -22 7 -40.5t14.5 -27t10.5 -8.5h832q3 0 10.5 8.5t14.5 27t7 40.5zM480 1152h448l-48 117q-7 9 -17 11h-317q-10 -2 -17 -11zM1408 1120v-64q0 -14 -9 -23t-23 -9h-96v-948q0 -83 -47 -143.5t-113 -60.5h-832
+q-66 0 -113 58.5t-47 141.5v952h-96q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h309l70 167q15 37 54 63t79 26h320q40 0 79 -26t54 -63l70 -167h309q14 0 23 -9t9 -23z" />
+    <glyph glyph-name="home" unicode="&#xf015;" horiz-adv-x="1664" 
+d="M1408 544v-480q0 -26 -19 -45t-45 -19h-384v384h-256v-384h-384q-26 0 -45 19t-19 45v480q0 1 0.5 3t0.5 3l575 474l575 -474q1 -2 1 -6zM1631 613l-62 -74q-8 -9 -21 -11h-3q-13 0 -21 7l-692 577l-692 -577q-12 -8 -24 -7q-13 2 -21 11l-62 74q-8 10 -7 23.5t11 21.5
+l719 599q32 26 76 26t76 -26l244 -204v195q0 14 9 23t23 9h192q14 0 23 -9t9 -23v-408l219 -182q10 -8 11 -21.5t-7 -23.5z" />
+    <glyph glyph-name="file_alt" unicode="&#xf016;" 
+d="M1468 1156q28 -28 48 -76t20 -88v-1152q0 -40 -28 -68t-68 -28h-1344q-40 0 -68 28t-28 68v1600q0 40 28 68t68 28h896q40 0 88 -20t76 -48zM1024 1400v-376h376q-10 29 -22 41l-313 313q-12 12 -41 22zM1408 -128v1024h-416q-40 0 -68 28t-28 68v416h-768v-1536h1280z
+" />
+    <glyph glyph-name="time" unicode="&#xf017;" 
+d="M896 992v-448q0 -14 -9 -23t-23 -9h-320q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h224v352q0 14 9 23t23 9h64q14 0 23 -9t9 -23zM1312 640q0 148 -73 273t-198 198t-273 73t-273 -73t-198 -198t-73 -273t73 -273t198 -198t273 -73t273 73t198 198t73 273zM1536 640
+q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="road" unicode="&#xf018;" horiz-adv-x="1920" 
+d="M1111 540v4l-24 320q-1 13 -11 22.5t-23 9.5h-186q-13 0 -23 -9.5t-11 -22.5l-24 -320v-4q-1 -12 8 -20t21 -8h244q12 0 21 8t8 20zM1870 73q0 -73 -46 -73h-704q13 0 22 9.5t8 22.5l-20 256q-1 13 -11 22.5t-23 9.5h-272q-13 0 -23 -9.5t-11 -22.5l-20 -256
+q-1 -13 8 -22.5t22 -9.5h-704q-46 0 -46 73q0 54 26 116l417 1044q8 19 26 33t38 14h339q-13 0 -23 -9.5t-11 -22.5l-15 -192q-1 -14 8 -23t22 -9h166q13 0 22 9t8 23l-15 192q-1 13 -11 22.5t-23 9.5h339q20 0 38 -14t26 -33l417 -1044q26 -62 26 -116z" />
+    <glyph glyph-name="download_alt" unicode="&#xf019;" horiz-adv-x="1664" 
+d="M1280 192q0 26 -19 45t-45 19t-45 -19t-19 -45t19 -45t45 -19t45 19t19 45zM1536 192q0 26 -19 45t-45 19t-45 -19t-19 -45t19 -45t45 -19t45 19t19 45zM1664 416v-320q0 -40 -28 -68t-68 -28h-1472q-40 0 -68 28t-28 68v320q0 40 28 68t68 28h465l135 -136
+q58 -56 136 -56t136 56l136 136h464q40 0 68 -28t28 -68zM1339 985q17 -41 -14 -70l-448 -448q-18 -19 -45 -19t-45 19l-448 448q-31 29 -14 70q17 39 59 39h256v448q0 26 19 45t45 19h256q26 0 45 -19t19 -45v-448h256q42 0 59 -39z" />
+    <glyph glyph-name="download" unicode="&#xf01a;" 
+d="M1120 608q0 -12 -10 -24l-319 -319q-11 -9 -23 -9t-23 9l-320 320q-15 16 -7 35q8 20 30 20h192v352q0 14 9 23t23 9h192q14 0 23 -9t9 -23v-352h192q14 0 23 -9t9 -23zM768 1184q-148 0 -273 -73t-198 -198t-73 -273t73 -273t198 -198t273 -73t273 73t198 198t73 273
+t-73 273t-198 198t-273 73zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="upload" unicode="&#xf01b;" 
+d="M1118 660q-8 -20 -30 -20h-192v-352q0 -14 -9 -23t-23 -9h-192q-14 0 -23 9t-9 23v352h-192q-14 0 -23 9t-9 23q0 12 10 24l319 319q11 9 23 9t23 -9l320 -320q15 -16 7 -35zM768 1184q-148 0 -273 -73t-198 -198t-73 -273t73 -273t198 -198t273 -73t273 73t198 198
+t73 273t-73 273t-198 198t-273 73zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="inbox" unicode="&#xf01c;" 
+d="M1023 576h316q-1 3 -2.5 8.5t-2.5 7.5l-212 496h-708l-212 -496q-1 -3 -2.5 -8.5t-2.5 -7.5h316l95 -192h320zM1536 546v-482q0 -26 -19 -45t-45 -19h-1408q-26 0 -45 19t-19 45v482q0 62 25 123l238 552q10 25 36.5 42t52.5 17h832q26 0 52.5 -17t36.5 -42l238 -552
+q25 -61 25 -123z" />
+    <glyph glyph-name="play_circle" unicode="&#xf01d;" 
+d="M1184 640q0 -37 -32 -55l-544 -320q-15 -9 -32 -9q-16 0 -32 8q-32 19 -32 56v640q0 37 32 56q33 18 64 -1l544 -320q32 -18 32 -55zM1312 640q0 148 -73 273t-198 198t-273 73t-273 -73t-198 -198t-73 -273t73 -273t198 -198t273 -73t273 73t198 198t73 273zM1536 640
+q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="repeat" unicode="&#xf01e;" 
+d="M1536 1280v-448q0 -26 -19 -45t-45 -19h-448q-42 0 -59 40q-17 39 14 69l138 138q-148 137 -349 137q-104 0 -198.5 -40.5t-163.5 -109.5t-109.5 -163.5t-40.5 -198.5t40.5 -198.5t109.5 -163.5t163.5 -109.5t198.5 -40.5q119 0 225 52t179 147q7 10 23 12q15 0 25 -9
+l137 -138q9 -8 9.5 -20.5t-7.5 -22.5q-109 -132 -264 -204.5t-327 -72.5q-156 0 -298 61t-245 164t-164 245t-61 298t61 298t164 245t245 164t298 61q147 0 284.5 -55.5t244.5 -156.5l130 129q29 31 70 14q39 -17 39 -59z" />
+    <glyph glyph-name="refresh" unicode="&#xf021;" 
+d="M1511 480q0 -5 -1 -7q-64 -268 -268 -434.5t-478 -166.5q-146 0 -282.5 55t-243.5 157l-129 -129q-19 -19 -45 -19t-45 19t-19 45v448q0 26 19 45t45 19h448q26 0 45 -19t19 -45t-19 -45l-137 -137q71 -66 161 -102t187 -36q134 0 250 65t186 179q11 17 53 117
+q8 23 30 23h192q13 0 22.5 -9.5t9.5 -22.5zM1536 1280v-448q0 -26 -19 -45t-45 -19h-448q-26 0 -45 19t-19 45t19 45l138 138q-148 137 -349 137q-134 0 -250 -65t-186 -179q-11 -17 -53 -117q-8 -23 -30 -23h-199q-13 0 -22.5 9.5t-9.5 22.5v7q65 268 270 434.5t480 166.5
+q146 0 284 -55.5t245 -156.5l130 129q19 19 45 19t45 -19t19 -45z" />
+    <glyph glyph-name="list_alt" unicode="&#xf022;" horiz-adv-x="1792" 
+d="M384 352v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5zM384 608v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5z
+M384 864v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5zM1536 352v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-960q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h960q13 0 22.5 -9.5t9.5 -22.5z
+M1536 608v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-960q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h960q13 0 22.5 -9.5t9.5 -22.5zM1536 864v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-960q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h960q13 0 22.5 -9.5
+t9.5 -22.5zM1664 160v832q0 13 -9.5 22.5t-22.5 9.5h-1472q-13 0 -22.5 -9.5t-9.5 -22.5v-832q0 -13 9.5 -22.5t22.5 -9.5h1472q13 0 22.5 9.5t9.5 22.5zM1792 1248v-1088q0 -66 -47 -113t-113 -47h-1472q-66 0 -113 47t-47 113v1088q0 66 47 113t113 47h1472q66 0 113 -47
+t47 -113z" />
+    <glyph glyph-name="lock" unicode="&#xf023;" horiz-adv-x="1152" 
+d="M320 768h512v192q0 106 -75 181t-181 75t-181 -75t-75 -181v-192zM1152 672v-576q0 -40 -28 -68t-68 -28h-960q-40 0 -68 28t-28 68v576q0 40 28 68t68 28h32v192q0 184 132 316t316 132t316 -132t132 -316v-192h32q40 0 68 -28t28 -68z" />
+    <glyph glyph-name="flag" unicode="&#xf024;" horiz-adv-x="1792" 
+d="M320 1280q0 -72 -64 -110v-1266q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v1266q-64 38 -64 110q0 53 37.5 90.5t90.5 37.5t90.5 -37.5t37.5 -90.5zM1792 1216v-763q0 -25 -12.5 -38.5t-39.5 -27.5q-215 -116 -369 -116q-61 0 -123.5 22t-108.5 48
+t-115.5 48t-142.5 22q-192 0 -464 -146q-17 -9 -33 -9q-26 0 -45 19t-19 45v742q0 32 31 55q21 14 79 43q236 120 421 120q107 0 200 -29t219 -88q38 -19 88 -19q54 0 117.5 21t110 47t88 47t54.5 21q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="headphones" unicode="&#xf025;" horiz-adv-x="1664" 
+d="M1664 650q0 -166 -60 -314l-20 -49l-185 -33q-22 -83 -90.5 -136.5t-156.5 -53.5v-32q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9t-9 23v576q0 14 9 23t23 9h64q14 0 23 -9t9 -23v-32q71 0 130 -35.5t93 -95.5l68 12q29 95 29 193q0 148 -88 279t-236.5 209t-315.5 78
+t-315.5 -78t-236.5 -209t-88 -279q0 -98 29 -193l68 -12q34 60 93 95.5t130 35.5v32q0 14 9 23t23 9h64q14 0 23 -9t9 -23v-576q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9t-9 23v32q-88 0 -156.5 53.5t-90.5 136.5l-185 33l-20 49q-60 148 -60 314q0 151 67 291t179 242.5
+t266 163.5t320 61t320 -61t266 -163.5t179 -242.5t67 -291z" />
+    <glyph glyph-name="volume_off" unicode="&#xf026;" horiz-adv-x="768" 
+d="M768 1184v-1088q0 -26 -19 -45t-45 -19t-45 19l-333 333h-262q-26 0 -45 19t-19 45v384q0 26 19 45t45 19h262l333 333q19 19 45 19t45 -19t19 -45z" />
+    <glyph glyph-name="volume_down" unicode="&#xf027;" horiz-adv-x="1152" 
+d="M768 1184v-1088q0 -26 -19 -45t-45 -19t-45 19l-333 333h-262q-26 0 -45 19t-19 45v384q0 26 19 45t45 19h262l333 333q19 19 45 19t45 -19t19 -45zM1152 640q0 -76 -42.5 -141.5t-112.5 -93.5q-10 -5 -25 -5q-26 0 -45 18.5t-19 45.5q0 21 12 35.5t29 25t34 23t29 36
+t12 56.5t-12 56.5t-29 36t-34 23t-29 25t-12 35.5q0 27 19 45.5t45 18.5q15 0 25 -5q70 -27 112.5 -93t42.5 -142z" />
+    <glyph glyph-name="volume_up" unicode="&#xf028;" horiz-adv-x="1664" 
+d="M768 1184v-1088q0 -26 -19 -45t-45 -19t-45 19l-333 333h-262q-26 0 -45 19t-19 45v384q0 26 19 45t45 19h262l333 333q19 19 45 19t45 -19t19 -45zM1152 640q0 -76 -42.5 -141.5t-112.5 -93.5q-10 -5 -25 -5q-26 0 -45 18.5t-19 45.5q0 21 12 35.5t29 25t34 23t29 36
+t12 56.5t-12 56.5t-29 36t-34 23t-29 25t-12 35.5q0 27 19 45.5t45 18.5q15 0 25 -5q70 -27 112.5 -93t42.5 -142zM1408 640q0 -153 -85 -282.5t-225 -188.5q-13 -5 -25 -5q-27 0 -46 19t-19 45q0 39 39 59q56 29 76 44q74 54 115.5 135.5t41.5 173.5t-41.5 173.5
+t-115.5 135.5q-20 15 -76 44q-39 20 -39 59q0 26 19 45t45 19q13 0 26 -5q140 -59 225 -188.5t85 -282.5zM1664 640q0 -230 -127 -422.5t-338 -283.5q-13 -5 -26 -5q-26 0 -45 19t-19 45q0 36 39 59q7 4 22.5 10.5t22.5 10.5q46 25 82 51q123 91 192 227t69 289t-69 289
+t-192 227q-36 26 -82 51q-7 4 -22.5 10.5t-22.5 10.5q-39 23 -39 59q0 26 19 45t45 19q13 0 26 -5q211 -91 338 -283.5t127 -422.5z" />
+    <glyph glyph-name="qrcode" unicode="&#xf029;" horiz-adv-x="1408" 
+d="M384 384v-128h-128v128h128zM384 1152v-128h-128v128h128zM1152 1152v-128h-128v128h128zM128 129h384v383h-384v-383zM128 896h384v384h-384v-384zM896 896h384v384h-384v-384zM640 640v-640h-640v640h640zM1152 128v-128h-128v128h128zM1408 128v-128h-128v128h128z
+M1408 640v-384h-384v128h-128v-384h-128v640h384v-128h128v128h128zM640 1408v-640h-640v640h640zM1408 1408v-640h-640v640h640z" />
+    <glyph glyph-name="barcode" unicode="&#xf02a;" horiz-adv-x="1792" 
+d="M63 0h-63v1408h63v-1408zM126 1h-32v1407h32v-1407zM220 1h-31v1407h31v-1407zM377 1h-31v1407h31v-1407zM534 1h-62v1407h62v-1407zM660 1h-31v1407h31v-1407zM723 1h-31v1407h31v-1407zM786 1h-31v1407h31v-1407zM943 1h-63v1407h63v-1407zM1100 1h-63v1407h63v-1407z
+M1226 1h-63v1407h63v-1407zM1352 1h-63v1407h63v-1407zM1446 1h-63v1407h63v-1407zM1635 1h-94v1407h94v-1407zM1698 1h-32v1407h32v-1407zM1792 0h-63v1408h63v-1408z" />
+    <glyph glyph-name="tag" unicode="&#xf02b;" 
+d="M448 1088q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM1515 512q0 -53 -37 -90l-491 -492q-39 -37 -91 -37q-53 0 -90 37l-715 716q-38 37 -64.5 101t-26.5 117v416q0 52 38 90t90 38h416q53 0 117 -26.5t102 -64.5
+l715 -714q37 -39 37 -91z" />
+    <glyph glyph-name="tags" unicode="&#xf02c;" horiz-adv-x="1920" 
+d="M448 1088q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM1515 512q0 -53 -37 -90l-491 -492q-39 -37 -91 -37q-53 0 -90 37l-715 716q-38 37 -64.5 101t-26.5 117v416q0 52 38 90t90 38h416q53 0 117 -26.5t102 -64.5
+l715 -714q37 -39 37 -91zM1899 512q0 -53 -37 -90l-491 -492q-39 -37 -91 -37q-36 0 -59 14t-53 45l470 470q37 37 37 90q0 52 -37 91l-715 714q-38 38 -102 64.5t-117 26.5h224q53 0 117 -26.5t102 -64.5l715 -714q37 -39 37 -91z" />
+    <glyph glyph-name="book" unicode="&#xf02d;" horiz-adv-x="1664" 
+d="M1639 1058q40 -57 18 -129l-275 -906q-19 -64 -76.5 -107.5t-122.5 -43.5h-923q-77 0 -148.5 53.5t-99.5 131.5q-24 67 -2 127q0 4 3 27t4 37q1 8 -3 21.5t-3 19.5q2 11 8 21t16.5 23.5t16.5 23.5q23 38 45 91.5t30 91.5q3 10 0.5 30t-0.5 28q3 11 17 28t17 23
+q21 36 42 92t25 90q1 9 -2.5 32t0.5 28q4 13 22 30.5t22 22.5q19 26 42.5 84.5t27.5 96.5q1 8 -3 25.5t-2 26.5q2 8 9 18t18 23t17 21q8 12 16.5 30.5t15 35t16 36t19.5 32t26.5 23.5t36 11.5t47.5 -5.5l-1 -3q38 9 51 9h761q74 0 114 -56t18 -130l-274 -906
+q-36 -119 -71.5 -153.5t-128.5 -34.5h-869q-27 0 -38 -15q-11 -16 -1 -43q24 -70 144 -70h923q29 0 56 15.5t35 41.5l300 987q7 22 5 57q38 -15 59 -43zM575 1056q-4 -13 2 -22.5t20 -9.5h608q13 0 25.5 9.5t16.5 22.5l21 64q4 13 -2 22.5t-20 9.5h-608q-13 0 -25.5 -9.5
+t-16.5 -22.5zM492 800q-4 -13 2 -22.5t20 -9.5h608q13 0 25.5 9.5t16.5 22.5l21 64q4 13 -2 22.5t-20 9.5h-608q-13 0 -25.5 -9.5t-16.5 -22.5z" />
+    <glyph glyph-name="bookmark" unicode="&#xf02e;" horiz-adv-x="1280" 
+d="M1164 1408q23 0 44 -9q33 -13 52.5 -41t19.5 -62v-1289q0 -34 -19.5 -62t-52.5 -41q-19 -8 -44 -8q-48 0 -83 32l-441 424l-441 -424q-36 -33 -83 -33q-23 0 -44 9q-33 13 -52.5 41t-19.5 62v1289q0 34 19.5 62t52.5 41q21 9 44 9h1048z" />
+    <glyph glyph-name="print" unicode="&#xf02f;" horiz-adv-x="1664" 
+d="M384 0h896v256h-896v-256zM384 640h896v384h-160q-40 0 -68 28t-28 68v160h-640v-640zM1536 576q0 26 -19 45t-45 19t-45 -19t-19 -45t19 -45t45 -19t45 19t19 45zM1664 576v-416q0 -13 -9.5 -22.5t-22.5 -9.5h-224v-160q0 -40 -28 -68t-68 -28h-960q-40 0 -68 28t-28 68
+v160h-224q-13 0 -22.5 9.5t-9.5 22.5v416q0 79 56.5 135.5t135.5 56.5h64v544q0 40 28 68t68 28h672q40 0 88 -20t76 -48l152 -152q28 -28 48 -76t20 -88v-256h64q79 0 135.5 -56.5t56.5 -135.5z" />
+    <glyph glyph-name="camera" unicode="&#xf030;" horiz-adv-x="1920" 
+d="M960 864q119 0 203.5 -84.5t84.5 -203.5t-84.5 -203.5t-203.5 -84.5t-203.5 84.5t-84.5 203.5t84.5 203.5t203.5 84.5zM1664 1280q106 0 181 -75t75 -181v-896q0 -106 -75 -181t-181 -75h-1408q-106 0 -181 75t-75 181v896q0 106 75 181t181 75h224l51 136
+q19 49 69.5 84.5t103.5 35.5h512q53 0 103.5 -35.5t69.5 -84.5l51 -136h224zM960 128q185 0 316.5 131.5t131.5 316.5t-131.5 316.5t-316.5 131.5t-316.5 -131.5t-131.5 -316.5t131.5 -316.5t316.5 -131.5z" />
+    <glyph glyph-name="font" unicode="&#xf031;" horiz-adv-x="1664" 
+d="M725 977l-170 -450q33 0 136.5 -2t160.5 -2q19 0 57 2q-87 253 -184 452zM0 -128l2 79q23 7 56 12.5t57 10.5t49.5 14.5t44.5 29t31 50.5l237 616l280 724h75h53q8 -14 11 -21l205 -480q33 -78 106 -257.5t114 -274.5q15 -34 58 -144.5t72 -168.5q20 -45 35 -57
+q19 -15 88 -29.5t84 -20.5q6 -38 6 -57q0 -5 -0.5 -13.5t-0.5 -12.5q-63 0 -190 8t-191 8q-76 0 -215 -7t-178 -8q0 43 4 78l131 28q1 0 12.5 2.5t15.5 3.5t14.5 4.5t15 6.5t11 8t9 11t2.5 14q0 16 -31 96.5t-72 177.5t-42 100l-450 2q-26 -58 -76.5 -195.5t-50.5 -162.5
+q0 -22 14 -37.5t43.5 -24.5t48.5 -13.5t57 -8.5t41 -4q1 -19 1 -58q0 -9 -2 -27q-58 0 -174.5 10t-174.5 10q-8 0 -26.5 -4t-21.5 -4q-80 -14 -188 -14z" />
+    <glyph glyph-name="bold" unicode="&#xf032;" horiz-adv-x="1408" 
+d="M555 15q74 -32 140 -32q376 0 376 335q0 114 -41 180q-27 44 -61.5 74t-67.5 46.5t-80.5 25t-84 10.5t-94.5 2q-73 0 -101 -10q0 -53 -0.5 -159t-0.5 -158q0 -8 -1 -67.5t-0.5 -96.5t4.5 -83.5t12 -66.5zM541 761q42 -7 109 -7q82 0 143 13t110 44.5t74.5 89.5t25.5 142
+q0 70 -29 122.5t-79 82t-108 43.5t-124 14q-50 0 -130 -13q0 -50 4 -151t4 -152q0 -27 -0.5 -80t-0.5 -79q0 -46 1 -69zM0 -128l2 94q15 4 85 16t106 27q7 12 12.5 27t8.5 33.5t5.5 32.5t3 37.5t0.5 34v35.5v30q0 982 -22 1025q-4 8 -22 14.5t-44.5 11t-49.5 7t-48.5 4.5
+t-30.5 3l-4 83q98 2 340 11.5t373 9.5q23 0 68 -0.5t68 -0.5q70 0 136.5 -13t128.5 -42t108 -71t74 -104.5t28 -137.5q0 -52 -16.5 -95.5t-39 -72t-64.5 -57.5t-73 -45t-84 -40q154 -35 256.5 -134t102.5 -248q0 -100 -35 -179.5t-93.5 -130.5t-138 -85.5t-163.5 -48.5
+t-176 -14q-44 0 -132 3t-132 3q-106 0 -307 -11t-231 -12z" />
+    <glyph glyph-name="italic" unicode="&#xf033;" horiz-adv-x="1024" 
+d="M0 -126l17 85q22 7 61.5 16.5t72 19t59.5 23.5q28 35 41 101q1 7 62 289t114 543.5t52 296.5v25q-24 13 -54.5 18.5t-69.5 8t-58 5.5l19 103q33 -2 120 -6.5t149.5 -7t120.5 -2.5q48 0 98.5 2.5t121 7t98.5 6.5q-5 -39 -19 -89q-30 -10 -101.5 -28.5t-108.5 -33.5
+q-8 -19 -14 -42.5t-9 -40t-7.5 -45.5t-6.5 -42q-27 -148 -87.5 -419.5t-77.5 -355.5q-2 -9 -13 -58t-20 -90t-16 -83.5t-6 -57.5l1 -18q17 -4 185 -31q-3 -44 -16 -99q-11 0 -32.5 -1.5t-32.5 -1.5q-29 0 -87 10t-86 10q-138 2 -206 2q-51 0 -143 -9t-121 -11z" />
+    <glyph glyph-name="text_height" unicode="&#xf034;" horiz-adv-x="1792" 
+d="M1744 128q33 0 42 -18.5t-11 -44.5l-126 -162q-20 -26 -49 -26t-49 26l-126 162q-20 26 -11 44.5t42 18.5h80v1024h-80q-33 0 -42 18.5t11 44.5l126 162q20 26 49 26t49 -26l126 -162q20 -26 11 -44.5t-42 -18.5h-80v-1024h80zM81 1407l54 -27q12 -5 211 -5q44 0 132 2
+t132 2q36 0 107.5 -0.5t107.5 -0.5h293q6 0 21 -0.5t20.5 0t16 3t17.5 9t15 17.5l42 1q4 0 14 -0.5t14 -0.5q2 -112 2 -336q0 -80 -5 -109q-39 -14 -68 -18q-25 44 -54 128q-3 9 -11 48t-14.5 73.5t-7.5 35.5q-6 8 -12 12.5t-15.5 6t-13 2.5t-18 0.5t-16.5 -0.5
+q-17 0 -66.5 0.5t-74.5 0.5t-64 -2t-71 -6q-9 -81 -8 -136q0 -94 2 -388t2 -455q0 -16 -2.5 -71.5t0 -91.5t12.5 -69q40 -21 124 -42.5t120 -37.5q5 -40 5 -50q0 -14 -3 -29l-34 -1q-76 -2 -218 8t-207 10q-50 0 -151 -9t-152 -9q-3 51 -3 52v9q17 27 61.5 43t98.5 29t78 27
+q19 42 19 383q0 101 -3 303t-3 303v117q0 2 0.5 15.5t0.5 25t-1 25.5t-3 24t-5 14q-11 12 -162 12q-33 0 -93 -12t-80 -26q-19 -13 -34 -72.5t-31.5 -111t-42.5 -53.5q-42 26 -56 44v383z" />
+    <glyph glyph-name="text_width" unicode="&#xf035;" 
+d="M81 1407l54 -27q12 -5 211 -5q44 0 132 2t132 2q70 0 246.5 1t304.5 0.5t247 -4.5q33 -1 56 31l42 1q4 0 14 -0.5t14 -0.5q2 -112 2 -336q0 -80 -5 -109q-39 -14 -68 -18q-25 44 -54 128q-3 9 -11 47.5t-15 73.5t-7 36q-10 13 -27 19q-5 2 -66 2q-30 0 -93 1t-103 1
+t-94 -2t-96 -7q-9 -81 -8 -136l1 -152v52q0 -55 1 -154t1.5 -180t0.5 -153q0 -16 -2.5 -71.5t0 -91.5t12.5 -69q40 -21 124 -42.5t120 -37.5q5 -40 5 -50q0 -14 -3 -29l-34 -1q-76 -2 -218 8t-207 10q-50 0 -151 -9t-152 -9q-3 51 -3 52v9q17 27 61.5 43t98.5 29t78 27
+q7 16 11.5 74t6 145.5t1.5 155t-0.5 153.5t-0.5 89q0 7 -2.5 21.5t-2.5 22.5q0 7 0.5 44t1 73t0 76.5t-3 67.5t-6.5 32q-11 12 -162 12q-41 0 -163 -13.5t-138 -24.5q-19 -12 -34 -71.5t-31.5 -111.5t-42.5 -54q-42 26 -56 44v383zM1310 125q12 0 42 -19.5t57.5 -41.5
+t59.5 -49t36 -30q26 -21 26 -49t-26 -49q-4 -3 -36 -30t-59.5 -49t-57.5 -41.5t-42 -19.5q-13 0 -20.5 10.5t-10 28.5t-2.5 33.5t1.5 33t1.5 19.5h-1024q0 -2 1.5 -19.5t1.5 -33t-2.5 -33.5t-10 -28.5t-20.5 -10.5q-12 0 -42 19.5t-57.5 41.5t-59.5 49t-36 30q-26 21 -26 49
+t26 49q4 3 36 30t59.5 49t57.5 41.5t42 19.5q13 0 20.5 -10.5t10 -28.5t2.5 -33.5t-1.5 -33t-1.5 -19.5h1024q0 2 -1.5 19.5t-1.5 33t2.5 33.5t10 28.5t20.5 10.5z" />
+    <glyph glyph-name="align_left" unicode="&#xf036;" horiz-adv-x="1792" 
+d="M1792 192v-128q0 -26 -19 -45t-45 -19h-1664q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h1664q26 0 45 -19t19 -45zM1408 576v-128q0 -26 -19 -45t-45 -19h-1280q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h1280q26 0 45 -19t19 -45zM1664 960v-128q0 -26 -19 -45
+t-45 -19h-1536q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h1536q26 0 45 -19t19 -45zM1280 1344v-128q0 -26 -19 -45t-45 -19h-1152q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h1152q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="align_center" unicode="&#xf037;" horiz-adv-x="1792" 
+d="M1792 192v-128q0 -26 -19 -45t-45 -19h-1664q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h1664q26 0 45 -19t19 -45zM1408 576v-128q0 -26 -19 -45t-45 -19h-896q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h896q26 0 45 -19t19 -45zM1664 960v-128q0 -26 -19 -45t-45 -19
+h-1408q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h1408q26 0 45 -19t19 -45zM1280 1344v-128q0 -26 -19 -45t-45 -19h-640q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h640q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="align_right" unicode="&#xf038;" horiz-adv-x="1792" 
+d="M1792 192v-128q0 -26 -19 -45t-45 -19h-1664q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h1664q26 0 45 -19t19 -45zM1792 576v-128q0 -26 -19 -45t-45 -19h-1280q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h1280q26 0 45 -19t19 -45zM1792 960v-128q0 -26 -19 -45
+t-45 -19h-1536q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h1536q26 0 45 -19t19 -45zM1792 1344v-128q0 -26 -19 -45t-45 -19h-1152q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h1152q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="align_justify" unicode="&#xf039;" horiz-adv-x="1792" 
+d="M1792 192v-128q0 -26 -19 -45t-45 -19h-1664q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h1664q26 0 45 -19t19 -45zM1792 576v-128q0 -26 -19 -45t-45 -19h-1664q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h1664q26 0 45 -19t19 -45zM1792 960v-128q0 -26 -19 -45
+t-45 -19h-1664q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h1664q26 0 45 -19t19 -45zM1792 1344v-128q0 -26 -19 -45t-45 -19h-1664q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h1664q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="list" unicode="&#xf03a;" horiz-adv-x="1792" 
+d="M256 224v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-192q-13 0 -22.5 9.5t-9.5 22.5v192q0 13 9.5 22.5t22.5 9.5h192q13 0 22.5 -9.5t9.5 -22.5zM256 608v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-192q-13 0 -22.5 9.5t-9.5 22.5v192q0 13 9.5 22.5t22.5 9.5h192q13 0 22.5 -9.5
+t9.5 -22.5zM256 992v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-192q-13 0 -22.5 9.5t-9.5 22.5v192q0 13 9.5 22.5t22.5 9.5h192q13 0 22.5 -9.5t9.5 -22.5zM1792 224v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-1344q-13 0 -22.5 9.5t-9.5 22.5v192q0 13 9.5 22.5t22.5 9.5h1344
+q13 0 22.5 -9.5t9.5 -22.5zM256 1376v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-192q-13 0 -22.5 9.5t-9.5 22.5v192q0 13 9.5 22.5t22.5 9.5h192q13 0 22.5 -9.5t9.5 -22.5zM1792 608v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-1344q-13 0 -22.5 9.5t-9.5 22.5v192q0 13 9.5 22.5
+t22.5 9.5h1344q13 0 22.5 -9.5t9.5 -22.5zM1792 992v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-1344q-13 0 -22.5 9.5t-9.5 22.5v192q0 13 9.5 22.5t22.5 9.5h1344q13 0 22.5 -9.5t9.5 -22.5zM1792 1376v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-1344q-13 0 -22.5 9.5t-9.5 22.5v192
+q0 13 9.5 22.5t22.5 9.5h1344q13 0 22.5 -9.5t9.5 -22.5z" />
+    <glyph glyph-name="indent_left" unicode="&#xf03b;" horiz-adv-x="1792" 
+d="M384 992v-576q0 -13 -9.5 -22.5t-22.5 -9.5q-14 0 -23 9l-288 288q-9 9 -9 23t9 23l288 288q9 9 23 9q13 0 22.5 -9.5t9.5 -22.5zM1792 224v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-1728q-13 0 -22.5 9.5t-9.5 22.5v192q0 13 9.5 22.5t22.5 9.5h1728q13 0 22.5 -9.5
+t9.5 -22.5zM1792 608v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-1088q-13 0 -22.5 9.5t-9.5 22.5v192q0 13 9.5 22.5t22.5 9.5h1088q13 0 22.5 -9.5t9.5 -22.5zM1792 992v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-1088q-13 0 -22.5 9.5t-9.5 22.5v192q0 13 9.5 22.5t22.5 9.5h1088
+q13 0 22.5 -9.5t9.5 -22.5zM1792 1376v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-1728q-13 0 -22.5 9.5t-9.5 22.5v192q0 13 9.5 22.5t22.5 9.5h1728q13 0 22.5 -9.5t9.5 -22.5z" />
+    <glyph glyph-name="indent_right" unicode="&#xf03c;" horiz-adv-x="1792" 
+d="M352 704q0 -14 -9 -23l-288 -288q-9 -9 -23 -9q-13 0 -22.5 9.5t-9.5 22.5v576q0 13 9.5 22.5t22.5 9.5q14 0 23 -9l288 -288q9 -9 9 -23zM1792 224v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-1728q-13 0 -22.5 9.5t-9.5 22.5v192q0 13 9.5 22.5t22.5 9.5h1728q13 0 22.5 -9.5
+t9.5 -22.5zM1792 608v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-1088q-13 0 -22.5 9.5t-9.5 22.5v192q0 13 9.5 22.5t22.5 9.5h1088q13 0 22.5 -9.5t9.5 -22.5zM1792 992v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-1088q-13 0 -22.5 9.5t-9.5 22.5v192q0 13 9.5 22.5t22.5 9.5h1088
+q13 0 22.5 -9.5t9.5 -22.5zM1792 1376v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-1728q-13 0 -22.5 9.5t-9.5 22.5v192q0 13 9.5 22.5t22.5 9.5h1728q13 0 22.5 -9.5t9.5 -22.5z" />
+    <glyph glyph-name="facetime_video" unicode="&#xf03d;" horiz-adv-x="1792" 
+d="M1792 1184v-1088q0 -42 -39 -59q-13 -5 -25 -5q-27 0 -45 19l-403 403v-166q0 -119 -84.5 -203.5t-203.5 -84.5h-704q-119 0 -203.5 84.5t-84.5 203.5v704q0 119 84.5 203.5t203.5 84.5h704q119 0 203.5 -84.5t84.5 -203.5v-165l403 402q18 19 45 19q12 0 25 -5
+q39 -17 39 -59z" />
+    <glyph glyph-name="picture" unicode="&#xf03e;" horiz-adv-x="1920" 
+d="M640 960q0 -80 -56 -136t-136 -56t-136 56t-56 136t56 136t136 56t136 -56t56 -136zM1664 576v-448h-1408v192l320 320l160 -160l512 512zM1760 1280h-1600q-13 0 -22.5 -9.5t-9.5 -22.5v-1216q0 -13 9.5 -22.5t22.5 -9.5h1600q13 0 22.5 9.5t9.5 22.5v1216
+q0 13 -9.5 22.5t-22.5 9.5zM1920 1248v-1216q0 -66 -47 -113t-113 -47h-1600q-66 0 -113 47t-47 113v1216q0 66 47 113t113 47h1600q66 0 113 -47t47 -113z" />
+    <glyph glyph-name="pencil" unicode="&#xf040;" 
+d="M363 0l91 91l-235 235l-91 -91v-107h128v-128h107zM886 928q0 22 -22 22q-10 0 -17 -7l-542 -542q-7 -7 -7 -17q0 -22 22 -22q10 0 17 7l542 542q7 7 7 17zM832 1120l416 -416l-832 -832h-416v416zM1515 1024q0 -53 -37 -90l-166 -166l-416 416l166 165q36 38 90 38
+q53 0 91 -38l235 -234q37 -39 37 -91z" />
+    <glyph glyph-name="map_marker" unicode="&#xf041;" horiz-adv-x="1024" 
+d="M768 896q0 106 -75 181t-181 75t-181 -75t-75 -181t75 -181t181 -75t181 75t75 181zM1024 896q0 -109 -33 -179l-364 -774q-16 -33 -47.5 -52t-67.5 -19t-67.5 19t-46.5 52l-365 774q-33 70 -33 179q0 212 150 362t362 150t362 -150t150 -362z" />
+    <glyph glyph-name="adjust" unicode="&#xf042;" 
+d="M768 96v1088q-148 0 -273 -73t-198 -198t-73 -273t73 -273t198 -198t273 -73zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="tint" unicode="&#xf043;" horiz-adv-x="1024" 
+d="M512 384q0 36 -20 69q-1 1 -15.5 22.5t-25.5 38t-25 44t-21 50.5q-4 16 -21 16t-21 -16q-7 -23 -21 -50.5t-25 -44t-25.5 -38t-15.5 -22.5q-20 -33 -20 -69q0 -53 37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM1024 512q0 -212 -150 -362t-362 -150t-362 150t-150 362
+q0 145 81 275q6 9 62.5 90.5t101 151t99.5 178t83 201.5q9 30 34 47t51 17t51.5 -17t33.5 -47q28 -93 83 -201.5t99.5 -178t101 -151t62.5 -90.5q81 -127 81 -275z" />
+    <glyph glyph-name="edit" unicode="&#xf044;" horiz-adv-x="1792" 
+d="M888 352l116 116l-152 152l-116 -116v-56h96v-96h56zM1328 1072q-16 16 -33 -1l-350 -350q-17 -17 -1 -33t33 1l350 350q17 17 1 33zM1408 478v-190q0 -119 -84.5 -203.5t-203.5 -84.5h-832q-119 0 -203.5 84.5t-84.5 203.5v832q0 119 84.5 203.5t203.5 84.5h832
+q63 0 117 -25q15 -7 18 -23q3 -17 -9 -29l-49 -49q-14 -14 -32 -8q-23 6 -45 6h-832q-66 0 -113 -47t-47 -113v-832q0 -66 47 -113t113 -47h832q66 0 113 47t47 113v126q0 13 9 22l64 64q15 15 35 7t20 -29zM1312 1216l288 -288l-672 -672h-288v288zM1756 1084l-92 -92
+l-288 288l92 92q28 28 68 28t68 -28l152 -152q28 -28 28 -68t-28 -68z" />
+    <glyph glyph-name="share" unicode="&#xf045;" horiz-adv-x="1664" 
+d="M1408 547v-259q0 -119 -84.5 -203.5t-203.5 -84.5h-832q-119 0 -203.5 84.5t-84.5 203.5v832q0 119 84.5 203.5t203.5 84.5h255v0q13 0 22.5 -9.5t9.5 -22.5q0 -27 -26 -32q-77 -26 -133 -60q-10 -4 -16 -4h-112q-66 0 -113 -47t-47 -113v-832q0 -66 47 -113t113 -47h832
+q66 0 113 47t47 113v214q0 19 18 29q28 13 54 37q16 16 35 8q21 -9 21 -29zM1645 1043l-384 -384q-18 -19 -45 -19q-12 0 -25 5q-39 17 -39 59v192h-160q-323 0 -438 -131q-119 -137 -74 -473q3 -23 -20 -34q-8 -2 -12 -2q-16 0 -26 13q-10 14 -21 31t-39.5 68.5t-49.5 99.5
+t-38.5 114t-17.5 122q0 49 3.5 91t14 90t28 88t47 81.5t68.5 74t94.5 61.5t124.5 48.5t159.5 30.5t196.5 11h160v192q0 42 39 59q13 5 25 5q26 0 45 -19l384 -384q19 -19 19 -45t-19 -45z" />
+    <glyph glyph-name="check" unicode="&#xf046;" horiz-adv-x="1664" 
+d="M1408 606v-318q0 -119 -84.5 -203.5t-203.5 -84.5h-832q-119 0 -203.5 84.5t-84.5 203.5v832q0 119 84.5 203.5t203.5 84.5h832q63 0 117 -25q15 -7 18 -23q3 -17 -9 -29l-49 -49q-10 -10 -23 -10q-3 0 -9 2q-23 6 -45 6h-832q-66 0 -113 -47t-47 -113v-832
+q0 -66 47 -113t113 -47h832q66 0 113 47t47 113v254q0 13 9 22l64 64q10 10 23 10q6 0 12 -3q20 -8 20 -29zM1639 1095l-814 -814q-24 -24 -57 -24t-57 24l-430 430q-24 24 -24 57t24 57l110 110q24 24 57 24t57 -24l263 -263l647 647q24 24 57 24t57 -24l110 -110
+q24 -24 24 -57t-24 -57z" />
+    <glyph glyph-name="move" unicode="&#xf047;" horiz-adv-x="1792" 
+d="M1792 640q0 -26 -19 -45l-256 -256q-19 -19 -45 -19t-45 19t-19 45v128h-384v-384h128q26 0 45 -19t19 -45t-19 -45l-256 -256q-19 -19 -45 -19t-45 19l-256 256q-19 19 -19 45t19 45t45 19h128v384h-384v-128q0 -26 -19 -45t-45 -19t-45 19l-256 256q-19 19 -19 45
+t19 45l256 256q19 19 45 19t45 -19t19 -45v-128h384v384h-128q-26 0 -45 19t-19 45t19 45l256 256q19 19 45 19t45 -19l256 -256q19 -19 19 -45t-19 -45t-45 -19h-128v-384h384v128q0 26 19 45t45 19t45 -19l256 -256q19 -19 19 -45z" />
+    <glyph glyph-name="step_backward" unicode="&#xf048;" horiz-adv-x="1024" 
+d="M979 1395q19 19 32 13t13 -32v-1472q0 -26 -13 -32t-32 13l-710 710q-9 9 -13 19v-678q0 -26 -19 -45t-45 -19h-128q-26 0 -45 19t-19 45v1408q0 26 19 45t45 19h128q26 0 45 -19t19 -45v-678q4 10 13 19z" />
+    <glyph glyph-name="fast_backward" unicode="&#xf049;" horiz-adv-x="1792" 
+d="M1747 1395q19 19 32 13t13 -32v-1472q0 -26 -13 -32t-32 13l-710 710q-9 9 -13 19v-710q0 -26 -13 -32t-32 13l-710 710q-9 9 -13 19v-678q0 -26 -19 -45t-45 -19h-128q-26 0 -45 19t-19 45v1408q0 26 19 45t45 19h128q26 0 45 -19t19 -45v-678q4 10 13 19l710 710
+q19 19 32 13t13 -32v-710q4 10 13 19z" />
+    <glyph glyph-name="backward" unicode="&#xf04a;" horiz-adv-x="1664" 
+d="M1619 1395q19 19 32 13t13 -32v-1472q0 -26 -13 -32t-32 13l-710 710q-9 9 -13 19v-710q0 -26 -13 -32t-32 13l-710 710q-19 19 -19 45t19 45l710 710q19 19 32 13t13 -32v-710q4 10 13 19z" />
+    <glyph glyph-name="play" unicode="&#xf04b;" horiz-adv-x="1408" 
+d="M1384 609l-1328 -738q-23 -13 -39.5 -3t-16.5 36v1472q0 26 16.5 36t39.5 -3l1328 -738q23 -13 23 -31t-23 -31z" />
+    <glyph glyph-name="pause" unicode="&#xf04c;" 
+d="M1536 1344v-1408q0 -26 -19 -45t-45 -19h-512q-26 0 -45 19t-19 45v1408q0 26 19 45t45 19h512q26 0 45 -19t19 -45zM640 1344v-1408q0 -26 -19 -45t-45 -19h-512q-26 0 -45 19t-19 45v1408q0 26 19 45t45 19h512q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="stop" unicode="&#xf04d;" 
+d="M1536 1344v-1408q0 -26 -19 -45t-45 -19h-1408q-26 0 -45 19t-19 45v1408q0 26 19 45t45 19h1408q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="forward" unicode="&#xf04e;" horiz-adv-x="1664" 
+d="M45 -115q-19 -19 -32 -13t-13 32v1472q0 26 13 32t32 -13l710 -710q9 -9 13 -19v710q0 26 13 32t32 -13l710 -710q19 -19 19 -45t-19 -45l-710 -710q-19 -19 -32 -13t-13 32v710q-4 -10 -13 -19z" />
+    <glyph glyph-name="fast_forward" unicode="&#xf050;" horiz-adv-x="1792" 
+d="M45 -115q-19 -19 -32 -13t-13 32v1472q0 26 13 32t32 -13l710 -710q9 -9 13 -19v710q0 26 13 32t32 -13l710 -710q9 -9 13 -19v678q0 26 19 45t45 19h128q26 0 45 -19t19 -45v-1408q0 -26 -19 -45t-45 -19h-128q-26 0 -45 19t-19 45v678q-4 -10 -13 -19l-710 -710
+q-19 -19 -32 -13t-13 32v710q-4 -10 -13 -19z" />
+    <glyph glyph-name="step_forward" unicode="&#xf051;" horiz-adv-x="1024" 
+d="M45 -115q-19 -19 -32 -13t-13 32v1472q0 26 13 32t32 -13l710 -710q9 -9 13 -19v678q0 26 19 45t45 19h128q26 0 45 -19t19 -45v-1408q0 -26 -19 -45t-45 -19h-128q-26 0 -45 19t-19 45v678q-4 -10 -13 -19z" />
+    <glyph glyph-name="eject" unicode="&#xf052;" horiz-adv-x="1538" 
+d="M14 557l710 710q19 19 45 19t45 -19l710 -710q19 -19 13 -32t-32 -13h-1472q-26 0 -32 13t13 32zM1473 0h-1408q-26 0 -45 19t-19 45v256q0 26 19 45t45 19h1408q26 0 45 -19t19 -45v-256q0 -26 -19 -45t-45 -19z" />
+    <glyph glyph-name="chevron_left" unicode="&#xf053;" horiz-adv-x="1280" 
+d="M1171 1235l-531 -531l531 -531q19 -19 19 -45t-19 -45l-166 -166q-19 -19 -45 -19t-45 19l-742 742q-19 19 -19 45t19 45l742 742q19 19 45 19t45 -19l166 -166q19 -19 19 -45t-19 -45z" />
+    <glyph glyph-name="chevron_right" unicode="&#xf054;" horiz-adv-x="1280" 
+d="M1107 659l-742 -742q-19 -19 -45 -19t-45 19l-166 166q-19 19 -19 45t19 45l531 531l-531 531q-19 19 -19 45t19 45l166 166q19 19 45 19t45 -19l742 -742q19 -19 19 -45t-19 -45z" />
+    <glyph glyph-name="plus_sign" unicode="&#xf055;" 
+d="M1216 576v128q0 26 -19 45t-45 19h-256v256q0 26 -19 45t-45 19h-128q-26 0 -45 -19t-19 -45v-256h-256q-26 0 -45 -19t-19 -45v-128q0 -26 19 -45t45 -19h256v-256q0 -26 19 -45t45 -19h128q26 0 45 19t19 45v256h256q26 0 45 19t19 45zM1536 640q0 -209 -103 -385.5
+t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="minus_sign" unicode="&#xf056;" 
+d="M1216 576v128q0 26 -19 45t-45 19h-768q-26 0 -45 -19t-19 -45v-128q0 -26 19 -45t45 -19h768q26 0 45 19t19 45zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5
+t103 -385.5z" />
+    <glyph glyph-name="remove_sign" unicode="&#xf057;" 
+d="M1149 414q0 26 -19 45l-181 181l181 181q19 19 19 45q0 27 -19 46l-90 90q-19 19 -46 19q-26 0 -45 -19l-181 -181l-181 181q-19 19 -45 19q-27 0 -46 -19l-90 -90q-19 -19 -19 -46q0 -26 19 -45l181 -181l-181 -181q-19 -19 -19 -45q0 -27 19 -46l90 -90q19 -19 46 -19
+q26 0 45 19l181 181l181 -181q19 -19 45 -19q27 0 46 19l90 90q19 19 19 46zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="ok_sign" unicode="&#xf058;" 
+d="M1284 802q0 28 -18 46l-91 90q-19 19 -45 19t-45 -19l-408 -407l-226 226q-19 19 -45 19t-45 -19l-91 -90q-18 -18 -18 -46q0 -27 18 -45l362 -362q19 -19 45 -19q27 0 46 19l543 543q18 18 18 45zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103
+t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="question_sign" unicode="&#xf059;" 
+d="M896 160v192q0 14 -9 23t-23 9h-192q-14 0 -23 -9t-9 -23v-192q0 -14 9 -23t23 -9h192q14 0 23 9t9 23zM1152 832q0 88 -55.5 163t-138.5 116t-170 41q-243 0 -371 -213q-15 -24 8 -42l132 -100q7 -6 19 -6q16 0 25 12q53 68 86 92q34 24 86 24q48 0 85.5 -26t37.5 -59
+q0 -38 -20 -61t-68 -45q-63 -28 -115.5 -86.5t-52.5 -125.5v-36q0 -14 9 -23t23 -9h192q14 0 23 9t9 23q0 19 21.5 49.5t54.5 49.5q32 18 49 28.5t46 35t44.5 48t28 60.5t12.5 81zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5
+t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="info_sign" unicode="&#xf05a;" 
+d="M1024 160v160q0 14 -9 23t-23 9h-96v512q0 14 -9 23t-23 9h-320q-14 0 -23 -9t-9 -23v-160q0 -14 9 -23t23 -9h96v-320h-96q-14 0 -23 -9t-9 -23v-160q0 -14 9 -23t23 -9h448q14 0 23 9t9 23zM896 1056v160q0 14 -9 23t-23 9h-192q-14 0 -23 -9t-9 -23v-160q0 -14 9 -23
+t23 -9h192q14 0 23 9t9 23zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="screenshot" unicode="&#xf05b;" 
+d="M1197 512h-109q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h109q-32 108 -112.5 188.5t-188.5 112.5v-109q0 -26 -19 -45t-45 -19h-128q-26 0 -45 19t-19 45v109q-108 -32 -188.5 -112.5t-112.5 -188.5h109q26 0 45 -19t19 -45v-128q0 -26 -19 -45t-45 -19h-109
+q32 -108 112.5 -188.5t188.5 -112.5v109q0 26 19 45t45 19h128q26 0 45 -19t19 -45v-109q108 32 188.5 112.5t112.5 188.5zM1536 704v-128q0 -26 -19 -45t-45 -19h-143q-37 -161 -154.5 -278.5t-278.5 -154.5v-143q0 -26 -19 -45t-45 -19h-128q-26 0 -45 19t-19 45v143
+q-161 37 -278.5 154.5t-154.5 278.5h-143q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h143q37 161 154.5 278.5t278.5 154.5v143q0 26 19 45t45 19h128q26 0 45 -19t19 -45v-143q161 -37 278.5 -154.5t154.5 -278.5h143q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="remove_circle" unicode="&#xf05c;" 
+d="M1097 457l-146 -146q-10 -10 -23 -10t-23 10l-137 137l-137 -137q-10 -10 -23 -10t-23 10l-146 146q-10 10 -10 23t10 23l137 137l-137 137q-10 10 -10 23t10 23l146 146q10 10 23 10t23 -10l137 -137l137 137q10 10 23 10t23 -10l146 -146q10 -10 10 -23t-10 -23
+l-137 -137l137 -137q10 -10 10 -23t-10 -23zM1312 640q0 148 -73 273t-198 198t-273 73t-273 -73t-198 -198t-73 -273t73 -273t198 -198t273 -73t273 73t198 198t73 273zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5
+t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="ok_circle" unicode="&#xf05d;" 
+d="M1171 723l-422 -422q-19 -19 -45 -19t-45 19l-294 294q-19 19 -19 45t19 45l102 102q19 19 45 19t45 -19l147 -147l275 275q19 19 45 19t45 -19l102 -102q19 -19 19 -45t-19 -45zM1312 640q0 148 -73 273t-198 198t-273 73t-273 -73t-198 -198t-73 -273t73 -273t198 -198
+t273 -73t273 73t198 198t73 273zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="ban_circle" unicode="&#xf05e;" 
+d="M1312 643q0 161 -87 295l-754 -753q137 -89 297 -89q111 0 211.5 43.5t173.5 116.5t116 174.5t43 212.5zM313 344l755 754q-135 91 -300 91q-148 0 -273 -73t-198 -199t-73 -274q0 -162 89 -299zM1536 643q0 -157 -61 -300t-163.5 -246t-245 -164t-298.5 -61t-298.5 61
+t-245 164t-163.5 246t-61 300t61 299.5t163.5 245.5t245 164t298.5 61t298.5 -61t245 -164t163.5 -245.5t61 -299.5z" />
+    <glyph glyph-name="arrow_left" unicode="&#xf060;" 
+d="M1536 640v-128q0 -53 -32.5 -90.5t-84.5 -37.5h-704l293 -294q38 -36 38 -90t-38 -90l-75 -76q-37 -37 -90 -37q-52 0 -91 37l-651 652q-37 37 -37 90q0 52 37 91l651 650q38 38 91 38q52 0 90 -38l75 -74q38 -38 38 -91t-38 -91l-293 -293h704q52 0 84.5 -37.5
+t32.5 -90.5z" />
+    <glyph glyph-name="arrow_right" unicode="&#xf061;" 
+d="M1472 576q0 -54 -37 -91l-651 -651q-39 -37 -91 -37q-51 0 -90 37l-75 75q-38 38 -38 91t38 91l293 293h-704q-52 0 -84.5 37.5t-32.5 90.5v128q0 53 32.5 90.5t84.5 37.5h704l-293 294q-38 36 -38 90t38 90l75 75q38 38 90 38q53 0 91 -38l651 -651q37 -35 37 -90z" />
+    <glyph glyph-name="arrow_up" unicode="&#xf062;" horiz-adv-x="1664" 
+d="M1611 565q0 -51 -37 -90l-75 -75q-38 -38 -91 -38q-54 0 -90 38l-294 293v-704q0 -52 -37.5 -84.5t-90.5 -32.5h-128q-53 0 -90.5 32.5t-37.5 84.5v704l-294 -293q-36 -38 -90 -38t-90 38l-75 75q-38 38 -38 90q0 53 38 91l651 651q35 37 90 37q54 0 91 -37l651 -651
+q37 -39 37 -91z" />
+    <glyph glyph-name="arrow_down" unicode="&#xf063;" horiz-adv-x="1664" 
+d="M1611 704q0 -53 -37 -90l-651 -652q-39 -37 -91 -37q-53 0 -90 37l-651 652q-38 36 -38 90q0 53 38 91l74 75q39 37 91 37q53 0 90 -37l294 -294v704q0 52 38 90t90 38h128q52 0 90 -38t38 -90v-704l294 294q37 37 90 37q52 0 91 -37l75 -75q37 -39 37 -91z" />
+    <glyph glyph-name="share_alt" unicode="&#xf064;" horiz-adv-x="1792" 
+d="M1792 896q0 -26 -19 -45l-512 -512q-19 -19 -45 -19t-45 19t-19 45v256h-224q-98 0 -175.5 -6t-154 -21.5t-133 -42.5t-105.5 -69.5t-80 -101t-48.5 -138.5t-17.5 -181q0 -55 5 -123q0 -6 2.5 -23.5t2.5 -26.5q0 -15 -8.5 -25t-23.5 -10q-16 0 -28 17q-7 9 -13 22
+t-13.5 30t-10.5 24q-127 285 -127 451q0 199 53 333q162 403 875 403h224v256q0 26 19 45t45 19t45 -19l512 -512q19 -19 19 -45z" />
+    <glyph glyph-name="resize_full" unicode="&#xf065;" 
+d="M755 480q0 -13 -10 -23l-332 -332l144 -144q19 -19 19 -45t-19 -45t-45 -19h-448q-26 0 -45 19t-19 45v448q0 26 19 45t45 19t45 -19l144 -144l332 332q10 10 23 10t23 -10l114 -114q10 -10 10 -23zM1536 1344v-448q0 -26 -19 -45t-45 -19t-45 19l-144 144l-332 -332
+q-10 -10 -23 -10t-23 10l-114 114q-10 10 -10 23t10 23l332 332l-144 144q-19 19 -19 45t19 45t45 19h448q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="resize_small" unicode="&#xf066;" 
+d="M768 576v-448q0 -26 -19 -45t-45 -19t-45 19l-144 144l-332 -332q-10 -10 -23 -10t-23 10l-114 114q-10 10 -10 23t10 23l332 332l-144 144q-19 19 -19 45t19 45t45 19h448q26 0 45 -19t19 -45zM1523 1248q0 -13 -10 -23l-332 -332l144 -144q19 -19 19 -45t-19 -45
+t-45 -19h-448q-26 0 -45 19t-19 45v448q0 26 19 45t45 19t45 -19l144 -144l332 332q10 10 23 10t23 -10l114 -114q10 -10 10 -23z" />
+    <glyph glyph-name="plus" unicode="&#xf067;" horiz-adv-x="1408" 
+d="M1408 800v-192q0 -40 -28 -68t-68 -28h-416v-416q0 -40 -28 -68t-68 -28h-192q-40 0 -68 28t-28 68v416h-416q-40 0 -68 28t-28 68v192q0 40 28 68t68 28h416v416q0 40 28 68t68 28h192q40 0 68 -28t28 -68v-416h416q40 0 68 -28t28 -68z" />
+    <glyph glyph-name="minus" unicode="&#xf068;" horiz-adv-x="1408" 
+d="M1408 800v-192q0 -40 -28 -68t-68 -28h-1216q-40 0 -68 28t-28 68v192q0 40 28 68t68 28h1216q40 0 68 -28t28 -68z" />
+    <glyph glyph-name="asterisk" unicode="&#xf069;" horiz-adv-x="1664" 
+d="M1482 486q46 -26 59.5 -77.5t-12.5 -97.5l-64 -110q-26 -46 -77.5 -59.5t-97.5 12.5l-266 153v-307q0 -52 -38 -90t-90 -38h-128q-52 0 -90 38t-38 90v307l-266 -153q-46 -26 -97.5 -12.5t-77.5 59.5l-64 110q-26 46 -12.5 97.5t59.5 77.5l266 154l-266 154
+q-46 26 -59.5 77.5t12.5 97.5l64 110q26 46 77.5 59.5t97.5 -12.5l266 -153v307q0 52 38 90t90 38h128q52 0 90 -38t38 -90v-307l266 153q46 26 97.5 12.5t77.5 -59.5l64 -110q26 -46 12.5 -97.5t-59.5 -77.5l-266 -154z" />
+    <glyph glyph-name="exclamation_sign" unicode="&#xf06a;" 
+d="M768 1408q209 0 385.5 -103t279.5 -279.5t103 -385.5t-103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103zM896 161v190q0 14 -9 23.5t-22 9.5h-192q-13 0 -23 -10t-10 -23v-190q0 -13 10 -23t23 -10h192
+q13 0 22 9.5t9 23.5zM894 505l18 621q0 12 -10 18q-10 8 -24 8h-220q-14 0 -24 -8q-10 -6 -10 -18l17 -621q0 -10 10 -17.5t24 -7.5h185q14 0 23.5 7.5t10.5 17.5z" />
+    <glyph glyph-name="gift" unicode="&#xf06b;" 
+d="M928 180v56v468v192h-320v-192v-468v-56q0 -25 18 -38.5t46 -13.5h192q28 0 46 13.5t18 38.5zM472 1024h195l-126 161q-26 31 -69 31q-40 0 -68 -28t-28 -68t28 -68t68 -28zM1160 1120q0 40 -28 68t-68 28q-43 0 -69 -31l-125 -161h194q40 0 68 28t28 68zM1536 864v-320
+q0 -14 -9 -23t-23 -9h-96v-416q0 -40 -28 -68t-68 -28h-1088q-40 0 -68 28t-28 68v416h-96q-14 0 -23 9t-9 23v320q0 14 9 23t23 9h440q-93 0 -158.5 65.5t-65.5 158.5t65.5 158.5t158.5 65.5q107 0 168 -77l128 -165l128 165q61 77 168 77q93 0 158.5 -65.5t65.5 -158.5
+t-65.5 -158.5t-158.5 -65.5h440q14 0 23 -9t9 -23z" />
+    <glyph glyph-name="leaf" unicode="&#xf06c;" horiz-adv-x="1792" 
+d="M1280 832q0 26 -19 45t-45 19q-172 0 -318 -49.5t-259.5 -134t-235.5 -219.5q-19 -21 -19 -45q0 -26 19 -45t45 -19q24 0 45 19q27 24 74 71t67 66q137 124 268.5 176t313.5 52q26 0 45 19t19 45zM1792 1030q0 -95 -20 -193q-46 -224 -184.5 -383t-357.5 -268
+q-214 -108 -438 -108q-148 0 -286 47q-15 5 -88 42t-96 37q-16 0 -39.5 -32t-45 -70t-52.5 -70t-60 -32q-43 0 -63.5 17.5t-45.5 59.5q-2 4 -6 11t-5.5 10t-3 9.5t-1.5 13.5q0 35 31 73.5t68 65.5t68 56t31 48q0 4 -14 38t-16 44q-9 51 -9 104q0 115 43.5 220t119 184.5
+t170.5 139t204 95.5q55 18 145 25.5t179.5 9t178.5 6t163.5 24t113.5 56.5l29.5 29.5t29.5 28t27 20t36.5 16t43.5 4.5q39 0 70.5 -46t47.5 -112t24 -124t8 -96z" />
+    <glyph glyph-name="fire" unicode="&#xf06d;" horiz-adv-x="1408" 
+d="M1408 -160v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-1344q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h1344q13 0 22.5 -9.5t9.5 -22.5zM1152 896q0 -78 -24.5 -144t-64 -112.5t-87.5 -88t-96 -77.5t-87.5 -72t-64 -81.5t-24.5 -96.5q0 -96 67 -224l-4 1l1 -1
+q-90 41 -160 83t-138.5 100t-113.5 122.5t-72.5 150.5t-27.5 184q0 78 24.5 144t64 112.5t87.5 88t96 77.5t87.5 72t64 81.5t24.5 96.5q0 94 -66 224l3 -1l-1 1q90 -41 160 -83t138.5 -100t113.5 -122.5t72.5 -150.5t27.5 -184z" />
+    <glyph glyph-name="eye_open" unicode="&#xf06e;" horiz-adv-x="1792" 
+d="M1664 576q-152 236 -381 353q61 -104 61 -225q0 -185 -131.5 -316.5t-316.5 -131.5t-316.5 131.5t-131.5 316.5q0 121 61 225q-229 -117 -381 -353q133 -205 333.5 -326.5t434.5 -121.5t434.5 121.5t333.5 326.5zM944 960q0 20 -14 34t-34 14q-125 0 -214.5 -89.5
+t-89.5 -214.5q0 -20 14 -34t34 -14t34 14t14 34q0 86 61 147t147 61q20 0 34 14t14 34zM1792 576q0 -34 -20 -69q-140 -230 -376.5 -368.5t-499.5 -138.5t-499.5 139t-376.5 368q-20 35 -20 69t20 69q140 229 376.5 368t499.5 139t499.5 -139t376.5 -368q20 -35 20 -69z" />
+    <glyph glyph-name="eye_close" unicode="&#xf070;" horiz-adv-x="1792" 
+d="M555 201l78 141q-87 63 -136 159t-49 203q0 121 61 225q-229 -117 -381 -353q167 -258 427 -375zM944 960q0 20 -14 34t-34 14q-125 0 -214.5 -89.5t-89.5 -214.5q0 -20 14 -34t34 -14t34 14t14 34q0 86 61 147t147 61q20 0 34 14t14 34zM1307 1151q0 -7 -1 -9
+q-106 -189 -316 -567t-315 -566l-49 -89q-10 -16 -28 -16q-12 0 -134 70q-16 10 -16 28q0 12 44 87q-143 65 -263.5 173t-208.5 245q-20 31 -20 69t20 69q153 235 380 371t496 136q89 0 180 -17l54 97q10 16 28 16q5 0 18 -6t31 -15.5t33 -18.5t31.5 -18.5t19.5 -11.5
+q16 -10 16 -27zM1344 704q0 -139 -79 -253.5t-209 -164.5l280 502q8 -45 8 -84zM1792 576q0 -35 -20 -69q-39 -64 -109 -145q-150 -172 -347.5 -267t-419.5 -95l74 132q212 18 392.5 137t301.5 307q-115 179 -282 294l63 112q95 -64 182.5 -153t144.5 -184q20 -34 20 -69z
+" />
+    <glyph glyph-name="warning_sign" unicode="&#xf071;" horiz-adv-x="1792" 
+d="M1024 161v190q0 14 -9.5 23.5t-22.5 9.5h-192q-13 0 -22.5 -9.5t-9.5 -23.5v-190q0 -14 9.5 -23.5t22.5 -9.5h192q13 0 22.5 9.5t9.5 23.5zM1022 535l18 459q0 12 -10 19q-13 11 -24 11h-220q-11 0 -24 -11q-10 -7 -10 -21l17 -457q0 -10 10 -16.5t24 -6.5h185
+q14 0 23.5 6.5t10.5 16.5zM1008 1469l768 -1408q35 -63 -2 -126q-17 -29 -46.5 -46t-63.5 -17h-1536q-34 0 -63.5 17t-46.5 46q-37 63 -2 126l768 1408q17 31 47 49t65 18t65 -18t47 -49z" />
+    <glyph glyph-name="plane" unicode="&#xf072;" horiz-adv-x="1408" 
+d="M1376 1376q44 -52 12 -148t-108 -172l-161 -161l160 -696q5 -19 -12 -33l-128 -96q-7 -6 -19 -6q-4 0 -7 1q-15 3 -21 16l-279 508l-259 -259l53 -194q5 -17 -8 -31l-96 -96q-9 -9 -23 -9h-2q-15 2 -24 13l-189 252l-252 189q-11 7 -13 23q-1 13 9 25l96 97q9 9 23 9
+q6 0 8 -1l194 -53l259 259l-508 279q-14 8 -17 24q-2 16 9 27l128 128q14 13 30 8l665 -159l160 160q76 76 172 108t148 -12z" />
+    <glyph glyph-name="calendar" unicode="&#xf073;" horiz-adv-x="1664" 
+d="M128 -128h288v288h-288v-288zM480 -128h320v288h-320v-288zM128 224h288v320h-288v-320zM480 224h320v320h-320v-320zM128 608h288v288h-288v-288zM864 -128h320v288h-320v-288zM480 608h320v288h-320v-288zM1248 -128h288v288h-288v-288zM864 224h320v320h-320v-320z
+M512 1088v288q0 13 -9.5 22.5t-22.5 9.5h-64q-13 0 -22.5 -9.5t-9.5 -22.5v-288q0 -13 9.5 -22.5t22.5 -9.5h64q13 0 22.5 9.5t9.5 22.5zM1248 224h288v320h-288v-320zM864 608h320v288h-320v-288zM1248 608h288v288h-288v-288zM1280 1088v288q0 13 -9.5 22.5t-22.5 9.5h-64
+q-13 0 -22.5 -9.5t-9.5 -22.5v-288q0 -13 9.5 -22.5t22.5 -9.5h64q13 0 22.5 9.5t9.5 22.5zM1664 1152v-1280q0 -52 -38 -90t-90 -38h-1408q-52 0 -90 38t-38 90v1280q0 52 38 90t90 38h128v96q0 66 47 113t113 47h64q66 0 113 -47t47 -113v-96h384v96q0 66 47 113t113 47
+h64q66 0 113 -47t47 -113v-96h128q52 0 90 -38t38 -90z" />
+    <glyph glyph-name="random" unicode="&#xf074;" horiz-adv-x="1792" 
+d="M666 1055q-60 -92 -137 -273q-22 45 -37 72.5t-40.5 63.5t-51 56.5t-63 35t-81.5 14.5h-224q-14 0 -23 9t-9 23v192q0 14 9 23t23 9h224q250 0 410 -225zM1792 256q0 -14 -9 -23l-320 -320q-9 -9 -23 -9q-13 0 -22.5 9.5t-9.5 22.5v192q-32 0 -85 -0.5t-81 -1t-73 1
+t-71 5t-64 10.5t-63 18.5t-58 28.5t-59 40t-55 53.5t-56 69.5q59 93 136 273q22 -45 37 -72.5t40.5 -63.5t51 -56.5t63 -35t81.5 -14.5h256v192q0 14 9 23t23 9q12 0 24 -10l319 -319q9 -9 9 -23zM1792 1152q0 -14 -9 -23l-320 -320q-9 -9 -23 -9q-13 0 -22.5 9.5t-9.5 22.5
+v192h-256q-48 0 -87 -15t-69 -45t-51 -61.5t-45 -77.5q-32 -62 -78 -171q-29 -66 -49.5 -111t-54 -105t-64 -100t-74 -83t-90 -68.5t-106.5 -42t-128 -16.5h-224q-14 0 -23 9t-9 23v192q0 14 9 23t23 9h224q48 0 87 15t69 45t51 61.5t45 77.5q32 62 78 171q29 66 49.5 111
+t54 105t64 100t74 83t90 68.5t106.5 42t128 16.5h256v192q0 14 9 23t23 9q12 0 24 -10l319 -319q9 -9 9 -23z" />
+    <glyph glyph-name="comment" unicode="&#xf075;" horiz-adv-x="1792" 
+d="M1792 640q0 -174 -120 -321.5t-326 -233t-450 -85.5q-70 0 -145 8q-198 -175 -460 -242q-49 -14 -114 -22q-17 -2 -30.5 9t-17.5 29v1q-3 4 -0.5 12t2 10t4.5 9.5l6 9t7 8.5t8 9q7 8 31 34.5t34.5 38t31 39.5t32.5 51t27 59t26 76q-157 89 -247.5 220t-90.5 281
+q0 130 71 248.5t191 204.5t286 136.5t348 50.5q244 0 450 -85.5t326 -233t120 -321.5z" />
+    <glyph glyph-name="magnet" unicode="&#xf076;" 
+d="M1536 704v-128q0 -201 -98.5 -362t-274 -251.5t-395.5 -90.5t-395.5 90.5t-274 251.5t-98.5 362v128q0 26 19 45t45 19h384q26 0 45 -19t19 -45v-128q0 -52 23.5 -90t53.5 -57t71 -30t64 -13t44 -2t44 2t64 13t71 30t53.5 57t23.5 90v128q0 26 19 45t45 19h384
+q26 0 45 -19t19 -45zM512 1344v-384q0 -26 -19 -45t-45 -19h-384q-26 0 -45 19t-19 45v384q0 26 19 45t45 19h384q26 0 45 -19t19 -45zM1536 1344v-384q0 -26 -19 -45t-45 -19h-384q-26 0 -45 19t-19 45v384q0 26 19 45t45 19h384q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="chevron_up" unicode="&#xf077;" horiz-adv-x="1792" 
+d="M1683 205l-166 -165q-19 -19 -45 -19t-45 19l-531 531l-531 -531q-19 -19 -45 -19t-45 19l-166 165q-19 19 -19 45.5t19 45.5l742 741q19 19 45 19t45 -19l742 -741q19 -19 19 -45.5t-19 -45.5z" />
+    <glyph glyph-name="chevron_down" unicode="&#xf078;" horiz-adv-x="1792" 
+d="M1683 728l-742 -741q-19 -19 -45 -19t-45 19l-742 741q-19 19 -19 45.5t19 45.5l166 165q19 19 45 19t45 -19l531 -531l531 531q19 19 45 19t45 -19l166 -165q19 -19 19 -45.5t-19 -45.5z" />
+    <glyph glyph-name="retweet" unicode="&#xf079;" horiz-adv-x="1920" 
+d="M1280 32q0 -13 -9.5 -22.5t-22.5 -9.5h-960q-8 0 -13.5 2t-9 7t-5.5 8t-3 11.5t-1 11.5v13v11v160v416h-192q-26 0 -45 19t-19 45q0 24 15 41l320 384q19 22 49 22t49 -22l320 -384q15 -17 15 -41q0 -26 -19 -45t-45 -19h-192v-384h576q16 0 25 -11l160 -192q7 -10 7 -21
+zM1920 448q0 -24 -15 -41l-320 -384q-20 -23 -49 -23t-49 23l-320 384q-15 17 -15 41q0 26 19 45t45 19h192v384h-576q-16 0 -25 12l-160 192q-7 9 -7 20q0 13 9.5 22.5t22.5 9.5h960q8 0 13.5 -2t9 -7t5.5 -8t3 -11.5t1 -11.5v-13v-11v-160v-416h192q26 0 45 -19t19 -45z
+" />
+    <glyph glyph-name="shopping_cart" unicode="&#xf07a;" horiz-adv-x="1664" 
+d="M640 0q0 -52 -38 -90t-90 -38t-90 38t-38 90t38 90t90 38t90 -38t38 -90zM1536 0q0 -52 -38 -90t-90 -38t-90 38t-38 90t38 90t90 38t90 -38t38 -90zM1664 1088v-512q0 -24 -16.5 -42.5t-40.5 -21.5l-1044 -122q13 -60 13 -70q0 -16 -24 -64h920q26 0 45 -19t19 -45
+t-19 -45t-45 -19h-1024q-26 0 -45 19t-19 45q0 11 8 31.5t16 36t21.5 40t15.5 29.5l-177 823h-204q-26 0 -45 19t-19 45t19 45t45 19h256q16 0 28.5 -6.5t19.5 -15.5t13 -24.5t8 -26t5.5 -29.5t4.5 -26h1201q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="folder_close" unicode="&#xf07b;" horiz-adv-x="1664" 
+d="M1664 928v-704q0 -92 -66 -158t-158 -66h-1216q-92 0 -158 66t-66 158v960q0 92 66 158t158 66h320q92 0 158 -66t66 -158v-32h672q92 0 158 -66t66 -158z" />
+    <glyph glyph-name="folder_open" unicode="&#xf07c;" horiz-adv-x="1920" 
+d="M1879 584q0 -31 -31 -66l-336 -396q-43 -51 -120.5 -86.5t-143.5 -35.5h-1088q-34 0 -60.5 13t-26.5 43q0 31 31 66l336 396q43 51 120.5 86.5t143.5 35.5h1088q34 0 60.5 -13t26.5 -43zM1536 928v-160h-832q-94 0 -197 -47.5t-164 -119.5l-337 -396l-5 -6q0 4 -0.5 12.5
+t-0.5 12.5v960q0 92 66 158t158 66h320q92 0 158 -66t66 -158v-32h544q92 0 158 -66t66 -158z" />
+    <glyph glyph-name="resize_vertical" unicode="&#xf07d;" horiz-adv-x="768" 
+d="M704 1216q0 -26 -19 -45t-45 -19h-128v-1024h128q26 0 45 -19t19 -45t-19 -45l-256 -256q-19 -19 -45 -19t-45 19l-256 256q-19 19 -19 45t19 45t45 19h128v1024h-128q-26 0 -45 19t-19 45t19 45l256 256q19 19 45 19t45 -19l256 -256q19 -19 19 -45z" />
+    <glyph glyph-name="resize_horizontal" unicode="&#xf07e;" horiz-adv-x="1792" 
+d="M1792 640q0 -26 -19 -45l-256 -256q-19 -19 -45 -19t-45 19t-19 45v128h-1024v-128q0 -26 -19 -45t-45 -19t-45 19l-256 256q-19 19 -19 45t19 45l256 256q19 19 45 19t45 -19t19 -45v-128h1024v128q0 26 19 45t45 19t45 -19l256 -256q19 -19 19 -45z" />
+    <glyph glyph-name="bar_chart" unicode="&#xf080;" horiz-adv-x="2048" 
+d="M640 640v-512h-256v512h256zM1024 1152v-1024h-256v1024h256zM2048 0v-128h-2048v1536h128v-1408h1920zM1408 896v-768h-256v768h256zM1792 1280v-1152h-256v1152h256z" />
+    <glyph glyph-name="twitter_sign" unicode="&#xf081;" 
+d="M1280 926q-56 -25 -121 -34q68 40 93 117q-65 -38 -134 -51q-61 66 -153 66q-87 0 -148.5 -61.5t-61.5 -148.5q0 -29 5 -48q-129 7 -242 65t-192 155q-29 -50 -29 -106q0 -114 91 -175q-47 1 -100 26v-2q0 -75 50 -133.5t123 -72.5q-29 -8 -51 -8q-13 0 -39 4
+q21 -63 74.5 -104t121.5 -42q-116 -90 -261 -90q-26 0 -50 3q148 -94 322 -94q112 0 210 35.5t168 95t120.5 137t75 162t24.5 168.5q0 18 -1 27q63 45 105 109zM1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5
+t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="facebook_sign" unicode="&#xf082;" 
+d="M1248 1408q119 0 203.5 -84.5t84.5 -203.5v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-188v595h199l30 232h-229v148q0 56 23.5 84t91.5 28l122 1v207q-63 9 -178 9q-136 0 -217.5 -80t-81.5 -226v-171h-200v-232h200v-595h-532q-119 0 -203.5 84.5t-84.5 203.5v960
+q0 119 84.5 203.5t203.5 84.5h960z" />
+    <glyph glyph-name="camera_retro" unicode="&#xf083;" horiz-adv-x="1792" 
+d="M928 704q0 14 -9 23t-23 9q-66 0 -113 -47t-47 -113q0 -14 9 -23t23 -9t23 9t9 23q0 40 28 68t68 28q14 0 23 9t9 23zM1152 574q0 -106 -75 -181t-181 -75t-181 75t-75 181t75 181t181 75t181 -75t75 -181zM128 0h1536v128h-1536v-128zM1280 574q0 159 -112.5 271.5
+t-271.5 112.5t-271.5 -112.5t-112.5 -271.5t112.5 -271.5t271.5 -112.5t271.5 112.5t112.5 271.5zM256 1216h384v128h-384v-128zM128 1024h1536v118v138h-828l-64 -128h-644v-128zM1792 1280v-1280q0 -53 -37.5 -90.5t-90.5 -37.5h-1536q-53 0 -90.5 37.5t-37.5 90.5v1280
+q0 53 37.5 90.5t90.5 37.5h1536q53 0 90.5 -37.5t37.5 -90.5z" />
+    <glyph glyph-name="key" unicode="&#xf084;" horiz-adv-x="1792" 
+d="M832 1024q0 80 -56 136t-136 56t-136 -56t-56 -136q0 -42 19 -83q-41 19 -83 19q-80 0 -136 -56t-56 -136t56 -136t136 -56t136 56t56 136q0 42 -19 83q41 -19 83 -19q80 0 136 56t56 136zM1683 320q0 -17 -49 -66t-66 -49q-9 0 -28.5 16t-36.5 33t-38.5 40t-24.5 26
+l-96 -96l220 -220q28 -28 28 -68q0 -42 -39 -81t-81 -39q-40 0 -68 28l-671 671q-176 -131 -365 -131q-163 0 -265.5 102.5t-102.5 265.5q0 160 95 313t248 248t313 95q163 0 265.5 -102.5t102.5 -265.5q0 -189 -131 -365l355 -355l96 96q-3 3 -26 24.5t-40 38.5t-33 36.5
+t-16 28.5q0 17 49 66t66 49q13 0 23 -10q6 -6 46 -44.5t82 -79.5t86.5 -86t73 -78t28.5 -41z" />
+    <glyph glyph-name="cogs" unicode="&#xf085;" horiz-adv-x="1920" 
+d="M896 640q0 106 -75 181t-181 75t-181 -75t-75 -181t75 -181t181 -75t181 75t75 181zM1664 128q0 52 -38 90t-90 38t-90 -38t-38 -90q0 -53 37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM1664 1152q0 52 -38 90t-90 38t-90 -38t-38 -90q0 -53 37.5 -90.5t90.5 -37.5
+t90.5 37.5t37.5 90.5zM1280 731v-185q0 -10 -7 -19.5t-16 -10.5l-155 -24q-11 -35 -32 -76q34 -48 90 -115q7 -11 7 -20q0 -12 -7 -19q-23 -30 -82.5 -89.5t-78.5 -59.5q-11 0 -21 7l-115 90q-37 -19 -77 -31q-11 -108 -23 -155q-7 -24 -30 -24h-186q-11 0 -20 7.5t-10 17.5
+l-23 153q-34 10 -75 31l-118 -89q-7 -7 -20 -7q-11 0 -21 8q-144 133 -144 160q0 9 7 19q10 14 41 53t47 61q-23 44 -35 82l-152 24q-10 1 -17 9.5t-7 19.5v185q0 10 7 19.5t16 10.5l155 24q11 35 32 76q-34 48 -90 115q-7 11 -7 20q0 12 7 20q22 30 82 89t79 59q11 0 21 -7
+l115 -90q34 18 77 32q11 108 23 154q7 24 30 24h186q11 0 20 -7.5t10 -17.5l23 -153q34 -10 75 -31l118 89q8 7 20 7q11 0 21 -8q144 -133 144 -160q0 -8 -7 -19q-12 -16 -42 -54t-45 -60q23 -48 34 -82l152 -23q10 -2 17 -10.5t7 -19.5zM1920 198v-140q0 -16 -149 -31
+q-12 -27 -30 -52q51 -113 51 -138q0 -4 -4 -7q-122 -71 -124 -71q-8 0 -46 47t-52 68q-20 -2 -30 -2t-30 2q-14 -21 -52 -68t-46 -47q-2 0 -124 71q-4 3 -4 7q0 25 51 138q-18 25 -30 52q-149 15 -149 31v140q0 16 149 31q13 29 30 52q-51 113 -51 138q0 4 4 7q4 2 35 20
+t59 34t30 16q8 0 46 -46.5t52 -67.5q20 2 30 2t30 -2q51 71 92 112l6 2q4 0 124 -70q4 -3 4 -7q0 -25 -51 -138q17 -23 30 -52q149 -15 149 -31zM1920 1222v-140q0 -16 -149 -31q-12 -27 -30 -52q51 -113 51 -138q0 -4 -4 -7q-122 -71 -124 -71q-8 0 -46 47t-52 68
+q-20 -2 -30 -2t-30 2q-14 -21 -52 -68t-46 -47q-2 0 -124 71q-4 3 -4 7q0 25 51 138q-18 25 -30 52q-149 15 -149 31v140q0 16 149 31q13 29 30 52q-51 113 -51 138q0 4 4 7q4 2 35 20t59 34t30 16q8 0 46 -46.5t52 -67.5q20 2 30 2t30 -2q51 71 92 112l6 2q4 0 124 -70
+q4 -3 4 -7q0 -25 -51 -138q17 -23 30 -52q149 -15 149 -31z" />
+    <glyph glyph-name="comments" unicode="&#xf086;" horiz-adv-x="1792" 
+d="M1408 768q0 -139 -94 -257t-256.5 -186.5t-353.5 -68.5q-86 0 -176 16q-124 -88 -278 -128q-36 -9 -86 -16h-3q-11 0 -20.5 8t-11.5 21q-1 3 -1 6.5t0.5 6.5t2 6l2.5 5t3.5 5.5t4 5t4.5 5t4 4.5q5 6 23 25t26 29.5t22.5 29t25 38.5t20.5 44q-124 72 -195 177t-71 224
+q0 139 94 257t256.5 186.5t353.5 68.5t353.5 -68.5t256.5 -186.5t94 -257zM1792 512q0 -120 -71 -224.5t-195 -176.5q10 -24 20.5 -44t25 -38.5t22.5 -29t26 -29.5t23 -25q1 -1 4 -4.5t4.5 -5t4 -5t3.5 -5.5l2.5 -5t2 -6t0.5 -6.5t-1 -6.5q-3 -14 -13 -22t-22 -7
+q-50 7 -86 16q-154 40 -278 128q-90 -16 -176 -16q-271 0 -472 132q58 -4 88 -4q161 0 309 45t264 129q125 92 192 212t67 254q0 77 -23 152q129 -71 204 -178t75 -230z" />
+    <glyph glyph-name="thumbs_up_alt" unicode="&#xf087;" 
+d="M256 192q0 26 -19 45t-45 19t-45 -19t-19 -45t19 -45t45 -19t45 19t19 45zM1408 768q0 51 -39 89.5t-89 38.5h-352q0 58 48 159.5t48 160.5q0 98 -32 145t-128 47q-26 -26 -38 -85t-30.5 -125.5t-59.5 -109.5q-22 -23 -77 -91q-4 -5 -23 -30t-31.5 -41t-34.5 -42.5
+t-40 -44t-38.5 -35.5t-40 -27t-35.5 -9h-32v-640h32q13 0 31.5 -3t33 -6.5t38 -11t35 -11.5t35.5 -12.5t29 -10.5q211 -73 342 -73h121q192 0 192 167q0 26 -5 56q30 16 47.5 52.5t17.5 73.5t-18 69q53 50 53 119q0 25 -10 55.5t-25 47.5q32 1 53.5 47t21.5 81zM1536 769
+q0 -89 -49 -163q9 -33 9 -69q0 -77 -38 -144q3 -21 3 -43q0 -101 -60 -178q1 -139 -85 -219.5t-227 -80.5h-36h-93q-96 0 -189.5 22.5t-216.5 65.5q-116 40 -138 40h-288q-53 0 -90.5 37.5t-37.5 90.5v640q0 53 37.5 90.5t90.5 37.5h274q36 24 137 155q58 75 107 128
+q24 25 35.5 85.5t30.5 126.5t62 108q39 37 90 37q84 0 151 -32.5t102 -101.5t35 -186q0 -93 -48 -192h176q104 0 180 -76t76 -179z" />
+    <glyph glyph-name="thumbs_down_alt" unicode="&#xf088;" 
+d="M256 1088q0 26 -19 45t-45 19t-45 -19t-19 -45t19 -45t45 -19t45 19t19 45zM1408 512q0 35 -21.5 81t-53.5 47q15 17 25 47.5t10 55.5q0 69 -53 119q18 31 18 69q0 37 -17.5 73.5t-47.5 52.5q5 30 5 56q0 85 -49 126t-136 41h-128q-131 0 -342 -73q-5 -2 -29 -10.5
+t-35.5 -12.5t-35 -11.5t-38 -11t-33 -6.5t-31.5 -3h-32v-640h32q16 0 35.5 -9t40 -27t38.5 -35.5t40 -44t34.5 -42.5t31.5 -41t23 -30q55 -68 77 -91q41 -43 59.5 -109.5t30.5 -125.5t38 -85q96 0 128 47t32 145q0 59 -48 160.5t-48 159.5h352q50 0 89 38.5t39 89.5z
+M1536 511q0 -103 -76 -179t-180 -76h-176q48 -99 48 -192q0 -118 -35 -186q-35 -69 -102 -101.5t-151 -32.5q-51 0 -90 37q-34 33 -54 82t-25.5 90.5t-17.5 84.5t-31 64q-48 50 -107 127q-101 131 -137 155h-274q-53 0 -90.5 37.5t-37.5 90.5v640q0 53 37.5 90.5t90.5 37.5
+h288q22 0 138 40q128 44 223 66t200 22h112q140 0 226.5 -79t85.5 -216v-5q60 -77 60 -178q0 -22 -3 -43q38 -67 38 -144q0 -36 -9 -69q49 -73 49 -163z" />
+    <glyph glyph-name="star_half" unicode="&#xf089;" horiz-adv-x="896" 
+d="M832 1504v-1339l-449 -236q-22 -12 -40 -12q-21 0 -31.5 14.5t-10.5 35.5q0 6 2 20l86 500l-364 354q-25 27 -25 48q0 37 56 46l502 73l225 455q19 41 49 41z" />
+    <glyph glyph-name="heart_empty" unicode="&#xf08a;" horiz-adv-x="1792" 
+d="M1664 940q0 81 -21.5 143t-55 98.5t-81.5 59.5t-94 31t-98 8t-112 -25.5t-110.5 -64t-86.5 -72t-60 -61.5q-18 -22 -49 -22t-49 22q-24 28 -60 61.5t-86.5 72t-110.5 64t-112 25.5t-98 -8t-94 -31t-81.5 -59.5t-55 -98.5t-21.5 -143q0 -168 187 -355l581 -560l580 559
+q188 188 188 356zM1792 940q0 -221 -229 -450l-623 -600q-18 -18 -44 -18t-44 18l-624 602q-10 8 -27.5 26t-55.5 65.5t-68 97.5t-53.5 121t-23.5 138q0 220 127 344t351 124q62 0 126.5 -21.5t120 -58t95.5 -68.5t76 -68q36 36 76 68t95.5 68.5t120 58t126.5 21.5
+q224 0 351 -124t127 -344z" />
+    <glyph glyph-name="signout" unicode="&#xf08b;" horiz-adv-x="1664" 
+d="M640 96q0 -4 1 -20t0.5 -26.5t-3 -23.5t-10 -19.5t-20.5 -6.5h-320q-119 0 -203.5 84.5t-84.5 203.5v704q0 119 84.5 203.5t203.5 84.5h320q13 0 22.5 -9.5t9.5 -22.5q0 -4 1 -20t0.5 -26.5t-3 -23.5t-10 -19.5t-20.5 -6.5h-320q-66 0 -113 -47t-47 -113v-704
+q0 -66 47 -113t113 -47h288h11h13t11.5 -1t11.5 -3t8 -5.5t7 -9t2 -13.5zM1568 640q0 -26 -19 -45l-544 -544q-19 -19 -45 -19t-45 19t-19 45v288h-448q-26 0 -45 19t-19 45v384q0 26 19 45t45 19h448v288q0 26 19 45t45 19t45 -19l544 -544q19 -19 19 -45z" />
+    <glyph glyph-name="linkedin_sign" unicode="&#xf08c;" 
+d="M237 122h231v694h-231v-694zM483 1030q-1 52 -36 86t-93 34t-94.5 -34t-36.5 -86q0 -51 35.5 -85.5t92.5 -34.5h1q59 0 95 34.5t36 85.5zM1068 122h231v398q0 154 -73 233t-193 79q-136 0 -209 -117h2v101h-231q3 -66 0 -694h231v388q0 38 7 56q15 35 45 59.5t74 24.5
+q116 0 116 -157v-371zM1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="pushpin" unicode="&#xf08d;" horiz-adv-x="1152" 
+d="M480 672v448q0 14 -9 23t-23 9t-23 -9t-9 -23v-448q0 -14 9 -23t23 -9t23 9t9 23zM1152 320q0 -26 -19 -45t-45 -19h-429l-51 -483q-2 -12 -10.5 -20.5t-20.5 -8.5h-1q-27 0 -32 27l-76 485h-404q-26 0 -45 19t-19 45q0 123 78.5 221.5t177.5 98.5v512q-52 0 -90 38
+t-38 90t38 90t90 38h640q52 0 90 -38t38 -90t-38 -90t-90 -38v-512q99 0 177.5 -98.5t78.5 -221.5z" />
+    <glyph glyph-name="external_link" unicode="&#xf08e;" horiz-adv-x="1792" 
+d="M1408 608v-320q0 -119 -84.5 -203.5t-203.5 -84.5h-832q-119 0 -203.5 84.5t-84.5 203.5v832q0 119 84.5 203.5t203.5 84.5h704q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-704q-66 0 -113 -47t-47 -113v-832q0 -66 47 -113t113 -47h832q66 0 113 47t47 113v320
+q0 14 9 23t23 9h64q14 0 23 -9t9 -23zM1792 1472v-512q0 -26 -19 -45t-45 -19t-45 19l-176 176l-652 -652q-10 -10 -23 -10t-23 10l-114 114q-10 10 -10 23t10 23l652 652l-176 176q-19 19 -19 45t19 45t45 19h512q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="signin" unicode="&#xf090;" 
+d="M1184 640q0 -26 -19 -45l-544 -544q-19 -19 -45 -19t-45 19t-19 45v288h-448q-26 0 -45 19t-19 45v384q0 26 19 45t45 19h448v288q0 26 19 45t45 19t45 -19l544 -544q19 -19 19 -45zM1536 992v-704q0 -119 -84.5 -203.5t-203.5 -84.5h-320q-13 0 -22.5 9.5t-9.5 22.5
+q0 4 -1 20t-0.5 26.5t3 23.5t10 19.5t20.5 6.5h320q66 0 113 47t47 113v704q0 66 -47 113t-113 47h-288h-11h-13t-11.5 1t-11.5 3t-8 5.5t-7 9t-2 13.5q0 4 -1 20t-0.5 26.5t3 23.5t10 19.5t20.5 6.5h320q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="trophy" unicode="&#xf091;" horiz-adv-x="1664" 
+d="M458 653q-74 162 -74 371h-256v-96q0 -78 94.5 -162t235.5 -113zM1536 928v96h-256q0 -209 -74 -371q141 29 235.5 113t94.5 162zM1664 1056v-128q0 -71 -41.5 -143t-112 -130t-173 -97.5t-215.5 -44.5q-42 -54 -95 -95q-38 -34 -52.5 -72.5t-14.5 -89.5q0 -54 30.5 -91
+t97.5 -37q75 0 133.5 -45.5t58.5 -114.5v-64q0 -14 -9 -23t-23 -9h-832q-14 0 -23 9t-9 23v64q0 69 58.5 114.5t133.5 45.5q67 0 97.5 37t30.5 91q0 51 -14.5 89.5t-52.5 72.5q-53 41 -95 95q-113 5 -215.5 44.5t-173 97.5t-112 130t-41.5 143v128q0 40 28 68t68 28h288v96
+q0 66 47 113t113 47h576q66 0 113 -47t47 -113v-96h288q40 0 68 -28t28 -68z" />
+    <glyph glyph-name="github_sign" unicode="&#xf092;" 
+d="M519 336q4 6 -3 13q-9 7 -14 2q-4 -6 3 -13q9 -7 14 -2zM491 377q-5 7 -12 4q-6 -4 0 -12q7 -8 12 -5q6 4 0 13zM450 417q2 4 -5 8q-7 2 -8 -2q-3 -5 4 -8q8 -2 9 2zM471 394q2 1 1.5 4.5t-3.5 5.5q-6 7 -10 3t1 -11q6 -6 11 -2zM557 319q2 7 -9 11q-9 3 -13 -4
+q-2 -7 9 -11q9 -3 13 4zM599 316q0 8 -12 8q-10 0 -10 -8t11 -8t11 8zM638 323q-2 7 -13 5t-9 -9q2 -8 12 -6t10 10zM1280 640q0 212 -150 362t-362 150t-362 -150t-150 -362q0 -167 98 -300.5t252 -185.5q18 -3 26.5 5t8.5 20q0 52 -1 95q-6 -1 -15.5 -2.5t-35.5 -2t-48 4
+t-43.5 20t-29.5 41.5q-23 59 -57 74q-2 1 -4.5 3.5l-8 8t-7 9.5t4 7.5t19.5 3.5q6 0 15 -2t30 -15.5t33 -35.5q16 -28 37.5 -42t43.5 -14t38 3.5t30 9.5q7 47 33 69q-49 6 -86 18.5t-73 39t-55.5 76t-19.5 119.5q0 79 53 137q-24 62 5 136q19 6 54.5 -7.5t60.5 -29.5l26 -16
+q58 17 128 17t128 -17q11 7 28.5 18t55.5 26t57 9q29 -74 5 -136q53 -58 53 -137q0 -57 -14 -100.5t-35.5 -70t-53.5 -44.5t-62.5 -26t-68.5 -12q35 -31 35 -95q0 -40 -0.5 -89t-0.5 -51q0 -12 8.5 -20t26.5 -5q154 52 252 185.5t98 300.5zM1536 1120v-960
+q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="upload_alt" unicode="&#xf093;" horiz-adv-x="1664" 
+d="M1280 64q0 26 -19 45t-45 19t-45 -19t-19 -45t19 -45t45 -19t45 19t19 45zM1536 64q0 26 -19 45t-45 19t-45 -19t-19 -45t19 -45t45 -19t45 19t19 45zM1664 288v-320q0 -40 -28 -68t-68 -28h-1472q-40 0 -68 28t-28 68v320q0 40 28 68t68 28h427q21 -56 70.5 -92
+t110.5 -36h256q61 0 110.5 36t70.5 92h427q40 0 68 -28t28 -68zM1339 936q-17 -40 -59 -40h-256v-448q0 -26 -19 -45t-45 -19h-256q-26 0 -45 19t-19 45v448h-256q-42 0 -59 40q-17 39 14 69l448 448q18 19 45 19t45 -19l448 -448q31 -30 14 -69z" />
+    <glyph glyph-name="lemon" unicode="&#xf094;" 
+d="M1407 710q0 44 -7 113.5t-18 96.5q-12 30 -17 44t-9 36.5t-4 48.5q0 23 5 68.5t5 67.5q0 37 -10 55q-4 1 -13 1q-19 0 -58 -4.5t-59 -4.5q-60 0 -176 24t-175 24q-43 0 -94.5 -11.5t-85 -23.5t-89.5 -34q-137 -54 -202 -103q-96 -73 -159.5 -189.5t-88 -236t-24.5 -248.5
+q0 -40 12.5 -120t12.5 -121q0 -23 -11 -66.5t-11 -65.5t12 -36.5t34 -14.5q24 0 72.5 11t73.5 11q57 0 169.5 -15.5t169.5 -15.5q181 0 284 36q129 45 235.5 152.5t166 245.5t59.5 275zM1535 712q0 -165 -70 -327.5t-196 -288t-281 -180.5q-124 -44 -326 -44
+q-57 0 -170 14.5t-169 14.5q-24 0 -72.5 -14.5t-73.5 -14.5q-73 0 -123.5 55.5t-50.5 128.5q0 24 11 68t11 67q0 40 -12.5 120.5t-12.5 121.5q0 111 18 217.5t54.5 209.5t100.5 194t150 156q78 59 232 120q194 78 316 78q60 0 175.5 -24t173.5 -24q19 0 57 5t58 5
+q81 0 118 -50.5t37 -134.5q0 -23 -5 -68t-5 -68q0 -13 2 -25t3.5 -16.5t7.5 -20.5t8 -20q16 -40 25 -118.5t9 -136.5z" />
+    <glyph glyph-name="phone" unicode="&#xf095;" horiz-adv-x="1408" 
+d="M1408 296q0 -27 -10 -70.5t-21 -68.5q-21 -50 -122 -106q-94 -51 -186 -51q-27 0 -53 3.5t-57.5 12.5t-47 14.5t-55.5 20.5t-49 18q-98 35 -175 83q-127 79 -264 216t-216 264q-48 77 -83 175q-3 9 -18 49t-20.5 55.5t-14.5 47t-12.5 57.5t-3.5 53q0 92 51 186
+q56 101 106 122q25 11 68.5 21t70.5 10q14 0 21 -3q18 -6 53 -76q11 -19 30 -54t35 -63.5t31 -53.5q3 -4 17.5 -25t21.5 -35.5t7 -28.5q0 -20 -28.5 -50t-62 -55t-62 -53t-28.5 -46q0 -9 5 -22.5t8.5 -20.5t14 -24t11.5 -19q76 -137 174 -235t235 -174q2 -1 19 -11.5t24 -14
+t20.5 -8.5t22.5 -5q18 0 46 28.5t53 62t55 62t50 28.5q14 0 28.5 -7t35.5 -21.5t25 -17.5q25 -15 53.5 -31t63.5 -35t54 -30q70 -35 76 -53q3 -7 3 -21z" />
+    <glyph glyph-name="check_empty" unicode="&#xf096;" horiz-adv-x="1408" 
+d="M1120 1280h-832q-66 0 -113 -47t-47 -113v-832q0 -66 47 -113t113 -47h832q66 0 113 47t47 113v832q0 66 -47 113t-113 47zM1408 1120v-832q0 -119 -84.5 -203.5t-203.5 -84.5h-832q-119 0 -203.5 84.5t-84.5 203.5v832q0 119 84.5 203.5t203.5 84.5h832
+q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="bookmark_empty" unicode="&#xf097;" horiz-adv-x="1280" 
+d="M1152 1280h-1024v-1242l423 406l89 85l89 -85l423 -406v1242zM1164 1408q23 0 44 -9q33 -13 52.5 -41t19.5 -62v-1289q0 -34 -19.5 -62t-52.5 -41q-19 -8 -44 -8q-48 0 -83 32l-441 424l-441 -424q-36 -33 -83 -33q-23 0 -44 9q-33 13 -52.5 41t-19.5 62v1289
+q0 34 19.5 62t52.5 41q21 9 44 9h1048z" />
+    <glyph glyph-name="phone_sign" unicode="&#xf098;" 
+d="M1280 343q0 11 -2 16t-18 16.5t-40.5 25t-47.5 26.5t-45.5 25t-28.5 15q-5 3 -19 13t-25 15t-21 5q-15 0 -36.5 -20.5t-39.5 -45t-38.5 -45t-33.5 -20.5q-7 0 -16.5 3.5t-15.5 6.5t-17 9.5t-14 8.5q-99 55 -170 126.5t-127 170.5q-2 3 -8.5 14t-9.5 17t-6.5 15.5
+t-3.5 16.5q0 13 20.5 33.5t45 38.5t45 39.5t20.5 36.5q0 10 -5 21t-15 25t-13 19q-3 6 -15 28.5t-25 45.5t-26.5 47.5t-25 40.5t-16.5 18t-16 2q-48 0 -101 -22q-46 -21 -80 -94.5t-34 -130.5q0 -16 2.5 -34t5 -30.5t9 -33t10 -29.5t12.5 -33t11 -30q60 -164 216.5 -320.5
+t320.5 -216.5q6 -2 30 -11t33 -12.5t29.5 -10t33 -9t30.5 -5t34 -2.5q57 0 130.5 34t94.5 80q22 53 22 101zM1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z
+" />
+    <glyph glyph-name="twitter" unicode="&#xf099;" horiz-adv-x="1664" 
+d="M1620 1128q-67 -98 -162 -167q1 -14 1 -42q0 -130 -38 -259.5t-115.5 -248.5t-184.5 -210.5t-258 -146t-323 -54.5q-271 0 -496 145q35 -4 78 -4q225 0 401 138q-105 2 -188 64.5t-114 159.5q33 -5 61 -5q43 0 85 11q-112 23 -185.5 111.5t-73.5 205.5v4q68 -38 146 -41
+q-66 44 -105 115t-39 154q0 88 44 163q121 -149 294.5 -238.5t371.5 -99.5q-8 38 -8 74q0 134 94.5 228.5t228.5 94.5q140 0 236 -102q109 21 205 78q-37 -115 -142 -178q93 10 186 50z" />
+    <glyph glyph-name="facebook" unicode="&#xf09a;" horiz-adv-x="1024" 
+d="M959 1524v-264h-157q-86 0 -116 -36t-30 -108v-189h293l-39 -296h-254v-759h-306v759h-255v296h255v218q0 186 104 288.5t277 102.5q147 0 228 -12z" />
+    <glyph glyph-name="github" unicode="&#xf09b;" 
+d="M768 1408q209 0 385.5 -103t279.5 -279.5t103 -385.5q0 -251 -146.5 -451.5t-378.5 -277.5q-27 -5 -40 7t-13 30q0 3 0.5 76.5t0.5 134.5q0 97 -52 142q57 6 102.5 18t94 39t81 66.5t53 105t20.5 150.5q0 119 -79 206q37 91 -8 204q-28 9 -81 -11t-92 -44l-38 -24
+q-93 26 -192 26t-192 -26q-16 11 -42.5 27t-83.5 38.5t-85 13.5q-45 -113 -8 -204q-79 -87 -79 -206q0 -85 20.5 -150t52.5 -105t80.5 -67t94 -39t102.5 -18q-39 -36 -49 -103q-21 -10 -45 -15t-57 -5t-65.5 21.5t-55.5 62.5q-19 32 -48.5 52t-49.5 24l-20 3q-21 0 -29 -4.5
+t-5 -11.5t9 -14t13 -12l7 -5q22 -10 43.5 -38t31.5 -51l10 -23q13 -38 44 -61.5t67 -30t69.5 -7t55.5 3.5l23 4q0 -38 0.5 -88.5t0.5 -54.5q0 -18 -13 -30t-40 -7q-232 77 -378.5 277.5t-146.5 451.5q0 209 103 385.5t279.5 279.5t385.5 103zM291 305q3 7 -7 12
+q-10 3 -13 -2q-3 -7 7 -12q9 -6 13 2zM322 271q7 5 -2 16q-10 9 -16 3q-7 -5 2 -16q10 -10 16 -3zM352 226q9 7 0 19q-8 13 -17 6q-9 -5 0 -18t17 -7zM394 184q8 8 -4 19q-12 12 -20 3q-9 -8 4 -19q12 -12 20 -3zM451 159q3 11 -13 16q-15 4 -19 -7t13 -15q15 -6 19 6z
+M514 154q0 13 -17 11q-16 0 -16 -11q0 -13 17 -11q16 0 16 11zM572 164q-2 11 -18 9q-16 -3 -14 -15t18 -8t14 14z" />
+    <glyph glyph-name="unlock" unicode="&#xf09c;" horiz-adv-x="1664" 
+d="M1664 960v-256q0 -26 -19 -45t-45 -19h-64q-26 0 -45 19t-19 45v256q0 106 -75 181t-181 75t-181 -75t-75 -181v-192h96q40 0 68 -28t28 -68v-576q0 -40 -28 -68t-68 -28h-960q-40 0 -68 28t-28 68v576q0 40 28 68t68 28h672v192q0 185 131.5 316.5t316.5 131.5
+t316.5 -131.5t131.5 -316.5z" />
+    <glyph glyph-name="credit_card" unicode="&#xf09d;" horiz-adv-x="1920" 
+d="M1760 1408q66 0 113 -47t47 -113v-1216q0 -66 -47 -113t-113 -47h-1600q-66 0 -113 47t-47 113v1216q0 66 47 113t113 47h1600zM160 1280q-13 0 -22.5 -9.5t-9.5 -22.5v-224h1664v224q0 13 -9.5 22.5t-22.5 9.5h-1600zM1760 0q13 0 22.5 9.5t9.5 22.5v608h-1664v-608
+q0 -13 9.5 -22.5t22.5 -9.5h1600zM256 128v128h256v-128h-256zM640 128v128h384v-128h-384z" />
+    <glyph glyph-name="rss" unicode="&#xf09e;" horiz-adv-x="1408" 
+d="M384 192q0 -80 -56 -136t-136 -56t-136 56t-56 136t56 136t136 56t136 -56t56 -136zM896 69q2 -28 -17 -48q-18 -21 -47 -21h-135q-25 0 -43 16.5t-20 41.5q-22 229 -184.5 391.5t-391.5 184.5q-25 2 -41.5 20t-16.5 43v135q0 29 21 47q17 17 43 17h5q160 -13 306 -80.5
+t259 -181.5q114 -113 181.5 -259t80.5 -306zM1408 67q2 -27 -18 -47q-18 -20 -46 -20h-143q-26 0 -44.5 17.5t-19.5 42.5q-12 215 -101 408.5t-231.5 336t-336 231.5t-408.5 102q-25 1 -42.5 19.5t-17.5 43.5v143q0 28 20 46q18 18 44 18h3q262 -13 501.5 -120t425.5 -294
+q187 -186 294 -425.5t120 -501.5z" />
+    <glyph glyph-name="hdd" unicode="&#xf0a0;" 
+d="M1040 320q0 -33 -23.5 -56.5t-56.5 -23.5t-56.5 23.5t-23.5 56.5t23.5 56.5t56.5 23.5t56.5 -23.5t23.5 -56.5zM1296 320q0 -33 -23.5 -56.5t-56.5 -23.5t-56.5 23.5t-23.5 56.5t23.5 56.5t56.5 23.5t56.5 -23.5t23.5 -56.5zM1408 160v320q0 13 -9.5 22.5t-22.5 9.5
+h-1216q-13 0 -22.5 -9.5t-9.5 -22.5v-320q0 -13 9.5 -22.5t22.5 -9.5h1216q13 0 22.5 9.5t9.5 22.5zM178 640h1180l-157 482q-4 13 -16 21.5t-26 8.5h-782q-14 0 -26 -8.5t-16 -21.5zM1536 480v-320q0 -66 -47 -113t-113 -47h-1216q-66 0 -113 47t-47 113v320q0 25 16 75
+l197 606q17 53 63 86t101 33h782q55 0 101 -33t63 -86l197 -606q16 -50 16 -75z" />
+    <glyph glyph-name="bullhorn" unicode="&#xf0a1;" horiz-adv-x="1792" 
+d="M1664 896q53 0 90.5 -37.5t37.5 -90.5t-37.5 -90.5t-90.5 -37.5v-384q0 -52 -38 -90t-90 -38q-417 347 -812 380q-58 -19 -91 -66t-31 -100.5t40 -92.5q-20 -33 -23 -65.5t6 -58t33.5 -55t48 -50t61.5 -50.5q-29 -58 -111.5 -83t-168.5 -11.5t-132 55.5q-7 23 -29.5 87.5
+t-32 94.5t-23 89t-15 101t3.5 98.5t22 110.5h-122q-66 0 -113 47t-47 113v192q0 66 47 113t113 47h480q435 0 896 384q52 0 90 -38t38 -90v-384zM1536 292v954q-394 -302 -768 -343v-270q377 -42 768 -341z" />
+    <glyph glyph-name="bell" unicode="&#xf0a2;" horiz-adv-x="1792" 
+d="M912 -160q0 16 -16 16q-59 0 -101.5 42.5t-42.5 101.5q0 16 -16 16t-16 -16q0 -73 51.5 -124.5t124.5 -51.5q16 0 16 16zM246 128h1300q-266 300 -266 832q0 51 -24 105t-69 103t-121.5 80.5t-169.5 31.5t-169.5 -31.5t-121.5 -80.5t-69 -103t-24 -105q0 -532 -266 -832z
+M1728 128q0 -52 -38 -90t-90 -38h-448q0 -106 -75 -181t-181 -75t-181 75t-75 181h-448q-52 0 -90 38t-38 90q50 42 91 88t85 119.5t74.5 158.5t50 206t19.5 260q0 152 117 282.5t307 158.5q-8 19 -8 39q0 40 28 68t68 28t68 -28t28 -68q0 -20 -8 -39q190 -28 307 -158.5
+t117 -282.5q0 -139 19.5 -260t50 -206t74.5 -158.5t85 -119.5t91 -88z" />
+    <glyph glyph-name="certificate" unicode="&#xf0a3;" 
+d="M1376 640l138 -135q30 -28 20 -70q-12 -41 -52 -51l-188 -48l53 -186q12 -41 -19 -70q-29 -31 -70 -19l-186 53l-48 -188q-10 -40 -51 -52q-12 -2 -19 -2q-31 0 -51 22l-135 138l-135 -138q-28 -30 -70 -20q-41 11 -51 52l-48 188l-186 -53q-41 -12 -70 19q-31 29 -19 70
+l53 186l-188 48q-40 10 -52 51q-10 42 20 70l138 135l-138 135q-30 28 -20 70q12 41 52 51l188 48l-53 186q-12 41 19 70q29 31 70 19l186 -53l48 188q10 41 51 51q41 12 70 -19l135 -139l135 139q29 30 70 19q41 -10 51 -51l48 -188l186 53q41 12 70 -19q31 -29 19 -70
+l-53 -186l188 -48q40 -10 52 -51q10 -42 -20 -70z" />
+    <glyph glyph-name="hand_right" unicode="&#xf0a4;" horiz-adv-x="1792" 
+d="M256 192q0 26 -19 45t-45 19t-45 -19t-19 -45t19 -45t45 -19t45 19t19 45zM1664 768q0 51 -39 89.5t-89 38.5h-576q0 20 15 48.5t33 55t33 68t15 84.5q0 67 -44.5 97.5t-115.5 30.5q-24 0 -90 -139q-24 -44 -37 -65q-40 -64 -112 -145q-71 -81 -101 -106
+q-69 -57 -140 -57h-32v-640h32q72 0 167 -32t193.5 -64t179.5 -32q189 0 189 167q0 26 -5 56q30 16 47.5 52.5t17.5 73.5t-18 69q53 50 53 119q0 25 -10 55.5t-25 47.5h331q52 0 90 38t38 90zM1792 769q0 -105 -75.5 -181t-180.5 -76h-169q-4 -62 -37 -119q3 -21 3 -43
+q0 -101 -60 -178q1 -139 -85 -219.5t-227 -80.5q-133 0 -322 69q-164 59 -223 59h-288q-53 0 -90.5 37.5t-37.5 90.5v640q0 53 37.5 90.5t90.5 37.5h288q10 0 21.5 4.5t23.5 14t22.5 18t24 22.5t20.5 21.5t19 21.5t14 17q65 74 100 129q13 21 33 62t37 72t40.5 63t55 49.5
+t69.5 17.5q125 0 206.5 -67t81.5 -189q0 -68 -22 -128h374q104 0 180 -76t76 -179z" />
+    <glyph glyph-name="hand_left" unicode="&#xf0a5;" horiz-adv-x="1792" 
+d="M1376 128h32v640h-32q-35 0 -67.5 12t-62.5 37t-50 46t-49 54q-8 9 -12 14q-72 81 -112 145q-14 22 -38 68q-1 3 -10.5 22.5t-18.5 36t-20 35.5t-21.5 30.5t-18.5 11.5q-71 0 -115.5 -30.5t-44.5 -97.5q0 -43 15 -84.5t33 -68t33 -55t15 -48.5h-576q-50 0 -89 -38.5
+t-39 -89.5q0 -52 38 -90t90 -38h331q-15 -17 -25 -47.5t-10 -55.5q0 -69 53 -119q-18 -32 -18 -69t17.5 -73.5t47.5 -52.5q-4 -24 -4 -56q0 -85 48.5 -126t135.5 -41q84 0 183 32t194 64t167 32zM1664 192q0 26 -19 45t-45 19t-45 -19t-19 -45t19 -45t45 -19t45 19t19 45z
+M1792 768v-640q0 -53 -37.5 -90.5t-90.5 -37.5h-288q-59 0 -223 -59q-190 -69 -317 -69q-142 0 -230 77.5t-87 217.5l1 5q-61 76 -61 178q0 22 3 43q-33 57 -37 119h-169q-105 0 -180.5 76t-75.5 181q0 103 76 179t180 76h374q-22 60 -22 128q0 122 81.5 189t206.5 67
+q38 0 69.5 -17.5t55 -49.5t40.5 -63t37 -72t33 -62q35 -55 100 -129q2 -3 14 -17t19 -21.5t20.5 -21.5t24 -22.5t22.5 -18t23.5 -14t21.5 -4.5h288q53 0 90.5 -37.5t37.5 -90.5z" />
+    <glyph glyph-name="hand_up" unicode="&#xf0a6;" 
+d="M1280 -64q0 26 -19 45t-45 19t-45 -19t-19 -45t19 -45t45 -19t45 19t19 45zM1408 700q0 189 -167 189q-26 0 -56 -5q-16 30 -52.5 47.5t-73.5 17.5t-69 -18q-50 53 -119 53q-25 0 -55.5 -10t-47.5 -25v331q0 52 -38 90t-90 38q-51 0 -89.5 -39t-38.5 -89v-576
+q-20 0 -48.5 15t-55 33t-68 33t-84.5 15q-67 0 -97.5 -44.5t-30.5 -115.5q0 -24 139 -90q44 -24 65 -37q64 -40 145 -112q81 -71 106 -101q57 -69 57 -140v-32h640v32q0 72 32 167t64 193.5t32 179.5zM1536 705q0 -133 -69 -322q-59 -164 -59 -223v-288q0 -53 -37.5 -90.5
+t-90.5 -37.5h-640q-53 0 -90.5 37.5t-37.5 90.5v288q0 10 -4.5 21.5t-14 23.5t-18 22.5t-22.5 24t-21.5 20.5t-21.5 19t-17 14q-74 65 -129 100q-21 13 -62 33t-72 37t-63 40.5t-49.5 55t-17.5 69.5q0 125 67 206.5t189 81.5q68 0 128 -22v374q0 104 76 180t179 76
+q105 0 181 -75.5t76 -180.5v-169q62 -4 119 -37q21 3 43 3q101 0 178 -60q139 1 219.5 -85t80.5 -227z" />
+    <glyph glyph-name="hand_down" unicode="&#xf0a7;" 
+d="M1408 576q0 84 -32 183t-64 194t-32 167v32h-640v-32q0 -35 -12 -67.5t-37 -62.5t-46 -50t-54 -49q-9 -8 -14 -12q-81 -72 -145 -112q-22 -14 -68 -38q-3 -1 -22.5 -10.5t-36 -18.5t-35.5 -20t-30.5 -21.5t-11.5 -18.5q0 -71 30.5 -115.5t97.5 -44.5q43 0 84.5 15t68 33
+t55 33t48.5 15v-576q0 -50 38.5 -89t89.5 -39q52 0 90 38t38 90v331q46 -35 103 -35q69 0 119 53q32 -18 69 -18t73.5 17.5t52.5 47.5q24 -4 56 -4q85 0 126 48.5t41 135.5zM1280 1344q0 26 -19 45t-45 19t-45 -19t-19 -45t19 -45t45 -19t45 19t19 45zM1536 580
+q0 -142 -77.5 -230t-217.5 -87l-5 1q-76 -61 -178 -61q-22 0 -43 3q-54 -30 -119 -37v-169q0 -105 -76 -180.5t-181 -75.5q-103 0 -179 76t-76 180v374q-54 -22 -128 -22q-121 0 -188.5 81.5t-67.5 206.5q0 38 17.5 69.5t49.5 55t63 40.5t72 37t62 33q55 35 129 100
+q3 2 17 14t21.5 19t21.5 20.5t22.5 24t18 22.5t14 23.5t4.5 21.5v288q0 53 37.5 90.5t90.5 37.5h640q53 0 90.5 -37.5t37.5 -90.5v-288q0 -59 59 -223q69 -190 69 -317z" />
+    <glyph glyph-name="circle_arrow_left" unicode="&#xf0a8;" 
+d="M1280 576v128q0 26 -19 45t-45 19h-502l189 189q19 19 19 45t-19 45l-91 91q-18 18 -45 18t-45 -18l-362 -362l-91 -91q-18 -18 -18 -45t18 -45l91 -91l362 -362q18 -18 45 -18t45 18l91 91q18 18 18 45t-18 45l-189 189h502q26 0 45 19t19 45zM1536 640
+q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="circle_arrow_right" unicode="&#xf0a9;" 
+d="M1285 640q0 27 -18 45l-91 91l-362 362q-18 18 -45 18t-45 -18l-91 -91q-18 -18 -18 -45t18 -45l189 -189h-502q-26 0 -45 -19t-19 -45v-128q0 -26 19 -45t45 -19h502l-189 -189q-19 -19 -19 -45t19 -45l91 -91q18 -18 45 -18t45 18l362 362l91 91q18 18 18 45zM1536 640
+q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="circle_arrow_up" unicode="&#xf0aa;" 
+d="M1284 641q0 27 -18 45l-362 362l-91 91q-18 18 -45 18t-45 -18l-91 -91l-362 -362q-18 -18 -18 -45t18 -45l91 -91q18 -18 45 -18t45 18l189 189v-502q0 -26 19 -45t45 -19h128q26 0 45 19t19 45v502l189 -189q19 -19 45 -19t45 19l91 91q18 18 18 45zM1536 640
+q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="circle_arrow_down" unicode="&#xf0ab;" 
+d="M1284 639q0 27 -18 45l-91 91q-18 18 -45 18t-45 -18l-189 -189v502q0 26 -19 45t-45 19h-128q-26 0 -45 -19t-19 -45v-502l-189 189q-19 19 -45 19t-45 -19l-91 -91q-18 -18 -18 -45t18 -45l362 -362l91 -91q18 -18 45 -18t45 18l91 91l362 362q18 18 18 45zM1536 640
+q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="globe" unicode="&#xf0ac;" 
+d="M768 1408q209 0 385.5 -103t279.5 -279.5t103 -385.5t-103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103zM1042 887q-2 -1 -9.5 -9.5t-13.5 -9.5q2 0 4.5 5t5 11t3.5 7q6 7 22 15q14 6 52 12q34 8 51 -11
+q-2 2 9.5 13t14.5 12q3 2 15 4.5t15 7.5l2 22q-12 -1 -17.5 7t-6.5 21q0 -2 -6 -8q0 7 -4.5 8t-11.5 -1t-9 -1q-10 3 -15 7.5t-8 16.5t-4 15q-2 5 -9.5 11t-9.5 10q-1 2 -2.5 5.5t-3 6.5t-4 5.5t-5.5 2.5t-7 -5t-7.5 -10t-4.5 -5q-3 2 -6 1.5t-4.5 -1t-4.5 -3t-5 -3.5
+q-3 -2 -8.5 -3t-8.5 -2q15 5 -1 11q-10 4 -16 3q9 4 7.5 12t-8.5 14h5q-1 4 -8.5 8.5t-17.5 8.5t-13 6q-8 5 -34 9.5t-33 0.5q-5 -6 -4.5 -10.5t4 -14t3.5 -12.5q1 -6 -5.5 -13t-6.5 -12q0 -7 14 -15.5t10 -21.5q-3 -8 -16 -16t-16 -12q-5 -8 -1.5 -18.5t10.5 -16.5
+q2 -2 1.5 -4t-3.5 -4.5t-5.5 -4t-6.5 -3.5l-3 -2q-11 -5 -20.5 6t-13.5 26q-7 25 -16 30q-23 8 -29 -1q-5 13 -41 26q-25 9 -58 4q6 1 0 15q-7 15 -19 12q3 6 4 17.5t1 13.5q3 13 12 23q1 1 7 8.5t9.5 13.5t0.5 6q35 -4 50 11q5 5 11.5 17t10.5 17q9 6 14 5.5t14.5 -5.5
+t14.5 -5q14 -1 15.5 11t-7.5 20q12 -1 3 17q-4 7 -8 9q-12 4 -27 -5q-8 -4 2 -8q-1 1 -9.5 -10.5t-16.5 -17.5t-16 5q-1 1 -5.5 13.5t-9.5 13.5q-8 0 -16 -15q3 8 -11 15t-24 8q19 12 -8 27q-7 4 -20.5 5t-19.5 -4q-5 -7 -5.5 -11.5t5 -8t10.5 -5.5t11.5 -4t8.5 -3
+q14 -10 8 -14q-2 -1 -8.5 -3.5t-11.5 -4.5t-6 -4q-3 -4 0 -14t-2 -14q-5 5 -9 17.5t-7 16.5q7 -9 -25 -6l-10 1q-4 0 -16 -2t-20.5 -1t-13.5 8q-4 8 0 20q1 4 4 2q-4 3 -11 9.5t-10 8.5q-46 -15 -94 -41q6 -1 12 1q5 2 13 6.5t10 5.5q34 14 42 7l5 5q14 -16 20 -25
+q-7 4 -30 1q-20 -6 -22 -12q7 -12 5 -18q-4 3 -11.5 10t-14.5 11t-15 5q-16 0 -22 -1q-146 -80 -235 -222q7 -7 12 -8q4 -1 5 -9t2.5 -11t11.5 3q9 -8 3 -19q1 1 44 -27q19 -17 21 -21q3 -11 -10 -18q-1 2 -9 9t-9 4q-3 -5 0.5 -18.5t10.5 -12.5q-7 0 -9.5 -16t-2.5 -35.5
+t-1 -23.5l2 -1q-3 -12 5.5 -34.5t21.5 -19.5q-13 -3 20 -43q6 -8 8 -9q3 -2 12 -7.5t15 -10t10 -10.5q4 -5 10 -22.5t14 -23.5q-2 -6 9.5 -20t10.5 -23q-1 0 -2.5 -1t-2.5 -1q3 -7 15.5 -14t15.5 -13q1 -3 2 -10t3 -11t8 -2q2 20 -24 62q-15 25 -17 29q-3 5 -5.5 15.5
+t-4.5 14.5q2 0 6 -1.5t8.5 -3.5t7.5 -4t2 -3q-3 -7 2 -17.5t12 -18.5t17 -19t12 -13q6 -6 14 -19.5t0 -13.5q9 0 20 -10.5t17 -19.5q5 -8 8 -26t5 -24q2 -7 8.5 -13.5t12.5 -9.5l16 -8t13 -7q5 -2 18.5 -10.5t21.5 -11.5q10 -4 16 -4t14.5 2.5t13.5 3.5q15 2 29 -15t21 -21
+q36 -19 55 -11q-2 -1 0.5 -7.5t8 -15.5t9 -14.5t5.5 -8.5q5 -6 18 -15t18 -15q6 4 7 9q-3 -8 7 -20t18 -10q14 3 14 32q-31 -15 -49 18q0 1 -2.5 5.5t-4 8.5t-2.5 8.5t0 7.5t5 3q9 0 10 3.5t-2 12.5t-4 13q-1 8 -11 20t-12 15q-5 -9 -16 -8t-16 9q0 -1 -1.5 -5.5t-1.5 -6.5
+q-13 0 -15 1q1 3 2.5 17.5t3.5 22.5q1 4 5.5 12t7.5 14.5t4 12.5t-4.5 9.5t-17.5 2.5q-19 -1 -26 -20q-1 -3 -3 -10.5t-5 -11.5t-9 -7q-7 -3 -24 -2t-24 5q-13 8 -22.5 29t-9.5 37q0 10 2.5 26.5t3 25t-5.5 24.5q3 2 9 9.5t10 10.5q2 1 4.5 1.5t4.5 0t4 1.5t3 6q-1 1 -4 3
+q-3 3 -4 3q7 -3 28.5 1.5t27.5 -1.5q15 -11 22 2q0 1 -2.5 9.5t-0.5 13.5q5 -27 29 -9q3 -3 15.5 -5t17.5 -5q3 -2 7 -5.5t5.5 -4.5t5 0.5t8.5 6.5q10 -14 12 -24q11 -40 19 -44q7 -3 11 -2t4.5 9.5t0 14t-1.5 12.5l-1 8v18l-1 8q-15 3 -18.5 12t1.5 18.5t15 18.5q1 1 8 3.5
+t15.5 6.5t12.5 8q21 19 15 35q7 0 11 9q-1 0 -5 3t-7.5 5t-4.5 2q9 5 2 16q5 3 7.5 11t7.5 10q9 -12 21 -2q8 8 1 16q5 7 20.5 10.5t18.5 9.5q7 -2 8 2t1 12t3 12q4 5 15 9t13 5l17 11q3 4 0 4q18 -2 31 11q10 11 -6 20q3 6 -3 9.5t-15 5.5q3 1 11.5 0.5t10.5 1.5
+q15 10 -7 16q-17 5 -43 -12zM879 10q206 36 351 189q-3 3 -12.5 4.5t-12.5 3.5q-18 7 -24 8q1 7 -2.5 13t-8 9t-12.5 8t-11 7q-2 2 -7 6t-7 5.5t-7.5 4.5t-8.5 2t-10 -1l-3 -1q-3 -1 -5.5 -2.5t-5.5 -3t-4 -3t0 -2.5q-21 17 -36 22q-5 1 -11 5.5t-10.5 7t-10 1.5t-11.5 -7
+q-5 -5 -6 -15t-2 -13q-7 5 0 17.5t2 18.5q-3 6 -10.5 4.5t-12 -4.5t-11.5 -8.5t-9 -6.5t-8.5 -5.5t-8.5 -7.5q-3 -4 -6 -12t-5 -11q-2 4 -11.5 6.5t-9.5 5.5q2 -10 4 -35t5 -38q7 -31 -12 -48q-27 -25 -29 -40q-4 -22 12 -26q0 -7 -8 -20.5t-7 -21.5q0 -6 2 -16z" />
+    <glyph glyph-name="wrench" unicode="&#xf0ad;" horiz-adv-x="1664" 
+d="M384 64q0 26 -19 45t-45 19t-45 -19t-19 -45t19 -45t45 -19t45 19t19 45zM1028 484l-682 -682q-37 -37 -90 -37q-52 0 -91 37l-106 108q-38 36 -38 90q0 53 38 91l681 681q39 -98 114.5 -173.5t173.5 -114.5zM1662 919q0 -39 -23 -106q-47 -134 -164.5 -217.5
+t-258.5 -83.5q-185 0 -316.5 131.5t-131.5 316.5t131.5 316.5t316.5 131.5q58 0 121.5 -16.5t107.5 -46.5q16 -11 16 -28t-16 -28l-293 -169v-224l193 -107q5 3 79 48.5t135.5 81t70.5 35.5q15 0 23.5 -10t8.5 -25z" />
+    <glyph glyph-name="tasks" unicode="&#xf0ae;" horiz-adv-x="1792" 
+d="M1024 128h640v128h-640v-128zM640 640h1024v128h-1024v-128zM1280 1152h384v128h-384v-128zM1792 320v-256q0 -26 -19 -45t-45 -19h-1664q-26 0 -45 19t-19 45v256q0 26 19 45t45 19h1664q26 0 45 -19t19 -45zM1792 832v-256q0 -26 -19 -45t-45 -19h-1664q-26 0 -45 19
+t-19 45v256q0 26 19 45t45 19h1664q26 0 45 -19t19 -45zM1792 1344v-256q0 -26 -19 -45t-45 -19h-1664q-26 0 -45 19t-19 45v256q0 26 19 45t45 19h1664q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="filter" unicode="&#xf0b0;" horiz-adv-x="1408" 
+d="M1403 1241q17 -41 -14 -70l-493 -493v-742q0 -42 -39 -59q-13 -5 -25 -5q-27 0 -45 19l-256 256q-19 19 -19 45v486l-493 493q-31 29 -14 70q17 39 59 39h1280q42 0 59 -39z" />
+    <glyph glyph-name="briefcase" unicode="&#xf0b1;" horiz-adv-x="1792" 
+d="M640 1280h512v128h-512v-128zM1792 640v-480q0 -66 -47 -113t-113 -47h-1472q-66 0 -113 47t-47 113v480h672v-160q0 -26 19 -45t45 -19h320q26 0 45 19t19 45v160h672zM1024 640v-128h-256v128h256zM1792 1120v-384h-1792v384q0 66 47 113t113 47h352v160q0 40 28 68
+t68 28h576q40 0 68 -28t28 -68v-160h352q66 0 113 -47t47 -113z" />
+    <glyph glyph-name="fullscreen" unicode="&#xf0b2;" 
+d="M1283 995l-355 -355l355 -355l144 144q29 31 70 14q39 -17 39 -59v-448q0 -26 -19 -45t-45 -19h-448q-42 0 -59 40q-17 39 14 69l144 144l-355 355l-355 -355l144 -144q31 -30 14 -69q-17 -40 -59 -40h-448q-26 0 -45 19t-19 45v448q0 42 40 59q39 17 69 -14l144 -144
+l355 355l-355 355l-144 -144q-19 -19 -45 -19q-12 0 -24 5q-40 17 -40 59v448q0 26 19 45t45 19h448q42 0 59 -40q17 -39 -14 -69l-144 -144l355 -355l355 355l-144 144q-31 30 -14 69q17 40 59 40h448q26 0 45 -19t19 -45v-448q0 -42 -39 -59q-13 -5 -25 -5q-26 0 -45 19z
+" />
+    <glyph glyph-name="group" unicode="&#xf0c0;" horiz-adv-x="1920" 
+d="M593 640q-162 -5 -265 -128h-134q-82 0 -138 40.5t-56 118.5q0 353 124 353q6 0 43.5 -21t97.5 -42.5t119 -21.5q67 0 133 23q-5 -37 -5 -66q0 -139 81 -256zM1664 3q0 -120 -73 -189.5t-194 -69.5h-874q-121 0 -194 69.5t-73 189.5q0 53 3.5 103.5t14 109t26.5 108.5
+t43 97.5t62 81t85.5 53.5t111.5 20q10 0 43 -21.5t73 -48t107 -48t135 -21.5t135 21.5t107 48t73 48t43 21.5q61 0 111.5 -20t85.5 -53.5t62 -81t43 -97.5t26.5 -108.5t14 -109t3.5 -103.5zM640 1280q0 -106 -75 -181t-181 -75t-181 75t-75 181t75 181t181 75t181 -75
+t75 -181zM1344 896q0 -159 -112.5 -271.5t-271.5 -112.5t-271.5 112.5t-112.5 271.5t112.5 271.5t271.5 112.5t271.5 -112.5t112.5 -271.5zM1920 671q0 -78 -56 -118.5t-138 -40.5h-134q-103 123 -265 128q81 117 81 256q0 29 -5 66q66 -23 133 -23q59 0 119 21.5t97.5 42.5
+t43.5 21q124 0 124 -353zM1792 1280q0 -106 -75 -181t-181 -75t-181 75t-75 181t75 181t181 75t181 -75t75 -181z" />
+    <glyph glyph-name="link" unicode="&#xf0c1;" horiz-adv-x="1664" 
+d="M1456 320q0 40 -28 68l-208 208q-28 28 -68 28q-42 0 -72 -32q3 -3 19 -18.5t21.5 -21.5t15 -19t13 -25.5t3.5 -27.5q0 -40 -28 -68t-68 -28q-15 0 -27.5 3.5t-25.5 13t-19 15t-21.5 21.5t-18.5 19q-33 -31 -33 -73q0 -40 28 -68l206 -207q27 -27 68 -27q40 0 68 26
+l147 146q28 28 28 67zM753 1025q0 40 -28 68l-206 207q-28 28 -68 28q-39 0 -68 -27l-147 -146q-28 -28 -28 -67q0 -40 28 -68l208 -208q27 -27 68 -27q42 0 72 31q-3 3 -19 18.5t-21.5 21.5t-15 19t-13 25.5t-3.5 27.5q0 40 28 68t68 28q15 0 27.5 -3.5t25.5 -13t19 -15
+t21.5 -21.5t18.5 -19q33 31 33 73zM1648 320q0 -120 -85 -203l-147 -146q-83 -83 -203 -83q-121 0 -204 85l-206 207q-83 83 -83 203q0 123 88 209l-88 88q-86 -88 -208 -88q-120 0 -204 84l-208 208q-84 84 -84 204t85 203l147 146q83 83 203 83q121 0 204 -85l206 -207
+q83 -83 83 -203q0 -123 -88 -209l88 -88q86 88 208 88q120 0 204 -84l208 -208q84 -84 84 -204z" />
+    <glyph glyph-name="cloud" unicode="&#xf0c2;" horiz-adv-x="1920" 
+d="M1920 384q0 -159 -112.5 -271.5t-271.5 -112.5h-1088q-185 0 -316.5 131.5t-131.5 316.5q0 132 71 241.5t187 163.5q-2 28 -2 43q0 212 150 362t362 150q158 0 286.5 -88t187.5 -230q70 62 166 62q106 0 181 -75t75 -181q0 -75 -41 -138q129 -30 213 -134.5t84 -239.5z
+" />
+    <glyph glyph-name="beaker" unicode="&#xf0c3;" horiz-adv-x="1664" 
+d="M1527 88q56 -89 21.5 -152.5t-140.5 -63.5h-1152q-106 0 -140.5 63.5t21.5 152.5l503 793v399h-64q-26 0 -45 19t-19 45t19 45t45 19h512q26 0 45 -19t19 -45t-19 -45t-45 -19h-64v-399zM748 813l-272 -429h712l-272 429l-20 31v37v399h-128v-399v-37z" />
+    <glyph glyph-name="cut" unicode="&#xf0c4;" horiz-adv-x="1792" 
+d="M960 640q26 0 45 -19t19 -45t-19 -45t-45 -19t-45 19t-19 45t19 45t45 19zM1260 576l507 -398q28 -20 25 -56q-5 -35 -35 -51l-128 -64q-13 -7 -29 -7q-17 0 -31 8l-690 387l-110 -66q-8 -4 -12 -5q14 -49 10 -97q-7 -77 -56 -147.5t-132 -123.5q-132 -84 -277 -84
+q-136 0 -222 78q-90 84 -79 207q7 76 56 147t131 124q132 84 278 84q83 0 151 -31q9 13 22 22l122 73l-122 73q-13 9 -22 22q-68 -31 -151 -31q-146 0 -278 84q-82 53 -131 124t-56 147q-5 59 15.5 113t63.5 93q85 79 222 79q145 0 277 -84q83 -52 132 -123t56 -148
+q4 -48 -10 -97q4 -1 12 -5l110 -66l690 387q14 8 31 8q16 0 29 -7l128 -64q30 -16 35 -51q3 -36 -25 -56zM579 836q46 42 21 108t-106 117q-92 59 -192 59q-74 0 -113 -36q-46 -42 -21 -108t106 -117q92 -59 192 -59q74 0 113 36zM494 91q81 51 106 117t-21 108
+q-39 36 -113 36q-100 0 -192 -59q-81 -51 -106 -117t21 -108q39 -36 113 -36q100 0 192 59zM672 704l96 -58v11q0 36 33 56l14 8l-79 47l-26 -26q-3 -3 -10 -11t-12 -12q-2 -2 -4 -3.5t-3 -2.5zM896 480l96 -32l736 576l-128 64l-768 -431v-113l-160 -96l9 -8q2 -2 7 -6
+q4 -4 11 -12t11 -12l26 -26zM1600 64l128 64l-520 408l-177 -138q-2 -3 -13 -7z" />
+    <glyph glyph-name="copy" unicode="&#xf0c5;" horiz-adv-x="1792" 
+d="M1696 1152q40 0 68 -28t28 -68v-1216q0 -40 -28 -68t-68 -28h-960q-40 0 -68 28t-28 68v288h-544q-40 0 -68 28t-28 68v672q0 40 20 88t48 76l408 408q28 28 76 48t88 20h416q40 0 68 -28t28 -68v-328q68 40 128 40h416zM1152 939l-299 -299h299v299zM512 1323l-299 -299
+h299v299zM708 676l316 316v416h-384v-416q0 -40 -28 -68t-68 -28h-416v-640h512v256q0 40 20 88t48 76zM1664 -128v1152h-384v-416q0 -40 -28 -68t-68 -28h-416v-640h896z" />
+    <glyph glyph-name="paper_clip" unicode="&#xf0c6;" horiz-adv-x="1408" 
+d="M1404 151q0 -117 -79 -196t-196 -79q-135 0 -235 100l-777 776q-113 115 -113 271q0 159 110 270t269 111q158 0 273 -113l605 -606q10 -10 10 -22q0 -16 -30.5 -46.5t-46.5 -30.5q-13 0 -23 10l-606 607q-79 77 -181 77q-106 0 -179 -75t-73 -181q0 -105 76 -181
+l776 -777q63 -63 145 -63q64 0 106 42t42 106q0 82 -63 145l-581 581q-26 24 -60 24q-29 0 -48 -19t-19 -48q0 -32 25 -59l410 -410q10 -10 10 -22q0 -16 -31 -47t-47 -31q-12 0 -22 10l-410 410q-63 61 -63 149q0 82 57 139t139 57q88 0 149 -63l581 -581q100 -98 100 -235
+z" />
+    <glyph glyph-name="save" unicode="&#xf0c7;" 
+d="M384 0h768v384h-768v-384zM1280 0h128v896q0 14 -10 38.5t-20 34.5l-281 281q-10 10 -34 20t-39 10v-416q0 -40 -28 -68t-68 -28h-576q-40 0 -68 28t-28 68v416h-128v-1280h128v416q0 40 28 68t68 28h832q40 0 68 -28t28 -68v-416zM896 928v320q0 13 -9.5 22.5t-22.5 9.5
+h-192q-13 0 -22.5 -9.5t-9.5 -22.5v-320q0 -13 9.5 -22.5t22.5 -9.5h192q13 0 22.5 9.5t9.5 22.5zM1536 896v-928q0 -40 -28 -68t-68 -28h-1344q-40 0 -68 28t-28 68v1344q0 40 28 68t68 28h928q40 0 88 -20t76 -48l280 -280q28 -28 48 -76t20 -88z" />
+    <glyph glyph-name="sign_blank" unicode="&#xf0c8;" 
+d="M1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="reorder" unicode="&#xf0c9;" 
+d="M1536 192v-128q0 -26 -19 -45t-45 -19h-1408q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h1408q26 0 45 -19t19 -45zM1536 704v-128q0 -26 -19 -45t-45 -19h-1408q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h1408q26 0 45 -19t19 -45zM1536 1216v-128q0 -26 -19 -45
+t-45 -19h-1408q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h1408q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="ul" unicode="&#xf0ca;" horiz-adv-x="1792" 
+d="M384 128q0 -80 -56 -136t-136 -56t-136 56t-56 136t56 136t136 56t136 -56t56 -136zM384 640q0 -80 -56 -136t-136 -56t-136 56t-56 136t56 136t136 56t136 -56t56 -136zM1792 224v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-1216q-13 0 -22.5 9.5t-9.5 22.5v192q0 13 9.5 22.5
+t22.5 9.5h1216q13 0 22.5 -9.5t9.5 -22.5zM384 1152q0 -80 -56 -136t-136 -56t-136 56t-56 136t56 136t136 56t136 -56t56 -136zM1792 736v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-1216q-13 0 -22.5 9.5t-9.5 22.5v192q0 13 9.5 22.5t22.5 9.5h1216q13 0 22.5 -9.5t9.5 -22.5z
+M1792 1248v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-1216q-13 0 -22.5 9.5t-9.5 22.5v192q0 13 9.5 22.5t22.5 9.5h1216q13 0 22.5 -9.5t9.5 -22.5z" />
+    <glyph glyph-name="ol" unicode="&#xf0cb;" horiz-adv-x="1792" 
+d="M381 -84q0 -80 -54.5 -126t-135.5 -46q-106 0 -172 66l57 88q49 -45 106 -45q29 0 50.5 14.5t21.5 42.5q0 64 -105 56l-26 56q8 10 32.5 43.5t42.5 54t37 38.5v1q-16 0 -48.5 -1t-48.5 -1v-53h-106v152h333v-88l-95 -115q51 -12 81 -49t30 -88zM383 543v-159h-362
+q-6 36 -6 54q0 51 23.5 93t56.5 68t66 47.5t56.5 43.5t23.5 45q0 25 -14.5 38.5t-39.5 13.5q-46 0 -81 -58l-85 59q24 51 71.5 79.5t105.5 28.5q73 0 123 -41.5t50 -112.5q0 -50 -34 -91.5t-75 -64.5t-75.5 -50.5t-35.5 -52.5h127v60h105zM1792 224v-192q0 -13 -9.5 -22.5
+t-22.5 -9.5h-1216q-13 0 -22.5 9.5t-9.5 22.5v192q0 14 9 23t23 9h1216q13 0 22.5 -9.5t9.5 -22.5zM384 1123v-99h-335v99h107q0 41 0.5 121.5t0.5 121.5v12h-2q-8 -17 -50 -54l-71 76l136 127h106v-404h108zM1792 736v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-1216
+q-13 0 -22.5 9.5t-9.5 22.5v192q0 14 9 23t23 9h1216q13 0 22.5 -9.5t9.5 -22.5zM1792 1248v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-1216q-13 0 -22.5 9.5t-9.5 22.5v192q0 13 9.5 22.5t22.5 9.5h1216q13 0 22.5 -9.5t9.5 -22.5z" />
+    <glyph glyph-name="strikethrough" unicode="&#xf0cc;" horiz-adv-x="1792" 
+d="M1760 640q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-1728q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h1728zM483 704q-28 35 -51 80q-48 98 -48 188q0 181 134 309q133 127 393 127q50 0 167 -19q66 -12 177 -48q10 -38 21 -118q14 -123 14 -183q0 -18 -5 -45l-12 -3l-84 6
+l-14 2q-50 149 -103 205q-88 91 -210 91q-114 0 -182 -59q-67 -58 -67 -146q0 -73 66 -140t279 -129q69 -20 173 -66q58 -28 95 -52h-743zM990 448h411q7 -39 7 -92q0 -111 -41 -212q-23 -56 -71 -104q-37 -35 -109 -81q-80 -48 -153 -66q-80 -21 -203 -21q-114 0 -195 23
+l-140 40q-57 16 -72 28q-8 8 -8 22v13q0 108 -2 156q-1 30 0 68l2 37v44l102 2q15 -34 30 -71t22.5 -56t12.5 -27q35 -57 80 -94q43 -36 105 -57q59 -22 132 -22q64 0 139 27q77 26 122 86q47 61 47 129q0 84 -81 157q-34 29 -137 71z" />
+    <glyph glyph-name="underline" unicode="&#xf0cd;" 
+d="M48 1313q-37 2 -45 4l-3 88q13 1 40 1q60 0 112 -4q132 -7 166 -7q86 0 168 3q116 4 146 5q56 0 86 2l-1 -14l2 -64v-9q-60 -9 -124 -9q-60 0 -79 -25q-13 -14 -13 -132q0 -13 0.5 -32.5t0.5 -25.5l1 -229l14 -280q6 -124 51 -202q35 -59 96 -92q88 -47 177 -47
+q104 0 191 28q56 18 99 51q48 36 65 64q36 56 53 114q21 73 21 229q0 79 -3.5 128t-11 122.5t-13.5 159.5l-4 59q-5 67 -24 88q-34 35 -77 34l-100 -2l-14 3l2 86h84l205 -10q76 -3 196 10l18 -2q6 -38 6 -51q0 -7 -4 -31q-45 -12 -84 -13q-73 -11 -79 -17q-15 -15 -15 -41
+q0 -7 1.5 -27t1.5 -31q8 -19 22 -396q6 -195 -15 -304q-15 -76 -41 -122q-38 -65 -112 -123q-75 -57 -182 -89q-109 -33 -255 -33q-167 0 -284 46q-119 47 -179 122q-61 76 -83 195q-16 80 -16 237v333q0 188 -17 213q-25 36 -147 39zM1536 -96v64q0 14 -9 23t-23 9h-1472
+q-14 0 -23 -9t-9 -23v-64q0 -14 9 -23t23 -9h1472q14 0 23 9t9 23z" />
+    <glyph glyph-name="table" unicode="&#xf0ce;" horiz-adv-x="1664" 
+d="M512 160v192q0 14 -9 23t-23 9h-320q-14 0 -23 -9t-9 -23v-192q0 -14 9 -23t23 -9h320q14 0 23 9t9 23zM512 544v192q0 14 -9 23t-23 9h-320q-14 0 -23 -9t-9 -23v-192q0 -14 9 -23t23 -9h320q14 0 23 9t9 23zM1024 160v192q0 14 -9 23t-23 9h-320q-14 0 -23 -9t-9 -23
+v-192q0 -14 9 -23t23 -9h320q14 0 23 9t9 23zM512 928v192q0 14 -9 23t-23 9h-320q-14 0 -23 -9t-9 -23v-192q0 -14 9 -23t23 -9h320q14 0 23 9t9 23zM1024 544v192q0 14 -9 23t-23 9h-320q-14 0 -23 -9t-9 -23v-192q0 -14 9 -23t23 -9h320q14 0 23 9t9 23zM1536 160v192
+q0 14 -9 23t-23 9h-320q-14 0 -23 -9t-9 -23v-192q0 -14 9 -23t23 -9h320q14 0 23 9t9 23zM1024 928v192q0 14 -9 23t-23 9h-320q-14 0 -23 -9t-9 -23v-192q0 -14 9 -23t23 -9h320q14 0 23 9t9 23zM1536 544v192q0 14 -9 23t-23 9h-320q-14 0 -23 -9t-9 -23v-192
+q0 -14 9 -23t23 -9h320q14 0 23 9t9 23zM1536 928v192q0 14 -9 23t-23 9h-320q-14 0 -23 -9t-9 -23v-192q0 -14 9 -23t23 -9h320q14 0 23 9t9 23zM1664 1248v-1088q0 -66 -47 -113t-113 -47h-1344q-66 0 -113 47t-47 113v1088q0 66 47 113t113 47h1344q66 0 113 -47t47 -113
+z" />
+    <glyph glyph-name="magic" unicode="&#xf0d0;" horiz-adv-x="1664" 
+d="M1190 955l293 293l-107 107l-293 -293zM1637 1248q0 -27 -18 -45l-1286 -1286q-18 -18 -45 -18t-45 18l-198 198q-18 18 -18 45t18 45l1286 1286q18 18 45 18t45 -18l198 -198q18 -18 18 -45zM286 1438l98 -30l-98 -30l-30 -98l-30 98l-98 30l98 30l30 98zM636 1276
+l196 -60l-196 -60l-60 -196l-60 196l-196 60l196 60l60 196zM1566 798l98 -30l-98 -30l-30 -98l-30 98l-98 30l98 30l30 98zM926 1438l98 -30l-98 -30l-30 -98l-30 98l-98 30l98 30l30 98z" />
+    <glyph glyph-name="truck" unicode="&#xf0d1;" horiz-adv-x="1792" 
+d="M640 128q0 52 -38 90t-90 38t-90 -38t-38 -90t38 -90t90 -38t90 38t38 90zM256 640h384v256h-158q-13 0 -22 -9l-195 -195q-9 -9 -9 -22v-30zM1536 128q0 52 -38 90t-90 38t-90 -38t-38 -90t38 -90t90 -38t90 38t38 90zM1792 1216v-1024q0 -15 -4 -26.5t-13.5 -18.5
+t-16.5 -11.5t-23.5 -6t-22.5 -2t-25.5 0t-22.5 0.5q0 -106 -75 -181t-181 -75t-181 75t-75 181h-384q0 -106 -75 -181t-181 -75t-181 75t-75 181h-64q-3 0 -22.5 -0.5t-25.5 0t-22.5 2t-23.5 6t-16.5 11.5t-13.5 18.5t-4 26.5q0 26 19 45t45 19v320q0 8 -0.5 35t0 38
+t2.5 34.5t6.5 37t14 30.5t22.5 30l198 198q19 19 50.5 32t58.5 13h160v192q0 26 19 45t45 19h1024q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="pinterest" unicode="&#xf0d2;" 
+d="M1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103q-111 0 -218 32q59 93 78 164q9 34 54 211q20 -39 73 -67.5t114 -28.5q121 0 216 68.5t147 188.5t52 270q0 114 -59.5 214t-172.5 163t-255 63q-105 0 -196 -29t-154.5 -77t-109 -110.5t-67 -129.5t-21.5 -134
+q0 -104 40 -183t117 -111q30 -12 38 20q2 7 8 31t8 30q6 23 -11 43q-51 61 -51 151q0 151 104.5 259.5t273.5 108.5q151 0 235.5 -82t84.5 -213q0 -170 -68.5 -289t-175.5 -119q-61 0 -98 43.5t-23 104.5q8 35 26.5 93.5t30 103t11.5 75.5q0 50 -27 83t-77 33
+q-62 0 -105 -57t-43 -142q0 -73 25 -122l-99 -418q-17 -70 -13 -177q-206 91 -333 281t-127 423q0 209 103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="pinterest_sign" unicode="&#xf0d3;" 
+d="M1248 1408q119 0 203.5 -84.5t84.5 -203.5v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-725q85 122 108 210q9 34 53 209q21 -39 73.5 -67t112.5 -28q181 0 295.5 147.5t114.5 373.5q0 84 -35 162.5t-96.5 139t-152.5 97t-197 36.5q-104 0 -194.5 -28.5t-153 -76.5
+t-107.5 -109.5t-66.5 -128t-21.5 -132.5q0 -102 39.5 -180t116.5 -110q13 -5 23.5 0t14.5 19q10 44 15 61q6 23 -11 42q-50 62 -50 150q0 150 103.5 256.5t270.5 106.5q149 0 232.5 -81t83.5 -210q0 -168 -67.5 -286t-173.5 -118q-60 0 -97 43.5t-23 103.5q8 34 26.5 92.5
+t29.5 102t11 74.5q0 49 -26.5 81.5t-75.5 32.5q-61 0 -103.5 -56.5t-42.5 -139.5q0 -72 24 -121l-98 -414q-24 -100 -7 -254h-183q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960z" />
+    <glyph glyph-name="google_plus_sign" unicode="&#xf0d4;" 
+d="M917 631q0 26 -6 64h-362v-132h217q-3 -24 -16.5 -50t-37.5 -53t-66.5 -44.5t-96.5 -17.5q-99 0 -169 71t-70 171t70 171t169 71q92 0 153 -59l104 101q-108 100 -257 100q-160 0 -272 -112.5t-112 -271.5t112 -271.5t272 -112.5q165 0 266.5 105t101.5 270zM1262 585
+h109v110h-109v110h-110v-110h-110v-110h110v-110h110v110zM1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="google_plus" unicode="&#xf0d5;" horiz-adv-x="2304" 
+d="M1437 623q0 -208 -87 -370.5t-248 -254t-369 -91.5q-149 0 -285 58t-234 156t-156 234t-58 285t58 285t156 234t234 156t285 58q286 0 491 -192l-199 -191q-117 113 -292 113q-123 0 -227.5 -62t-165.5 -168.5t-61 -232.5t61 -232.5t165.5 -168.5t227.5 -62
+q83 0 152.5 23t114.5 57.5t78.5 78.5t49 83t21.5 74h-416v252h692q12 -63 12 -122zM2304 745v-210h-209v-209h-210v209h-209v210h209v209h210v-209h209z" />
+    <glyph glyph-name="money" unicode="&#xf0d6;" horiz-adv-x="1920" 
+d="M768 384h384v96h-128v448h-114l-148 -137l77 -80q42 37 55 57h2v-288h-128v-96zM1280 640q0 -70 -21 -142t-59.5 -134t-101.5 -101t-138 -39t-138 39t-101.5 101t-59.5 134t-21 142t21 142t59.5 134t101.5 101t138 39t138 -39t101.5 -101t59.5 -134t21 -142zM1792 384
+v512q-106 0 -181 75t-75 181h-1152q0 -106 -75 -181t-181 -75v-512q106 0 181 -75t75 -181h1152q0 106 75 181t181 75zM1920 1216v-1152q0 -26 -19 -45t-45 -19h-1792q-26 0 -45 19t-19 45v1152q0 26 19 45t45 19h1792q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="caret_down" unicode="&#xf0d7;" horiz-adv-x="1024" 
+d="M1024 832q0 -26 -19 -45l-448 -448q-19 -19 -45 -19t-45 19l-448 448q-19 19 -19 45t19 45t45 19h896q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="caret_up" unicode="&#xf0d8;" horiz-adv-x="1024" 
+d="M1024 320q0 -26 -19 -45t-45 -19h-896q-26 0 -45 19t-19 45t19 45l448 448q19 19 45 19t45 -19l448 -448q19 -19 19 -45z" />
+    <glyph glyph-name="caret_left" unicode="&#xf0d9;" horiz-adv-x="640" 
+d="M640 1088v-896q0 -26 -19 -45t-45 -19t-45 19l-448 448q-19 19 -19 45t19 45l448 448q19 19 45 19t45 -19t19 -45z" />
+    <glyph glyph-name="caret_right" unicode="&#xf0da;" horiz-adv-x="640" 
+d="M576 640q0 -26 -19 -45l-448 -448q-19 -19 -45 -19t-45 19t-19 45v896q0 26 19 45t45 19t45 -19l448 -448q19 -19 19 -45z" />
+    <glyph glyph-name="columns" unicode="&#xf0db;" horiz-adv-x="1664" 
+d="M160 0h608v1152h-640v-1120q0 -13 9.5 -22.5t22.5 -9.5zM1536 32v1120h-640v-1152h608q13 0 22.5 9.5t9.5 22.5zM1664 1248v-1216q0 -66 -47 -113t-113 -47h-1344q-66 0 -113 47t-47 113v1216q0 66 47 113t113 47h1344q66 0 113 -47t47 -113z" />
+    <glyph glyph-name="sort" unicode="&#xf0dc;" horiz-adv-x="1024" 
+d="M1024 448q0 -26 -19 -45l-448 -448q-19 -19 -45 -19t-45 19l-448 448q-19 19 -19 45t19 45t45 19h896q26 0 45 -19t19 -45zM1024 832q0 -26 -19 -45t-45 -19h-896q-26 0 -45 19t-19 45t19 45l448 448q19 19 45 19t45 -19l448 -448q19 -19 19 -45z" />
+    <glyph glyph-name="sort_down" unicode="&#xf0dd;" horiz-adv-x="1024" 
+d="M1024 448q0 -26 -19 -45l-448 -448q-19 -19 -45 -19t-45 19l-448 448q-19 19 -19 45t19 45t45 19h896q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="sort_up" unicode="&#xf0de;" horiz-adv-x="1024" 
+d="M1024 832q0 -26 -19 -45t-45 -19h-896q-26 0 -45 19t-19 45t19 45l448 448q19 19 45 19t45 -19l448 -448q19 -19 19 -45z" />
+    <glyph glyph-name="envelope_alt" unicode="&#xf0e0;" horiz-adv-x="1792" 
+d="M1792 826v-794q0 -66 -47 -113t-113 -47h-1472q-66 0 -113 47t-47 113v794q44 -49 101 -87q362 -246 497 -345q57 -42 92.5 -65.5t94.5 -48t110 -24.5h1h1q51 0 110 24.5t94.5 48t92.5 65.5q170 123 498 345q57 39 100 87zM1792 1120q0 -79 -49 -151t-122 -123
+q-376 -261 -468 -325q-10 -7 -42.5 -30.5t-54 -38t-52 -32.5t-57.5 -27t-50 -9h-1h-1q-23 0 -50 9t-57.5 27t-52 32.5t-54 38t-42.5 30.5q-91 64 -262 182.5t-205 142.5q-62 42 -117 115.5t-55 136.5q0 78 41.5 130t118.5 52h1472q65 0 112.5 -47t47.5 -113z" />
+    <glyph glyph-name="linkedin" unicode="&#xf0e1;" 
+d="M349 911v-991h-330v991h330zM370 1217q1 -73 -50.5 -122t-135.5 -49h-2q-82 0 -132 49t-50 122q0 74 51.5 122.5t134.5 48.5t133 -48.5t51 -122.5zM1536 488v-568h-329v530q0 105 -40.5 164.5t-126.5 59.5q-63 0 -105.5 -34.5t-63.5 -85.5q-11 -30 -11 -81v-553h-329
+q2 399 2 647t-1 296l-1 48h329v-144h-2q20 32 41 56t56.5 52t87 43.5t114.5 15.5q171 0 275 -113.5t104 -332.5z" />
+    <glyph glyph-name="undo" unicode="&#xf0e2;" 
+d="M1536 640q0 -156 -61 -298t-164 -245t-245 -164t-298 -61q-172 0 -327 72.5t-264 204.5q-7 10 -6.5 22.5t8.5 20.5l137 138q10 9 25 9q16 -2 23 -12q73 -95 179 -147t225 -52q104 0 198.5 40.5t163.5 109.5t109.5 163.5t40.5 198.5t-40.5 198.5t-109.5 163.5
+t-163.5 109.5t-198.5 40.5q-98 0 -188 -35.5t-160 -101.5l137 -138q31 -30 14 -69q-17 -40 -59 -40h-448q-26 0 -45 19t-19 45v448q0 42 40 59q39 17 69 -14l130 -129q107 101 244.5 156.5t284.5 55.5q156 0 298 -61t245 -164t164 -245t61 -298z" />
+    <glyph glyph-name="legal" unicode="&#xf0e3;" horiz-adv-x="1792" 
+d="M1771 0q0 -53 -37 -90l-107 -108q-39 -37 -91 -37q-53 0 -90 37l-363 364q-38 36 -38 90q0 53 43 96l-256 256l-126 -126q-14 -14 -34 -14t-34 14q2 -2 12.5 -12t12.5 -13t10 -11.5t10 -13.5t6 -13.5t5.5 -16.5t1.5 -18q0 -38 -28 -68q-3 -3 -16.5 -18t-19 -20.5
+t-18.5 -16.5t-22 -15.5t-22 -9t-26 -4.5q-40 0 -68 28l-408 408q-28 28 -28 68q0 13 4.5 26t9 22t15.5 22t16.5 18.5t20.5 19t18 16.5q30 28 68 28q10 0 18 -1.5t16.5 -5.5t13.5 -6t13.5 -10t11.5 -10t13 -12.5t12 -12.5q-14 14 -14 34t14 34l348 348q14 14 34 14t34 -14
+q-2 2 -12.5 12t-12.5 13t-10 11.5t-10 13.5t-6 13.5t-5.5 16.5t-1.5 18q0 38 28 68q3 3 16.5 18t19 20.5t18.5 16.5t22 15.5t22 9t26 4.5q40 0 68 -28l408 -408q28 -28 28 -68q0 -13 -4.5 -26t-9 -22t-15.5 -22t-16.5 -18.5t-20.5 -19t-18 -16.5q-30 -28 -68 -28
+q-10 0 -18 1.5t-16.5 5.5t-13.5 6t-13.5 10t-11.5 10t-13 12.5t-12 12.5q14 -14 14 -34t-14 -34l-126 -126l256 -256q43 43 96 43q52 0 91 -37l363 -363q37 -39 37 -91z" />
+    <glyph glyph-name="dashboard" unicode="&#xf0e4;" horiz-adv-x="1792" 
+d="M384 384q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM576 832q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM1004 351l101 382q6 26 -7.5 48.5t-38.5 29.5
+t-48 -6.5t-30 -39.5l-101 -382q-60 -5 -107 -43.5t-63 -98.5q-20 -77 20 -146t117 -89t146 20t89 117q16 60 -6 117t-72 91zM1664 384q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM1024 1024q0 53 -37.5 90.5
+t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM1472 832q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM1792 384q0 -261 -141 -483q-19 -29 -54 -29h-1402q-35 0 -54 29
+q-141 221 -141 483q0 182 71 348t191 286t286 191t348 71t348 -71t286 -191t191 -286t71 -348z" />
+    <glyph glyph-name="comment_alt" unicode="&#xf0e5;" horiz-adv-x="1792" 
+d="M896 1152q-204 0 -381.5 -69.5t-282 -187.5t-104.5 -255q0 -112 71.5 -213.5t201.5 -175.5l87 -50l-27 -96q-24 -91 -70 -172q152 63 275 171l43 38l57 -6q69 -8 130 -8q204 0 381.5 69.5t282 187.5t104.5 255t-104.5 255t-282 187.5t-381.5 69.5zM1792 640
+q0 -174 -120 -321.5t-326 -233t-450 -85.5q-70 0 -145 8q-198 -175 -460 -242q-49 -14 -114 -22h-5q-15 0 -27 10.5t-16 27.5v1q-3 4 -0.5 12t2 10t4.5 9.5l6 9t7 8.5t8 9q7 8 31 34.5t34.5 38t31 39.5t32.5 51t27 59t26 76q-157 89 -247.5 220t-90.5 281q0 174 120 321.5
+t326 233t450 85.5t450 -85.5t326 -233t120 -321.5z" />
+    <glyph glyph-name="comments_alt" unicode="&#xf0e6;" horiz-adv-x="1792" 
+d="M704 1152q-153 0 -286 -52t-211.5 -141t-78.5 -191q0 -82 53 -158t149 -132l97 -56l-35 -84q34 20 62 39l44 31l53 -10q78 -14 153 -14q153 0 286 52t211.5 141t78.5 191t-78.5 191t-211.5 141t-286 52zM704 1280q191 0 353.5 -68.5t256.5 -186.5t94 -257t-94 -257
+t-256.5 -186.5t-353.5 -68.5q-86 0 -176 16q-124 -88 -278 -128q-36 -9 -86 -16h-3q-11 0 -20.5 8t-11.5 21q-1 3 -1 6.5t0.5 6.5t2 6l2.5 5t3.5 5.5t4 5t4.5 5t4 4.5q5 6 23 25t26 29.5t22.5 29t25 38.5t20.5 44q-124 72 -195 177t-71 224q0 139 94 257t256.5 186.5
+t353.5 68.5zM1526 111q10 -24 20.5 -44t25 -38.5t22.5 -29t26 -29.5t23 -25q1 -1 4 -4.5t4.5 -5t4 -5t3.5 -5.5l2.5 -5t2 -6t0.5 -6.5t-1 -6.5q-3 -14 -13 -22t-22 -7q-50 7 -86 16q-154 40 -278 128q-90 -16 -176 -16q-271 0 -472 132q58 -4 88 -4q161 0 309 45t264 129
+q125 92 192 212t67 254q0 77 -23 152q129 -71 204 -178t75 -230q0 -120 -71 -224.5t-195 -176.5z" />
+    <glyph glyph-name="bolt" unicode="&#xf0e7;" horiz-adv-x="896" 
+d="M885 970q18 -20 7 -44l-540 -1157q-13 -25 -42 -25q-4 0 -14 2q-17 5 -25.5 19t-4.5 30l197 808l-406 -101q-4 -1 -12 -1q-18 0 -31 11q-18 15 -13 39l201 825q4 14 16 23t28 9h328q19 0 32 -12.5t13 -29.5q0 -8 -5 -18l-171 -463l396 98q8 2 12 2q19 0 34 -15z" />
+    <glyph glyph-name="sitemap" unicode="&#xf0e8;" horiz-adv-x="1792" 
+d="M1792 288v-320q0 -40 -28 -68t-68 -28h-320q-40 0 -68 28t-28 68v320q0 40 28 68t68 28h96v192h-512v-192h96q40 0 68 -28t28 -68v-320q0 -40 -28 -68t-68 -28h-320q-40 0 -68 28t-28 68v320q0 40 28 68t68 28h96v192h-512v-192h96q40 0 68 -28t28 -68v-320
+q0 -40 -28 -68t-68 -28h-320q-40 0 -68 28t-28 68v320q0 40 28 68t68 28h96v192q0 52 38 90t90 38h512v192h-96q-40 0 -68 28t-28 68v320q0 40 28 68t68 28h320q40 0 68 -28t28 -68v-320q0 -40 -28 -68t-68 -28h-96v-192h512q52 0 90 -38t38 -90v-192h96q40 0 68 -28t28 -68
+z" />
+    <glyph glyph-name="umbrella" unicode="&#xf0e9;" horiz-adv-x="1664" 
+d="M896 708v-580q0 -104 -76 -180t-180 -76t-180 76t-76 180q0 26 19 45t45 19t45 -19t19 -45q0 -50 39 -89t89 -39t89 39t39 89v580q33 11 64 11t64 -11zM1664 681q0 -13 -9.5 -22.5t-22.5 -9.5q-11 0 -23 10q-49 46 -93 69t-102 23q-68 0 -128 -37t-103 -97
+q-7 -10 -17.5 -28t-14.5 -24q-11 -17 -28 -17q-18 0 -29 17q-4 6 -14.5 24t-17.5 28q-43 60 -102.5 97t-127.5 37t-127.5 -37t-102.5 -97q-7 -10 -17.5 -28t-14.5 -24q-11 -17 -29 -17q-17 0 -28 17q-4 6 -14.5 24t-17.5 28q-43 60 -103 97t-128 37q-58 0 -102 -23t-93 -69
+q-12 -10 -23 -10q-13 0 -22.5 9.5t-9.5 22.5q0 5 1 7q45 183 172.5 319.5t298 204.5t360.5 68q140 0 274.5 -40t246.5 -113.5t194.5 -187t115.5 -251.5q1 -2 1 -7zM896 1408v-98q-42 2 -64 2t-64 -2v98q0 26 19 45t45 19t45 -19t19 -45z" />
+    <glyph glyph-name="paste" unicode="&#xf0ea;" horiz-adv-x="1792" 
+d="M768 -128h896v640h-416q-40 0 -68 28t-28 68v416h-384v-1152zM1024 1312v64q0 13 -9.5 22.5t-22.5 9.5h-704q-13 0 -22.5 -9.5t-9.5 -22.5v-64q0 -13 9.5 -22.5t22.5 -9.5h704q13 0 22.5 9.5t9.5 22.5zM1280 640h299l-299 299v-299zM1792 512v-672q0 -40 -28 -68t-68 -28
+h-960q-40 0 -68 28t-28 68v160h-544q-40 0 -68 28t-28 68v1344q0 40 28 68t68 28h1088q40 0 68 -28t28 -68v-328q21 -13 36 -28l408 -408q28 -28 48 -76t20 -88z" />
+    <glyph glyph-name="light_bulb" unicode="&#xf0eb;" horiz-adv-x="1024" 
+d="M736 960q0 -13 -9.5 -22.5t-22.5 -9.5t-22.5 9.5t-9.5 22.5q0 46 -54 71t-106 25q-13 0 -22.5 9.5t-9.5 22.5t9.5 22.5t22.5 9.5q50 0 99.5 -16t87 -54t37.5 -90zM896 960q0 72 -34.5 134t-90 101.5t-123 62t-136.5 22.5t-136.5 -22.5t-123 -62t-90 -101.5t-34.5 -134
+q0 -101 68 -180q10 -11 30.5 -33t30.5 -33q128 -153 141 -298h228q13 145 141 298q10 11 30.5 33t30.5 33q68 79 68 180zM1024 960q0 -155 -103 -268q-45 -49 -74.5 -87t-59.5 -95.5t-34 -107.5q47 -28 47 -82q0 -37 -25 -64q25 -27 25 -64q0 -52 -45 -81q13 -23 13 -47
+q0 -46 -31.5 -71t-77.5 -25q-20 -44 -60 -70t-87 -26t-87 26t-60 70q-46 0 -77.5 25t-31.5 71q0 24 13 47q-45 29 -45 81q0 37 25 64q-25 27 -25 64q0 54 47 82q-4 50 -34 107.5t-59.5 95.5t-74.5 87q-103 113 -103 268q0 99 44.5 184.5t117 142t164 89t186.5 32.5
+t186.5 -32.5t164 -89t117 -142t44.5 -184.5z" />
+    <glyph glyph-name="exchange" unicode="&#xf0ec;" horiz-adv-x="1792" 
+d="M1792 352v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-1376v-192q0 -13 -9.5 -22.5t-22.5 -9.5q-12 0 -24 10l-319 320q-9 9 -9 22q0 14 9 23l320 320q9 9 23 9q13 0 22.5 -9.5t9.5 -22.5v-192h1376q13 0 22.5 -9.5t9.5 -22.5zM1792 896q0 -14 -9 -23l-320 -320q-9 -9 -23 -9
+q-13 0 -22.5 9.5t-9.5 22.5v192h-1376q-13 0 -22.5 9.5t-9.5 22.5v192q0 13 9.5 22.5t22.5 9.5h1376v192q0 14 9 23t23 9q12 0 24 -10l319 -319q9 -9 9 -23z" />
+    <glyph glyph-name="cloud_download" unicode="&#xf0ed;" horiz-adv-x="1920" 
+d="M1280 608q0 14 -9 23t-23 9h-224v352q0 13 -9.5 22.5t-22.5 9.5h-192q-13 0 -22.5 -9.5t-9.5 -22.5v-352h-224q-13 0 -22.5 -9.5t-9.5 -22.5q0 -14 9 -23l352 -352q9 -9 23 -9t23 9l351 351q10 12 10 24zM1920 384q0 -159 -112.5 -271.5t-271.5 -112.5h-1088
+q-185 0 -316.5 131.5t-131.5 316.5q0 130 70 240t188 165q-2 30 -2 43q0 212 150 362t362 150q156 0 285.5 -87t188.5 -231q71 62 166 62q106 0 181 -75t75 -181q0 -76 -41 -138q130 -31 213.5 -135.5t83.5 -238.5z" />
+    <glyph glyph-name="cloud_upload" unicode="&#xf0ee;" horiz-adv-x="1920" 
+d="M1280 672q0 14 -9 23l-352 352q-9 9 -23 9t-23 -9l-351 -351q-10 -12 -10 -24q0 -14 9 -23t23 -9h224v-352q0 -13 9.5 -22.5t22.5 -9.5h192q13 0 22.5 9.5t9.5 22.5v352h224q13 0 22.5 9.5t9.5 22.5zM1920 384q0 -159 -112.5 -271.5t-271.5 -112.5h-1088
+q-185 0 -316.5 131.5t-131.5 316.5q0 130 70 240t188 165q-2 30 -2 43q0 212 150 362t362 150q156 0 285.5 -87t188.5 -231q71 62 166 62q106 0 181 -75t75 -181q0 -76 -41 -138q130 -31 213.5 -135.5t83.5 -238.5z" />
+    <glyph glyph-name="user_md" unicode="&#xf0f0;" horiz-adv-x="1408" 
+d="M384 192q0 -26 -19 -45t-45 -19t-45 19t-19 45t19 45t45 19t45 -19t19 -45zM1408 131q0 -121 -73 -190t-194 -69h-874q-121 0 -194 69t-73 190q0 68 5.5 131t24 138t47.5 132.5t81 103t120 60.5q-22 -52 -22 -120v-203q-58 -20 -93 -70t-35 -111q0 -80 56 -136t136 -56
+t136 56t56 136q0 61 -35.5 111t-92.5 70v203q0 62 25 93q132 -104 295 -104t295 104q25 -31 25 -93v-64q-106 0 -181 -75t-75 -181v-89q-32 -29 -32 -71q0 -40 28 -68t68 -28t68 28t28 68q0 42 -32 71v89q0 52 38 90t90 38t90 -38t38 -90v-89q-32 -29 -32 -71q0 -40 28 -68
+t68 -28t68 28t28 68q0 42 -32 71v89q0 68 -34.5 127.5t-93.5 93.5q0 10 0.5 42.5t0 48t-2.5 41.5t-7 47t-13 40q68 -15 120 -60.5t81 -103t47.5 -132.5t24 -138t5.5 -131zM1088 1024q0 -159 -112.5 -271.5t-271.5 -112.5t-271.5 112.5t-112.5 271.5t112.5 271.5t271.5 112.5
+t271.5 -112.5t112.5 -271.5z" />
+    <glyph glyph-name="stethoscope" unicode="&#xf0f1;" horiz-adv-x="1408" 
+d="M1280 832q0 26 -19 45t-45 19t-45 -19t-19 -45t19 -45t45 -19t45 19t19 45zM1408 832q0 -62 -35.5 -111t-92.5 -70v-395q0 -159 -131.5 -271.5t-316.5 -112.5t-316.5 112.5t-131.5 271.5v132q-164 20 -274 128t-110 252v512q0 26 19 45t45 19q6 0 16 -2q17 30 47 48
+t65 18q53 0 90.5 -37.5t37.5 -90.5t-37.5 -90.5t-90.5 -37.5q-33 0 -64 18v-402q0 -106 94 -181t226 -75t226 75t94 181v402q-31 -18 -64 -18q-53 0 -90.5 37.5t-37.5 90.5t37.5 90.5t90.5 37.5q35 0 65 -18t47 -48q10 2 16 2q26 0 45 -19t19 -45v-512q0 -144 -110 -252
+t-274 -128v-132q0 -106 94 -181t226 -75t226 75t94 181v395q-57 21 -92.5 70t-35.5 111q0 80 56 136t136 56t136 -56t56 -136z" />
+    <glyph glyph-name="suitcase" unicode="&#xf0f2;" horiz-adv-x="1792" 
+d="M640 1152h512v128h-512v-128zM288 1152v-1280h-64q-92 0 -158 66t-66 158v832q0 92 66 158t158 66h64zM1408 1152v-1280h-1024v1280h128v160q0 40 28 68t68 28h576q40 0 68 -28t28 -68v-160h128zM1792 928v-832q0 -92 -66 -158t-158 -66h-64v1280h64q92 0 158 -66
+t66 -158z" />
+    <glyph glyph-name="bell_alt" unicode="&#xf0f3;" horiz-adv-x="1792" 
+d="M912 -160q0 16 -16 16q-59 0 -101.5 42.5t-42.5 101.5q0 16 -16 16t-16 -16q0 -73 51.5 -124.5t124.5 -51.5q16 0 16 16zM1728 128q0 -52 -38 -90t-90 -38h-448q0 -106 -75 -181t-181 -75t-181 75t-75 181h-448q-52 0 -90 38t-38 90q50 42 91 88t85 119.5t74.5 158.5
+t50 206t19.5 260q0 152 117 282.5t307 158.5q-8 19 -8 39q0 40 28 68t68 28t68 -28t28 -68q0 -20 -8 -39q190 -28 307 -158.5t117 -282.5q0 -139 19.5 -260t50 -206t74.5 -158.5t85 -119.5t91 -88z" />
+    <glyph glyph-name="coffee" unicode="&#xf0f4;" horiz-adv-x="1920" 
+d="M1664 896q0 80 -56 136t-136 56h-64v-384h64q80 0 136 56t56 136zM0 128h1792q0 -106 -75 -181t-181 -75h-1280q-106 0 -181 75t-75 181zM1856 896q0 -159 -112.5 -271.5t-271.5 -112.5h-64v-32q0 -92 -66 -158t-158 -66h-704q-92 0 -158 66t-66 158v736q0 26 19 45
+t45 19h1152q159 0 271.5 -112.5t112.5 -271.5z" />
+    <glyph glyph-name="food" unicode="&#xf0f5;" horiz-adv-x="1408" 
+d="M640 1472v-640q0 -61 -35.5 -111t-92.5 -70v-779q0 -52 -38 -90t-90 -38h-128q-52 0 -90 38t-38 90v779q-57 20 -92.5 70t-35.5 111v640q0 26 19 45t45 19t45 -19t19 -45v-416q0 -26 19 -45t45 -19t45 19t19 45v416q0 26 19 45t45 19t45 -19t19 -45v-416q0 -26 19 -45
+t45 -19t45 19t19 45v416q0 26 19 45t45 19t45 -19t19 -45zM1408 1472v-1600q0 -52 -38 -90t-90 -38h-128q-52 0 -90 38t-38 90v512h-224q-13 0 -22.5 9.5t-9.5 22.5v800q0 132 94 226t226 94h256q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="file_text_alt" unicode="&#xf0f6;" 
+d="M1468 1156q28 -28 48 -76t20 -88v-1152q0 -40 -28 -68t-68 -28h-1344q-40 0 -68 28t-28 68v1600q0 40 28 68t68 28h896q40 0 88 -20t76 -48zM1024 1400v-376h376q-10 29 -22 41l-313 313q-12 12 -41 22zM1408 -128v1024h-416q-40 0 -68 28t-28 68v416h-768v-1536h1280z
+M384 736q0 14 9 23t23 9h704q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-704q-14 0 -23 9t-9 23v64zM1120 512q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-704q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h704zM1120 256q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-704
+q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h704z" />
+    <glyph glyph-name="building" unicode="&#xf0f7;" horiz-adv-x="1408" 
+d="M384 224v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5zM384 480v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5z
+M640 480v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5zM384 736v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5z
+M1152 224v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5zM896 480v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5z
+M640 736v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5zM384 992v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5z
+M1152 480v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5zM896 736v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5z
+M640 992v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5zM384 1248v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5z
+M1152 736v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5zM896 992v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5z
+M640 1248v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5zM1152 992v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5z
+M896 1248v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5zM1152 1248v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5z
+M896 -128h384v1536h-1152v-1536h384v224q0 13 9.5 22.5t22.5 9.5h320q13 0 22.5 -9.5t9.5 -22.5v-224zM1408 1472v-1664q0 -26 -19 -45t-45 -19h-1280q-26 0 -45 19t-19 45v1664q0 26 19 45t45 19h1280q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="hospital" unicode="&#xf0f8;" horiz-adv-x="1408" 
+d="M384 224v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5zM384 480v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5z
+M640 480v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5zM384 736v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5z
+M1152 224v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5zM896 480v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5z
+M640 736v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5zM1152 480v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5z
+M896 736v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5zM1152 736v-64q0 -13 -9.5 -22.5t-22.5 -9.5h-64q-13 0 -22.5 9.5t-9.5 22.5v64q0 13 9.5 22.5t22.5 9.5h64q13 0 22.5 -9.5t9.5 -22.5z
+M896 -128h384v1152h-256v-32q0 -40 -28 -68t-68 -28h-448q-40 0 -68 28t-28 68v32h-256v-1152h384v224q0 13 9.5 22.5t22.5 9.5h320q13 0 22.5 -9.5t9.5 -22.5v-224zM896 1056v320q0 13 -9.5 22.5t-22.5 9.5h-64q-13 0 -22.5 -9.5t-9.5 -22.5v-96h-128v96q0 13 -9.5 22.5
+t-22.5 9.5h-64q-13 0 -22.5 -9.5t-9.5 -22.5v-320q0 -13 9.5 -22.5t22.5 -9.5h64q13 0 22.5 9.5t9.5 22.5v96h128v-96q0 -13 9.5 -22.5t22.5 -9.5h64q13 0 22.5 9.5t9.5 22.5zM1408 1088v-1280q0 -26 -19 -45t-45 -19h-1280q-26 0 -45 19t-19 45v1280q0 26 19 45t45 19h320
+v288q0 40 28 68t68 28h448q40 0 68 -28t28 -68v-288h320q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="ambulance" unicode="&#xf0f9;" horiz-adv-x="1920" 
+d="M640 128q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM256 640h384v256h-158q-14 -2 -22 -9l-195 -195q-7 -12 -9 -22v-30zM1536 128q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5
+t90.5 37.5t37.5 90.5zM1664 800v192q0 14 -9 23t-23 9h-224v224q0 14 -9 23t-23 9h-192q-14 0 -23 -9t-9 -23v-224h-224q-14 0 -23 -9t-9 -23v-192q0 -14 9 -23t23 -9h224v-224q0 -14 9 -23t23 -9h192q14 0 23 9t9 23v224h224q14 0 23 9t9 23zM1920 1344v-1152
+q0 -26 -19 -45t-45 -19h-192q0 -106 -75 -181t-181 -75t-181 75t-75 181h-384q0 -106 -75 -181t-181 -75t-181 75t-75 181h-128q-26 0 -45 19t-19 45t19 45t45 19v416q0 26 13 58t32 51l198 198q19 19 51 32t58 13h160v320q0 26 19 45t45 19h1152q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="medkit" unicode="&#xf0fa;" horiz-adv-x="1792" 
+d="M1280 416v192q0 14 -9 23t-23 9h-224v224q0 14 -9 23t-23 9h-192q-14 0 -23 -9t-9 -23v-224h-224q-14 0 -23 -9t-9 -23v-192q0 -14 9 -23t23 -9h224v-224q0 -14 9 -23t23 -9h192q14 0 23 9t9 23v224h224q14 0 23 9t9 23zM640 1152h512v128h-512v-128zM256 1152v-1280h-32
+q-92 0 -158 66t-66 158v832q0 92 66 158t158 66h32zM1440 1152v-1280h-1088v1280h160v160q0 40 28 68t68 28h576q40 0 68 -28t28 -68v-160h160zM1792 928v-832q0 -92 -66 -158t-158 -66h-32v1280h32q92 0 158 -66t66 -158z" />
+    <glyph glyph-name="fighter_jet" unicode="&#xf0fb;" horiz-adv-x="1920" 
+d="M1920 576q-1 -32 -288 -96l-352 -32l-224 -64h-64l-293 -352h69q26 0 45 -4.5t19 -11.5t-19 -11.5t-45 -4.5h-96h-160h-64v32h64v416h-160l-192 -224h-96l-32 32v192h32v32h128v8l-192 24v128l192 24v8h-128v32h-32v192l32 32h96l192 -224h160v416h-64v32h64h160h96
+q26 0 45 -4.5t19 -11.5t-19 -11.5t-45 -4.5h-69l293 -352h64l224 -64l352 -32q128 -28 200 -52t80 -34z" />
+    <glyph glyph-name="beer" unicode="&#xf0fc;" horiz-adv-x="1664" 
+d="M640 640v384h-256v-256q0 -53 37.5 -90.5t90.5 -37.5h128zM1664 192v-192h-1152v192l128 192h-128q-159 0 -271.5 112.5t-112.5 271.5v320l-64 64l32 128h480l32 128h960l32 -192l-64 -32v-800z" />
+    <glyph glyph-name="h_sign" unicode="&#xf0fd;" 
+d="M1280 192v896q0 26 -19 45t-45 19h-128q-26 0 -45 -19t-19 -45v-320h-512v320q0 26 -19 45t-45 19h-128q-26 0 -45 -19t-19 -45v-896q0 -26 19 -45t45 -19h128q26 0 45 19t19 45v320h512v-320q0 -26 19 -45t45 -19h128q26 0 45 19t19 45zM1536 1120v-960
+q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="f0fe" unicode="&#xf0fe;" 
+d="M1280 576v128q0 26 -19 45t-45 19h-320v320q0 26 -19 45t-45 19h-128q-26 0 -45 -19t-19 -45v-320h-320q-26 0 -45 -19t-19 -45v-128q0 -26 19 -45t45 -19h320v-320q0 -26 19 -45t45 -19h128q26 0 45 19t19 45v320h320q26 0 45 19t19 45zM1536 1120v-960
+q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="double_angle_left" unicode="&#xf100;" horiz-adv-x="1024" 
+d="M627 160q0 -13 -10 -23l-50 -50q-10 -10 -23 -10t-23 10l-466 466q-10 10 -10 23t10 23l466 466q10 10 23 10t23 -10l50 -50q10 -10 10 -23t-10 -23l-393 -393l393 -393q10 -10 10 -23zM1011 160q0 -13 -10 -23l-50 -50q-10 -10 -23 -10t-23 10l-466 466q-10 10 -10 23
+t10 23l466 466q10 10 23 10t23 -10l50 -50q10 -10 10 -23t-10 -23l-393 -393l393 -393q10 -10 10 -23z" />
+    <glyph glyph-name="double_angle_right" unicode="&#xf101;" horiz-adv-x="1024" 
+d="M595 576q0 -13 -10 -23l-466 -466q-10 -10 -23 -10t-23 10l-50 50q-10 10 -10 23t10 23l393 393l-393 393q-10 10 -10 23t10 23l50 50q10 10 23 10t23 -10l466 -466q10 -10 10 -23zM979 576q0 -13 -10 -23l-466 -466q-10 -10 -23 -10t-23 10l-50 50q-10 10 -10 23t10 23
+l393 393l-393 393q-10 10 -10 23t10 23l50 50q10 10 23 10t23 -10l466 -466q10 -10 10 -23z" />
+    <glyph glyph-name="double_angle_up" unicode="&#xf102;" horiz-adv-x="1152" 
+d="M1075 224q0 -13 -10 -23l-50 -50q-10 -10 -23 -10t-23 10l-393 393l-393 -393q-10 -10 -23 -10t-23 10l-50 50q-10 10 -10 23t10 23l466 466q10 10 23 10t23 -10l466 -466q10 -10 10 -23zM1075 608q0 -13 -10 -23l-50 -50q-10 -10 -23 -10t-23 10l-393 393l-393 -393
+q-10 -10 -23 -10t-23 10l-50 50q-10 10 -10 23t10 23l466 466q10 10 23 10t23 -10l466 -466q10 -10 10 -23z" />
+    <glyph glyph-name="double_angle_down" unicode="&#xf103;" horiz-adv-x="1152" 
+d="M1075 672q0 -13 -10 -23l-466 -466q-10 -10 -23 -10t-23 10l-466 466q-10 10 -10 23t10 23l50 50q10 10 23 10t23 -10l393 -393l393 393q10 10 23 10t23 -10l50 -50q10 -10 10 -23zM1075 1056q0 -13 -10 -23l-466 -466q-10 -10 -23 -10t-23 10l-466 466q-10 10 -10 23
+t10 23l50 50q10 10 23 10t23 -10l393 -393l393 393q10 10 23 10t23 -10l50 -50q10 -10 10 -23z" />
+    <glyph glyph-name="angle_left" unicode="&#xf104;" horiz-adv-x="640" 
+d="M627 992q0 -13 -10 -23l-393 -393l393 -393q10 -10 10 -23t-10 -23l-50 -50q-10 -10 -23 -10t-23 10l-466 466q-10 10 -10 23t10 23l466 466q10 10 23 10t23 -10l50 -50q10 -10 10 -23z" />
+    <glyph glyph-name="angle_right" unicode="&#xf105;" horiz-adv-x="640" 
+d="M595 576q0 -13 -10 -23l-466 -466q-10 -10 -23 -10t-23 10l-50 50q-10 10 -10 23t10 23l393 393l-393 393q-10 10 -10 23t10 23l50 50q10 10 23 10t23 -10l466 -466q10 -10 10 -23z" />
+    <glyph glyph-name="angle_up" unicode="&#xf106;" horiz-adv-x="1152" 
+d="M1075 352q0 -13 -10 -23l-50 -50q-10 -10 -23 -10t-23 10l-393 393l-393 -393q-10 -10 -23 -10t-23 10l-50 50q-10 10 -10 23t10 23l466 466q10 10 23 10t23 -10l466 -466q10 -10 10 -23z" />
+    <glyph glyph-name="angle_down" unicode="&#xf107;" horiz-adv-x="1152" 
+d="M1075 800q0 -13 -10 -23l-466 -466q-10 -10 -23 -10t-23 10l-466 466q-10 10 -10 23t10 23l50 50q10 10 23 10t23 -10l393 -393l393 393q10 10 23 10t23 -10l50 -50q10 -10 10 -23z" />
+    <glyph glyph-name="desktop" unicode="&#xf108;" horiz-adv-x="1920" 
+d="M1792 544v832q0 13 -9.5 22.5t-22.5 9.5h-1600q-13 0 -22.5 -9.5t-9.5 -22.5v-832q0 -13 9.5 -22.5t22.5 -9.5h1600q13 0 22.5 9.5t9.5 22.5zM1920 1376v-1088q0 -66 -47 -113t-113 -47h-544q0 -37 16 -77.5t32 -71t16 -43.5q0 -26 -19 -45t-45 -19h-512q-26 0 -45 19
+t-19 45q0 14 16 44t32 70t16 78h-544q-66 0 -113 47t-47 113v1088q0 66 47 113t113 47h1600q66 0 113 -47t47 -113z" />
+    <glyph glyph-name="laptop" unicode="&#xf109;" horiz-adv-x="1920" 
+d="M416 256q-66 0 -113 47t-47 113v704q0 66 47 113t113 47h1088q66 0 113 -47t47 -113v-704q0 -66 -47 -113t-113 -47h-1088zM384 1120v-704q0 -13 9.5 -22.5t22.5 -9.5h1088q13 0 22.5 9.5t9.5 22.5v704q0 13 -9.5 22.5t-22.5 9.5h-1088q-13 0 -22.5 -9.5t-9.5 -22.5z
+M1760 192h160v-96q0 -40 -47 -68t-113 -28h-1600q-66 0 -113 28t-47 68v96h160h1600zM1040 96q16 0 16 16t-16 16h-160q-16 0 -16 -16t16 -16h160z" />
+    <glyph glyph-name="tablet" unicode="&#xf10a;" horiz-adv-x="1152" 
+d="M640 128q0 26 -19 45t-45 19t-45 -19t-19 -45t19 -45t45 -19t45 19t19 45zM1024 288v960q0 13 -9.5 22.5t-22.5 9.5h-832q-13 0 -22.5 -9.5t-9.5 -22.5v-960q0 -13 9.5 -22.5t22.5 -9.5h832q13 0 22.5 9.5t9.5 22.5zM1152 1248v-1088q0 -66 -47 -113t-113 -47h-832
+q-66 0 -113 47t-47 113v1088q0 66 47 113t113 47h832q66 0 113 -47t47 -113z" />
+    <glyph glyph-name="mobile_phone" unicode="&#xf10b;" horiz-adv-x="768" 
+d="M464 128q0 33 -23.5 56.5t-56.5 23.5t-56.5 -23.5t-23.5 -56.5t23.5 -56.5t56.5 -23.5t56.5 23.5t23.5 56.5zM672 288v704q0 13 -9.5 22.5t-22.5 9.5h-512q-13 0 -22.5 -9.5t-9.5 -22.5v-704q0 -13 9.5 -22.5t22.5 -9.5h512q13 0 22.5 9.5t9.5 22.5zM480 1136
+q0 16 -16 16h-160q-16 0 -16 -16t16 -16h160q16 0 16 16zM768 1152v-1024q0 -52 -38 -90t-90 -38h-512q-52 0 -90 38t-38 90v1024q0 52 38 90t90 38h512q52 0 90 -38t38 -90z" />
+    <glyph glyph-name="circle_blank" unicode="&#xf10c;" 
+d="M768 1184q-148 0 -273 -73t-198 -198t-73 -273t73 -273t198 -198t273 -73t273 73t198 198t73 273t-73 273t-198 198t-273 73zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103
+t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="quote_left" unicode="&#xf10d;" horiz-adv-x="1664" 
+d="M768 576v-384q0 -80 -56 -136t-136 -56h-384q-80 0 -136 56t-56 136v704q0 104 40.5 198.5t109.5 163.5t163.5 109.5t198.5 40.5h64q26 0 45 -19t19 -45v-128q0 -26 -19 -45t-45 -19h-64q-106 0 -181 -75t-75 -181v-32q0 -40 28 -68t68 -28h224q80 0 136 -56t56 -136z
+M1664 576v-384q0 -80 -56 -136t-136 -56h-384q-80 0 -136 56t-56 136v704q0 104 40.5 198.5t109.5 163.5t163.5 109.5t198.5 40.5h64q26 0 45 -19t19 -45v-128q0 -26 -19 -45t-45 -19h-64q-106 0 -181 -75t-75 -181v-32q0 -40 28 -68t68 -28h224q80 0 136 -56t56 -136z" />
+    <glyph glyph-name="quote_right" unicode="&#xf10e;" horiz-adv-x="1664" 
+d="M768 1216v-704q0 -104 -40.5 -198.5t-109.5 -163.5t-163.5 -109.5t-198.5 -40.5h-64q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h64q106 0 181 75t75 181v32q0 40 -28 68t-68 28h-224q-80 0 -136 56t-56 136v384q0 80 56 136t136 56h384q80 0 136 -56t56 -136zM1664 1216
+v-704q0 -104 -40.5 -198.5t-109.5 -163.5t-163.5 -109.5t-198.5 -40.5h-64q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h64q106 0 181 75t75 181v32q0 40 -28 68t-68 28h-224q-80 0 -136 56t-56 136v384q0 80 56 136t136 56h384q80 0 136 -56t56 -136z" />
+    <glyph glyph-name="spinner" unicode="&#xf110;" horiz-adv-x="1792" 
+d="M526 142q0 -53 -37.5 -90.5t-90.5 -37.5q-52 0 -90 38t-38 90q0 53 37.5 90.5t90.5 37.5t90.5 -37.5t37.5 -90.5zM1024 -64q0 -53 -37.5 -90.5t-90.5 -37.5t-90.5 37.5t-37.5 90.5t37.5 90.5t90.5 37.5t90.5 -37.5t37.5 -90.5zM320 640q0 -53 -37.5 -90.5t-90.5 -37.5
+t-90.5 37.5t-37.5 90.5t37.5 90.5t90.5 37.5t90.5 -37.5t37.5 -90.5zM1522 142q0 -52 -38 -90t-90 -38q-53 0 -90.5 37.5t-37.5 90.5t37.5 90.5t90.5 37.5t90.5 -37.5t37.5 -90.5zM558 1138q0 -66 -47 -113t-113 -47t-113 47t-47 113t47 113t113 47t113 -47t47 -113z
+M1728 640q0 -53 -37.5 -90.5t-90.5 -37.5t-90.5 37.5t-37.5 90.5t37.5 90.5t90.5 37.5t90.5 -37.5t37.5 -90.5zM1088 1344q0 -80 -56 -136t-136 -56t-136 56t-56 136t56 136t136 56t136 -56t56 -136zM1618 1138q0 -93 -66 -158.5t-158 -65.5q-93 0 -158.5 65.5t-65.5 158.5
+q0 92 65.5 158t158.5 66q92 0 158 -66t66 -158z" />
+    <glyph glyph-name="circle" unicode="&#xf111;" 
+d="M1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="reply" unicode="&#xf112;" horiz-adv-x="1792" 
+d="M1792 416q0 -166 -127 -451q-3 -7 -10.5 -24t-13.5 -30t-13 -22q-12 -17 -28 -17q-15 0 -23.5 10t-8.5 25q0 9 2.5 26.5t2.5 23.5q5 68 5 123q0 101 -17.5 181t-48.5 138.5t-80 101t-105.5 69.5t-133 42.5t-154 21.5t-175.5 6h-224v-256q0 -26 -19 -45t-45 -19t-45 19
+l-512 512q-19 19 -19 45t19 45l512 512q19 19 45 19t45 -19t19 -45v-256h224q713 0 875 -403q53 -134 53 -333z" />
+    <glyph glyph-name="github_alt" unicode="&#xf113;" horiz-adv-x="1664" 
+d="M640 320q0 -40 -12.5 -82t-43 -76t-72.5 -34t-72.5 34t-43 76t-12.5 82t12.5 82t43 76t72.5 34t72.5 -34t43 -76t12.5 -82zM1280 320q0 -40 -12.5 -82t-43 -76t-72.5 -34t-72.5 34t-43 76t-12.5 82t12.5 82t43 76t72.5 34t72.5 -34t43 -76t12.5 -82zM1440 320
+q0 120 -69 204t-187 84q-41 0 -195 -21q-71 -11 -157 -11t-157 11q-152 21 -195 21q-118 0 -187 -84t-69 -204q0 -88 32 -153.5t81 -103t122 -60t140 -29.5t149 -7h168q82 0 149 7t140 29.5t122 60t81 103t32 153.5zM1664 496q0 -207 -61 -331q-38 -77 -105.5 -133t-141 -86
+t-170 -47.5t-171.5 -22t-167 -4.5q-78 0 -142 3t-147.5 12.5t-152.5 30t-137 51.5t-121 81t-86 115q-62 123 -62 331q0 237 136 396q-27 82 -27 170q0 116 51 218q108 0 190 -39.5t189 -123.5q147 35 309 35q148 0 280 -32q105 82 187 121t189 39q51 -102 51 -218
+q0 -87 -27 -168q136 -160 136 -398z" />
+    <glyph glyph-name="folder_close_alt" unicode="&#xf114;" horiz-adv-x="1664" 
+d="M1536 224v704q0 40 -28 68t-68 28h-704q-40 0 -68 28t-28 68v64q0 40 -28 68t-68 28h-320q-40 0 -68 -28t-28 -68v-960q0 -40 28 -68t68 -28h1216q40 0 68 28t28 68zM1664 928v-704q0 -92 -66 -158t-158 -66h-1216q-92 0 -158 66t-66 158v960q0 92 66 158t158 66h320
+q92 0 158 -66t66 -158v-32h672q92 0 158 -66t66 -158z" />
+    <glyph glyph-name="folder_open_alt" unicode="&#xf115;" horiz-adv-x="1920" 
+d="M1781 605q0 35 -53 35h-1088q-40 0 -85.5 -21.5t-71.5 -52.5l-294 -363q-18 -24 -18 -40q0 -35 53 -35h1088q40 0 86 22t71 53l294 363q18 22 18 39zM640 768h768v160q0 40 -28 68t-68 28h-576q-40 0 -68 28t-28 68v64q0 40 -28 68t-68 28h-320q-40 0 -68 -28t-28 -68
+v-853l256 315q44 53 116 87.5t140 34.5zM1909 605q0 -62 -46 -120l-295 -363q-43 -53 -116 -87.5t-140 -34.5h-1088q-92 0 -158 66t-66 158v960q0 92 66 158t158 66h320q92 0 158 -66t66 -158v-32h544q92 0 158 -66t66 -158v-160h192q54 0 99 -24.5t67 -70.5q15 -32 15 -68z
+" />
+    <glyph glyph-name="expand_alt" unicode="&#xf116;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="collapse_alt" unicode="&#xf117;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="smile" unicode="&#xf118;" 
+d="M1134 461q-37 -121 -138 -195t-228 -74t-228 74t-138 195q-8 25 4 48.5t38 31.5q25 8 48.5 -4t31.5 -38q25 -80 92.5 -129.5t151.5 -49.5t151.5 49.5t92.5 129.5q8 26 32 38t49 4t37 -31.5t4 -48.5zM640 896q0 -53 -37.5 -90.5t-90.5 -37.5t-90.5 37.5t-37.5 90.5
+t37.5 90.5t90.5 37.5t90.5 -37.5t37.5 -90.5zM1152 896q0 -53 -37.5 -90.5t-90.5 -37.5t-90.5 37.5t-37.5 90.5t37.5 90.5t90.5 37.5t90.5 -37.5t37.5 -90.5zM1408 640q0 130 -51 248.5t-136.5 204t-204 136.5t-248.5 51t-248.5 -51t-204 -136.5t-136.5 -204t-51 -248.5
+t51 -248.5t136.5 -204t204 -136.5t248.5 -51t248.5 51t204 136.5t136.5 204t51 248.5zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="frown" unicode="&#xf119;" 
+d="M1134 307q8 -25 -4 -48.5t-37 -31.5t-49 4t-32 38q-25 80 -92.5 129.5t-151.5 49.5t-151.5 -49.5t-92.5 -129.5q-8 -26 -31.5 -38t-48.5 -4q-26 8 -38 31.5t-4 48.5q37 121 138 195t228 74t228 -74t138 -195zM640 896q0 -53 -37.5 -90.5t-90.5 -37.5t-90.5 37.5
+t-37.5 90.5t37.5 90.5t90.5 37.5t90.5 -37.5t37.5 -90.5zM1152 896q0 -53 -37.5 -90.5t-90.5 -37.5t-90.5 37.5t-37.5 90.5t37.5 90.5t90.5 37.5t90.5 -37.5t37.5 -90.5zM1408 640q0 130 -51 248.5t-136.5 204t-204 136.5t-248.5 51t-248.5 -51t-204 -136.5t-136.5 -204
+t-51 -248.5t51 -248.5t136.5 -204t204 -136.5t248.5 -51t248.5 51t204 136.5t136.5 204t51 248.5zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="meh" unicode="&#xf11a;" 
+d="M1152 448q0 -26 -19 -45t-45 -19h-640q-26 0 -45 19t-19 45t19 45t45 19h640q26 0 45 -19t19 -45zM640 896q0 -53 -37.5 -90.5t-90.5 -37.5t-90.5 37.5t-37.5 90.5t37.5 90.5t90.5 37.5t90.5 -37.5t37.5 -90.5zM1152 896q0 -53 -37.5 -90.5t-90.5 -37.5t-90.5 37.5
+t-37.5 90.5t37.5 90.5t90.5 37.5t90.5 -37.5t37.5 -90.5zM1408 640q0 130 -51 248.5t-136.5 204t-204 136.5t-248.5 51t-248.5 -51t-204 -136.5t-136.5 -204t-51 -248.5t51 -248.5t136.5 -204t204 -136.5t248.5 -51t248.5 51t204 136.5t136.5 204t51 248.5zM1536 640
+q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="gamepad" unicode="&#xf11b;" horiz-adv-x="1920" 
+d="M832 448v128q0 14 -9 23t-23 9h-192v192q0 14 -9 23t-23 9h-128q-14 0 -23 -9t-9 -23v-192h-192q-14 0 -23 -9t-9 -23v-128q0 -14 9 -23t23 -9h192v-192q0 -14 9 -23t23 -9h128q14 0 23 9t9 23v192h192q14 0 23 9t9 23zM1408 384q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5
+t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM1664 640q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM1920 512q0 -212 -150 -362t-362 -150q-192 0 -338 128h-220q-146 -128 -338 -128q-212 0 -362 150
+t-150 362t150 362t362 150h896q212 0 362 -150t150 -362z" />
+    <glyph glyph-name="keyboard" unicode="&#xf11c;" horiz-adv-x="1920" 
+d="M384 368v-96q0 -16 -16 -16h-96q-16 0 -16 16v96q0 16 16 16h96q16 0 16 -16zM512 624v-96q0 -16 -16 -16h-224q-16 0 -16 16v96q0 16 16 16h224q16 0 16 -16zM384 880v-96q0 -16 -16 -16h-96q-16 0 -16 16v96q0 16 16 16h96q16 0 16 -16zM1408 368v-96q0 -16 -16 -16
+h-864q-16 0 -16 16v96q0 16 16 16h864q16 0 16 -16zM768 624v-96q0 -16 -16 -16h-96q-16 0 -16 16v96q0 16 16 16h96q16 0 16 -16zM640 880v-96q0 -16 -16 -16h-96q-16 0 -16 16v96q0 16 16 16h96q16 0 16 -16zM1024 624v-96q0 -16 -16 -16h-96q-16 0 -16 16v96q0 16 16 16
+h96q16 0 16 -16zM896 880v-96q0 -16 -16 -16h-96q-16 0 -16 16v96q0 16 16 16h96q16 0 16 -16zM1280 624v-96q0 -16 -16 -16h-96q-16 0 -16 16v96q0 16 16 16h96q16 0 16 -16zM1664 368v-96q0 -16 -16 -16h-96q-16 0 -16 16v96q0 16 16 16h96q16 0 16 -16zM1152 880v-96
+q0 -16 -16 -16h-96q-16 0 -16 16v96q0 16 16 16h96q16 0 16 -16zM1408 880v-96q0 -16 -16 -16h-96q-16 0 -16 16v96q0 16 16 16h96q16 0 16 -16zM1664 880v-352q0 -16 -16 -16h-224q-16 0 -16 16v96q0 16 16 16h112v240q0 16 16 16h96q16 0 16 -16zM1792 128v896h-1664v-896
+h1664zM1920 1024v-896q0 -53 -37.5 -90.5t-90.5 -37.5h-1664q-53 0 -90.5 37.5t-37.5 90.5v896q0 53 37.5 90.5t90.5 37.5h1664q53 0 90.5 -37.5t37.5 -90.5z" />
+    <glyph glyph-name="flag_alt" unicode="&#xf11d;" horiz-adv-x="1792" 
+d="M1664 491v616q-169 -91 -306 -91q-82 0 -145 32q-100 49 -184 76.5t-178 27.5q-173 0 -403 -127v-599q245 113 433 113q55 0 103.5 -7.5t98 -26t77 -31t82.5 -39.5l28 -14q44 -22 101 -22q120 0 293 92zM320 1280q0 -35 -17.5 -64t-46.5 -46v-1266q0 -14 -9 -23t-23 -9
+h-64q-14 0 -23 9t-9 23v1266q-29 17 -46.5 46t-17.5 64q0 53 37.5 90.5t90.5 37.5t90.5 -37.5t37.5 -90.5zM1792 1216v-763q0 -39 -35 -57q-10 -5 -17 -9q-218 -116 -369 -116q-88 0 -158 35l-28 14q-64 33 -99 48t-91 29t-114 14q-102 0 -235.5 -44t-228.5 -102
+q-15 -9 -33 -9q-16 0 -32 8q-32 19 -32 56v742q0 35 31 55q35 21 78.5 42.5t114 52t152.5 49.5t155 19q112 0 209 -31t209 -86q38 -19 89 -19q122 0 310 112q22 12 31 17q31 16 62 -2q31 -20 31 -55z" />
+    <glyph glyph-name="flag_checkered" unicode="&#xf11e;" horiz-adv-x="1792" 
+d="M832 536v192q-181 -16 -384 -117v-185q205 96 384 110zM832 954v197q-172 -8 -384 -126v-189q215 111 384 118zM1664 491v184q-235 -116 -384 -71v224q-20 6 -39 15q-5 3 -33 17t-34.5 17t-31.5 15t-34.5 15.5t-32.5 13t-36 12.5t-35 8.5t-39.5 7.5t-39.5 4t-44 2
+q-23 0 -49 -3v-222h19q102 0 192.5 -29t197.5 -82q19 -9 39 -15v-188q42 -17 91 -17q120 0 293 92zM1664 918v189q-169 -91 -306 -91q-45 0 -78 8v-196q148 -42 384 90zM320 1280q0 -35 -17.5 -64t-46.5 -46v-1266q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9t-9 23v1266
+q-29 17 -46.5 46t-17.5 64q0 53 37.5 90.5t90.5 37.5t90.5 -37.5t37.5 -90.5zM1792 1216v-763q0 -39 -35 -57q-10 -5 -17 -9q-218 -116 -369 -116q-88 0 -158 35l-28 14q-64 33 -99 48t-91 29t-114 14q-102 0 -235.5 -44t-228.5 -102q-15 -9 -33 -9q-16 0 -32 8
+q-32 19 -32 56v742q0 35 31 55q35 21 78.5 42.5t114 52t152.5 49.5t155 19q112 0 209 -31t209 -86q38 -19 89 -19q122 0 310 112q22 12 31 17q31 16 62 -2q31 -20 31 -55z" />
+    <glyph glyph-name="terminal" unicode="&#xf120;" horiz-adv-x="1664" 
+d="M585 553l-466 -466q-10 -10 -23 -10t-23 10l-50 50q-10 10 -10 23t10 23l393 393l-393 393q-10 10 -10 23t10 23l50 50q10 10 23 10t23 -10l466 -466q10 -10 10 -23t-10 -23zM1664 96v-64q0 -14 -9 -23t-23 -9h-960q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h960q14 0 23 -9
+t9 -23z" />
+    <glyph glyph-name="code" unicode="&#xf121;" horiz-adv-x="1920" 
+d="M617 137l-50 -50q-10 -10 -23 -10t-23 10l-466 466q-10 10 -10 23t10 23l466 466q10 10 23 10t23 -10l50 -50q10 -10 10 -23t-10 -23l-393 -393l393 -393q10 -10 10 -23t-10 -23zM1208 1204l-373 -1291q-4 -13 -15.5 -19.5t-23.5 -2.5l-62 17q-13 4 -19.5 15.5t-2.5 24.5
+l373 1291q4 13 15.5 19.5t23.5 2.5l62 -17q13 -4 19.5 -15.5t2.5 -24.5zM1865 553l-466 -466q-10 -10 -23 -10t-23 10l-50 50q-10 10 -10 23t10 23l393 393l-393 393q-10 10 -10 23t10 23l50 50q10 10 23 10t23 -10l466 -466q10 -10 10 -23t-10 -23z" />
+    <glyph glyph-name="reply_all" unicode="&#xf122;" horiz-adv-x="1792" 
+d="M640 454v-70q0 -42 -39 -59q-13 -5 -25 -5q-27 0 -45 19l-512 512q-19 19 -19 45t19 45l512 512q29 31 70 14q39 -17 39 -59v-69l-397 -398q-19 -19 -19 -45t19 -45zM1792 416q0 -58 -17 -133.5t-38.5 -138t-48 -125t-40.5 -90.5l-20 -40q-8 -17 -28 -17q-6 0 -9 1
+q-25 8 -23 34q43 400 -106 565q-64 71 -170.5 110.5t-267.5 52.5v-251q0 -42 -39 -59q-13 -5 -25 -5q-27 0 -45 19l-512 512q-19 19 -19 45t19 45l512 512q29 31 70 14q39 -17 39 -59v-262q411 -28 599 -221q169 -173 169 -509z" />
+    <glyph glyph-name="star_half_empty" unicode="&#xf123;" horiz-adv-x="1664" 
+d="M1186 579l257 250l-356 52l-66 10l-30 60l-159 322v-963l59 -31l318 -168l-60 355l-12 66zM1638 841l-363 -354l86 -500q5 -33 -6 -51.5t-34 -18.5q-17 0 -40 12l-449 236l-449 -236q-23 -12 -40 -12q-23 0 -34 18.5t-6 51.5l86 500l-364 354q-32 32 -23 59.5t54 34.5
+l502 73l225 455q20 41 49 41q28 0 49 -41l225 -455l502 -73q45 -7 54 -34.5t-24 -59.5z" />
+    <glyph glyph-name="location_arrow" unicode="&#xf124;" horiz-adv-x="1408" 
+d="M1401 1187l-640 -1280q-17 -35 -57 -35q-5 0 -15 2q-22 5 -35.5 22.5t-13.5 39.5v576h-576q-22 0 -39.5 13.5t-22.5 35.5t4 42t29 30l1280 640q13 7 29 7q27 0 45 -19q15 -14 18.5 -34.5t-6.5 -39.5z" />
+    <glyph glyph-name="crop" unicode="&#xf125;" horiz-adv-x="1664" 
+d="M557 256h595v595zM512 301l595 595h-595v-595zM1664 224v-192q0 -14 -9 -23t-23 -9h-224v-224q0 -14 -9 -23t-23 -9h-192q-14 0 -23 9t-9 23v224h-864q-14 0 -23 9t-9 23v864h-224q-14 0 -23 9t-9 23v192q0 14 9 23t23 9h224v224q0 14 9 23t23 9h192q14 0 23 -9t9 -23
+v-224h851l246 247q10 9 23 9t23 -9q9 -10 9 -23t-9 -23l-247 -246v-851h224q14 0 23 -9t9 -23z" />
+    <glyph glyph-name="code_fork" unicode="&#xf126;" horiz-adv-x="1024" 
+d="M288 64q0 40 -28 68t-68 28t-68 -28t-28 -68t28 -68t68 -28t68 28t28 68zM288 1216q0 40 -28 68t-68 28t-68 -28t-28 -68t28 -68t68 -28t68 28t28 68zM928 1088q0 40 -28 68t-68 28t-68 -28t-28 -68t28 -68t68 -28t68 28t28 68zM1024 1088q0 -52 -26 -96.5t-70 -69.5
+q-2 -287 -226 -414q-67 -38 -203 -81q-128 -40 -169.5 -71t-41.5 -100v-26q44 -25 70 -69.5t26 -96.5q0 -80 -56 -136t-136 -56t-136 56t-56 136q0 52 26 96.5t70 69.5v820q-44 25 -70 69.5t-26 96.5q0 80 56 136t136 56t136 -56t56 -136q0 -52 -26 -96.5t-70 -69.5v-497
+q54 26 154 57q55 17 87.5 29.5t70.5 31t59 39.5t40.5 51t28 69.5t8.5 91.5q-44 25 -70 69.5t-26 96.5q0 80 56 136t136 56t136 -56t56 -136z" />
+    <glyph glyph-name="unlink" unicode="&#xf127;" horiz-adv-x="1664" 
+d="M439 265l-256 -256q-11 -9 -23 -9t-23 9q-9 10 -9 23t9 23l256 256q10 9 23 9t23 -9q9 -10 9 -23t-9 -23zM608 224v-320q0 -14 -9 -23t-23 -9t-23 9t-9 23v320q0 14 9 23t23 9t23 -9t9 -23zM384 448q0 -14 -9 -23t-23 -9h-320q-14 0 -23 9t-9 23t9 23t23 9h320
+q14 0 23 -9t9 -23zM1648 320q0 -120 -85 -203l-147 -146q-83 -83 -203 -83q-121 0 -204 85l-334 335q-21 21 -42 56l239 18l273 -274q27 -27 68 -27.5t68 26.5l147 146q28 28 28 67q0 40 -28 68l-274 275l18 239q35 -21 56 -42l336 -336q84 -86 84 -204zM1031 1044l-239 -18
+l-273 274q-28 28 -68 28q-39 0 -68 -27l-147 -146q-28 -28 -28 -67q0 -40 28 -68l274 -274l-18 -240q-35 21 -56 42l-336 336q-84 86 -84 204q0 120 85 203l147 146q83 83 203 83q121 0 204 -85l334 -335q21 -21 42 -56zM1664 960q0 -14 -9 -23t-23 -9h-320q-14 0 -23 9
+t-9 23t9 23t23 9h320q14 0 23 -9t9 -23zM1120 1504v-320q0 -14 -9 -23t-23 -9t-23 9t-9 23v320q0 14 9 23t23 9t23 -9t9 -23zM1527 1353l-256 -256q-11 -9 -23 -9t-23 9q-9 10 -9 23t9 23l256 256q10 9 23 9t23 -9q9 -10 9 -23t-9 -23z" />
+    <glyph glyph-name="question" unicode="&#xf128;" horiz-adv-x="1024" 
+d="M704 280v-240q0 -16 -12 -28t-28 -12h-240q-16 0 -28 12t-12 28v240q0 16 12 28t28 12h240q16 0 28 -12t12 -28zM1020 880q0 -54 -15.5 -101t-35 -76.5t-55 -59.5t-57.5 -43.5t-61 -35.5q-41 -23 -68.5 -65t-27.5 -67q0 -17 -12 -32.5t-28 -15.5h-240q-15 0 -25.5 18.5
+t-10.5 37.5v45q0 83 65 156.5t143 108.5q59 27 84 56t25 76q0 42 -46.5 74t-107.5 32q-65 0 -108 -29q-35 -25 -107 -115q-13 -16 -31 -16q-12 0 -25 8l-164 125q-13 10 -15.5 25t5.5 28q160 266 464 266q80 0 161 -31t146 -83t106 -127.5t41 -158.5z" />
+    <glyph glyph-name="_279" unicode="&#xf129;" horiz-adv-x="640" 
+d="M640 192v-128q0 -26 -19 -45t-45 -19h-512q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h64v384h-64q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h384q26 0 45 -19t19 -45v-576h64q26 0 45 -19t19 -45zM512 1344v-192q0 -26 -19 -45t-45 -19h-256q-26 0 -45 19t-19 45v192
+q0 26 19 45t45 19h256q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="exclamation" unicode="&#xf12a;" horiz-adv-x="640" 
+d="M512 288v-224q0 -26 -19 -45t-45 -19h-256q-26 0 -45 19t-19 45v224q0 26 19 45t45 19h256q26 0 45 -19t19 -45zM542 1344l-28 -768q-1 -26 -20.5 -45t-45.5 -19h-256q-26 0 -45.5 19t-20.5 45l-28 768q-1 26 17.5 45t44.5 19h320q26 0 44.5 -19t17.5 -45z" />
+    <glyph glyph-name="superscript" unicode="&#xf12b;" 
+d="M897 167v-167h-248l-159 252l-24 42q-8 9 -11 21h-3q-1 -3 -2.5 -6.5t-3.5 -8t-3 -6.5q-10 -20 -25 -44l-155 -250h-258v167h128l197 291l-185 272h-137v168h276l139 -228q2 -4 23 -42q8 -9 11 -21h3q3 9 11 21l25 42l140 228h257v-168h-125l-184 -267l204 -296h109z
+M1534 846v-206h-514l-3 27q-4 28 -4 46q0 64 26 117t65 86.5t84 65t84 54.5t65 54t26 64q0 38 -29.5 62.5t-70.5 24.5q-51 0 -97 -39q-14 -11 -36 -38l-105 92q26 37 63 66q83 65 188 65q110 0 178 -59.5t68 -158.5q0 -56 -24.5 -103t-62 -76.5t-81.5 -58.5t-82 -50.5
+t-65.5 -51.5t-30.5 -63h232v80h126z" />
+    <glyph glyph-name="subscript" unicode="&#xf12c;" 
+d="M897 167v-167h-248l-159 252l-24 42q-8 9 -11 21h-3q-1 -3 -2.5 -6.5t-3.5 -8t-3 -6.5q-10 -20 -25 -44l-155 -250h-258v167h128l197 291l-185 272h-137v168h276l139 -228q2 -4 23 -42q8 -9 11 -21h3q3 9 11 21l25 42l140 228h257v-168h-125l-184 -267l204 -296h109z
+M1536 -50v-206h-514l-4 27q-3 45 -3 46q0 64 26 117t65 86.5t84 65t84 54.5t65 54t26 64q0 38 -29.5 62.5t-70.5 24.5q-51 0 -97 -39q-14 -11 -36 -38l-105 92q26 37 63 66q80 65 188 65q110 0 178 -59.5t68 -158.5q0 -66 -34.5 -118.5t-84 -86t-99.5 -62.5t-87 -63t-41 -73
+h232v80h126z" />
+    <glyph glyph-name="_283" unicode="&#xf12d;" horiz-adv-x="1920" 
+d="M896 128l336 384h-768l-336 -384h768zM1909 1205q15 -34 9.5 -71.5t-30.5 -65.5l-896 -1024q-38 -44 -96 -44h-768q-38 0 -69.5 20.5t-47.5 54.5q-15 34 -9.5 71.5t30.5 65.5l896 1024q38 44 96 44h768q38 0 69.5 -20.5t47.5 -54.5z" />
+    <glyph glyph-name="puzzle_piece" unicode="&#xf12e;" horiz-adv-x="1664" 
+d="M1664 438q0 -81 -44.5 -135t-123.5 -54q-41 0 -77.5 17.5t-59 38t-56.5 38t-71 17.5q-110 0 -110 -124q0 -39 16 -115t15 -115v-5q-22 0 -33 -1q-34 -3 -97.5 -11.5t-115.5 -13.5t-98 -5q-61 0 -103 26.5t-42 83.5q0 37 17.5 71t38 56.5t38 59t17.5 77.5q0 79 -54 123.5
+t-135 44.5q-84 0 -143 -45.5t-59 -127.5q0 -43 15 -83t33.5 -64.5t33.5 -53t15 -50.5q0 -45 -46 -89q-37 -35 -117 -35q-95 0 -245 24q-9 2 -27.5 4t-27.5 4l-13 2q-1 0 -3 1q-2 0 -2 1v1024q2 -1 17.5 -3.5t34 -5t21.5 -3.5q150 -24 245 -24q80 0 117 35q46 44 46 89
+q0 22 -15 50.5t-33.5 53t-33.5 64.5t-15 83q0 82 59 127.5t144 45.5q80 0 134 -44.5t54 -123.5q0 -41 -17.5 -77.5t-38 -59t-38 -56.5t-17.5 -71q0 -57 42 -83.5t103 -26.5q64 0 180 15t163 17v-2q-1 -2 -3.5 -17.5t-5 -34t-3.5 -21.5q-24 -150 -24 -245q0 -80 35 -117
+q44 -46 89 -46q22 0 50.5 15t53 33.5t64.5 33.5t83 15q82 0 127.5 -59t45.5 -143z" />
+    <glyph glyph-name="microphone" unicode="&#xf130;" horiz-adv-x="1152" 
+d="M1152 832v-128q0 -221 -147.5 -384.5t-364.5 -187.5v-132h256q26 0 45 -19t19 -45t-19 -45t-45 -19h-640q-26 0 -45 19t-19 45t19 45t45 19h256v132q-217 24 -364.5 187.5t-147.5 384.5v128q0 26 19 45t45 19t45 -19t19 -45v-128q0 -185 131.5 -316.5t316.5 -131.5
+t316.5 131.5t131.5 316.5v128q0 26 19 45t45 19t45 -19t19 -45zM896 1216v-512q0 -132 -94 -226t-226 -94t-226 94t-94 226v512q0 132 94 226t226 94t226 -94t94 -226z" />
+    <glyph glyph-name="microphone_off" unicode="&#xf131;" horiz-adv-x="1408" 
+d="M271 591l-101 -101q-42 103 -42 214v128q0 26 19 45t45 19t45 -19t19 -45v-128q0 -53 15 -113zM1385 1193l-361 -361v-128q0 -132 -94 -226t-226 -94q-55 0 -109 19l-96 -96q97 -51 205 -51q185 0 316.5 131.5t131.5 316.5v128q0 26 19 45t45 19t45 -19t19 -45v-128
+q0 -221 -147.5 -384.5t-364.5 -187.5v-132h256q26 0 45 -19t19 -45t-19 -45t-45 -19h-640q-26 0 -45 19t-19 45t19 45t45 19h256v132q-125 13 -235 81l-254 -254q-10 -10 -23 -10t-23 10l-82 82q-10 10 -10 23t10 23l1234 1234q10 10 23 10t23 -10l82 -82q10 -10 10 -23
+t-10 -23zM1005 1325l-621 -621v512q0 132 94 226t226 94q102 0 184.5 -59t116.5 -152z" />
+    <glyph glyph-name="shield" unicode="&#xf132;" horiz-adv-x="1280" 
+d="M1088 576v640h-448v-1137q119 63 213 137q235 184 235 360zM1280 1344v-768q0 -86 -33.5 -170.5t-83 -150t-118 -127.5t-126.5 -103t-121 -77.5t-89.5 -49.5t-42.5 -20q-12 -6 -26 -6t-26 6q-16 7 -42.5 20t-89.5 49.5t-121 77.5t-126.5 103t-118 127.5t-83 150
+t-33.5 170.5v768q0 26 19 45t45 19h1152q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="calendar_empty" unicode="&#xf133;" horiz-adv-x="1664" 
+d="M128 -128h1408v1024h-1408v-1024zM512 1088v288q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-288q0 -14 9 -23t23 -9h64q14 0 23 9t9 23zM1280 1088v288q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-288q0 -14 9 -23t23 -9h64q14 0 23 9t9 23zM1664 1152v-1280
+q0 -52 -38 -90t-90 -38h-1408q-52 0 -90 38t-38 90v1280q0 52 38 90t90 38h128v96q0 66 47 113t113 47h64q66 0 113 -47t47 -113v-96h384v96q0 66 47 113t113 47h64q66 0 113 -47t47 -113v-96h128q52 0 90 -38t38 -90z" />
+    <glyph glyph-name="fire_extinguisher" unicode="&#xf134;" horiz-adv-x="1408" 
+d="M512 1344q0 26 -19 45t-45 19t-45 -19t-19 -45t19 -45t45 -19t45 19t19 45zM1408 1376v-320q0 -16 -12 -25q-8 -7 -20 -7q-4 0 -7 1l-448 96q-11 2 -18 11t-7 20h-256v-102q111 -23 183.5 -111t72.5 -203v-800q0 -26 -19 -45t-45 -19h-512q-26 0 -45 19t-19 45v800
+q0 106 62.5 190.5t161.5 114.5v111h-32q-59 0 -115 -23.5t-91.5 -53t-66 -66.5t-40.5 -53.5t-14 -24.5q-17 -35 -57 -35q-16 0 -29 7q-23 12 -31.5 37t3.5 49q5 10 14.5 26t37.5 53.5t60.5 70t85 67t108.5 52.5q-25 42 -25 86q0 66 47 113t113 47t113 -47t47 -113
+q0 -33 -14 -64h302q0 11 7 20t18 11l448 96q3 1 7 1q12 0 20 -7q12 -9 12 -25z" />
+    <glyph glyph-name="rocket" unicode="&#xf135;" horiz-adv-x="1664" 
+d="M1440 1088q0 40 -28 68t-68 28t-68 -28t-28 -68t28 -68t68 -28t68 28t28 68zM1664 1376q0 -249 -75.5 -430.5t-253.5 -360.5q-81 -80 -195 -176l-20 -379q-2 -16 -16 -26l-384 -224q-7 -4 -16 -4q-12 0 -23 9l-64 64q-13 14 -8 32l85 276l-281 281l-276 -85q-3 -1 -9 -1
+q-14 0 -23 9l-64 64q-17 19 -5 39l224 384q10 14 26 16l379 20q96 114 176 195q188 187 358 258t431 71q14 0 24 -9.5t10 -22.5z" />
+    <glyph glyph-name="maxcdn" unicode="&#xf136;" horiz-adv-x="1792" 
+d="M1745 763l-164 -763h-334l178 832q13 56 -15 88q-27 33 -83 33h-169l-204 -953h-334l204 953h-286l-204 -953h-334l204 953l-153 327h1276q101 0 189.5 -40.5t147.5 -113.5q60 -73 81 -168.5t0 -194.5z" />
+    <glyph glyph-name="chevron_sign_left" unicode="&#xf137;" 
+d="M909 141l102 102q19 19 19 45t-19 45l-307 307l307 307q19 19 19 45t-19 45l-102 102q-19 19 -45 19t-45 -19l-454 -454q-19 -19 -19 -45t19 -45l454 -454q19 -19 45 -19t45 19zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5
+t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="chevron_sign_right" unicode="&#xf138;" 
+d="M717 141l454 454q19 19 19 45t-19 45l-454 454q-19 19 -45 19t-45 -19l-102 -102q-19 -19 -19 -45t19 -45l307 -307l-307 -307q-19 -19 -19 -45t19 -45l102 -102q19 -19 45 -19t45 19zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5
+t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="chevron_sign_up" unicode="&#xf139;" 
+d="M1165 397l102 102q19 19 19 45t-19 45l-454 454q-19 19 -45 19t-45 -19l-454 -454q-19 -19 -19 -45t19 -45l102 -102q19 -19 45 -19t45 19l307 307l307 -307q19 -19 45 -19t45 19zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5
+t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="chevron_sign_down" unicode="&#xf13a;" 
+d="M813 237l454 454q19 19 19 45t-19 45l-102 102q-19 19 -45 19t-45 -19l-307 -307l-307 307q-19 19 -45 19t-45 -19l-102 -102q-19 -19 -19 -45t19 -45l454 -454q19 -19 45 -19t45 19zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5
+t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="html5" unicode="&#xf13b;" horiz-adv-x="1408" 
+d="M1130 939l16 175h-884l47 -534h612l-22 -228l-197 -53l-196 53l-13 140h-175l22 -278l362 -100h4v1l359 99l50 544h-644l-15 181h674zM0 1408h1408l-128 -1438l-578 -162l-574 162z" />
+    <glyph glyph-name="css3" unicode="&#xf13c;" horiz-adv-x="1792" 
+d="M275 1408h1505l-266 -1333l-804 -267l-698 267l71 356h297l-29 -147l422 -161l486 161l68 339h-1208l58 297h1209l38 191h-1208z" />
+    <glyph glyph-name="anchor" unicode="&#xf13d;" horiz-adv-x="1792" 
+d="M960 1280q0 26 -19 45t-45 19t-45 -19t-19 -45t19 -45t45 -19t45 19t19 45zM1792 352v-352q0 -22 -20 -30q-8 -2 -12 -2q-12 0 -23 9l-93 93q-119 -143 -318.5 -226.5t-429.5 -83.5t-429.5 83.5t-318.5 226.5l-93 -93q-9 -9 -23 -9q-4 0 -12 2q-20 8 -20 30v352
+q0 14 9 23t23 9h352q22 0 30 -20q8 -19 -7 -35l-100 -100q67 -91 189.5 -153.5t271.5 -82.5v647h-192q-26 0 -45 19t-19 45v128q0 26 19 45t45 19h192v163q-58 34 -93 92.5t-35 128.5q0 106 75 181t181 75t181 -75t75 -181q0 -70 -35 -128.5t-93 -92.5v-163h192q26 0 45 -19
+t19 -45v-128q0 -26 -19 -45t-45 -19h-192v-647q149 20 271.5 82.5t189.5 153.5l-100 100q-15 16 -7 35q8 20 30 20h352q14 0 23 -9t9 -23z" />
+    <glyph glyph-name="unlock_alt" unicode="&#xf13e;" horiz-adv-x="1152" 
+d="M1056 768q40 0 68 -28t28 -68v-576q0 -40 -28 -68t-68 -28h-960q-40 0 -68 28t-28 68v576q0 40 28 68t68 28h32v320q0 185 131.5 316.5t316.5 131.5t316.5 -131.5t131.5 -316.5q0 -26 -19 -45t-45 -19h-64q-26 0 -45 19t-19 45q0 106 -75 181t-181 75t-181 -75t-75 -181
+v-320h736z" />
+    <glyph glyph-name="bullseye" unicode="&#xf140;" 
+d="M1024 640q0 -106 -75 -181t-181 -75t-181 75t-75 181t75 181t181 75t181 -75t75 -181zM1152 640q0 159 -112.5 271.5t-271.5 112.5t-271.5 -112.5t-112.5 -271.5t112.5 -271.5t271.5 -112.5t271.5 112.5t112.5 271.5zM1280 640q0 -212 -150 -362t-362 -150t-362 150
+t-150 362t150 362t362 150t362 -150t150 -362zM1408 640q0 130 -51 248.5t-136.5 204t-204 136.5t-248.5 51t-248.5 -51t-204 -136.5t-136.5 -204t-51 -248.5t51 -248.5t136.5 -204t204 -136.5t248.5 -51t248.5 51t204 136.5t136.5 204t51 248.5zM1536 640
+q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="ellipsis_horizontal" unicode="&#xf141;" horiz-adv-x="1408" 
+d="M384 800v-192q0 -40 -28 -68t-68 -28h-192q-40 0 -68 28t-28 68v192q0 40 28 68t68 28h192q40 0 68 -28t28 -68zM896 800v-192q0 -40 -28 -68t-68 -28h-192q-40 0 -68 28t-28 68v192q0 40 28 68t68 28h192q40 0 68 -28t28 -68zM1408 800v-192q0 -40 -28 -68t-68 -28h-192
+q-40 0 -68 28t-28 68v192q0 40 28 68t68 28h192q40 0 68 -28t28 -68z" />
+    <glyph glyph-name="ellipsis_vertical" unicode="&#xf142;" horiz-adv-x="384" 
+d="M384 288v-192q0 -40 -28 -68t-68 -28h-192q-40 0 -68 28t-28 68v192q0 40 28 68t68 28h192q40 0 68 -28t28 -68zM384 800v-192q0 -40 -28 -68t-68 -28h-192q-40 0 -68 28t-28 68v192q0 40 28 68t68 28h192q40 0 68 -28t28 -68zM384 1312v-192q0 -40 -28 -68t-68 -28h-192
+q-40 0 -68 28t-28 68v192q0 40 28 68t68 28h192q40 0 68 -28t28 -68z" />
+    <glyph glyph-name="_303" unicode="&#xf143;" 
+d="M512 256q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM863 162q-13 233 -176.5 396.5t-396.5 176.5q-14 1 -24 -9t-10 -23v-128q0 -13 8.5 -22t21.5 -10q154 -11 264 -121t121 -264q1 -13 10 -21.5t22 -8.5h128
+q13 0 23 10t9 24zM1247 161q-5 154 -56 297.5t-139.5 260t-205 205t-260 139.5t-297.5 56q-14 1 -23 -9q-10 -10 -10 -23v-128q0 -13 9 -22t22 -10q204 -7 378 -111.5t278.5 -278.5t111.5 -378q1 -13 10 -22t22 -9h128q13 0 23 10q11 9 9 23zM1536 1120v-960
+q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="play_sign" unicode="&#xf144;" 
+d="M768 1408q209 0 385.5 -103t279.5 -279.5t103 -385.5t-103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103zM1152 585q32 18 32 55t-32 55l-544 320q-31 19 -64 1q-32 -19 -32 -56v-640q0 -37 32 -56
+q16 -8 32 -8q17 0 32 9z" />
+    <glyph glyph-name="ticket" unicode="&#xf145;" horiz-adv-x="1792" 
+d="M1024 1084l316 -316l-572 -572l-316 316zM813 105l618 618q19 19 19 45t-19 45l-362 362q-18 18 -45 18t-45 -18l-618 -618q-19 -19 -19 -45t19 -45l362 -362q18 -18 45 -18t45 18zM1702 742l-907 -908q-37 -37 -90.5 -37t-90.5 37l-126 126q56 56 56 136t-56 136
+t-136 56t-136 -56l-125 126q-37 37 -37 90.5t37 90.5l907 906q37 37 90.5 37t90.5 -37l125 -125q-56 -56 -56 -136t56 -136t136 -56t136 56l126 -125q37 -37 37 -90.5t-37 -90.5z" />
+    <glyph glyph-name="minus_sign_alt" unicode="&#xf146;" 
+d="M1280 576v128q0 26 -19 45t-45 19h-896q-26 0 -45 -19t-19 -45v-128q0 -26 19 -45t45 -19h896q26 0 45 19t19 45zM1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5
+t84.5 -203.5z" />
+    <glyph glyph-name="check_minus" unicode="&#xf147;" horiz-adv-x="1408" 
+d="M1152 736v-64q0 -14 -9 -23t-23 -9h-832q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h832q14 0 23 -9t9 -23zM1280 288v832q0 66 -47 113t-113 47h-832q-66 0 -113 -47t-47 -113v-832q0 -66 47 -113t113 -47h832q66 0 113 47t47 113zM1408 1120v-832q0 -119 -84.5 -203.5
+t-203.5 -84.5h-832q-119 0 -203.5 84.5t-84.5 203.5v832q0 119 84.5 203.5t203.5 84.5h832q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="level_up" unicode="&#xf148;" horiz-adv-x="1024" 
+d="M1018 933q-18 -37 -58 -37h-192v-864q0 -14 -9 -23t-23 -9h-704q-21 0 -29 18q-8 20 4 35l160 192q9 11 25 11h320v640h-192q-40 0 -58 37q-17 37 9 68l320 384q18 22 49 22t49 -22l320 -384q27 -32 9 -68z" />
+    <glyph glyph-name="level_down" unicode="&#xf149;" horiz-adv-x="1024" 
+d="M32 1280h704q13 0 22.5 -9.5t9.5 -23.5v-863h192q40 0 58 -37t-9 -69l-320 -384q-18 -22 -49 -22t-49 22l-320 384q-26 31 -9 69q18 37 58 37h192v640h-320q-14 0 -25 11l-160 192q-13 14 -4 34q9 19 29 19z" />
+    <glyph glyph-name="check_sign" unicode="&#xf14a;" 
+d="M685 237l614 614q19 19 19 45t-19 45l-102 102q-19 19 -45 19t-45 -19l-467 -467l-211 211q-19 19 -45 19t-45 -19l-102 -102q-19 -19 -19 -45t19 -45l358 -358q19 -19 45 -19t45 19zM1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5
+t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="edit_sign" unicode="&#xf14b;" 
+d="M404 428l152 -152l-52 -52h-56v96h-96v56zM818 818q14 -13 -3 -30l-291 -291q-17 -17 -30 -3q-14 13 3 30l291 291q17 17 30 3zM544 128l544 544l-288 288l-544 -544v-288h288zM1152 736l92 92q28 28 28 68t-28 68l-152 152q-28 28 -68 28t-68 -28l-92 -92zM1536 1120
+v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="_312" unicode="&#xf14c;" 
+d="M1280 608v480q0 26 -19 45t-45 19h-480q-42 0 -59 -39q-17 -41 14 -70l144 -144l-534 -534q-19 -19 -19 -45t19 -45l102 -102q19 -19 45 -19t45 19l534 534l144 -144q18 -19 45 -19q12 0 25 5q39 17 39 59zM1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960
+q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="share_sign" unicode="&#xf14d;" 
+d="M1005 435l352 352q19 19 19 45t-19 45l-352 352q-30 31 -69 14q-40 -17 -40 -59v-160q-119 0 -216 -19.5t-162.5 -51t-114 -79t-76.5 -95.5t-44.5 -109t-21.5 -111.5t-5 -110.5q0 -181 167 -404q11 -12 25 -12q7 0 13 3q22 9 19 33q-44 354 62 473q46 52 130 75.5
+t224 23.5v-160q0 -42 40 -59q12 -5 24 -5q26 0 45 19zM1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="compass" unicode="&#xf14e;" 
+d="M640 448l256 128l-256 128v-256zM1024 1039v-542l-512 -256v542zM1312 640q0 148 -73 273t-198 198t-273 73t-273 -73t-198 -198t-73 -273t73 -273t198 -198t273 -73t273 73t198 198t73 273zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103
+t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="collapse" unicode="&#xf150;" 
+d="M1145 861q18 -35 -5 -66l-320 -448q-19 -27 -52 -27t-52 27l-320 448q-23 31 -5 66q17 35 57 35h640q40 0 57 -35zM1280 160v960q0 13 -9.5 22.5t-22.5 9.5h-960q-13 0 -22.5 -9.5t-9.5 -22.5v-960q0 -13 9.5 -22.5t22.5 -9.5h960q13 0 22.5 9.5t9.5 22.5zM1536 1120
+v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="collapse_top" unicode="&#xf151;" 
+d="M1145 419q-17 -35 -57 -35h-640q-40 0 -57 35q-18 35 5 66l320 448q19 27 52 27t52 -27l320 -448q23 -31 5 -66zM1280 160v960q0 13 -9.5 22.5t-22.5 9.5h-960q-13 0 -22.5 -9.5t-9.5 -22.5v-960q0 -13 9.5 -22.5t22.5 -9.5h960q13 0 22.5 9.5t9.5 22.5zM1536 1120v-960
+q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="_317" unicode="&#xf152;" 
+d="M1088 640q0 -33 -27 -52l-448 -320q-31 -23 -66 -5q-35 17 -35 57v640q0 40 35 57q35 18 66 -5l448 -320q27 -19 27 -52zM1280 160v960q0 14 -9 23t-23 9h-960q-14 0 -23 -9t-9 -23v-960q0 -14 9 -23t23 -9h960q14 0 23 9t9 23zM1536 1120v-960q0 -119 -84.5 -203.5
+t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="eur" unicode="&#xf153;" horiz-adv-x="1024" 
+d="M976 229l35 -159q3 -12 -3 -22.5t-17 -14.5l-5 -1q-4 -2 -10.5 -3.5t-16 -4.5t-21.5 -5.5t-25.5 -5t-30 -5t-33.5 -4.5t-36.5 -3t-38.5 -1q-234 0 -409 130.5t-238 351.5h-95q-13 0 -22.5 9.5t-9.5 22.5v113q0 13 9.5 22.5t22.5 9.5h66q-2 57 1 105h-67q-14 0 -23 9
+t-9 23v114q0 14 9 23t23 9h98q67 210 243.5 338t400.5 128q102 0 194 -23q11 -3 20 -15q6 -11 3 -24l-43 -159q-3 -13 -14 -19.5t-24 -2.5l-4 1q-4 1 -11.5 2.5l-17.5 3.5t-22.5 3.5t-26 3t-29 2.5t-29.5 1q-126 0 -226 -64t-150 -176h468q16 0 25 -12q10 -12 7 -26
+l-24 -114q-5 -26 -32 -26h-488q-3 -37 0 -105h459q15 0 25 -12q9 -12 6 -27l-24 -112q-2 -11 -11 -18.5t-20 -7.5h-387q48 -117 149.5 -185.5t228.5 -68.5q18 0 36 1.5t33.5 3.5t29.5 4.5t24.5 5t18.5 4.5l12 3l5 2q13 5 26 -2q12 -7 15 -21z" />
+    <glyph glyph-name="gbp" unicode="&#xf154;" horiz-adv-x="1024" 
+d="M1020 399v-367q0 -14 -9 -23t-23 -9h-956q-14 0 -23 9t-9 23v150q0 13 9.5 22.5t22.5 9.5h97v383h-95q-14 0 -23 9.5t-9 22.5v131q0 14 9 23t23 9h95v223q0 171 123.5 282t314.5 111q185 0 335 -125q9 -8 10 -20.5t-7 -22.5l-103 -127q-9 -11 -22 -12q-13 -2 -23 7
+q-5 5 -26 19t-69 32t-93 18q-85 0 -137 -47t-52 -123v-215h305q13 0 22.5 -9t9.5 -23v-131q0 -13 -9.5 -22.5t-22.5 -9.5h-305v-379h414v181q0 13 9 22.5t23 9.5h162q14 0 23 -9.5t9 -22.5z" />
+    <glyph glyph-name="usd" unicode="&#xf155;" horiz-adv-x="1024" 
+d="M978 351q0 -153 -99.5 -263.5t-258.5 -136.5v-175q0 -14 -9 -23t-23 -9h-135q-13 0 -22.5 9.5t-9.5 22.5v175q-66 9 -127.5 31t-101.5 44.5t-74 48t-46.5 37.5t-17.5 18q-17 21 -2 41l103 135q7 10 23 12q15 2 24 -9l2 -2q113 -99 243 -125q37 -8 74 -8q81 0 142.5 43
+t61.5 122q0 28 -15 53t-33.5 42t-58.5 37.5t-66 32t-80 32.5q-39 16 -61.5 25t-61.5 26.5t-62.5 31t-56.5 35.5t-53.5 42.5t-43.5 49t-35.5 58t-21 66.5t-8.5 78q0 138 98 242t255 134v180q0 13 9.5 22.5t22.5 9.5h135q14 0 23 -9t9 -23v-176q57 -6 110.5 -23t87 -33.5
+t63.5 -37.5t39 -29t15 -14q17 -18 5 -38l-81 -146q-8 -15 -23 -16q-14 -3 -27 7q-3 3 -14.5 12t-39 26.5t-58.5 32t-74.5 26t-85.5 11.5q-95 0 -155 -43t-60 -111q0 -26 8.5 -48t29.5 -41.5t39.5 -33t56 -31t60.5 -27t70 -27.5q53 -20 81 -31.5t76 -35t75.5 -42.5t62 -50
+t53 -63.5t31.5 -76.5t13 -94z" />
+    <glyph glyph-name="inr" unicode="&#xf156;" horiz-adv-x="898" 
+d="M898 1066v-102q0 -14 -9 -23t-23 -9h-168q-23 -144 -129 -234t-276 -110q167 -178 459 -536q14 -16 4 -34q-8 -18 -29 -18h-195q-16 0 -25 12q-306 367 -498 571q-9 9 -9 22v127q0 13 9.5 22.5t22.5 9.5h112q132 0 212.5 43t102.5 125h-427q-14 0 -23 9t-9 23v102
+q0 14 9 23t23 9h413q-57 113 -268 113h-145q-13 0 -22.5 9.5t-9.5 22.5v133q0 14 9 23t23 9h832q14 0 23 -9t9 -23v-102q0 -14 -9 -23t-23 -9h-233q47 -61 64 -144h171q14 0 23 -9t9 -23z" />
+    <glyph glyph-name="jpy" unicode="&#xf157;" horiz-adv-x="1027" 
+d="M603 0h-172q-13 0 -22.5 9t-9.5 23v330h-288q-13 0 -22.5 9t-9.5 23v103q0 13 9.5 22.5t22.5 9.5h288v85h-288q-13 0 -22.5 9t-9.5 23v104q0 13 9.5 22.5t22.5 9.5h214l-321 578q-8 16 0 32q10 16 28 16h194q19 0 29 -18l215 -425q19 -38 56 -125q10 24 30.5 68t27.5 61
+l191 420q8 19 29 19h191q17 0 27 -16q9 -14 1 -31l-313 -579h215q13 0 22.5 -9.5t9.5 -22.5v-104q0 -14 -9.5 -23t-22.5 -9h-290v-85h290q13 0 22.5 -9.5t9.5 -22.5v-103q0 -14 -9.5 -23t-22.5 -9h-290v-330q0 -13 -9.5 -22.5t-22.5 -9.5z" />
+    <glyph glyph-name="rub" unicode="&#xf158;" horiz-adv-x="1280" 
+d="M1043 971q0 100 -65 162t-171 62h-320v-448h320q106 0 171 62t65 162zM1280 971q0 -193 -126.5 -315t-326.5 -122h-340v-118h505q14 0 23 -9t9 -23v-128q0 -14 -9 -23t-23 -9h-505v-192q0 -14 -9.5 -23t-22.5 -9h-167q-14 0 -23 9t-9 23v192h-224q-14 0 -23 9t-9 23v128
+q0 14 9 23t23 9h224v118h-224q-14 0 -23 9t-9 23v149q0 13 9 22.5t23 9.5h224v629q0 14 9 23t23 9h539q200 0 326.5 -122t126.5 -315z" />
+    <glyph glyph-name="krw" unicode="&#xf159;" horiz-adv-x="1792" 
+d="M514 341l81 299h-159l75 -300q1 -1 1 -3t1 -3q0 1 0.5 3.5t0.5 3.5zM630 768l35 128h-292l32 -128h225zM822 768h139l-35 128h-70zM1271 340l78 300h-162l81 -299q0 -1 0.5 -3.5t1.5 -3.5q0 1 0.5 3t0.5 3zM1382 768l33 128h-297l34 -128h230zM1792 736v-64q0 -14 -9 -23
+t-23 -9h-213l-164 -616q-7 -24 -31 -24h-159q-24 0 -31 24l-166 616h-209l-167 -616q-7 -24 -31 -24h-159q-11 0 -19.5 7t-10.5 17l-160 616h-208q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h175l-33 128h-142q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h109l-89 344q-5 15 5 28
+q10 12 26 12h137q26 0 31 -24l90 -360h359l97 360q7 24 31 24h126q24 0 31 -24l98 -360h365l93 360q5 24 31 24h137q16 0 26 -12q10 -13 5 -28l-91 -344h111q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-145l-34 -128h179q14 0 23 -9t9 -23z" />
+    <glyph glyph-name="btc" unicode="&#xf15a;" horiz-adv-x="1280" 
+d="M1167 896q18 -182 -131 -258q117 -28 175 -103t45 -214q-7 -71 -32.5 -125t-64.5 -89t-97 -58.5t-121.5 -34.5t-145.5 -15v-255h-154v251q-80 0 -122 1v-252h-154v255q-18 0 -54 0.5t-55 0.5h-200l31 183h111q50 0 58 51v402h16q-6 1 -16 1v287q-13 68 -89 68h-111v164
+l212 -1q64 0 97 1v252h154v-247q82 2 122 2v245h154v-252q79 -7 140 -22.5t113 -45t82.5 -78t36.5 -114.5zM952 351q0 36 -15 64t-37 46t-57.5 30.5t-65.5 18.5t-74 9t-69 3t-64.5 -1t-47.5 -1v-338q8 0 37 -0.5t48 -0.5t53 1.5t58.5 4t57 8.5t55.5 14t47.5 21t39.5 30
+t24.5 40t9.5 51zM881 827q0 33 -12.5 58.5t-30.5 42t-48 28t-55 16.5t-61.5 8t-58 2.5t-54 -1t-39.5 -0.5v-307q5 0 34.5 -0.5t46.5 0t50 2t55 5.5t51.5 11t48.5 18.5t37 27t27 38.5t9 51z" />
+    <glyph glyph-name="file" unicode="&#xf15b;" 
+d="M1024 1024v472q22 -14 36 -28l408 -408q14 -14 28 -36h-472zM896 992q0 -40 28 -68t68 -28h544v-1056q0 -40 -28 -68t-68 -28h-1344q-40 0 -68 28t-28 68v1600q0 40 28 68t68 28h800v-544z" />
+    <glyph glyph-name="file_text" unicode="&#xf15c;" 
+d="M1468 1060q14 -14 28 -36h-472v472q22 -14 36 -28zM992 896h544v-1056q0 -40 -28 -68t-68 -28h-1344q-40 0 -68 28t-28 68v1600q0 40 28 68t68 28h800v-544q0 -40 28 -68t68 -28zM1152 160v64q0 14 -9 23t-23 9h-704q-14 0 -23 -9t-9 -23v-64q0 -14 9 -23t23 -9h704
+q14 0 23 9t9 23zM1152 416v64q0 14 -9 23t-23 9h-704q-14 0 -23 -9t-9 -23v-64q0 -14 9 -23t23 -9h704q14 0 23 9t9 23zM1152 672v64q0 14 -9 23t-23 9h-704q-14 0 -23 -9t-9 -23v-64q0 -14 9 -23t23 -9h704q14 0 23 9t9 23z" />
+    <glyph glyph-name="sort_by_alphabet" unicode="&#xf15d;" horiz-adv-x="1664" 
+d="M1191 1128h177l-72 218l-12 47q-2 16 -2 20h-4l-3 -20q0 -1 -3.5 -18t-7.5 -29zM736 96q0 -12 -10 -24l-319 -319q-10 -9 -23 -9q-12 0 -23 9l-320 320q-15 16 -7 35q8 20 30 20h192v1376q0 14 9 23t23 9h192q14 0 23 -9t9 -23v-1376h192q14 0 23 -9t9 -23zM1572 -23
+v-233h-584v90l369 529q12 18 21 27l11 9v3q-2 0 -6.5 -0.5t-7.5 -0.5q-12 -3 -30 -3h-232v-115h-120v229h567v-89l-369 -530q-6 -8 -21 -26l-11 -11v-2l14 2q9 2 30 2h248v119h121zM1661 874v-106h-288v106h75l-47 144h-243l-47 -144h75v-106h-287v106h70l230 662h162
+l230 -662h70z" />
+    <glyph glyph-name="_329" unicode="&#xf15e;" horiz-adv-x="1664" 
+d="M1191 104h177l-72 218l-12 47q-2 16 -2 20h-4l-3 -20q0 -1 -3.5 -18t-7.5 -29zM736 96q0 -12 -10 -24l-319 -319q-10 -9 -23 -9q-12 0 -23 9l-320 320q-15 16 -7 35q8 20 30 20h192v1376q0 14 9 23t23 9h192q14 0 23 -9t9 -23v-1376h192q14 0 23 -9t9 -23zM1661 -150
+v-106h-288v106h75l-47 144h-243l-47 -144h75v-106h-287v106h70l230 662h162l230 -662h70zM1572 1001v-233h-584v90l369 529q12 18 21 27l11 9v3q-2 0 -6.5 -0.5t-7.5 -0.5q-12 -3 -30 -3h-232v-115h-120v229h567v-89l-369 -530q-6 -8 -21 -26l-11 -10v-3l14 3q9 1 30 1h248
+v119h121z" />
+    <glyph glyph-name="sort_by_attributes" unicode="&#xf160;" horiz-adv-x="1792" 
+d="M736 96q0 -12 -10 -24l-319 -319q-10 -9 -23 -9q-12 0 -23 9l-320 320q-15 16 -7 35q8 20 30 20h192v1376q0 14 9 23t23 9h192q14 0 23 -9t9 -23v-1376h192q14 0 23 -9t9 -23zM1792 -32v-192q0 -14 -9 -23t-23 -9h-832q-14 0 -23 9t-9 23v192q0 14 9 23t23 9h832
+q14 0 23 -9t9 -23zM1600 480v-192q0 -14 -9 -23t-23 -9h-640q-14 0 -23 9t-9 23v192q0 14 9 23t23 9h640q14 0 23 -9t9 -23zM1408 992v-192q0 -14 -9 -23t-23 -9h-448q-14 0 -23 9t-9 23v192q0 14 9 23t23 9h448q14 0 23 -9t9 -23zM1216 1504v-192q0 -14 -9 -23t-23 -9h-256
+q-14 0 -23 9t-9 23v192q0 14 9 23t23 9h256q14 0 23 -9t9 -23z" />
+    <glyph glyph-name="sort_by_attributes_alt" unicode="&#xf161;" horiz-adv-x="1792" 
+d="M1216 -32v-192q0 -14 -9 -23t-23 -9h-256q-14 0 -23 9t-9 23v192q0 14 9 23t23 9h256q14 0 23 -9t9 -23zM736 96q0 -12 -10 -24l-319 -319q-10 -9 -23 -9q-12 0 -23 9l-320 320q-15 16 -7 35q8 20 30 20h192v1376q0 14 9 23t23 9h192q14 0 23 -9t9 -23v-1376h192
+q14 0 23 -9t9 -23zM1408 480v-192q0 -14 -9 -23t-23 -9h-448q-14 0 -23 9t-9 23v192q0 14 9 23t23 9h448q14 0 23 -9t9 -23zM1600 992v-192q0 -14 -9 -23t-23 -9h-640q-14 0 -23 9t-9 23v192q0 14 9 23t23 9h640q14 0 23 -9t9 -23zM1792 1504v-192q0 -14 -9 -23t-23 -9h-832
+q-14 0 -23 9t-9 23v192q0 14 9 23t23 9h832q14 0 23 -9t9 -23z" />
+    <glyph glyph-name="sort_by_order" unicode="&#xf162;" 
+d="M1346 223q0 63 -44 116t-103 53q-52 0 -83 -37t-31 -94t36.5 -95t104.5 -38q50 0 85 27t35 68zM736 96q0 -12 -10 -24l-319 -319q-10 -9 -23 -9q-12 0 -23 9l-320 320q-15 16 -7 35q8 20 30 20h192v1376q0 14 9 23t23 9h192q14 0 23 -9t9 -23v-1376h192q14 0 23 -9t9 -23
+zM1486 165q0 -62 -13 -121.5t-41 -114t-68 -95.5t-98.5 -65.5t-127.5 -24.5q-62 0 -108 16q-24 8 -42 15l39 113q15 -7 31 -11q37 -13 75 -13q84 0 134.5 58.5t66.5 145.5h-2q-21 -23 -61.5 -37t-84.5 -14q-106 0 -173 71.5t-67 172.5q0 105 72 178t181 73q123 0 205 -94.5
+t82 -252.5zM1456 882v-114h-469v114h167v432q0 7 0.5 19t0.5 17v16h-2l-7 -12q-8 -13 -26 -31l-62 -58l-82 86l192 185h123v-654h165z" />
+    <glyph glyph-name="sort_by_order_alt" unicode="&#xf163;" 
+d="M1346 1247q0 63 -44 116t-103 53q-52 0 -83 -37t-31 -94t36.5 -95t104.5 -38q50 0 85 27t35 68zM736 96q0 -12 -10 -24l-319 -319q-10 -9 -23 -9q-12 0 -23 9l-320 320q-15 16 -7 35q8 20 30 20h192v1376q0 14 9 23t23 9h192q14 0 23 -9t9 -23v-1376h192q14 0 23 -9
+t9 -23zM1456 -142v-114h-469v114h167v432q0 7 0.5 19t0.5 17v16h-2l-7 -12q-8 -13 -26 -31l-62 -58l-82 86l192 185h123v-654h165zM1486 1189q0 -62 -13 -121.5t-41 -114t-68 -95.5t-98.5 -65.5t-127.5 -24.5q-62 0 -108 16q-24 8 -42 15l39 113q15 -7 31 -11q37 -13 75 -13
+q84 0 134.5 58.5t66.5 145.5h-2q-21 -23 -61.5 -37t-84.5 -14q-106 0 -173 71.5t-67 172.5q0 105 72 178t181 73q123 0 205 -94.5t82 -252.5z" />
+    <glyph glyph-name="_334" unicode="&#xf164;" horiz-adv-x="1664" 
+d="M256 192q0 26 -19 45t-45 19q-27 0 -45.5 -19t-18.5 -45q0 -27 18.5 -45.5t45.5 -18.5q26 0 45 18.5t19 45.5zM416 704v-640q0 -26 -19 -45t-45 -19h-288q-26 0 -45 19t-19 45v640q0 26 19 45t45 19h288q26 0 45 -19t19 -45zM1600 704q0 -86 -55 -149q15 -44 15 -76
+q3 -76 -43 -137q17 -56 0 -117q-15 -57 -54 -94q9 -112 -49 -181q-64 -76 -197 -78h-36h-76h-17q-66 0 -144 15.5t-121.5 29t-120.5 39.5q-123 43 -158 44q-26 1 -45 19.5t-19 44.5v641q0 25 18 43.5t43 20.5q24 2 76 59t101 121q68 87 101 120q18 18 31 48t17.5 48.5
+t13.5 60.5q7 39 12.5 61t19.5 52t34 50q19 19 45 19q46 0 82.5 -10.5t60 -26t40 -40.5t24 -45t12 -50t5 -45t0.5 -39q0 -38 -9.5 -76t-19 -60t-27.5 -56q-3 -6 -10 -18t-11 -22t-8 -24h277q78 0 135 -57t57 -135z" />
+    <glyph glyph-name="_335" unicode="&#xf165;" horiz-adv-x="1664" 
+d="M256 960q0 -26 -19 -45t-45 -19q-27 0 -45.5 19t-18.5 45q0 27 18.5 45.5t45.5 18.5q26 0 45 -18.5t19 -45.5zM416 448v640q0 26 -19 45t-45 19h-288q-26 0 -45 -19t-19 -45v-640q0 -26 19 -45t45 -19h288q26 0 45 19t19 45zM1545 597q55 -61 55 -149q-1 -78 -57.5 -135
+t-134.5 -57h-277q4 -14 8 -24t11 -22t10 -18q18 -37 27 -57t19 -58.5t10 -76.5q0 -24 -0.5 -39t-5 -45t-12 -50t-24 -45t-40 -40.5t-60 -26t-82.5 -10.5q-26 0 -45 19q-20 20 -34 50t-19.5 52t-12.5 61q-9 42 -13.5 60.5t-17.5 48.5t-31 48q-33 33 -101 120q-49 64 -101 121
+t-76 59q-25 2 -43 20.5t-18 43.5v641q0 26 19 44.5t45 19.5q35 1 158 44q77 26 120.5 39.5t121.5 29t144 15.5h17h76h36q133 -2 197 -78q58 -69 49 -181q39 -37 54 -94q17 -61 0 -117q46 -61 43 -137q0 -32 -15 -76z" />
+    <glyph glyph-name="youtube_sign" unicode="&#xf166;" 
+d="M919 233v157q0 50 -29 50q-17 0 -33 -16v-224q16 -16 33 -16q29 0 29 49zM1103 355h66v34q0 51 -33 51t-33 -51v-34zM532 621v-70h-80v-423h-74v423h-78v70h232zM733 495v-367h-67v40q-39 -45 -76 -45q-33 0 -42 28q-6 17 -6 54v290h66v-270q0 -24 1 -26q1 -15 15 -15
+q20 0 42 31v280h67zM985 384v-146q0 -52 -7 -73q-12 -42 -53 -42q-35 0 -68 41v-36h-67v493h67v-161q32 40 68 40q41 0 53 -42q7 -21 7 -74zM1236 255v-9q0 -29 -2 -43q-3 -22 -15 -40q-27 -40 -80 -40q-52 0 -81 38q-21 27 -21 86v129q0 59 20 86q29 38 80 38t78 -38
+q21 -29 21 -86v-76h-133v-65q0 -51 34 -51q24 0 30 26q0 1 0.5 7t0.5 16.5v21.5h68zM785 1079v-156q0 -51 -32 -51t-32 51v156q0 52 32 52t32 -52zM1318 366q0 177 -19 260q-10 44 -43 73.5t-76 34.5q-136 15 -412 15q-275 0 -411 -15q-44 -5 -76.5 -34.5t-42.5 -73.5
+q-20 -87 -20 -260q0 -176 20 -260q10 -43 42.5 -73t75.5 -35q137 -15 412 -15t412 15q43 5 75.5 35t42.5 73q20 84 20 260zM563 1017l90 296h-75l-51 -195l-53 195h-78q7 -23 23 -69l24 -69q35 -103 46 -158v-201h74v201zM852 936v130q0 58 -21 87q-29 38 -78 38
+q-51 0 -78 -38q-21 -29 -21 -87v-130q0 -58 21 -87q27 -38 78 -38q49 0 78 38q21 27 21 87zM1033 816h67v370h-67v-283q-22 -31 -42 -31q-15 0 -16 16q-1 2 -1 26v272h-67v-293q0 -37 6 -55q11 -27 43 -27q36 0 77 45v-40zM1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5
+h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="youtube" unicode="&#xf167;" 
+d="M971 292v-211q0 -67 -39 -67q-23 0 -45 22v301q22 22 45 22q39 0 39 -67zM1309 291v-46h-90v46q0 68 45 68t45 -68zM343 509h107v94h-312v-94h105v-569h100v569zM631 -60h89v494h-89v-378q-30 -42 -57 -42q-18 0 -21 21q-1 3 -1 35v364h-89v-391q0 -49 8 -73
+q12 -37 58 -37q48 0 102 61v-54zM1060 88v197q0 73 -9 99q-17 56 -71 56q-50 0 -93 -54v217h-89v-663h89v48q45 -55 93 -55q54 0 71 55q9 27 9 100zM1398 98v13h-91q0 -51 -2 -61q-7 -36 -40 -36q-46 0 -46 69v87h179v103q0 79 -27 116q-39 51 -106 51q-68 0 -107 -51
+q-28 -37 -28 -116v-173q0 -79 29 -116q39 -51 108 -51q72 0 108 53q18 27 21 54q2 9 2 58zM790 1011v210q0 69 -43 69t-43 -69v-210q0 -70 43 -70t43 70zM1509 260q0 -234 -26 -350q-14 -59 -58 -99t-102 -46q-184 -21 -555 -21t-555 21q-58 6 -102.5 46t-57.5 99
+q-26 112 -26 350q0 234 26 350q14 59 58 99t103 47q183 20 554 20t555 -20q58 -7 102.5 -47t57.5 -99q26 -112 26 -350zM511 1536h102l-121 -399v-271h-100v271q-14 74 -61 212q-37 103 -65 187h106l71 -263zM881 1203v-175q0 -81 -28 -118q-38 -51 -106 -51q-67 0 -105 51
+q-28 38 -28 118v175q0 80 28 117q38 51 105 51q68 0 106 -51q28 -37 28 -117zM1216 1365v-499h-91v55q-53 -62 -103 -62q-46 0 -59 37q-8 24 -8 75v394h91v-367q0 -33 1 -35q3 -22 21 -22q27 0 57 43v381h91z" />
+    <glyph glyph-name="xing" unicode="&#xf168;" horiz-adv-x="1408" 
+d="M597 869q-10 -18 -257 -456q-27 -46 -65 -46h-239q-21 0 -31 17t0 36l253 448q1 0 0 1l-161 279q-12 22 -1 37q9 15 32 15h239q40 0 66 -45zM1403 1511q11 -16 0 -37l-528 -934v-1l336 -615q11 -20 1 -37q-10 -15 -32 -15h-239q-42 0 -66 45l-339 622q18 32 531 942
+q25 45 64 45h241q22 0 31 -15z" />
+    <glyph glyph-name="xing_sign" unicode="&#xf169;" 
+d="M685 771q0 1 -126 222q-21 34 -52 34h-184q-18 0 -26 -11q-7 -12 1 -29l125 -216v-1l-196 -346q-9 -14 0 -28q8 -13 24 -13h185q31 0 50 36zM1309 1268q-7 12 -24 12h-187q-30 0 -49 -35l-411 -729q1 -2 262 -481q20 -35 52 -35h184q18 0 25 12q8 13 -1 28l-260 476v1
+l409 723q8 16 0 28zM1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="youtube_play" unicode="&#xf16a;" horiz-adv-x="1792" 
+d="M711 408l484 250l-484 253v-503zM896 1270q168 0 324.5 -4.5t229.5 -9.5l73 -4q1 0 17 -1.5t23 -3t23.5 -4.5t28.5 -8t28 -13t31 -19.5t29 -26.5q6 -6 15.5 -18.5t29 -58.5t26.5 -101q8 -64 12.5 -136.5t5.5 -113.5v-40v-136q1 -145 -18 -290q-7 -55 -25 -99.5t-32 -61.5
+l-14 -17q-14 -15 -29 -26.5t-31 -19t-28 -12.5t-28.5 -8t-24 -4.5t-23 -3t-16.5 -1.5q-251 -19 -627 -19q-207 2 -359.5 6.5t-200.5 7.5l-49 4l-36 4q-36 5 -54.5 10t-51 21t-56.5 41q-6 6 -15.5 18.5t-29 58.5t-26.5 101q-8 64 -12.5 136.5t-5.5 113.5v40v136
+q-1 145 18 290q7 55 25 99.5t32 61.5l14 17q14 15 29 26.5t31 19.5t28 13t28.5 8t23.5 4.5t23 3t17 1.5q251 18 627 18z" />
+    <glyph glyph-name="dropbox" unicode="&#xf16b;" horiz-adv-x="1792" 
+d="M402 829l494 -305l-342 -285l-490 319zM1388 274v-108l-490 -293v-1l-1 1l-1 -1v1l-489 293v108l147 -96l342 284v2l1 -1l1 1v-2l343 -284zM554 1418l342 -285l-494 -304l-338 270zM1390 829l338 -271l-489 -319l-343 285zM1239 1418l489 -319l-338 -270l-494 304z" />
+    <glyph glyph-name="stackexchange" unicode="&#xf16c;" 
+d="M1289 -96h-1118v480h-160v-640h1438v640h-160v-480zM347 428l33 157l783 -165l-33 -156zM450 802l67 146l725 -339l-67 -145zM651 1158l102 123l614 -513l-102 -123zM1048 1536l477 -641l-128 -96l-477 641zM330 65v159h800v-159h-800z" />
+    <glyph glyph-name="instagram" unicode="&#xf16d;" 
+d="M1024 640q0 106 -75 181t-181 75t-181 -75t-75 -181t75 -181t181 -75t181 75t75 181zM1162 640q0 -164 -115 -279t-279 -115t-279 115t-115 279t115 279t279 115t279 -115t115 -279zM1270 1050q0 -38 -27 -65t-65 -27t-65 27t-27 65t27 65t65 27t65 -27t27 -65zM768 1270
+q-7 0 -76.5 0.5t-105.5 0t-96.5 -3t-103 -10t-71.5 -18.5q-50 -20 -88 -58t-58 -88q-11 -29 -18.5 -71.5t-10 -103t-3 -96.5t0 -105.5t0.5 -76.5t-0.5 -76.5t0 -105.5t3 -96.5t10 -103t18.5 -71.5q20 -50 58 -88t88 -58q29 -11 71.5 -18.5t103 -10t96.5 -3t105.5 0t76.5 0.5
+t76.5 -0.5t105.5 0t96.5 3t103 10t71.5 18.5q50 20 88 58t58 88q11 29 18.5 71.5t10 103t3 96.5t0 105.5t-0.5 76.5t0.5 76.5t0 105.5t-3 96.5t-10 103t-18.5 71.5q-20 50 -58 88t-88 58q-29 11 -71.5 18.5t-103 10t-96.5 3t-105.5 0t-76.5 -0.5zM1536 640q0 -229 -5 -317
+q-10 -208 -124 -322t-322 -124q-88 -5 -317 -5t-317 5q-208 10 -322 124t-124 322q-5 88 -5 317t5 317q10 208 124 322t322 124q88 5 317 5t317 -5q208 -10 322 -124t124 -322q5 -88 5 -317z" />
+    <glyph glyph-name="flickr" unicode="&#xf16e;" 
+d="M1248 1408q119 0 203.5 -84.5t84.5 -203.5v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960zM698 640q0 88 -62 150t-150 62t-150 -62t-62 -150t62 -150t150 -62t150 62t62 150zM1262 640q0 88 -62 150
+t-150 62t-150 -62t-62 -150t62 -150t150 -62t150 62t62 150z" />
+    <glyph glyph-name="adn" unicode="&#xf170;" 
+d="M768 914l201 -306h-402zM1133 384h94l-459 691l-459 -691h94l104 160h522zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="f171" unicode="&#xf171;" horiz-adv-x="1408" 
+d="M815 677q8 -63 -50.5 -101t-111.5 -6q-39 17 -53.5 58t-0.5 82t52 58q36 18 72.5 12t64 -35.5t27.5 -67.5zM926 698q-14 107 -113 164t-197 13q-63 -28 -100.5 -88.5t-34.5 -129.5q4 -91 77.5 -155t165.5 -56q91 8 152 84t50 168zM1165 1240q-20 27 -56 44.5t-58 22
+t-71 12.5q-291 47 -566 -2q-43 -7 -66 -12t-55 -22t-50 -43q30 -28 76 -45.5t73.5 -22t87.5 -11.5q228 -29 448 -1q63 8 89.5 12t72.5 21.5t75 46.5zM1222 205q-8 -26 -15.5 -76.5t-14 -84t-28.5 -70t-58 -56.5q-86 -48 -189.5 -71.5t-202 -22t-201.5 18.5q-46 8 -81.5 18
+t-76.5 27t-73 43.5t-52 61.5q-25 96 -57 292l6 16l18 9q223 -148 506.5 -148t507.5 148q21 -6 24 -23t-5 -45t-8 -37zM1403 1166q-26 -167 -111 -655q-5 -30 -27 -56t-43.5 -40t-54.5 -31q-252 -126 -610 -88q-248 27 -394 139q-15 12 -25.5 26.5t-17 35t-9 34t-6 39.5
+t-5.5 35q-9 50 -26.5 150t-28 161.5t-23.5 147.5t-22 158q3 26 17.5 48.5t31.5 37.5t45 30t46 22.5t48 18.5q125 46 313 64q379 37 676 -50q155 -46 215 -122q16 -20 16.5 -51t-5.5 -54z" />
+    <glyph glyph-name="bitbucket_sign" unicode="&#xf172;" 
+d="M848 666q0 43 -41 66t-77 1q-43 -20 -42.5 -72.5t43.5 -70.5q39 -23 81 4t36 72zM928 682q8 -66 -36 -121t-110 -61t-119 40t-56 113q-2 49 25.5 93t72.5 64q70 31 141.5 -10t81.5 -118zM1100 1073q-20 -21 -53.5 -34t-53 -16t-63.5 -8q-155 -20 -324 0q-44 6 -63 9.5
+t-52.5 16t-54.5 32.5q13 19 36 31t40 15.5t47 8.5q198 35 408 1q33 -5 51 -8.5t43 -16t39 -31.5zM1142 327q0 7 5.5 26.5t3 32t-17.5 16.5q-161 -106 -365 -106t-366 106l-12 -6l-5 -12q26 -154 41 -210q47 -81 204 -108q249 -46 428 53q34 19 49 51.5t22.5 85.5t12.5 71z
+M1272 1020q9 53 -8 75q-43 55 -155 88q-216 63 -487 36q-132 -12 -226 -46q-38 -15 -59.5 -25t-47 -34t-29.5 -54q8 -68 19 -138t29 -171t24 -137q1 -5 5 -31t7 -36t12 -27t22 -28q105 -80 284 -100q259 -28 440 63q24 13 39.5 23t31 29t19.5 40q48 267 80 473zM1536 1120
+v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="tumblr" unicode="&#xf173;" horiz-adv-x="1024" 
+d="M944 207l80 -237q-23 -35 -111 -66t-177 -32q-104 -2 -190.5 26t-142.5 74t-95 106t-55.5 120t-16.5 118v544h-168v215q72 26 129 69.5t91 90t58 102t34 99t15 88.5q1 5 4.5 8.5t7.5 3.5h244v-424h333v-252h-334v-518q0 -30 6.5 -56t22.5 -52.5t49.5 -41.5t81.5 -14
+q78 2 134 29z" />
+    <glyph glyph-name="tumblr_sign" unicode="&#xf174;" 
+d="M1136 75l-62 183q-44 -22 -103 -22q-36 -1 -62 10.5t-38.5 31.5t-17.5 40.5t-5 43.5v398h257v194h-256v326h-188q-8 0 -9 -10q-5 -44 -17.5 -87t-39 -95t-77 -95t-118.5 -68v-165h130v-418q0 -57 21.5 -115t65 -111t121 -85.5t176.5 -30.5q69 1 136.5 25t85.5 50z
+M1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="long_arrow_down" unicode="&#xf175;" horiz-adv-x="768" 
+d="M765 237q8 -19 -5 -35l-350 -384q-10 -10 -23 -10q-14 0 -24 10l-355 384q-13 16 -5 35q9 19 29 19h224v1248q0 14 9 23t23 9h192q14 0 23 -9t9 -23v-1248h224q21 0 29 -19z" />
+    <glyph glyph-name="long_arrow_up" unicode="&#xf176;" horiz-adv-x="768" 
+d="M765 1043q-9 -19 -29 -19h-224v-1248q0 -14 -9 -23t-23 -9h-192q-14 0 -23 9t-9 23v1248h-224q-21 0 -29 19t5 35l350 384q10 10 23 10q14 0 24 -10l355 -384q13 -16 5 -35z" />
+    <glyph glyph-name="long_arrow_left" unicode="&#xf177;" horiz-adv-x="1792" 
+d="M1792 736v-192q0 -14 -9 -23t-23 -9h-1248v-224q0 -21 -19 -29t-35 5l-384 350q-10 10 -10 23q0 14 10 24l384 354q16 14 35 6q19 -9 19 -29v-224h1248q14 0 23 -9t9 -23z" />
+    <glyph glyph-name="long_arrow_right" unicode="&#xf178;" horiz-adv-x="1792" 
+d="M1728 643q0 -14 -10 -24l-384 -354q-16 -14 -35 -6q-19 9 -19 29v224h-1248q-14 0 -23 9t-9 23v192q0 14 9 23t23 9h1248v224q0 21 19 29t35 -5l384 -350q10 -10 10 -23z" />
+    <glyph glyph-name="apple" unicode="&#xf179;" horiz-adv-x="1408" 
+d="M1393 321q-39 -125 -123 -250q-129 -196 -257 -196q-49 0 -140 32q-86 32 -151 32q-61 0 -142 -33q-81 -34 -132 -34q-152 0 -301 259q-147 261 -147 503q0 228 113 374q113 144 284 144q72 0 177 -30q104 -30 138 -30q45 0 143 34q102 34 173 34q119 0 213 -65
+q52 -36 104 -100q-79 -67 -114 -118q-65 -94 -65 -207q0 -124 69 -223t158 -126zM1017 1494q0 -61 -29 -136q-30 -75 -93 -138q-54 -54 -108 -72q-37 -11 -104 -17q3 149 78 257q74 107 250 148q1 -3 2.5 -11t2.5 -11q0 -4 0.5 -10t0.5 -10z" />
+    <glyph glyph-name="windows" unicode="&#xf17a;" horiz-adv-x="1664" 
+d="M682 530v-651l-682 94v557h682zM682 1273v-659h-682v565zM1664 530v-786l-907 125v661h907zM1664 1408v-794h-907v669z" />
+    <glyph glyph-name="android" unicode="&#xf17b;" horiz-adv-x="1408" 
+d="M493 1053q16 0 27.5 11.5t11.5 27.5t-11.5 27.5t-27.5 11.5t-27 -11.5t-11 -27.5t11 -27.5t27 -11.5zM915 1053q16 0 27 11.5t11 27.5t-11 27.5t-27 11.5t-27.5 -11.5t-11.5 -27.5t11.5 -27.5t27.5 -11.5zM103 869q42 0 72 -30t30 -72v-430q0 -43 -29.5 -73t-72.5 -30
+t-73 30t-30 73v430q0 42 30 72t73 30zM1163 850v-666q0 -46 -32 -78t-77 -32h-75v-227q0 -43 -30 -73t-73 -30t-73 30t-30 73v227h-138v-227q0 -43 -30 -73t-73 -30q-42 0 -72 30t-30 73l-1 227h-74q-46 0 -78 32t-32 78v666h918zM931 1255q107 -55 171 -153.5t64 -215.5
+h-925q0 117 64 215.5t172 153.5l-71 131q-7 13 5 20q13 6 20 -6l72 -132q95 42 201 42t201 -42l72 132q7 12 20 6q12 -7 5 -20zM1408 767v-430q0 -43 -30 -73t-73 -30q-42 0 -72 30t-30 73v430q0 43 30 72.5t72 29.5q43 0 73 -29.5t30 -72.5z" />
+    <glyph glyph-name="linux" unicode="&#xf17c;" 
+d="M663 1125q-11 -1 -15.5 -10.5t-8.5 -9.5q-5 -1 -5 5q0 12 19 15h10zM750 1111q-4 -1 -11.5 6.5t-17.5 4.5q24 11 32 -2q3 -6 -3 -9zM399 684q-4 1 -6 -3t-4.5 -12.5t-5.5 -13.5t-10 -13q-10 -11 -1 -12q4 -1 12.5 7t12.5 18q1 3 2 7t2 6t1.5 4.5t0.5 4v3t-1 2.5t-3 2z
+M1254 325q0 18 -55 42q4 15 7.5 27.5t5 26t3 21.5t0.5 22.5t-1 19.5t-3.5 22t-4 20.5t-5 25t-5.5 26.5q-10 48 -47 103t-72 75q24 -20 57 -83q87 -162 54 -278q-11 -40 -50 -42q-31 -4 -38.5 18.5t-8 83.5t-11.5 107q-9 39 -19.5 69t-19.5 45.5t-15.5 24.5t-13 15t-7.5 7
+q-14 62 -31 103t-29.5 56t-23.5 33t-15 40q-4 21 6 53.5t4.5 49.5t-44.5 25q-15 3 -44.5 18t-35.5 16q-8 1 -11 26t8 51t36 27q37 3 51 -30t4 -58q-11 -19 -2 -26.5t30 -0.5q13 4 13 36v37q-5 30 -13.5 50t-21 30.5t-23.5 15t-27 7.5q-107 -8 -89 -134q0 -15 -1 -15
+q-9 9 -29.5 10.5t-33 -0.5t-15.5 5q1 57 -16 90t-45 34q-27 1 -41.5 -27.5t-16.5 -59.5q-1 -15 3.5 -37t13 -37.5t15.5 -13.5q10 3 16 14q4 9 -7 8q-7 0 -15.5 14.5t-9.5 33.5q-1 22 9 37t34 14q17 0 27 -21t9.5 -39t-1.5 -22q-22 -15 -31 -29q-8 -12 -27.5 -23.5
+t-20.5 -12.5q-13 -14 -15.5 -27t7.5 -18q14 -8 25 -19.5t16 -19t18.5 -13t35.5 -6.5q47 -2 102 15q2 1 23 7t34.5 10.5t29.5 13t21 17.5q9 14 20 8q5 -3 6.5 -8.5t-3 -12t-16.5 -9.5q-20 -6 -56.5 -21.5t-45.5 -19.5q-44 -19 -70 -23q-25 -5 -79 2q-10 2 -9 -2t17 -19
+q25 -23 67 -22q17 1 36 7t36 14t33.5 17.5t30 17t24.5 12t17.5 2.5t8.5 -11q0 -2 -1 -4.5t-4 -5t-6 -4.5t-8.5 -5t-9 -4.5t-10 -5t-9.5 -4.5q-28 -14 -67.5 -44t-66.5 -43t-49 -1q-21 11 -63 73q-22 31 -25 22q-1 -3 -1 -10q0 -25 -15 -56.5t-29.5 -55.5t-21 -58t11.5 -63
+q-23 -6 -62.5 -90t-47.5 -141q-2 -18 -1.5 -69t-5.5 -59q-8 -24 -29 -3q-32 31 -36 94q-2 28 4 56q4 19 -1 18q-2 -1 -4 -5q-36 -65 10 -166q5 -12 25 -28t24 -20q20 -23 104 -90.5t93 -76.5q16 -15 17.5 -38t-14 -43t-45.5 -23q8 -15 29 -44.5t28 -54t7 -70.5q46 24 7 92
+q-4 8 -10.5 16t-9.5 12t-2 6q3 5 13 9.5t20 -2.5q46 -52 166 -36q133 15 177 87q23 38 34 30q12 -6 10 -52q-1 -25 -23 -92q-9 -23 -6 -37.5t24 -15.5q3 19 14.5 77t13.5 90q2 21 -6.5 73.5t-7.5 97t23 70.5q15 18 51 18q1 37 34.5 53t72.5 10.5t60 -22.5zM626 1152
+q3 17 -2.5 30t-11.5 15q-9 2 -9 -7q2 -5 5 -6q10 0 7 -15q-3 -20 8 -20q3 0 3 3zM1045 955q-2 8 -6.5 11.5t-13 5t-14.5 5.5q-5 3 -9.5 8t-7 8t-5.5 6.5t-4 4t-4 -1.5q-14 -16 7 -43.5t39 -31.5q9 -1 14.5 8t3.5 20zM867 1168q0 11 -5 19.5t-11 12.5t-9 3q-6 0 -8 -2t0 -4
+t5 -3q14 -4 18 -31q0 -3 8 2q2 2 2 3zM921 1401q0 2 -2.5 5t-9 7t-9.5 6q-15 15 -24 15q-9 -1 -11.5 -7.5t-1 -13t-0.5 -12.5q-1 -4 -6 -10.5t-6 -9t3 -8.5q4 -3 8 0t11 9t15 9q1 1 9 1t15 2t9 7zM1486 60q20 -12 31 -24.5t12 -24t-2.5 -22.5t-15.5 -22t-23.5 -19.5
+t-30 -18.5t-31.5 -16.5t-32 -15.5t-27 -13q-38 -19 -85.5 -56t-75.5 -64q-17 -16 -68 -19.5t-89 14.5q-18 9 -29.5 23.5t-16.5 25.5t-22 19.5t-47 9.5q-44 1 -130 1q-19 0 -57 -1.5t-58 -2.5q-44 -1 -79.5 -15t-53.5 -30t-43.5 -28.5t-53.5 -11.5q-29 1 -111 31t-146 43
+q-19 4 -51 9.5t-50 9t-39.5 9.5t-33.5 14.5t-17 19.5q-10 23 7 66.5t18 54.5q1 16 -4 40t-10 42.5t-4.5 36.5t10.5 27q14 12 57 14t60 12q30 18 42 35t12 51q21 -73 -32 -106q-32 -20 -83 -15q-34 3 -43 -10q-13 -15 5 -57q2 -6 8 -18t8.5 -18t4.5 -17t1 -22q0 -15 -17 -49
+t-14 -48q3 -17 37 -26q20 -6 84.5 -18.5t99.5 -20.5q24 -6 74 -22t82.5 -23t55.5 -4q43 6 64.5 28t23 48t-7.5 58.5t-19 52t-20 36.5q-121 190 -169 242q-68 74 -113 40q-11 -9 -15 15q-3 16 -2 38q1 29 10 52t24 47t22 42q8 21 26.5 72t29.5 78t30 61t39 54
+q110 143 124 195q-12 112 -16 310q-2 90 24 151.5t106 104.5q39 21 104 21q53 1 106 -13.5t89 -41.5q57 -42 91.5 -121.5t29.5 -147.5q-5 -95 30 -214q34 -113 133 -218q55 -59 99.5 -163t59.5 -191q8 -49 5 -84.5t-12 -55.5t-20 -22q-10 -2 -23.5 -19t-27 -35.5
+t-40.5 -33.5t-61 -14q-18 1 -31.5 5t-22.5 13.5t-13.5 15.5t-11.5 20.5t-9 19.5q-22 37 -41 30t-28 -49t7 -97q20 -70 1 -195q-10 -65 18 -100.5t73 -33t85 35.5q59 49 89.5 66.5t103.5 42.5q53 18 77 36.5t18.5 34.5t-25 28.5t-51.5 23.5q-33 11 -49.5 48t-15 72.5
+t15.5 47.5q1 -31 8 -56.5t14.5 -40.5t20.5 -28.5t21 -19t21.5 -13t16.5 -9.5z" />
+    <glyph glyph-name="dribble" unicode="&#xf17d;" 
+d="M1024 36q-42 241 -140 498h-2l-2 -1q-16 -6 -43 -16.5t-101 -49t-137 -82t-131 -114.5t-103 -148l-15 11q184 -150 418 -150q132 0 256 52zM839 643q-21 49 -53 111q-311 -93 -673 -93q-1 -7 -1 -21q0 -124 44 -236.5t124 -201.5q50 89 123.5 166.5t142.5 124.5t130.5 81
+t99.5 48l37 13q4 1 13 3.5t13 4.5zM732 855q-120 213 -244 378q-138 -65 -234 -186t-128 -272q302 0 606 80zM1416 536q-210 60 -409 29q87 -239 128 -469q111 75 185 189.5t96 250.5zM611 1277q-1 0 -2 -1q1 1 2 1zM1201 1132q-185 164 -433 164q-76 0 -155 -19
+q131 -170 246 -382q69 26 130 60.5t96.5 61.5t65.5 57t37.5 40.5zM1424 647q-3 232 -149 410l-1 -1q-9 -12 -19 -24.5t-43.5 -44.5t-71 -60.5t-100 -65t-131.5 -64.5q25 -53 44 -95q2 -5 6.5 -17t7.5 -17q36 5 74.5 7t73.5 2t69 -1.5t64 -4t56.5 -5.5t48 -6.5t36.5 -6
+t25 -4.5zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="skype" unicode="&#xf17e;" 
+d="M1173 473q0 50 -19.5 91.5t-48.5 68.5t-73 49t-82.5 34t-87.5 23l-104 24q-30 7 -44 10.5t-35 11.5t-30 16t-16.5 21t-7.5 30q0 77 144 77q43 0 77 -12t54 -28.5t38 -33.5t40 -29t48 -12q47 0 75.5 32t28.5 77q0 55 -56 99.5t-142 67.5t-182 23q-68 0 -132 -15.5
+t-119.5 -47t-89 -87t-33.5 -128.5q0 -61 19 -106.5t56 -75.5t80 -48.5t103 -32.5l146 -36q90 -22 112 -36q32 -20 32 -60q0 -39 -40 -64.5t-105 -25.5q-51 0 -91.5 16t-65 38.5t-45.5 45t-46 38.5t-54 16q-50 0 -75.5 -30t-25.5 -75q0 -92 122 -157.5t291 -65.5
+q73 0 140 18.5t122.5 53.5t88.5 93.5t33 131.5zM1536 256q0 -159 -112.5 -271.5t-271.5 -112.5q-130 0 -234 80q-77 -16 -150 -16q-143 0 -273.5 55.5t-225 150t-150 225t-55.5 273.5q0 73 16 150q-80 104 -80 234q0 159 112.5 271.5t271.5 112.5q130 0 234 -80
+q77 16 150 16q143 0 273.5 -55.5t225 -150t150 -225t55.5 -273.5q0 -73 -16 -150q80 -104 80 -234z" />
+    <glyph glyph-name="foursquare" unicode="&#xf180;" horiz-adv-x="1280" 
+d="M1000 1102l37 194q5 23 -9 40t-35 17h-712q-23 0 -38.5 -17t-15.5 -37v-1101q0 -7 6 -1l291 352q23 26 38 33.5t48 7.5h239q22 0 37 14.5t18 29.5q24 130 37 191q4 21 -11.5 40t-36.5 19h-294q-29 0 -48 19t-19 48v42q0 29 19 47.5t48 18.5h346q18 0 35 13.5t20 29.5z
+M1227 1324q-15 -73 -53.5 -266.5t-69.5 -350t-35 -173.5q-6 -22 -9 -32.5t-14 -32.5t-24.5 -33t-38.5 -21t-58 -10h-271q-13 0 -22 -10q-8 -9 -426 -494q-22 -25 -58.5 -28.5t-48.5 5.5q-55 22 -55 98v1410q0 55 38 102.5t120 47.5h888q95 0 127 -53t10 -159zM1227 1324
+l-158 -790q4 17 35 173.5t69.5 350t53.5 266.5z" />
+    <glyph glyph-name="trello" unicode="&#xf181;" 
+d="M704 192v1024q0 14 -9 23t-23 9h-480q-14 0 -23 -9t-9 -23v-1024q0 -14 9 -23t23 -9h480q14 0 23 9t9 23zM1376 576v640q0 14 -9 23t-23 9h-480q-14 0 -23 -9t-9 -23v-640q0 -14 9 -23t23 -9h480q14 0 23 9t9 23zM1536 1344v-1408q0 -26 -19 -45t-45 -19h-1408
+q-26 0 -45 19t-19 45v1408q0 26 19 45t45 19h1408q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="female" unicode="&#xf182;" horiz-adv-x="1280" 
+d="M1280 480q0 -40 -28 -68t-68 -28q-51 0 -80 43l-227 341h-45v-132l247 -411q9 -15 9 -33q0 -26 -19 -45t-45 -19h-192v-272q0 -46 -33 -79t-79 -33h-160q-46 0 -79 33t-33 79v272h-192q-26 0 -45 19t-19 45q0 18 9 33l247 411v132h-45l-227 -341q-29 -43 -80 -43
+q-40 0 -68 28t-28 68q0 29 16 53l256 384q73 107 176 107h384q103 0 176 -107l256 -384q16 -24 16 -53zM864 1280q0 -93 -65.5 -158.5t-158.5 -65.5t-158.5 65.5t-65.5 158.5t65.5 158.5t158.5 65.5t158.5 -65.5t65.5 -158.5z" />
+    <glyph glyph-name="male" unicode="&#xf183;" horiz-adv-x="1024" 
+d="M1024 832v-416q0 -40 -28 -68t-68 -28t-68 28t-28 68v352h-64v-912q0 -46 -33 -79t-79 -33t-79 33t-33 79v464h-64v-464q0 -46 -33 -79t-79 -33t-79 33t-33 79v912h-64v-352q0 -40 -28 -68t-68 -28t-68 28t-28 68v416q0 80 56 136t136 56h640q80 0 136 -56t56 -136z
+M736 1280q0 -93 -65.5 -158.5t-158.5 -65.5t-158.5 65.5t-65.5 158.5t65.5 158.5t158.5 65.5t158.5 -65.5t65.5 -158.5z" />
+    <glyph glyph-name="gittip" unicode="&#xf184;" 
+d="M773 234l350 473q16 22 24.5 59t-6 85t-61.5 79q-40 26 -83 25.5t-73.5 -17.5t-54.5 -45q-36 -40 -96 -40q-59 0 -95 40q-24 28 -54.5 45t-73.5 17.5t-84 -25.5q-46 -31 -60.5 -79t-6 -85t24.5 -59zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103
+t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="sun" unicode="&#xf185;" horiz-adv-x="1792" 
+d="M1472 640q0 117 -45.5 223.5t-123 184t-184 123t-223.5 45.5t-223.5 -45.5t-184 -123t-123 -184t-45.5 -223.5t45.5 -223.5t123 -184t184 -123t223.5 -45.5t223.5 45.5t184 123t123 184t45.5 223.5zM1748 363q-4 -15 -20 -20l-292 -96v-306q0 -16 -13 -26q-15 -10 -29 -4
+l-292 94l-180 -248q-10 -13 -26 -13t-26 13l-180 248l-292 -94q-14 -6 -29 4q-13 10 -13 26v306l-292 96q-16 5 -20 20q-5 17 4 29l180 248l-180 248q-9 13 -4 29q4 15 20 20l292 96v306q0 16 13 26q15 10 29 4l292 -94l180 248q9 12 26 12t26 -12l180 -248l292 94
+q14 6 29 -4q13 -10 13 -26v-306l292 -96q16 -5 20 -20q5 -16 -4 -29l-180 -248l180 -248q9 -12 4 -29z" />
+    <glyph glyph-name="_366" unicode="&#xf186;" 
+d="M1262 233q-54 -9 -110 -9q-182 0 -337 90t-245 245t-90 337q0 192 104 357q-201 -60 -328.5 -229t-127.5 -384q0 -130 51 -248.5t136.5 -204t204 -136.5t248.5 -51q144 0 273.5 61.5t220.5 171.5zM1465 318q-94 -203 -283.5 -324.5t-413.5 -121.5q-156 0 -298 61
+t-245 164t-164 245t-61 298q0 153 57.5 292.5t156 241.5t235.5 164.5t290 68.5q44 2 61 -39q18 -41 -15 -72q-86 -78 -131.5 -181.5t-45.5 -218.5q0 -148 73 -273t198 -198t273 -73q118 0 228 51q41 18 72 -13q14 -14 17.5 -34t-4.5 -38z" />
+    <glyph glyph-name="archive" unicode="&#xf187;" horiz-adv-x="1792" 
+d="M1088 704q0 26 -19 45t-45 19h-256q-26 0 -45 -19t-19 -45t19 -45t45 -19h256q26 0 45 19t19 45zM1664 896v-960q0 -26 -19 -45t-45 -19h-1408q-26 0 -45 19t-19 45v960q0 26 19 45t45 19h1408q26 0 45 -19t19 -45zM1728 1344v-256q0 -26 -19 -45t-45 -19h-1536
+q-26 0 -45 19t-19 45v256q0 26 19 45t45 19h1536q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="bug" unicode="&#xf188;" horiz-adv-x="1664" 
+d="M1632 576q0 -26 -19 -45t-45 -19h-224q0 -171 -67 -290l208 -209q19 -19 19 -45t-19 -45q-18 -19 -45 -19t-45 19l-198 197q-5 -5 -15 -13t-42 -28.5t-65 -36.5t-82 -29t-97 -13v896h-128v-896q-51 0 -101.5 13.5t-87 33t-66 39t-43.5 32.5l-15 14l-183 -207
+q-20 -21 -48 -21q-24 0 -43 16q-19 18 -20.5 44.5t15.5 46.5l202 227q-58 114 -58 274h-224q-26 0 -45 19t-19 45t19 45t45 19h224v294l-173 173q-19 19 -19 45t19 45t45 19t45 -19l173 -173h844l173 173q19 19 45 19t45 -19t19 -45t-19 -45l-173 -173v-294h224q26 0 45 -19
+t19 -45zM1152 1152h-640q0 133 93.5 226.5t226.5 93.5t226.5 -93.5t93.5 -226.5z" />
+    <glyph glyph-name="vk" unicode="&#xf189;" horiz-adv-x="1920" 
+d="M1917 1016q23 -64 -150 -294q-24 -32 -65 -85q-40 -51 -55 -72t-30.5 -49.5t-12 -42t13 -34.5t32.5 -43t57 -53q4 -2 5 -4q141 -131 191 -221q3 -5 6.5 -12.5t7 -26.5t-0.5 -34t-25 -27.5t-59 -12.5l-256 -4q-24 -5 -56 5t-52 22l-20 12q-30 21 -70 64t-68.5 77.5t-61 58
+t-56.5 15.5q-3 -1 -8 -3.5t-17 -14.5t-21.5 -29.5t-17 -52t-6.5 -77.5q0 -15 -3.5 -27.5t-7.5 -18.5l-4 -5q-18 -19 -53 -22h-115q-71 -4 -146 16.5t-131.5 53t-103 66t-70.5 57.5l-25 24q-10 10 -27.5 30t-71.5 91t-106 151t-122.5 211t-130.5 272q-6 16 -6 27t3 16l4 6
+q15 19 57 19l274 2q12 -2 23 -6.5t16 -8.5l5 -3q16 -11 24 -32q20 -50 46 -103.5t41 -81.5l16 -29q29 -60 56 -104t48.5 -68.5t41.5 -38.5t34 -14t27 5q2 1 5 5t12 22t13.5 47t9.5 81t0 125q-2 40 -9 73t-14 46l-6 12q-25 34 -85 43q-13 2 5 24q16 19 38 30q53 26 239 24
+q82 -1 135 -13q20 -5 33.5 -13.5t20.5 -24t10.5 -32t3.5 -45.5t-1 -55t-2.5 -70.5t-1.5 -82.5q0 -11 -1 -42t-0.5 -48t3.5 -40.5t11.5 -39t22.5 -24.5q8 -2 17 -4t26 11t38 34.5t52 67t68 107.5q60 104 107 225q4 10 10 17.5t11 10.5l4 3l5 2.5t13 3t20 0.5l288 2
+q39 5 64 -2.5t31 -16.5z" />
+    <glyph glyph-name="weibo" unicode="&#xf18a;" horiz-adv-x="1792" 
+d="M675 252q21 34 11 69t-45 50q-34 14 -73 1t-60 -46q-22 -34 -13 -68.5t43 -50.5t74.5 -2.5t62.5 47.5zM769 373q8 13 3.5 26.5t-17.5 18.5q-14 5 -28.5 -0.5t-21.5 -18.5q-17 -31 13 -45q14 -5 29 0.5t22 18.5zM943 266q-45 -102 -158 -150t-224 -12
+q-107 34 -147.5 126.5t6.5 187.5q47 93 151.5 139t210.5 19q111 -29 158.5 -119.5t2.5 -190.5zM1255 426q-9 96 -89 170t-208.5 109t-274.5 21q-223 -23 -369.5 -141.5t-132.5 -264.5q9 -96 89 -170t208.5 -109t274.5 -21q223 23 369.5 141.5t132.5 264.5zM1563 422
+q0 -68 -37 -139.5t-109 -137t-168.5 -117.5t-226 -83t-270.5 -31t-275 33.5t-240.5 93t-171.5 151t-65 199.5q0 115 69.5 245t197.5 258q169 169 341.5 236t246.5 -7q65 -64 20 -209q-4 -14 -1 -20t10 -7t14.5 0.5t13.5 3.5l6 2q139 59 246 59t153 -61q45 -63 0 -178
+q-2 -13 -4.5 -20t4.5 -12.5t12 -7.5t17 -6q57 -18 103 -47t80 -81.5t34 -116.5zM1489 1046q42 -47 54.5 -108.5t-6.5 -117.5q-8 -23 -29.5 -34t-44.5 -4q-23 8 -34 29.5t-4 44.5q20 63 -24 111t-107 35q-24 -5 -45 8t-25 37q-5 24 8 44.5t37 25.5q60 13 119 -5.5t101 -65.5z
+M1670 1209q87 -96 112.5 -222.5t-13.5 -241.5q-9 -27 -34 -40t-52 -4t-40 34t-5 52q28 82 10 172t-80 158q-62 69 -148 95.5t-173 8.5q-28 -6 -52 9.5t-30 43.5t9.5 51.5t43.5 29.5q123 26 244 -11.5t208 -134.5z" />
+    <glyph glyph-name="renren" unicode="&#xf18b;" 
+d="M1133 -34q-171 -94 -368 -94q-196 0 -367 94q138 87 235.5 211t131.5 268q35 -144 132.5 -268t235.5 -211zM638 1394v-485q0 -252 -126.5 -459.5t-330.5 -306.5q-181 215 -181 495q0 187 83.5 349.5t229.5 269.5t325 137zM1536 638q0 -280 -181 -495
+q-204 99 -330.5 306.5t-126.5 459.5v485q179 -30 325 -137t229.5 -269.5t83.5 -349.5z" />
+    <glyph glyph-name="_372" unicode="&#xf18c;" horiz-adv-x="1408" 
+d="M1402 433q-32 -80 -76 -138t-91 -88.5t-99 -46.5t-101.5 -14.5t-96.5 8.5t-86.5 22t-69.5 27.5t-46 22.5l-17 10q-113 -228 -289.5 -359.5t-384.5 -132.5q-19 0 -32 13t-13 32t13 31.5t32 12.5q173 1 322.5 107.5t251.5 294.5q-36 -14 -72 -23t-83 -13t-91 2.5t-93 28.5
+t-92 59t-84.5 100t-74.5 146q114 47 214 57t167.5 -7.5t124.5 -56.5t88.5 -77t56.5 -82q53 131 79 291q-7 -1 -18 -2.5t-46.5 -2.5t-69.5 0.5t-81.5 10t-88.5 23t-84 42.5t-75 65t-54.5 94.5t-28.5 127.5q70 28 133.5 36.5t112.5 -1t92 -30t73.5 -50t56 -61t42 -63t27.5 -56
+t16 -39.5l4 -16q12 122 12 195q-8 6 -21.5 16t-49 44.5t-63.5 71.5t-54 93t-33 112.5t12 127t70 138.5q73 -25 127.5 -61.5t84.5 -76.5t48 -85t20.5 -89t-0.5 -85.5t-13 -76.5t-19 -62t-17 -42l-7 -15q1 -4 1 -50t-1 -72q3 7 10 18.5t30.5 43t50.5 58t71 55.5t91.5 44.5
+t112 14.5t132.5 -24q-2 -78 -21.5 -141.5t-50 -104.5t-69.5 -71.5t-81.5 -45.5t-84.5 -24t-80 -9.5t-67.5 1t-46.5 4.5l-17 3q-23 -147 -73 -283q6 7 18 18.5t49.5 41t77.5 52.5t99.5 42t117.5 20t129 -23.5t137 -77.5z" />
+    <glyph glyph-name="stack_exchange" unicode="&#xf18d;" horiz-adv-x="1280" 
+d="M1259 283v-66q0 -85 -57.5 -144.5t-138.5 -59.5h-57l-260 -269v269h-529q-81 0 -138.5 59.5t-57.5 144.5v66h1238zM1259 609v-255h-1238v255h1238zM1259 937v-255h-1238v255h1238zM1259 1077v-67h-1238v67q0 84 57.5 143.5t138.5 59.5h846q81 0 138.5 -59.5t57.5 -143.5z
+" />
+    <glyph glyph-name="_374" unicode="&#xf18e;" 
+d="M1152 640q0 -14 -9 -23l-320 -320q-9 -9 -23 -9q-13 0 -22.5 9.5t-9.5 22.5v192h-352q-13 0 -22.5 9.5t-9.5 22.5v192q0 13 9.5 22.5t22.5 9.5h352v192q0 14 9 23t23 9q12 0 24 -10l319 -319q9 -9 9 -23zM1312 640q0 148 -73 273t-198 198t-273 73t-273 -73t-198 -198
+t-73 -273t73 -273t198 -198t273 -73t273 73t198 198t73 273zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="arrow_circle_alt_left" unicode="&#xf190;" 
+d="M1152 736v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-352v-192q0 -14 -9 -23t-23 -9q-12 0 -24 10l-319 319q-9 9 -9 23t9 23l320 320q9 9 23 9q13 0 22.5 -9.5t9.5 -22.5v-192h352q13 0 22.5 -9.5t9.5 -22.5zM1312 640q0 148 -73 273t-198 198t-273 73t-273 -73t-198 -198
+t-73 -273t73 -273t198 -198t273 -73t273 73t198 198t73 273zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="_376" unicode="&#xf191;" 
+d="M1024 960v-640q0 -26 -19 -45t-45 -19q-20 0 -37 12l-448 320q-27 19 -27 52t27 52l448 320q17 12 37 12q26 0 45 -19t19 -45zM1280 160v960q0 13 -9.5 22.5t-22.5 9.5h-960q-13 0 -22.5 -9.5t-9.5 -22.5v-960q0 -13 9.5 -22.5t22.5 -9.5h960q13 0 22.5 9.5t9.5 22.5z
+M1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="dot_circle_alt" unicode="&#xf192;" 
+d="M1024 640q0 -106 -75 -181t-181 -75t-181 75t-75 181t75 181t181 75t181 -75t75 -181zM768 1184q-148 0 -273 -73t-198 -198t-73 -273t73 -273t198 -198t273 -73t273 73t198 198t73 273t-73 273t-198 198t-273 73zM1536 640q0 -209 -103 -385.5t-279.5 -279.5
+t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="_378" unicode="&#xf193;" horiz-adv-x="1664" 
+d="M1023 349l102 -204q-58 -179 -210 -290t-339 -111q-156 0 -288.5 77.5t-210 210t-77.5 288.5q0 181 104.5 330t274.5 211l17 -131q-122 -54 -195 -165.5t-73 -244.5q0 -185 131.5 -316.5t316.5 -131.5q126 0 232.5 65t165 175.5t49.5 236.5zM1571 249l58 -114l-256 -128
+q-13 -7 -29 -7q-40 0 -57 35l-239 477h-472q-24 0 -42.5 16.5t-21.5 40.5l-96 779q-2 17 6 42q14 51 57 82.5t97 31.5q66 0 113 -47t47 -113q0 -69 -52 -117.5t-120 -41.5l37 -289h423v-128h-407l16 -128h455q40 0 57 -35l228 -455z" />
+    <glyph glyph-name="vimeo_square" unicode="&#xf194;" 
+d="M1292 898q10 216 -161 222q-231 8 -312 -261q44 19 82 19q85 0 74 -96q-4 -57 -74 -167t-105 -110q-43 0 -82 169q-13 54 -45 255q-30 189 -160 177q-59 -7 -164 -100l-81 -72l-81 -72l52 -67q76 52 87 52q57 0 107 -179q15 -55 45 -164.5t45 -164.5q68 -179 164 -179
+q157 0 383 294q220 283 226 444zM1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="_380" unicode="&#xf195;" horiz-adv-x="1152" 
+d="M1152 704q0 -191 -94.5 -353t-256.5 -256.5t-353 -94.5h-160q-14 0 -23 9t-9 23v611l-215 -66q-3 -1 -9 -1q-10 0 -19 6q-13 10 -13 26v128q0 23 23 31l233 71v93l-215 -66q-3 -1 -9 -1q-10 0 -19 6q-13 10 -13 26v128q0 23 23 31l233 71v250q0 14 9 23t23 9h160
+q14 0 23 -9t9 -23v-181l375 116q15 5 28 -5t13 -26v-128q0 -23 -23 -31l-393 -121v-93l375 116q15 5 28 -5t13 -26v-128q0 -23 -23 -31l-393 -121v-487q188 13 318 151t130 328q0 14 9 23t23 9h160q14 0 23 -9t9 -23z" />
+    <glyph glyph-name="plus_square_o" unicode="&#xf196;" horiz-adv-x="1408" 
+d="M1152 736v-64q0 -14 -9 -23t-23 -9h-352v-352q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9t-9 23v352h-352q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h352v352q0 14 9 23t23 9h64q14 0 23 -9t9 -23v-352h352q14 0 23 -9t9 -23zM1280 288v832q0 66 -47 113t-113 47h-832
+q-66 0 -113 -47t-47 -113v-832q0 -66 47 -113t113 -47h832q66 0 113 47t47 113zM1408 1120v-832q0 -119 -84.5 -203.5t-203.5 -84.5h-832q-119 0 -203.5 84.5t-84.5 203.5v832q0 119 84.5 203.5t203.5 84.5h832q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="_382" unicode="&#xf197;" horiz-adv-x="2176" 
+d="M620 416q-110 -64 -268 -64h-128v64h-64q-13 0 -22.5 23.5t-9.5 56.5q0 24 7 49q-58 2 -96.5 10.5t-38.5 20.5t38.5 20.5t96.5 10.5q-7 25 -7 49q0 33 9.5 56.5t22.5 23.5h64v64h128q158 0 268 -64h1113q42 -7 106.5 -18t80.5 -14q89 -15 150 -40.5t83.5 -47.5t22.5 -40
+t-22.5 -40t-83.5 -47.5t-150 -40.5q-16 -3 -80.5 -14t-106.5 -18h-1113zM1739 668q53 -36 53 -92t-53 -92l81 -30q68 48 68 122t-68 122zM625 400h1015q-217 -38 -456 -80q-57 0 -113 -24t-83 -48l-28 -24l-288 -288q-26 -26 -70.5 -45t-89.5 -19h-96l-93 464h29
+q157 0 273 64zM352 816h-29l93 464h96q46 0 90 -19t70 -45l288 -288q4 -4 11 -10.5t30.5 -23t48.5 -29t61.5 -23t72.5 -10.5l456 -80h-1015q-116 64 -273 64z" />
+    <glyph glyph-name="_383" unicode="&#xf198;" horiz-adv-x="1664" 
+d="M1519 760q62 0 103.5 -40.5t41.5 -101.5q0 -97 -93 -130l-172 -59l56 -167q7 -21 7 -47q0 -59 -42 -102t-101 -43q-47 0 -85.5 27t-53.5 72l-55 165l-310 -106l55 -164q8 -24 8 -47q0 -59 -42 -102t-102 -43q-47 0 -85 27t-53 72l-55 163l-153 -53q-29 -9 -50 -9
+q-61 0 -101.5 40t-40.5 101q0 47 27.5 85t71.5 53l156 53l-105 313l-156 -54q-26 -8 -48 -8q-60 0 -101 40.5t-41 100.5q0 47 27.5 85t71.5 53l157 53l-53 159q-8 24 -8 47q0 60 42 102.5t102 42.5q47 0 85 -27t53 -72l54 -160l310 105l-54 160q-8 24 -8 47q0 59 42.5 102
+t101.5 43q47 0 85.5 -27.5t53.5 -71.5l53 -161l162 55q21 6 43 6q60 0 102.5 -39.5t42.5 -98.5q0 -45 -30 -81.5t-74 -51.5l-157 -54l105 -316l164 56q24 8 46 8zM725 498l310 105l-105 315l-310 -107z" />
+    <glyph glyph-name="_384" unicode="&#xf199;" 
+d="M1248 1408q119 0 203.5 -84.5t84.5 -203.5v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960zM1280 352v436q-31 -35 -64 -55q-34 -22 -132.5 -85t-151.5 -99q-98 -69 -164 -69v0v0q-66 0 -164 69
+q-47 32 -142 92.5t-142 92.5q-12 8 -33 27t-31 27v-436q0 -40 28 -68t68 -28h832q40 0 68 28t28 68zM1280 925q0 41 -27.5 70t-68.5 29h-832q-40 0 -68 -28t-28 -68q0 -37 30.5 -76.5t67.5 -64.5q47 -32 137.5 -89t129.5 -83q3 -2 17 -11.5t21 -14t21 -13t23.5 -13
+t21.5 -9.5t22.5 -7.5t20.5 -2.5t20.5 2.5t22.5 7.5t21.5 9.5t23.5 13t21 13t21 14t17 11.5l267 174q35 23 66.5 62.5t31.5 73.5z" />
+    <glyph glyph-name="_385" unicode="&#xf19a;" horiz-adv-x="1792" 
+d="M127 640q0 163 67 313l367 -1005q-196 95 -315 281t-119 411zM1415 679q0 -19 -2.5 -38.5t-10 -49.5t-11.5 -44t-17.5 -59t-17.5 -58l-76 -256l-278 826q46 3 88 8q19 2 26 18.5t-2.5 31t-28.5 13.5l-205 -10q-75 1 -202 10q-12 1 -20.5 -5t-11.5 -15t-1.5 -18.5t9 -16.5
+t19.5 -8l80 -8l120 -328l-168 -504l-280 832q46 3 88 8q19 2 26 18.5t-2.5 31t-28.5 13.5l-205 -10q-7 0 -23 0.5t-26 0.5q105 160 274.5 253.5t367.5 93.5q147 0 280.5 -53t238.5 -149h-10q-55 0 -92 -40.5t-37 -95.5q0 -12 2 -24t4 -21.5t8 -23t9 -21t12 -22.5t12.5 -21
+t14.5 -24t14 -23q63 -107 63 -212zM909 573l237 -647q1 -6 5 -11q-126 -44 -255 -44q-112 0 -217 32zM1570 1009q95 -174 95 -369q0 -209 -104 -385.5t-279 -278.5l235 678q59 169 59 276q0 42 -6 79zM896 1536q182 0 348 -71t286 -191t191 -286t71 -348t-71 -348t-191 -286
+t-286 -191t-348 -71t-348 71t-286 191t-191 286t-71 348t71 348t191 286t286 191t348 71zM896 -215q173 0 331.5 68t273 182.5t182.5 273t68 331.5t-68 331.5t-182.5 273t-273 182.5t-331.5 68t-331.5 -68t-273 -182.5t-182.5 -273t-68 -331.5t68 -331.5t182.5 -273
+t273 -182.5t331.5 -68z" />
+    <glyph glyph-name="_386" unicode="&#xf19b;" horiz-adv-x="1792" 
+d="M1086 1536v-1536l-272 -128q-228 20 -414 102t-293 208.5t-107 272.5q0 140 100.5 263.5t275 205.5t391.5 108v-172q-217 -38 -356.5 -150t-139.5 -255q0 -152 154.5 -267t388.5 -145v1360zM1755 954l37 -390l-525 114l147 83q-119 70 -280 99v172q277 -33 481 -157z" />
+    <glyph glyph-name="_387" unicode="&#xf19c;" horiz-adv-x="2048" 
+d="M960 1536l960 -384v-128h-128q0 -26 -20.5 -45t-48.5 -19h-1526q-28 0 -48.5 19t-20.5 45h-128v128zM256 896h256v-768h128v768h256v-768h128v768h256v-768h128v768h256v-768h59q28 0 48.5 -19t20.5 -45v-64h-1664v64q0 26 20.5 45t48.5 19h59v768zM1851 -64
+q28 0 48.5 -19t20.5 -45v-128h-1920v128q0 26 20.5 45t48.5 19h1782z" />
+    <glyph glyph-name="_388" unicode="&#xf19d;" horiz-adv-x="2304" 
+d="M1774 700l18 -316q4 -69 -82 -128t-235 -93.5t-323 -34.5t-323 34.5t-235 93.5t-82 128l18 316l574 -181q22 -7 48 -7t48 7zM2304 1024q0 -23 -22 -31l-1120 -352q-4 -1 -10 -1t-10 1l-652 206q-43 -34 -71 -111.5t-34 -178.5q63 -36 63 -109q0 -69 -58 -107l58 -433
+q2 -14 -8 -25q-9 -11 -24 -11h-192q-15 0 -24 11q-10 11 -8 25l58 433q-58 38 -58 107q0 73 65 111q11 207 98 330l-333 104q-22 8 -22 31t22 31l1120 352q4 1 10 1t10 -1l1120 -352q22 -8 22 -31z" />
+    <glyph glyph-name="_389" unicode="&#xf19e;" 
+d="M859 579l13 -707q-62 11 -105 11q-41 0 -105 -11l13 707q-40 69 -168.5 295.5t-216.5 374.5t-181 287q58 -15 108 -15q44 0 111 15q63 -111 133.5 -229.5t167 -276.5t138.5 -227q37 61 109.5 177.5t117.5 190t105 176t107 189.5q54 -14 107 -14q56 0 114 14v0
+q-28 -39 -60 -88.5t-49.5 -78.5t-56.5 -96t-49 -84q-146 -248 -353 -610z" />
+    <glyph glyph-name="uniF1A0" unicode="&#xf1a0;" 
+d="M768 750h725q12 -67 12 -128q0 -217 -91 -387.5t-259.5 -266.5t-386.5 -96q-157 0 -299 60.5t-245 163.5t-163.5 245t-60.5 299t60.5 299t163.5 245t245 163.5t299 60.5q300 0 515 -201l-209 -201q-123 119 -306 119q-129 0 -238.5 -65t-173.5 -176.5t-64 -243.5
+t64 -243.5t173.5 -176.5t238.5 -65q87 0 160 24t120 60t82 82t51.5 87t22.5 78h-436v264z" />
+    <glyph glyph-name="f1a1" unicode="&#xf1a1;" horiz-adv-x="1792" 
+d="M1095 369q16 -16 0 -31q-62 -62 -199 -62t-199 62q-16 15 0 31q6 6 15 6t15 -6q48 -49 169 -49q120 0 169 49q6 6 15 6t15 -6zM788 550q0 -37 -26 -63t-63 -26t-63.5 26t-26.5 63q0 38 26.5 64t63.5 26t63 -26.5t26 -63.5zM1183 550q0 -37 -26.5 -63t-63.5 -26t-63 26
+t-26 63t26 63.5t63 26.5t63.5 -26t26.5 -64zM1434 670q0 49 -35 84t-85 35t-86 -36q-130 90 -311 96l63 283l200 -45q0 -37 26 -63t63 -26t63.5 26.5t26.5 63.5t-26.5 63.5t-63.5 26.5q-54 0 -80 -50l-221 49q-19 5 -25 -16l-69 -312q-180 -7 -309 -97q-35 37 -87 37
+q-50 0 -85 -35t-35 -84q0 -35 18.5 -64t49.5 -44q-6 -27 -6 -56q0 -142 140 -243t337 -101q198 0 338 101t140 243q0 32 -7 57q30 15 48 43.5t18 63.5zM1792 640q0 -182 -71 -348t-191 -286t-286 -191t-348 -71t-348 71t-286 191t-191 286t-71 348t71 348t191 286t286 191
+t348 71t348 -71t286 -191t191 -286t71 -348z" />
+    <glyph glyph-name="_392" unicode="&#xf1a2;" 
+d="M939 407q13 -13 0 -26q-53 -53 -171 -53t-171 53q-13 13 0 26q5 6 13 6t13 -6q42 -42 145 -42t145 42q5 6 13 6t13 -6zM676 563q0 -31 -23 -54t-54 -23t-54 23t-23 54q0 32 22.5 54.5t54.5 22.5t54.5 -22.5t22.5 -54.5zM1014 563q0 -31 -23 -54t-54 -23t-54 23t-23 54
+q0 32 22.5 54.5t54.5 22.5t54.5 -22.5t22.5 -54.5zM1229 666q0 42 -30 72t-73 30q-42 0 -73 -31q-113 78 -267 82l54 243l171 -39q1 -32 23.5 -54t53.5 -22q32 0 54.5 22.5t22.5 54.5t-22.5 54.5t-54.5 22.5q-48 0 -69 -43l-189 42q-17 5 -21 -13l-60 -268q-154 -6 -265 -83
+q-30 32 -74 32q-43 0 -73 -30t-30 -72q0 -30 16 -55t42 -38q-5 -25 -5 -48q0 -122 120 -208.5t289 -86.5q170 0 290 86.5t120 208.5q0 25 -6 49q25 13 40.5 37.5t15.5 54.5zM1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960
+q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="_393" unicode="&#xf1a3;" 
+d="M866 697l90 27v62q0 79 -58 135t-138 56t-138 -55.5t-58 -134.5v-283q0 -20 -14 -33.5t-33 -13.5t-32.5 13.5t-13.5 33.5v120h-151v-122q0 -82 57.5 -139t139.5 -57q81 0 138.5 56.5t57.5 136.5v280q0 19 13.5 33t33.5 14q19 0 32.5 -14t13.5 -33v-54zM1199 502v122h-150
+v-126q0 -20 -13.5 -33.5t-33.5 -13.5q-19 0 -32.5 14t-13.5 33v123l-90 -26l-60 28v-123q0 -80 58 -137t139 -57t138.5 57t57.5 139zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103
+t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="f1a4" unicode="&#xf1a4;" horiz-adv-x="1920" 
+d="M1062 824v118q0 42 -30 72t-72 30t-72 -30t-30 -72v-612q0 -175 -126 -299t-303 -124q-178 0 -303.5 125.5t-125.5 303.5v266h328v-262q0 -43 30 -72.5t72 -29.5t72 29.5t30 72.5v620q0 171 126.5 292t301.5 121q176 0 302 -122t126 -294v-136l-195 -58zM1592 602h328
+v-266q0 -178 -125.5 -303.5t-303.5 -125.5q-177 0 -303 124.5t-126 300.5v268l131 -61l195 58v-270q0 -42 30 -71.5t72 -29.5t72 29.5t30 71.5v275z" />
+    <glyph glyph-name="_395" unicode="&#xf1a5;" 
+d="M1472 160v480h-704v704h-480q-93 0 -158.5 -65.5t-65.5 -158.5v-480h704v-704h480q93 0 158.5 65.5t65.5 158.5zM1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5
+t84.5 -203.5z" />
+    <glyph glyph-name="_396" unicode="&#xf1a6;" horiz-adv-x="2048" 
+d="M328 1254h204v-983h-532v697h328v286zM328 435v369h-123v-369h123zM614 968v-697h205v697h-205zM614 1254v-204h205v204h-205zM901 968h533v-942h-533v163h328v82h-328v697zM1229 435v369h-123v-369h123zM1516 968h532v-942h-532v163h327v82h-327v697zM1843 435v369h-123
+v-369h123z" />
+    <glyph glyph-name="_397" unicode="&#xf1a7;" 
+d="M1046 516q0 -64 -38 -109t-91 -45q-43 0 -70 15v277q28 17 70 17q53 0 91 -45.5t38 -109.5zM703 944q0 -64 -38 -109.5t-91 -45.5q-43 0 -70 15v277q28 17 70 17q53 0 91 -45t38 -109zM1265 513q0 134 -88 229t-213 95q-20 0 -39 -3q-23 -78 -78 -136q-87 -95 -211 -101
+v-636l211 41v206q51 -19 117 -19q125 0 213 95t88 229zM922 940q0 134 -88.5 229t-213.5 95q-74 0 -141 -36h-186v-840l211 41v206q55 -19 116 -19q125 0 213.5 95t88.5 229zM1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960
+q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="_398" unicode="&#xf1a8;" horiz-adv-x="2038" 
+d="M1222 607q75 3 143.5 -20.5t118 -58.5t101 -94.5t84 -108t75.5 -120.5q33 -56 78.5 -109t75.5 -80.5t99 -88.5q-48 -30 -108.5 -57.5t-138.5 -59t-114 -47.5q-44 37 -74 115t-43.5 164.5t-33 180.5t-42.5 168.5t-72.5 123t-122.5 48.5l-10 -2l-6 -4q4 -5 13 -14
+q6 -5 28 -23.5t25.5 -22t19 -18t18 -20.5t11.5 -21t10.5 -27.5t4.5 -31t4 -40.5l1 -33q1 -26 -2.5 -57.5t-7.5 -52t-12.5 -58.5t-11.5 -53q-35 1 -101 -9.5t-98 -10.5q-39 0 -72 10q-2 16 -2 47q0 74 3 96q2 13 31.5 41.5t57 59t26.5 51.5q-24 2 -43 -24
+q-36 -53 -111.5 -99.5t-136.5 -46.5q-25 0 -75.5 63t-106.5 139.5t-84 96.5q-6 4 -27 30q-482 -112 -513 -112q-16 0 -28 11t-12 27q0 15 8.5 26.5t22.5 14.5l486 106q-8 14 -8 25t5.5 17.5t16 11.5t20 7t23 4.5t18.5 4.5q4 1 15.5 7.5t17.5 6.5q15 0 28 -16t20 -33
+q163 37 172 37q17 0 29.5 -11t12.5 -28q0 -15 -8.5 -26t-23.5 -14l-182 -40l-1 -16q-1 -26 81.5 -117.5t104.5 -91.5q47 0 119 80t72 129q0 36 -23.5 53t-51 18.5t-51 11.5t-23.5 34q0 16 10 34l-68 19q43 44 43 117q0 26 -5 58q82 16 144 16q44 0 71.5 -1.5t48.5 -8.5
+t31 -13.5t20.5 -24.5t15.5 -33.5t17 -47.5t24 -60l50 25q-3 -40 -23 -60t-42.5 -21t-40 -6.5t-16.5 -20.5zM1282 842q-5 5 -13.5 15.5t-12 14.5t-10.5 11.5t-10 10.5l-8 8t-8.5 7.5t-8 5t-8.5 4.5q-7 3 -14.5 5t-20.5 2.5t-22 0.5h-32.5h-37.5q-126 0 -217 -43
+q16 30 36 46.5t54 29.5t65.5 36t46 36.5t50 55t43.5 50.5q12 -9 28 -31.5t32 -36.5t38 -13l12 1v-76l22 -1q247 95 371 190q28 21 50 39t42.5 37.5t33 31t29.5 34t24 31t24.5 37t23 38t27 47.5t29.5 53l7 9q-2 -53 -43 -139q-79 -165 -205 -264t-306 -142q-14 -3 -42 -7.5
+t-50 -9.5t-39 -14q3 -19 24.5 -46t21.5 -34q0 -11 -26 -30zM1061 -79q39 26 131.5 47.5t146.5 21.5q9 0 22.5 -15.5t28 -42.5t26 -50t24 -51t14.5 -33q-121 -45 -244 -45q-61 0 -125 11zM822 568l48 12l109 -177l-73 -48zM1323 51q3 -15 3 -16q0 -7 -17.5 -14.5t-46 -13
+t-54 -9.5t-53.5 -7.5t-32 -4.5l-7 43q21 2 60.5 8.5t72 10t60.5 3.5h14zM866 679l-96 -20l-6 17q10 1 32.5 7t34.5 6q19 0 35 -10zM1061 45h31l10 -83l-41 -12v95zM1950 1535v1v-1zM1950 1535l-1 -5l-2 -2l1 3zM1950 1535l1 1z" />
+    <glyph glyph-name="_399" unicode="&#xf1a9;" 
+d="M1167 -50q-5 19 -24 5q-30 -22 -87 -39t-131 -17q-129 0 -193 49q-5 4 -13 4q-11 0 -26 -12q-7 -6 -7.5 -16t7.5 -20q34 -32 87.5 -46t102.5 -12.5t99 4.5q41 4 84.5 20.5t65 30t28.5 20.5q12 12 7 29zM1128 65q-19 47 -39 61q-23 15 -76 15q-47 0 -71 -10
+q-29 -12 -78 -56q-26 -24 -12 -44q9 -8 17.5 -4.5t31.5 23.5q3 2 10.5 8.5t10.5 8.5t10 7t11.5 7t12.5 5t15 4.5t16.5 2.5t20.5 1q27 0 44.5 -7.5t23 -14.5t13.5 -22q10 -17 12.5 -20t12.5 1q23 12 14 34zM1483 346q0 22 -5 44.5t-16.5 45t-34 36.5t-52.5 14
+q-33 0 -97 -41.5t-129 -83.5t-101 -42q-27 -1 -63.5 19t-76 49t-83.5 58t-100 49t-111 19q-115 -1 -197 -78.5t-84 -178.5q-2 -112 74 -164q29 -20 62.5 -28.5t103.5 -8.5q57 0 132 32.5t134 71t120 70.5t93 31q26 -1 65 -31.5t71.5 -67t68 -67.5t55.5 -32q35 -3 58.5 14
+t55.5 63q28 41 42.5 101t14.5 106zM1536 506q0 -164 -62 -304.5t-166 -236t-242.5 -149.5t-290.5 -54t-293 57.5t-247.5 157t-170.5 241.5t-64 302q0 89 19.5 172.5t49 145.5t70.5 118.5t78.5 94t78.5 69.5t64.5 46.5t42.5 24.5q14 8 51 26.5t54.5 28.5t48 30t60.5 44
+q36 28 58 72.5t30 125.5q129 -155 186 -193q44 -29 130 -68t129 -66q21 -13 39 -25t60.5 -46.5t76 -70.5t75 -95t69 -122t47 -148.5t19.5 -177.5z" />
+    <glyph glyph-name="_400" unicode="&#xf1aa;" 
+d="M1070 463l-160 -160l-151 -152l-30 -30q-65 -64 -151.5 -87t-171.5 -2q-16 -70 -72 -115t-129 -45q-85 0 -145 60.5t-60 145.5q0 72 44.5 128t113.5 72q-22 86 1 173t88 152l12 12l151 -152l-11 -11q-37 -37 -37 -89t37 -90q37 -37 89 -37t89 37l30 30l151 152l161 160z
+M729 1145l12 -12l-152 -152l-12 12q-37 37 -89 37t-89 -37t-37 -89.5t37 -89.5l29 -29l152 -152l160 -160l-151 -152l-161 160l-151 152l-30 30q-68 67 -90 159.5t5 179.5q-70 15 -115 71t-45 129q0 85 60 145.5t145 60.5q76 0 133.5 -49t69.5 -123q84 20 169.5 -3.5
+t149.5 -87.5zM1536 78q0 -85 -60 -145.5t-145 -60.5q-74 0 -131 47t-71 118q-86 -28 -179.5 -6t-161.5 90l-11 12l151 152l12 -12q37 -37 89 -37t89 37t37 89t-37 89l-30 30l-152 152l-160 160l152 152l160 -160l152 -152l29 -30q64 -64 87.5 -150.5t2.5 -171.5
+q76 -11 126.5 -68.5t50.5 -134.5zM1534 1202q0 -77 -51 -135t-127 -69q26 -85 3 -176.5t-90 -158.5l-12 -12l-151 152l12 12q37 37 37 89t-37 89t-89 37t-89 -37l-30 -30l-152 -152l-160 -160l-152 152l161 160l152 152l29 30q67 67 159 89.5t178 -3.5q11 75 68.5 126
+t135.5 51q85 0 145 -60.5t60 -145.5z" />
+    <glyph glyph-name="f1ab" unicode="&#xf1ab;" 
+d="M654 458q-1 -3 -12.5 0.5t-31.5 11.5l-20 9q-44 20 -87 49q-7 5 -41 31.5t-38 28.5q-67 -103 -134 -181q-81 -95 -105 -110q-4 -2 -19.5 -4t-18.5 0q6 4 82 92q21 24 85.5 115t78.5 118q17 30 51 98.5t36 77.5q-8 1 -110 -33q-8 -2 -27.5 -7.5t-34.5 -9.5t-17 -5
+q-2 -2 -2 -10.5t-1 -9.5q-5 -10 -31 -15q-23 -7 -47 0q-18 4 -28 21q-4 6 -5 23q6 2 24.5 5t29.5 6q58 16 105 32q100 35 102 35q10 2 43 19.5t44 21.5q9 3 21.5 8t14.5 5.5t6 -0.5q2 -12 -1 -33q0 -2 -12.5 -27t-26.5 -53.5t-17 -33.5q-25 -50 -77 -131l64 -28
+q12 -6 74.5 -32t67.5 -28q4 -1 10.5 -25.5t4.5 -30.5zM449 944q3 -15 -4 -28q-12 -23 -50 -38q-30 -12 -60 -12q-26 3 -49 26q-14 15 -18 41l1 3q3 -3 19.5 -5t26.5 0t58 16q36 12 55 14q17 0 21 -17zM1147 815l63 -227l-139 42zM39 15l694 232v1032l-694 -233v-1031z
+M1280 332l102 -31l-181 657l-100 31l-216 -536l102 -31l45 110l211 -65zM777 1294l573 -184v380zM1088 -29l158 -13l-54 -160l-40 66q-130 -83 -276 -108q-58 -12 -91 -12h-84q-79 0 -199.5 39t-183.5 85q-8 7 -8 16q0 8 5 13.5t13 5.5q4 0 18 -7.5t30.5 -16.5t20.5 -11
+q73 -37 159.5 -61.5t157.5 -24.5q95 0 167 14.5t157 50.5q15 7 30.5 15.5t34 19t28.5 16.5zM1536 1050v-1079l-774 246q-14 -6 -375 -127.5t-368 -121.5q-13 0 -18 13q0 1 -1 3v1078q3 9 4 10q5 6 20 11q107 36 149 50v384l558 -198q2 0 160.5 55t316 108.5t161.5 53.5
+q20 0 20 -21v-418z" />
+    <glyph glyph-name="_402" unicode="&#xf1ac;" horiz-adv-x="1792" 
+d="M288 1152q66 0 113 -47t47 -113v-1088q0 -66 -47 -113t-113 -47h-128q-66 0 -113 47t-47 113v1088q0 66 47 113t113 47h128zM1664 989q58 -34 93 -93t35 -128v-768q0 -106 -75 -181t-181 -75h-864q-66 0 -113 47t-47 113v1536q0 40 28 68t68 28h672q40 0 88 -20t76 -48
+l152 -152q28 -28 48 -76t20 -88v-163zM928 0v128q0 14 -9 23t-23 9h-128q-14 0 -23 -9t-9 -23v-128q0 -14 9 -23t23 -9h128q14 0 23 9t9 23zM928 256v128q0 14 -9 23t-23 9h-128q-14 0 -23 -9t-9 -23v-128q0 -14 9 -23t23 -9h128q14 0 23 9t9 23zM928 512v128q0 14 -9 23
+t-23 9h-128q-14 0 -23 -9t-9 -23v-128q0 -14 9 -23t23 -9h128q14 0 23 9t9 23zM1184 0v128q0 14 -9 23t-23 9h-128q-14 0 -23 -9t-9 -23v-128q0 -14 9 -23t23 -9h128q14 0 23 9t9 23zM1184 256v128q0 14 -9 23t-23 9h-128q-14 0 -23 -9t-9 -23v-128q0 -14 9 -23t23 -9h128
+q14 0 23 9t9 23zM1184 512v128q0 14 -9 23t-23 9h-128q-14 0 -23 -9t-9 -23v-128q0 -14 9 -23t23 -9h128q14 0 23 9t9 23zM1440 0v128q0 14 -9 23t-23 9h-128q-14 0 -23 -9t-9 -23v-128q0 -14 9 -23t23 -9h128q14 0 23 9t9 23zM1440 256v128q0 14 -9 23t-23 9h-128
+q-14 0 -23 -9t-9 -23v-128q0 -14 9 -23t23 -9h128q14 0 23 9t9 23zM1440 512v128q0 14 -9 23t-23 9h-128q-14 0 -23 -9t-9 -23v-128q0 -14 9 -23t23 -9h128q14 0 23 9t9 23zM1536 896v256h-160q-40 0 -68 28t-28 68v160h-640v-512h896z" />
+    <glyph glyph-name="_403" unicode="&#xf1ad;" 
+d="M1344 1536q26 0 45 -19t19 -45v-1664q0 -26 -19 -45t-45 -19h-1280q-26 0 -45 19t-19 45v1664q0 26 19 45t45 19h1280zM512 1248v-64q0 -14 9 -23t23 -9h64q14 0 23 9t9 23v64q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23zM512 992v-64q0 -14 9 -23t23 -9h64q14 0 23 9
+t9 23v64q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23zM512 736v-64q0 -14 9 -23t23 -9h64q14 0 23 9t9 23v64q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23zM512 480v-64q0 -14 9 -23t23 -9h64q14 0 23 9t9 23v64q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23zM384 160v64
+q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-64q0 -14 9 -23t23 -9h64q14 0 23 9t9 23zM384 416v64q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-64q0 -14 9 -23t23 -9h64q14 0 23 9t9 23zM384 672v64q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-64q0 -14 9 -23t23 -9h64
+q14 0 23 9t9 23zM384 928v64q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-64q0 -14 9 -23t23 -9h64q14 0 23 9t9 23zM384 1184v64q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-64q0 -14 9 -23t23 -9h64q14 0 23 9t9 23zM896 -96v192q0 14 -9 23t-23 9h-320q-14 0 -23 -9
+t-9 -23v-192q0 -14 9 -23t23 -9h320q14 0 23 9t9 23zM896 416v64q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-64q0 -14 9 -23t23 -9h64q14 0 23 9t9 23zM896 672v64q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-64q0 -14 9 -23t23 -9h64q14 0 23 9t9 23zM896 928v64
+q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-64q0 -14 9 -23t23 -9h64q14 0 23 9t9 23zM896 1184v64q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-64q0 -14 9 -23t23 -9h64q14 0 23 9t9 23zM1152 160v64q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-64q0 -14 9 -23t23 -9h64
+q14 0 23 9t9 23zM1152 416v64q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-64q0 -14 9 -23t23 -9h64q14 0 23 9t9 23zM1152 672v64q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-64q0 -14 9 -23t23 -9h64q14 0 23 9t9 23zM1152 928v64q0 14 -9 23t-23 9h-64q-14 0 -23 -9
+t-9 -23v-64q0 -14 9 -23t23 -9h64q14 0 23 9t9 23zM1152 1184v64q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-64q0 -14 9 -23t23 -9h64q14 0 23 9t9 23z" />
+    <glyph glyph-name="_404" unicode="&#xf1ae;" horiz-adv-x="1280" 
+d="M1188 988l-292 -292v-824q0 -46 -33 -79t-79 -33t-79 33t-33 79v384h-64v-384q0 -46 -33 -79t-79 -33t-79 33t-33 79v824l-292 292q-28 28 -28 68t28 68q29 28 68.5 28t67.5 -28l228 -228h368l228 228q28 28 68 28t68 -28q28 -29 28 -68.5t-28 -67.5zM864 1152
+q0 -93 -65.5 -158.5t-158.5 -65.5t-158.5 65.5t-65.5 158.5t65.5 158.5t158.5 65.5t158.5 -65.5t65.5 -158.5z" />
+    <glyph glyph-name="uniF1B1" unicode="&#xf1b0;" horiz-adv-x="1664" 
+d="M780 1064q0 -60 -19 -113.5t-63 -92.5t-105 -39q-76 0 -138 57.5t-92 135.5t-30 151q0 60 19 113.5t63 92.5t105 39q77 0 138.5 -57.5t91.5 -135t30 -151.5zM438 581q0 -80 -42 -139t-119 -59q-76 0 -141.5 55.5t-100.5 133.5t-35 152q0 80 42 139.5t119 59.5
+q76 0 141.5 -55.5t100.5 -134t35 -152.5zM832 608q118 0 255 -97.5t229 -237t92 -254.5q0 -46 -17 -76.5t-48.5 -45t-64.5 -20t-76 -5.5q-68 0 -187.5 45t-182.5 45q-66 0 -192.5 -44.5t-200.5 -44.5q-183 0 -183 146q0 86 56 191.5t139.5 192.5t187.5 146t193 59zM1071 819
+q-61 0 -105 39t-63 92.5t-19 113.5q0 74 30 151.5t91.5 135t138.5 57.5q61 0 105 -39t63 -92.5t19 -113.5q0 -73 -30 -151t-92 -135.5t-138 -57.5zM1503 923q77 0 119 -59.5t42 -139.5q0 -74 -35 -152t-100.5 -133.5t-141.5 -55.5q-77 0 -119 59t-42 139q0 74 35 152.5
+t100.5 134t141.5 55.5z" />
+    <glyph glyph-name="_406" unicode="&#xf1b1;" horiz-adv-x="768" 
+d="M704 1008q0 -145 -57 -243.5t-152 -135.5l45 -821q2 -26 -16 -45t-44 -19h-192q-26 0 -44 19t-16 45l45 821q-95 37 -152 135.5t-57 243.5q0 128 42.5 249.5t117.5 200t160 78.5t160 -78.5t117.5 -200t42.5 -249.5z" />
+    <glyph glyph-name="_407" unicode="&#xf1b2;" horiz-adv-x="1792" 
+d="M896 -93l640 349v636l-640 -233v-752zM832 772l698 254l-698 254l-698 -254zM1664 1024v-768q0 -35 -18 -65t-49 -47l-704 -384q-28 -16 -61 -16t-61 16l-704 384q-31 17 -49 47t-18 65v768q0 40 23 73t61 47l704 256q22 8 44 8t44 -8l704 -256q38 -14 61 -47t23 -73z
+" />
+    <glyph glyph-name="_408" unicode="&#xf1b3;" horiz-adv-x="2304" 
+d="M640 -96l384 192v314l-384 -164v-342zM576 358l404 173l-404 173l-404 -173zM1664 -96l384 192v314l-384 -164v-342zM1600 358l404 173l-404 173l-404 -173zM1152 651l384 165v266l-384 -164v-267zM1088 1030l441 189l-441 189l-441 -189zM2176 512v-416q0 -36 -19 -67
+t-52 -47l-448 -224q-25 -14 -57 -14t-57 14l-448 224q-4 2 -7 4q-2 -2 -7 -4l-448 -224q-25 -14 -57 -14t-57 14l-448 224q-33 16 -52 47t-19 67v416q0 38 21.5 70t56.5 48l434 186v400q0 38 21.5 70t56.5 48l448 192q23 10 50 10t50 -10l448 -192q35 -16 56.5 -48t21.5 -70
+v-400l434 -186q36 -16 57 -48t21 -70z" />
+    <glyph glyph-name="_409" unicode="&#xf1b4;" horiz-adv-x="2048" 
+d="M1848 1197h-511v-124h511v124zM1596 771q-90 0 -146 -52.5t-62 -142.5h408q-18 195 -200 195zM1612 186q63 0 122 32t76 87h221q-100 -307 -427 -307q-214 0 -340.5 132t-126.5 347q0 208 130.5 345.5t336.5 137.5q138 0 240.5 -68t153 -179t50.5 -248q0 -17 -2 -47h-658
+q0 -111 57.5 -171.5t166.5 -60.5zM277 236h296q205 0 205 167q0 180 -199 180h-302v-347zM277 773h281q78 0 123.5 36.5t45.5 113.5q0 144 -190 144h-260v-294zM0 1282h594q87 0 155 -14t126.5 -47.5t90 -96.5t31.5 -154q0 -181 -172 -263q114 -32 172 -115t58 -204
+q0 -75 -24.5 -136.5t-66 -103.5t-98.5 -71t-121 -42t-134 -13h-611v1260z" />
+    <glyph glyph-name="_410" unicode="&#xf1b5;" 
+d="M1248 1408q119 0 203.5 -84.5t84.5 -203.5v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960zM499 1041h-371v-787h382q117 0 197 57.5t80 170.5q0 158 -143 200q107 52 107 164q0 57 -19.5 96.5
+t-56.5 60.5t-79 29.5t-97 8.5zM477 723h-176v184h163q119 0 119 -90q0 -94 -106 -94zM486 388h-185v217h189q124 0 124 -113q0 -104 -128 -104zM1136 356q-68 0 -104 38t-36 107h411q1 10 1 30q0 132 -74.5 220.5t-203.5 88.5q-128 0 -210 -86t-82 -216q0 -135 79 -217
+t213 -82q205 0 267 191h-138q-11 -34 -47.5 -54t-75.5 -20zM1126 722q113 0 124 -122h-254q4 56 39 89t91 33zM964 988h319v-77h-319v77z" />
+    <glyph glyph-name="_411" unicode="&#xf1b6;" horiz-adv-x="1792" 
+d="M1582 954q0 -101 -71.5 -172.5t-172.5 -71.5t-172.5 71.5t-71.5 172.5t71.5 172.5t172.5 71.5t172.5 -71.5t71.5 -172.5zM812 212q0 104 -73 177t-177 73q-27 0 -54 -6l104 -42q77 -31 109.5 -106.5t1.5 -151.5q-31 -77 -107 -109t-152 -1q-21 8 -62 24.5t-61 24.5
+q32 -60 91 -96.5t130 -36.5q104 0 177 73t73 177zM1642 953q0 126 -89.5 215.5t-215.5 89.5q-127 0 -216.5 -89.5t-89.5 -215.5q0 -127 89.5 -216t216.5 -89q126 0 215.5 89t89.5 216zM1792 953q0 -189 -133.5 -322t-321.5 -133l-437 -319q-12 -129 -109 -218t-229 -89
+q-121 0 -214 76t-118 192l-230 92v429l389 -157q79 48 173 48q13 0 35 -2l284 407q2 187 135.5 319t320.5 132q188 0 321.5 -133.5t133.5 -321.5z" />
+    <glyph glyph-name="_412" unicode="&#xf1b7;" 
+d="M1242 889q0 80 -57 136.5t-137 56.5t-136.5 -57t-56.5 -136q0 -80 56.5 -136.5t136.5 -56.5t137 56.5t57 136.5zM632 301q0 -83 -58 -140.5t-140 -57.5q-56 0 -103 29t-72 77q52 -20 98 -40q60 -24 120 1.5t85 86.5q24 60 -1.5 120t-86.5 84l-82 33q22 5 42 5
+q82 0 140 -57.5t58 -140.5zM1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v153l172 -69q20 -92 93.5 -152t168.5 -60q104 0 181 70t87 173l345 252q150 0 255.5 105.5t105.5 254.5q0 150 -105.5 255.5t-255.5 105.5
+q-148 0 -253 -104.5t-107 -252.5l-225 -322q-9 1 -28 1q-75 0 -137 -37l-297 119v468q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5zM1289 887q0 -100 -71 -170.5t-171 -70.5t-170.5 70.5t-70.5 170.5t70.5 171t170.5 71q101 0 171.5 -70.5t70.5 -171.5z
+" />
+    <glyph glyph-name="_413" unicode="&#xf1b8;" horiz-adv-x="1792" 
+d="M836 367l-15 -368l-2 -22l-420 29q-36 3 -67 31.5t-47 65.5q-11 27 -14.5 55t4 65t12 55t21.5 64t19 53q78 -12 509 -28zM449 953l180 -379l-147 92q-63 -72 -111.5 -144.5t-72.5 -125t-39.5 -94.5t-18.5 -63l-4 -21l-190 357q-17 26 -18 56t6 47l8 18q35 63 114 188
+l-140 86zM1680 436l-188 -359q-12 -29 -36.5 -46.5t-43.5 -20.5l-18 -4q-71 -7 -219 -12l8 -164l-230 367l211 362l7 -173q170 -16 283 -5t170 33zM895 1360q-47 -63 -265 -435l-317 187l-19 12l225 356q20 31 60 45t80 10q24 -2 48.5 -12t42 -21t41.5 -33t36 -34.5
+t36 -39.5t32 -35zM1550 1053l212 -363q18 -37 12.5 -76t-27.5 -74q-13 -20 -33 -37t-38 -28t-48.5 -22t-47 -16t-51.5 -14t-46 -12q-34 72 -265 436l313 195zM1407 1279l142 83l-220 -373l-419 20l151 86q-34 89 -75 166t-75.5 123.5t-64.5 80t-47 46.5l-17 13l405 -1
+q31 3 58 -10.5t39 -28.5l11 -15q39 -61 112 -190z" />
+    <glyph glyph-name="_414" unicode="&#xf1b9;" horiz-adv-x="2048" 
+d="M480 448q0 66 -47 113t-113 47t-113 -47t-47 -113t47 -113t113 -47t113 47t47 113zM516 768h1016l-89 357q-2 8 -14 17.5t-21 9.5h-768q-9 0 -21 -9.5t-14 -17.5zM1888 448q0 66 -47 113t-113 47t-113 -47t-47 -113t47 -113t113 -47t113 47t47 113zM2048 544v-384
+q0 -14 -9 -23t-23 -9h-96v-128q0 -80 -56 -136t-136 -56t-136 56t-56 136v128h-1024v-128q0 -80 -56 -136t-136 -56t-136 56t-56 136v128h-96q-14 0 -23 9t-9 23v384q0 93 65.5 158.5t158.5 65.5h28l105 419q23 94 104 157.5t179 63.5h768q98 0 179 -63.5t104 -157.5
+l105 -419h28q93 0 158.5 -65.5t65.5 -158.5z" />
+    <glyph glyph-name="_415" unicode="&#xf1ba;" horiz-adv-x="2048" 
+d="M1824 640q93 0 158.5 -65.5t65.5 -158.5v-384q0 -14 -9 -23t-23 -9h-96v-64q0 -80 -56 -136t-136 -56t-136 56t-56 136v64h-1024v-64q0 -80 -56 -136t-136 -56t-136 56t-56 136v64h-96q-14 0 -23 9t-9 23v384q0 93 65.5 158.5t158.5 65.5h28l105 419q23 94 104 157.5
+t179 63.5h128v224q0 14 9 23t23 9h448q14 0 23 -9t9 -23v-224h128q98 0 179 -63.5t104 -157.5l105 -419h28zM320 160q66 0 113 47t47 113t-47 113t-113 47t-113 -47t-47 -113t47 -113t113 -47zM516 640h1016l-89 357q-2 8 -14 17.5t-21 9.5h-768q-9 0 -21 -9.5t-14 -17.5z
+M1728 160q66 0 113 47t47 113t-47 113t-113 47t-113 -47t-47 -113t47 -113t113 -47z" />
+    <glyph glyph-name="_416" unicode="&#xf1bb;" 
+d="M1504 64q0 -26 -19 -45t-45 -19h-462q1 -17 6 -87.5t5 -108.5q0 -25 -18 -42.5t-43 -17.5h-320q-25 0 -43 17.5t-18 42.5q0 38 5 108.5t6 87.5h-462q-26 0 -45 19t-19 45t19 45l402 403h-229q-26 0 -45 19t-19 45t19 45l402 403h-197q-26 0 -45 19t-19 45t19 45l384 384
+q19 19 45 19t45 -19l384 -384q19 -19 19 -45t-19 -45t-45 -19h-197l402 -403q19 -19 19 -45t-19 -45t-45 -19h-229l402 -403q19 -19 19 -45z" />
+    <glyph glyph-name="_417" unicode="&#xf1bc;" 
+d="M1127 326q0 32 -30 51q-193 115 -447 115q-133 0 -287 -34q-42 -9 -42 -52q0 -20 13.5 -34.5t35.5 -14.5q5 0 37 8q132 27 243 27q226 0 397 -103q19 -11 33 -11q19 0 33 13.5t14 34.5zM1223 541q0 40 -35 61q-237 141 -548 141q-153 0 -303 -42q-48 -13 -48 -64
+q0 -25 17.5 -42.5t42.5 -17.5q7 0 37 8q122 33 251 33q279 0 488 -124q24 -13 38 -13q25 0 42.5 17.5t17.5 42.5zM1331 789q0 47 -40 70q-126 73 -293 110.5t-343 37.5q-204 0 -364 -47q-23 -7 -38.5 -25.5t-15.5 -48.5q0 -31 20.5 -52t51.5 -21q11 0 40 8q133 37 307 37
+q159 0 309.5 -34t253.5 -95q21 -12 40 -12q29 0 50.5 20.5t21.5 51.5zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="_418" unicode="&#xf1bd;" horiz-adv-x="1024" 
+d="M1024 1233l-303 -582l24 -31h279v-415h-507l-44 -30l-142 -273l-30 -30h-301v303l303 583l-24 30h-279v415h507l44 30l142 273l30 30h301v-303z" />
+    <glyph glyph-name="_419" unicode="&#xf1be;" horiz-adv-x="2304" 
+d="M784 164l16 241l-16 523q-1 10 -7.5 17t-16.5 7q-9 0 -16 -7t-7 -17l-14 -523l14 -241q1 -10 7.5 -16.5t15.5 -6.5q22 0 24 23zM1080 193l11 211l-12 586q0 16 -13 24q-8 5 -16 5t-16 -5q-13 -8 -13 -24l-1 -6l-10 -579q0 -1 11 -236v-1q0 -10 6 -17q9 -11 23 -11
+q11 0 20 9q9 7 9 20zM35 533l20 -128l-20 -126q-2 -9 -9 -9t-9 9l-17 126l17 128q2 9 9 9t9 -9zM121 612l26 -207l-26 -203q-2 -9 -10 -9q-9 0 -9 10l-23 202l23 207q0 9 9 9q8 0 10 -9zM401 159zM213 650l25 -245l-25 -237q0 -11 -11 -11q-10 0 -12 11l-21 237l21 245
+q2 12 12 12q11 0 11 -12zM307 657l23 -252l-23 -244q-2 -13 -14 -13q-13 0 -13 13l-21 244l21 252q0 13 13 13q12 0 14 -13zM401 639l21 -234l-21 -246q-2 -16 -16 -16q-6 0 -10.5 4.5t-4.5 11.5l-20 246l20 234q0 6 4.5 10.5t10.5 4.5q14 0 16 -15zM784 164zM495 785
+l21 -380l-21 -246q0 -7 -5 -12.5t-12 -5.5q-16 0 -18 18l-18 246l18 380q2 18 18 18q7 0 12 -5.5t5 -12.5zM589 871l19 -468l-19 -244q0 -8 -5.5 -13.5t-13.5 -5.5q-18 0 -20 19l-16 244l16 468q2 19 20 19q8 0 13.5 -5.5t5.5 -13.5zM687 911l18 -506l-18 -242
+q-2 -21 -22 -21q-19 0 -21 21l-16 242l16 506q0 9 6.5 15.5t14.5 6.5q9 0 15 -6.5t7 -15.5zM1079 169v0v0v0zM881 915l15 -510l-15 -239q0 -10 -7.5 -17.5t-17.5 -7.5t-17 7t-8 18l-14 239l14 510q0 11 7.5 18t17.5 7t17.5 -7t7.5 -18zM980 896l14 -492l-14 -236
+q0 -11 -8 -19t-19 -8t-19 8t-9 19l-12 236l12 492q1 12 9 20t19 8t18.5 -8t8.5 -20zM1192 404l-14 -231v0q0 -13 -9 -22t-22 -9t-22 9t-10 22l-6 114l-6 117l12 636v3q2 15 12 24q9 7 20 7q8 0 15 -5q14 -8 16 -26zM2304 423q0 -117 -83 -199.5t-200 -82.5h-786
+q-13 2 -22 11t-9 22v899q0 23 28 33q85 34 181 34q195 0 338 -131.5t160 -323.5q53 22 110 22q117 0 200 -83t83 -201z" />
+    <glyph glyph-name="uniF1C0" unicode="&#xf1c0;" 
+d="M768 768q237 0 443 43t325 127v-170q0 -69 -103 -128t-280 -93.5t-385 -34.5t-385 34.5t-280 93.5t-103 128v170q119 -84 325 -127t443 -43zM768 0q237 0 443 43t325 127v-170q0 -69 -103 -128t-280 -93.5t-385 -34.5t-385 34.5t-280 93.5t-103 128v170q119 -84 325 -127
+t443 -43zM768 384q237 0 443 43t325 127v-170q0 -69 -103 -128t-280 -93.5t-385 -34.5t-385 34.5t-280 93.5t-103 128v170q119 -84 325 -127t443 -43zM768 1536q208 0 385 -34.5t280 -93.5t103 -128v-128q0 -69 -103 -128t-280 -93.5t-385 -34.5t-385 34.5t-280 93.5
+t-103 128v128q0 69 103 128t280 93.5t385 34.5z" />
+    <glyph glyph-name="uniF1C1" unicode="&#xf1c1;" 
+d="M1468 1156q28 -28 48 -76t20 -88v-1152q0 -40 -28 -68t-68 -28h-1344q-40 0 -68 28t-28 68v1600q0 40 28 68t68 28h896q40 0 88 -20t76 -48zM1024 1400v-376h376q-10 29 -22 41l-313 313q-12 12 -41 22zM1408 -128v1024h-416q-40 0 -68 28t-28 68v416h-768v-1536h1280z
+M894 465q33 -26 84 -56q59 7 117 7q147 0 177 -49q16 -22 2 -52q0 -1 -1 -2l-2 -2v-1q-6 -38 -71 -38q-48 0 -115 20t-130 53q-221 -24 -392 -83q-153 -262 -242 -262q-15 0 -28 7l-24 12q-1 1 -6 5q-10 10 -6 36q9 40 56 91.5t132 96.5q14 9 23 -6q2 -2 2 -4q52 85 107 197
+q68 136 104 262q-24 82 -30.5 159.5t6.5 127.5q11 40 42 40h21h1q23 0 35 -15q18 -21 9 -68q-2 -6 -4 -8q1 -3 1 -8v-30q-2 -123 -14 -192q55 -164 146 -238zM318 54q52 24 137 158q-51 -40 -87.5 -84t-49.5 -74zM716 974q-15 -42 -2 -132q1 7 7 44q0 3 7 43q1 4 4 8
+q-1 1 -1 2q-1 2 -1 3q-1 22 -13 36q0 -1 -1 -2v-2zM592 313q135 54 284 81q-2 1 -13 9.5t-16 13.5q-76 67 -127 176q-27 -86 -83 -197q-30 -56 -45 -83zM1238 329q-24 24 -140 24q76 -28 124 -28q14 0 18 1q0 1 -2 3z" />
+    <glyph glyph-name="_422" unicode="&#xf1c2;" 
+d="M1468 1156q28 -28 48 -76t20 -88v-1152q0 -40 -28 -68t-68 -28h-1344q-40 0 -68 28t-28 68v1600q0 40 28 68t68 28h896q40 0 88 -20t76 -48zM1024 1400v-376h376q-10 29 -22 41l-313 313q-12 12 -41 22zM1408 -128v1024h-416q-40 0 -68 28t-28 68v416h-768v-1536h1280z
+M233 768v-107h70l164 -661h159l128 485q7 20 10 46q2 16 2 24h4l3 -24q1 -3 3.5 -20t5.5 -26l128 -485h159l164 661h70v107h-300v-107h90l-99 -438q-5 -20 -7 -46l-2 -21h-4q0 3 -0.5 6.5t-1.5 8t-1 6.5q-1 5 -4 21t-5 25l-144 545h-114l-144 -545q-2 -9 -4.5 -24.5
+t-3.5 -21.5l-4 -21h-4l-2 21q-2 26 -7 46l-99 438h90v107h-300z" />
+    <glyph glyph-name="_423" unicode="&#xf1c3;" 
+d="M1468 1156q28 -28 48 -76t20 -88v-1152q0 -40 -28 -68t-68 -28h-1344q-40 0 -68 28t-28 68v1600q0 40 28 68t68 28h896q40 0 88 -20t76 -48zM1024 1400v-376h376q-10 29 -22 41l-313 313q-12 12 -41 22zM1408 -128v1024h-416q-40 0 -68 28t-28 68v416h-768v-1536h1280z
+M429 106v-106h281v106h-75l103 161q5 7 10 16.5t7.5 13.5t3.5 4h2q1 -4 5 -10q2 -4 4.5 -7.5t6 -8t6.5 -8.5l107 -161h-76v-106h291v106h-68l-192 273l195 282h67v107h-279v-107h74l-103 -159q-4 -7 -10 -16.5t-9 -13.5l-2 -3h-2q-1 4 -5 10q-6 11 -17 23l-106 159h76v107
+h-290v-107h68l189 -272l-194 -283h-68z" />
+    <glyph glyph-name="_424" unicode="&#xf1c4;" 
+d="M1468 1156q28 -28 48 -76t20 -88v-1152q0 -40 -28 -68t-68 -28h-1344q-40 0 -68 28t-28 68v1600q0 40 28 68t68 28h896q40 0 88 -20t76 -48zM1024 1400v-376h376q-10 29 -22 41l-313 313q-12 12 -41 22zM1408 -128v1024h-416q-40 0 -68 28t-28 68v416h-768v-1536h1280z
+M416 106v-106h327v106h-93v167h137q76 0 118 15q67 23 106.5 87t39.5 146q0 81 -37 141t-100 87q-48 19 -130 19h-368v-107h92v-555h-92zM769 386h-119v268h120q52 0 83 -18q56 -33 56 -115q0 -89 -62 -120q-31 -15 -78 -15z" />
+    <glyph glyph-name="_425" unicode="&#xf1c5;" 
+d="M1468 1156q28 -28 48 -76t20 -88v-1152q0 -40 -28 -68t-68 -28h-1344q-40 0 -68 28t-28 68v1600q0 40 28 68t68 28h896q40 0 88 -20t76 -48zM1024 1400v-376h376q-10 29 -22 41l-313 313q-12 12 -41 22zM1408 -128v1024h-416q-40 0 -68 28t-28 68v416h-768v-1536h1280z
+M1280 320v-320h-1024v192l192 192l128 -128l384 384zM448 512q-80 0 -136 56t-56 136t56 136t136 56t136 -56t56 -136t-56 -136t-136 -56z" />
+    <glyph glyph-name="_426" unicode="&#xf1c6;" 
+d="M640 1152v128h-128v-128h128zM768 1024v128h-128v-128h128zM640 896v128h-128v-128h128zM768 768v128h-128v-128h128zM1468 1156q28 -28 48 -76t20 -88v-1152q0 -40 -28 -68t-68 -28h-1344q-40 0 -68 28t-28 68v1600q0 40 28 68t68 28h896q40 0 88 -20t76 -48zM1024 1400
+v-376h376q-10 29 -22 41l-313 313q-12 12 -41 22zM1408 -128v1024h-416q-40 0 -68 28t-28 68v416h-128v-128h-128v128h-512v-1536h1280zM781 593l107 -349q8 -27 8 -52q0 -83 -72.5 -137.5t-183.5 -54.5t-183.5 54.5t-72.5 137.5q0 25 8 52q21 63 120 396v128h128v-128h79
+q22 0 39 -13t23 -34zM640 128q53 0 90.5 19t37.5 45t-37.5 45t-90.5 19t-90.5 -19t-37.5 -45t37.5 -45t90.5 -19z" />
+    <glyph glyph-name="_427" unicode="&#xf1c7;" 
+d="M1468 1156q28 -28 48 -76t20 -88v-1152q0 -40 -28 -68t-68 -28h-1344q-40 0 -68 28t-28 68v1600q0 40 28 68t68 28h896q40 0 88 -20t76 -48zM1024 1400v-376h376q-10 29 -22 41l-313 313q-12 12 -41 22zM1408 -128v1024h-416q-40 0 -68 28t-28 68v416h-768v-1536h1280z
+M620 686q20 -8 20 -30v-544q0 -22 -20 -30q-8 -2 -12 -2q-12 0 -23 9l-166 167h-131q-14 0 -23 9t-9 23v192q0 14 9 23t23 9h131l166 167q16 15 35 7zM1037 -3q31 0 50 24q129 159 129 363t-129 363q-16 21 -43 24t-47 -14q-21 -17 -23.5 -43.5t14.5 -47.5
+q100 -123 100 -282t-100 -282q-17 -21 -14.5 -47.5t23.5 -42.5q18 -15 40 -15zM826 145q27 0 47 20q87 93 87 219t-87 219q-18 19 -45 20t-46 -17t-20 -44.5t18 -46.5q52 -57 52 -131t-52 -131q-19 -20 -18 -46.5t20 -44.5q20 -17 44 -17z" />
+    <glyph glyph-name="_428" unicode="&#xf1c8;" 
+d="M1468 1156q28 -28 48 -76t20 -88v-1152q0 -40 -28 -68t-68 -28h-1344q-40 0 -68 28t-28 68v1600q0 40 28 68t68 28h896q40 0 88 -20t76 -48zM1024 1400v-376h376q-10 29 -22 41l-313 313q-12 12 -41 22zM1408 -128v1024h-416q-40 0 -68 28t-28 68v416h-768v-1536h1280z
+M768 768q52 0 90 -38t38 -90v-384q0 -52 -38 -90t-90 -38h-384q-52 0 -90 38t-38 90v384q0 52 38 90t90 38h384zM1260 766q20 -8 20 -30v-576q0 -22 -20 -30q-8 -2 -12 -2q-14 0 -23 9l-265 266v90l265 266q9 9 23 9q4 0 12 -2z" />
+    <glyph glyph-name="_429" unicode="&#xf1c9;" 
+d="M1468 1156q28 -28 48 -76t20 -88v-1152q0 -40 -28 -68t-68 -28h-1344q-40 0 -68 28t-28 68v1600q0 40 28 68t68 28h896q40 0 88 -20t76 -48zM1024 1400v-376h376q-10 29 -22 41l-313 313q-12 12 -41 22zM1408 -128v1024h-416q-40 0 -68 28t-28 68v416h-768v-1536h1280z
+M480 768q8 11 21 12.5t24 -6.5l51 -38q11 -8 12.5 -21t-6.5 -24l-182 -243l182 -243q8 -11 6.5 -24t-12.5 -21l-51 -38q-11 -8 -24 -6.5t-21 12.5l-226 301q-14 19 0 38zM1282 467q14 -19 0 -38l-226 -301q-8 -11 -21 -12.5t-24 6.5l-51 38q-11 8 -12.5 21t6.5 24l182 243
+l-182 243q-8 11 -6.5 24t12.5 21l51 38q11 8 24 6.5t21 -12.5zM662 6q-13 2 -20.5 13t-5.5 24l138 831q2 13 13 20.5t24 5.5l63 -10q13 -2 20.5 -13t5.5 -24l-138 -831q-2 -13 -13 -20.5t-24 -5.5z" />
+    <glyph glyph-name="_430" unicode="&#xf1ca;" 
+d="M1497 709v-198q-101 -23 -198 -23q-65 -136 -165.5 -271t-181.5 -215.5t-128 -106.5q-80 -45 -162 3q-28 17 -60.5 43.5t-85 83.5t-102.5 128.5t-107.5 184t-105.5 244t-91.5 314.5t-70.5 390h283q26 -218 70 -398.5t104.5 -317t121.5 -235.5t140 -195q169 169 287 406
+q-142 72 -223 220t-81 333q0 192 104 314.5t284 122.5q178 0 273 -105.5t95 -297.5q0 -159 -58 -286q-7 -1 -19.5 -3t-46 -2t-63 6t-62 25.5t-50.5 51.5q31 103 31 184q0 87 -29 132t-79 45q-53 0 -85 -49.5t-32 -140.5q0 -186 105 -293.5t267 -107.5q62 0 121 14z" />
+    <glyph glyph-name="_431" unicode="&#xf1cb;" horiz-adv-x="1792" 
+d="M216 367l603 -402v359l-334 223zM154 511l193 129l-193 129v-258zM973 -35l603 402l-269 180l-334 -223v-359zM896 458l272 182l-272 182l-272 -182zM485 733l334 223v359l-603 -402zM1445 640l193 -129v258zM1307 733l269 180l-603 402v-359zM1792 913v-546
+q0 -41 -34 -64l-819 -546q-21 -13 -43 -13t-43 13l-819 546q-34 23 -34 64v546q0 41 34 64l819 546q21 13 43 13t43 -13l819 -546q34 -23 34 -64z" />
+    <glyph glyph-name="_432" unicode="&#xf1cc;" horiz-adv-x="2048" 
+d="M1800 764q111 -46 179.5 -145.5t68.5 -221.5q0 -164 -118 -280.5t-285 -116.5q-4 0 -11.5 0.5t-10.5 0.5h-1209h-1h-2h-5q-170 10 -288 125.5t-118 280.5q0 110 55 203t147 147q-12 39 -12 82q0 115 82 196t199 81q95 0 172 -58q75 154 222.5 248t326.5 94
+q166 0 306 -80.5t221.5 -218.5t81.5 -301q0 -6 -0.5 -18t-0.5 -18zM468 498q0 -122 84 -193t208 -71q137 0 240 99q-16 20 -47.5 56.5t-43.5 50.5q-67 -65 -144 -65q-55 0 -93.5 33.5t-38.5 87.5q0 53 38.5 87t91.5 34q44 0 84.5 -21t73 -55t65 -75t69 -82t77 -75t97 -55
+t121.5 -21q121 0 204.5 71.5t83.5 190.5q0 121 -84 192t-207 71q-143 0 -241 -97l93 -108q66 64 142 64q52 0 92 -33t40 -84q0 -57 -37 -91.5t-94 -34.5q-43 0 -82.5 21t-72 55t-65.5 75t-69.5 82t-77.5 75t-96.5 55t-118.5 21q-122 0 -207 -70.5t-85 -189.5z" />
+    <glyph glyph-name="_433" unicode="&#xf1cd;" horiz-adv-x="1792" 
+d="M896 1536q182 0 348 -71t286 -191t191 -286t71 -348t-71 -348t-191 -286t-286 -191t-348 -71t-348 71t-286 191t-191 286t-71 348t71 348t191 286t286 191t348 71zM896 1408q-190 0 -361 -90l194 -194q82 28 167 28t167 -28l194 194q-171 90 -361 90zM218 279l194 194
+q-28 82 -28 167t28 167l-194 194q-90 -171 -90 -361t90 -361zM896 -128q190 0 361 90l-194 194q-82 -28 -167 -28t-167 28l-194 -194q171 -90 361 -90zM896 256q159 0 271.5 112.5t112.5 271.5t-112.5 271.5t-271.5 112.5t-271.5 -112.5t-112.5 -271.5t112.5 -271.5
+t271.5 -112.5zM1380 473l194 -194q90 171 90 361t-90 361l-194 -194q28 -82 28 -167t-28 -167z" />
+    <glyph glyph-name="_434" unicode="&#xf1ce;" horiz-adv-x="1792" 
+d="M1760 640q0 -176 -68.5 -336t-184 -275.5t-275.5 -184t-336 -68.5t-336 68.5t-275.5 184t-184 275.5t-68.5 336q0 213 97 398.5t265 305.5t374 151v-228q-221 -45 -366.5 -221t-145.5 -406q0 -130 51 -248.5t136.5 -204t204 -136.5t248.5 -51t248.5 51t204 136.5
+t136.5 204t51 248.5q0 230 -145.5 406t-366.5 221v228q206 -31 374 -151t265 -305.5t97 -398.5z" />
+    <glyph glyph-name="uniF1D0" unicode="&#xf1d0;" horiz-adv-x="1792" 
+d="M19 662q8 217 116 406t305 318h5q0 -1 -1 -3q-8 -8 -28 -33.5t-52 -76.5t-60 -110.5t-44.5 -135.5t-14 -150.5t39 -157.5t108.5 -154q50 -50 102 -69.5t90.5 -11.5t69.5 23.5t47 32.5l16 16q39 51 53 116.5t6.5 122.5t-21 107t-26.5 80l-14 29q-10 25 -30.5 49.5t-43 41
+t-43.5 29.5t-35 19l-13 6l104 115q39 -17 78 -52t59 -61l19 -27q1 48 -18.5 103.5t-40.5 87.5l-20 31l161 183l160 -181q-33 -46 -52.5 -102.5t-22.5 -90.5l-4 -33q22 37 61.5 72.5t67.5 52.5l28 17l103 -115q-44 -14 -85 -50t-60 -65l-19 -29q-31 -56 -48 -133.5t-7 -170
+t57 -156.5q33 -45 77.5 -60.5t85 -5.5t76 26.5t57.5 33.5l21 16q60 53 96.5 115t48.5 121.5t10 121.5t-18 118t-37 107.5t-45.5 93t-45 72t-34.5 47.5l-13 17q-14 13 -7 13l10 -3q40 -29 62.5 -46t62 -50t64 -58t58.5 -65t55.5 -77t45.5 -88t38 -103t23.5 -117t10.5 -136
+q3 -259 -108 -465t-312 -321t-456 -115q-185 0 -351 74t-283.5 198t-184 293t-60.5 353z" />
+    <glyph glyph-name="uniF1D1" unicode="&#xf1d1;" horiz-adv-x="1792" 
+d="M874 -102v-66q-208 6 -385 109.5t-283 275.5l58 34q29 -49 73 -99l65 57q148 -168 368 -212l-17 -86q65 -12 121 -13zM276 428l-83 -28q22 -60 49 -112l-57 -33q-98 180 -98 385t98 385l57 -33q-30 -56 -49 -112l82 -28q-35 -100 -35 -212q0 -109 36 -212zM1528 251
+l58 -34q-106 -172 -283 -275.5t-385 -109.5v66q56 1 121 13l-17 86q220 44 368 212l65 -57q44 50 73 99zM1377 805l-233 -80q14 -42 14 -85t-14 -85l232 -80q-31 -92 -98 -169l-185 162q-57 -67 -147 -85l48 -241q-52 -10 -98 -10t-98 10l48 241q-90 18 -147 85l-185 -162
+q-67 77 -98 169l232 80q-14 42 -14 85t14 85l-233 80q33 93 99 169l185 -162q59 68 147 86l-48 240q44 10 98 10t98 -10l-48 -240q88 -18 147 -86l185 162q66 -76 99 -169zM874 1448v-66q-65 -2 -121 -13l17 -86q-220 -42 -368 -211l-65 56q-38 -42 -73 -98l-57 33
+q106 172 282 275.5t385 109.5zM1705 640q0 -205 -98 -385l-57 33q27 52 49 112l-83 28q36 103 36 212q0 112 -35 212l82 28q-19 56 -49 112l57 33q98 -180 98 -385zM1585 1063l-57 -33q-35 56 -73 98l-65 -56q-148 169 -368 211l17 86q-56 11 -121 13v66q209 -6 385 -109.5
+t282 -275.5zM1748 640q0 173 -67.5 331t-181.5 272t-272 181.5t-331 67.5t-331 -67.5t-272 -181.5t-181.5 -272t-67.5 -331t67.5 -331t181.5 -272t272 -181.5t331 -67.5t331 67.5t272 181.5t181.5 272t67.5 331zM1792 640q0 -182 -71 -348t-191 -286t-286 -191t-348 -71
+t-348 71t-286 191t-191 286t-71 348t71 348t191 286t286 191t348 71t348 -71t286 -191t191 -286t71 -348z" />
+    <glyph glyph-name="uniF1D2" unicode="&#xf1d2;" 
+d="M582 228q0 -66 -93 -66q-107 0 -107 63q0 64 98 64q102 0 102 -61zM546 694q0 -85 -74 -85q-77 0 -77 84q0 90 77 90q36 0 55 -25.5t19 -63.5zM712 769v125q-78 -29 -135 -29q-50 29 -110 29q-86 0 -145 -57t-59 -143q0 -50 29.5 -102t73.5 -67v-3q-38 -17 -38 -85
+q0 -53 41 -77v-3q-113 -37 -113 -139q0 -45 20 -78.5t54 -51t72 -25.5t81 -8q224 0 224 188q0 67 -48 99t-126 46q-27 5 -51.5 20.5t-24.5 39.5q0 44 49 52q77 15 122 70t45 134q0 24 -10 52q37 9 49 13zM771 350h137q-2 27 -2 82v387q0 46 2 69h-137q3 -23 3 -71v-392
+q0 -50 -3 -75zM1280 366v121q-30 -21 -68 -21q-53 0 -53 82v225h52q9 0 26.5 -1t26.5 -1v117h-105q0 82 3 102h-140q4 -24 4 -55v-47h-60v-117q36 3 37 3q3 0 11 -0.5t12 -0.5v-2h-2v-217q0 -37 2.5 -64t11.5 -56.5t24.5 -48.5t43.5 -31t66 -12q64 0 108 24zM924 1072
+q0 36 -24 63.5t-60 27.5t-60.5 -27t-24.5 -64q0 -36 25 -62.5t60 -26.5t59.5 27t24.5 62zM1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="_438" unicode="&#xf1d3;" horiz-adv-x="1792" 
+d="M595 22q0 100 -165 100q-158 0 -158 -104q0 -101 172 -101q151 0 151 105zM536 777q0 61 -30 102t-89 41q-124 0 -124 -145q0 -135 124 -135q119 0 119 137zM805 1101v-202q-36 -12 -79 -22q16 -43 16 -84q0 -127 -73 -216.5t-197 -112.5q-40 -8 -59.5 -27t-19.5 -58
+q0 -31 22.5 -51.5t58 -32t78.5 -22t86 -25.5t78.5 -37.5t58 -64t22.5 -98.5q0 -304 -363 -304q-69 0 -130 12.5t-116 41t-87.5 82t-32.5 127.5q0 165 182 225v4q-67 41 -67 126q0 109 63 137v4q-72 24 -119.5 108.5t-47.5 165.5q0 139 95 231.5t235 92.5q96 0 178 -47
+q98 0 218 47zM1123 220h-222q4 45 4 134v609q0 94 -4 128h222q-4 -33 -4 -124v-613q0 -89 4 -134zM1724 442v-196q-71 -39 -174 -39q-62 0 -107 20t-70 50t-39.5 78t-18.5 92t-4 103v351h2v4q-7 0 -19 1t-18 1q-21 0 -59 -6v190h96v76q0 54 -6 89h227q-6 -41 -6 -165h171
+v-190q-15 0 -43.5 2t-42.5 2h-85v-365q0 -131 87 -131q61 0 109 33zM1148 1389q0 -58 -39 -101.5t-96 -43.5q-58 0 -98 43.5t-40 101.5q0 59 39.5 103t98.5 44q58 0 96.5 -44.5t38.5 -102.5z" />
+    <glyph glyph-name="_439" unicode="&#xf1d4;" 
+d="M809 532l266 499h-112l-157 -312q-24 -48 -44 -92l-42 92l-155 312h-120l263 -493v-324h101v318zM1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="uniF1D5" unicode="&#xf1d5;" horiz-adv-x="1280" 
+d="M842 964q0 -80 -57 -136.5t-136 -56.5q-60 0 -111 35q-62 -67 -115 -146q-247 -371 -202 -859q1 -22 -12.5 -38.5t-34.5 -18.5h-5q-20 0 -35 13.5t-17 33.5q-14 126 -3.5 247.5t29.5 217t54 186t69 155.5t74 125q61 90 132 165q-16 35 -16 77q0 80 56.5 136.5t136.5 56.5
+t136.5 -56.5t56.5 -136.5zM1223 953q0 -158 -78 -292t-212.5 -212t-292.5 -78q-64 0 -131 14q-21 5 -32.5 23.5t-6.5 39.5q5 20 23 31.5t39 7.5q51 -13 108 -13q97 0 186 38t153 102t102 153t38 186t-38 186t-102 153t-153 102t-186 38t-186 -38t-153 -102t-102 -153
+t-38 -186q0 -114 52 -218q10 -20 3.5 -40t-25.5 -30t-39.5 -3t-30.5 26q-64 123 -64 265q0 119 46.5 227t124.5 186t186 124t226 46q158 0 292.5 -78t212.5 -212.5t78 -292.5z" />
+    <glyph glyph-name="uniF1D6" unicode="&#xf1d6;" horiz-adv-x="1792" 
+d="M270 730q-8 19 -8 52q0 20 11 49t24 45q-1 22 7.5 53t22.5 43q0 139 92.5 288.5t217.5 209.5q139 66 324 66q133 0 266 -55q49 -21 90 -48t71 -56t55 -68t42 -74t32.5 -84.5t25.5 -89.5t22 -98l1 -5q55 -83 55 -150q0 -14 -9 -40t-9 -38q0 -1 1.5 -3.5t3.5 -5t2 -3.5
+q77 -114 120.5 -214.5t43.5 -208.5q0 -43 -19.5 -100t-55.5 -57q-9 0 -19.5 7.5t-19 17.5t-19 26t-16 26.5t-13.5 26t-9 17.5q-1 1 -3 1l-5 -4q-59 -154 -132 -223q20 -20 61.5 -38.5t69 -41.5t35.5 -65q-2 -4 -4 -16t-7 -18q-64 -97 -302 -97q-53 0 -110.5 9t-98 20
+t-104.5 30q-15 5 -23 7q-14 4 -46 4.5t-40 1.5q-41 -45 -127.5 -65t-168.5 -20q-35 0 -69 1.5t-93 9t-101 20.5t-74.5 40t-32.5 64q0 40 10 59.5t41 48.5q11 2 40.5 13t49.5 12q4 0 14 2q2 2 2 4l-2 3q-48 11 -108 105.5t-73 156.5l-5 3q-4 0 -12 -20q-18 -41 -54.5 -74.5
+t-77.5 -37.5h-1q-4 0 -6 4.5t-5 5.5q-23 54 -23 100q0 275 252 466z" />
+    <glyph glyph-name="uniF1D7" unicode="&#xf1d7;" horiz-adv-x="2048" 
+d="M580 1075q0 41 -25 66t-66 25q-43 0 -76 -25.5t-33 -65.5q0 -39 33 -64.5t76 -25.5q41 0 66 24.5t25 65.5zM1323 568q0 28 -25.5 50t-65.5 22q-27 0 -49.5 -22.5t-22.5 -49.5q0 -28 22.5 -50.5t49.5 -22.5q40 0 65.5 22t25.5 51zM1087 1075q0 41 -24.5 66t-65.5 25
+q-43 0 -76 -25.5t-33 -65.5q0 -39 33 -64.5t76 -25.5q41 0 65.5 24.5t24.5 65.5zM1722 568q0 28 -26 50t-65 22q-27 0 -49.5 -22.5t-22.5 -49.5q0 -28 22.5 -50.5t49.5 -22.5q39 0 65 22t26 51zM1456 965q-31 4 -70 4q-169 0 -311 -77t-223.5 -208.5t-81.5 -287.5
+q0 -78 23 -152q-35 -3 -68 -3q-26 0 -50 1.5t-55 6.5t-44.5 7t-54.5 10.5t-50 10.5l-253 -127l72 218q-290 203 -290 490q0 169 97.5 311t264 223.5t363.5 81.5q176 0 332.5 -66t262 -182.5t136.5 -260.5zM2048 404q0 -117 -68.5 -223.5t-185.5 -193.5l55 -181l-199 109
+q-150 -37 -218 -37q-169 0 -311 70.5t-223.5 191.5t-81.5 264t81.5 264t223.5 191.5t311 70.5q161 0 303 -70.5t227.5 -192t85.5 -263.5z" />
+    <glyph glyph-name="_443" unicode="&#xf1d8;" horiz-adv-x="1792" 
+d="M1764 1525q33 -24 27 -64l-256 -1536q-5 -29 -32 -45q-14 -8 -31 -8q-11 0 -24 5l-453 185l-242 -295q-18 -23 -49 -23q-13 0 -22 4q-19 7 -30.5 23.5t-11.5 36.5v349l864 1059l-1069 -925l-395 162q-37 14 -40 55q-2 40 32 59l1664 960q15 9 32 9q20 0 36 -11z" />
+    <glyph glyph-name="_444" unicode="&#xf1d9;" horiz-adv-x="1792" 
+d="M1764 1525q33 -24 27 -64l-256 -1536q-5 -29 -32 -45q-14 -8 -31 -8q-11 0 -24 5l-527 215l-298 -327q-18 -21 -47 -21q-14 0 -23 4q-19 7 -30 23.5t-11 36.5v452l-472 193q-37 14 -40 55q-3 39 32 59l1664 960q35 21 68 -2zM1422 26l221 1323l-1434 -827l336 -137
+l863 639l-478 -797z" />
+    <glyph glyph-name="_445" unicode="&#xf1da;" 
+d="M1536 640q0 -156 -61 -298t-164 -245t-245 -164t-298 -61q-172 0 -327 72.5t-264 204.5q-7 10 -6.5 22.5t8.5 20.5l137 138q10 9 25 9q16 -2 23 -12q73 -95 179 -147t225 -52q104 0 198.5 40.5t163.5 109.5t109.5 163.5t40.5 198.5t-40.5 198.5t-109.5 163.5
+t-163.5 109.5t-198.5 40.5q-98 0 -188 -35.5t-160 -101.5l137 -138q31 -30 14 -69q-17 -40 -59 -40h-448q-26 0 -45 19t-19 45v448q0 42 40 59q39 17 69 -14l130 -129q107 101 244.5 156.5t284.5 55.5q156 0 298 -61t245 -164t164 -245t61 -298zM896 928v-448q0 -14 -9 -23
+t-23 -9h-320q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h224v352q0 14 9 23t23 9h64q14 0 23 -9t9 -23z" />
+    <glyph glyph-name="_446" unicode="&#xf1db;" 
+d="M768 1280q-130 0 -248.5 -51t-204 -136.5t-136.5 -204t-51 -248.5t51 -248.5t136.5 -204t204 -136.5t248.5 -51t248.5 51t204 136.5t136.5 204t51 248.5t-51 248.5t-136.5 204t-204 136.5t-248.5 51zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103
+t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="_447" unicode="&#xf1dc;" horiz-adv-x="1792" 
+d="M1682 -128q-44 0 -132.5 3.5t-133.5 3.5q-44 0 -132 -3.5t-132 -3.5q-24 0 -37 20.5t-13 45.5q0 31 17 46t39 17t51 7t45 15q33 21 33 140l-1 391q0 21 -1 31q-13 4 -50 4h-675q-38 0 -51 -4q-1 -10 -1 -31l-1 -371q0 -142 37 -164q16 -10 48 -13t57 -3.5t45 -15
+t20 -45.5q0 -26 -12.5 -48t-36.5 -22q-47 0 -139.5 3.5t-138.5 3.5q-43 0 -128 -3.5t-127 -3.5q-23 0 -35.5 21t-12.5 45q0 30 15.5 45t36 17.5t47.5 7.5t42 15q33 23 33 143l-1 57v813q0 3 0.5 26t0 36.5t-1.5 38.5t-3.5 42t-6.5 36.5t-11 31.5t-16 18q-15 10 -45 12t-53 2
+t-41 14t-18 45q0 26 12 48t36 22q46 0 138.5 -3.5t138.5 -3.5q42 0 126.5 3.5t126.5 3.5q25 0 37.5 -22t12.5 -48q0 -30 -17 -43.5t-38.5 -14.5t-49.5 -4t-43 -13q-35 -21 -35 -160l1 -320q0 -21 1 -32q13 -3 39 -3h699q25 0 38 3q1 11 1 32l1 320q0 139 -35 160
+q-18 11 -58.5 12.5t-66 13t-25.5 49.5q0 26 12.5 48t37.5 22q44 0 132 -3.5t132 -3.5q43 0 129 3.5t129 3.5q25 0 37.5 -22t12.5 -48q0 -30 -17.5 -44t-40 -14.5t-51.5 -3t-44 -12.5q-35 -23 -35 -161l1 -943q0 -119 34 -140q16 -10 46 -13.5t53.5 -4.5t41.5 -15.5t18 -44.5
+q0 -26 -12 -48t-36 -22z" />
+    <glyph glyph-name="_448" unicode="&#xf1dd;" horiz-adv-x="1280" 
+d="M1278 1347v-73q0 -29 -18.5 -61t-42.5 -32q-50 0 -54 -1q-26 -6 -32 -31q-3 -11 -3 -64v-1152q0 -25 -18 -43t-43 -18h-108q-25 0 -43 18t-18 43v1218h-143v-1218q0 -25 -17.5 -43t-43.5 -18h-108q-26 0 -43.5 18t-17.5 43v496q-147 12 -245 59q-126 58 -192 179
+q-64 117 -64 259q0 166 88 286q88 118 209 159q111 37 417 37h479q25 0 43 -18t18 -43z" />
+    <glyph glyph-name="_449" unicode="&#xf1de;" 
+d="M352 128v-128h-352v128h352zM704 256q26 0 45 -19t19 -45v-256q0 -26 -19 -45t-45 -19h-256q-26 0 -45 19t-19 45v256q0 26 19 45t45 19h256zM864 640v-128h-864v128h864zM224 1152v-128h-224v128h224zM1536 128v-128h-736v128h736zM576 1280q26 0 45 -19t19 -45v-256
+q0 -26 -19 -45t-45 -19h-256q-26 0 -45 19t-19 45v256q0 26 19 45t45 19h256zM1216 768q26 0 45 -19t19 -45v-256q0 -26 -19 -45t-45 -19h-256q-26 0 -45 19t-19 45v256q0 26 19 45t45 19h256zM1536 640v-128h-224v128h224zM1536 1152v-128h-864v128h864z" />
+    <glyph glyph-name="uniF1E0" unicode="&#xf1e0;" 
+d="M1216 512q133 0 226.5 -93.5t93.5 -226.5t-93.5 -226.5t-226.5 -93.5t-226.5 93.5t-93.5 226.5q0 12 2 34l-360 180q-92 -86 -218 -86q-133 0 -226.5 93.5t-93.5 226.5t93.5 226.5t226.5 93.5q126 0 218 -86l360 180q-2 22 -2 34q0 133 93.5 226.5t226.5 93.5
+t226.5 -93.5t93.5 -226.5t-93.5 -226.5t-226.5 -93.5q-126 0 -218 86l-360 -180q2 -22 2 -34t-2 -34l360 -180q92 86 218 86z" />
+    <glyph glyph-name="_451" unicode="&#xf1e1;" 
+d="M1280 341q0 88 -62.5 151t-150.5 63q-84 0 -145 -58l-241 120q2 16 2 23t-2 23l241 120q61 -58 145 -58q88 0 150.5 63t62.5 151t-62.5 150.5t-150.5 62.5t-151 -62.5t-63 -150.5q0 -7 2 -23l-241 -120q-62 57 -145 57q-88 0 -150.5 -62.5t-62.5 -150.5t62.5 -150.5
+t150.5 -62.5q83 0 145 57l241 -120q-2 -16 -2 -23q0 -88 63 -150.5t151 -62.5t150.5 62.5t62.5 150.5zM1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="_452" unicode="&#xf1e2;" horiz-adv-x="1792" 
+d="M571 947q-10 25 -34 35t-49 0q-108 -44 -191 -127t-127 -191q-10 -25 0 -49t35 -34q13 -5 24 -5q42 0 60 40q34 84 98.5 148.5t148.5 98.5q25 11 35 35t0 49zM1513 1303l46 -46l-244 -243l68 -68q19 -19 19 -45.5t-19 -45.5l-64 -64q89 -161 89 -343q0 -143 -55.5 -273.5
+t-150 -225t-225 -150t-273.5 -55.5t-273.5 55.5t-225 150t-150 225t-55.5 273.5t55.5 273.5t150 225t225 150t273.5 55.5q182 0 343 -89l64 64q19 19 45.5 19t45.5 -19l68 -68zM1521 1359q-10 -10 -22 -10q-13 0 -23 10l-91 90q-9 10 -9 23t9 23q10 9 23 9t23 -9l90 -91
+q10 -9 10 -22.5t-10 -22.5zM1751 1129q-11 -9 -23 -9t-23 9l-90 91q-10 9 -10 22.5t10 22.5q9 10 22.5 10t22.5 -10l91 -90q9 -10 9 -23t-9 -23zM1792 1312q0 -14 -9 -23t-23 -9h-96q-14 0 -23 9t-9 23t9 23t23 9h96q14 0 23 -9t9 -23zM1600 1504v-96q0 -14 -9 -23t-23 -9
+t-23 9t-9 23v96q0 14 9 23t23 9t23 -9t9 -23zM1751 1449l-91 -90q-10 -10 -22 -10q-13 0 -23 10q-10 9 -10 22.5t10 22.5l90 91q10 9 23 9t23 -9q9 -10 9 -23t-9 -23z" />
+    <glyph glyph-name="_453" unicode="&#xf1e3;" horiz-adv-x="1792" 
+d="M609 720l287 208l287 -208l-109 -336h-355zM896 1536q182 0 348 -71t286 -191t191 -286t71 -348t-71 -348t-191 -286t-286 -191t-348 -71t-348 71t-286 191t-191 286t-71 348t71 348t191 286t286 191t348 71zM1515 186q149 203 149 454v3l-102 -89l-240 224l63 323
+l134 -12q-150 206 -389 282l53 -124l-287 -159l-287 159l53 124q-239 -76 -389 -282l135 12l62 -323l-240 -224l-102 89v-3q0 -251 149 -454l30 132l326 -40l139 -298l-116 -69q117 -39 240 -39t240 39l-116 69l139 298l326 40z" />
+    <glyph glyph-name="_454" unicode="&#xf1e4;" horiz-adv-x="1792" 
+d="M448 224v-192q0 -14 -9 -23t-23 -9h-192q-14 0 -23 9t-9 23v192q0 14 9 23t23 9h192q14 0 23 -9t9 -23zM256 608v-192q0 -14 -9 -23t-23 -9h-192q-14 0 -23 9t-9 23v192q0 14 9 23t23 9h192q14 0 23 -9t9 -23zM832 224v-192q0 -14 -9 -23t-23 -9h-192q-14 0 -23 9t-9 23
+v192q0 14 9 23t23 9h192q14 0 23 -9t9 -23zM640 608v-192q0 -14 -9 -23t-23 -9h-192q-14 0 -23 9t-9 23v192q0 14 9 23t23 9h192q14 0 23 -9t9 -23zM66 768q-28 0 -47 19t-19 46v129h514v-129q0 -27 -19 -46t-46 -19h-383zM1216 224v-192q0 -14 -9 -23t-23 -9h-192
+q-14 0 -23 9t-9 23v192q0 14 9 23t23 9h192q14 0 23 -9t9 -23zM1024 608v-192q0 -14 -9 -23t-23 -9h-192q-14 0 -23 9t-9 23v192q0 14 9 23t23 9h192q14 0 23 -9t9 -23zM1600 224v-192q0 -14 -9 -23t-23 -9h-192q-14 0 -23 9t-9 23v192q0 14 9 23t23 9h192q14 0 23 -9t9 -23
+zM1408 608v-192q0 -14 -9 -23t-23 -9h-192q-14 0 -23 9t-9 23v192q0 14 9 23t23 9h192q14 0 23 -9t9 -23zM1792 1016v-13h-514v10q0 104 -382 102q-382 -1 -382 -102v-10h-514v13q0 17 8.5 43t34 64t65.5 75.5t110.5 76t160 67.5t224 47.5t293.5 18.5t293 -18.5t224 -47.5
+t160.5 -67.5t110.5 -76t65.5 -75.5t34 -64t8.5 -43zM1792 608v-192q0 -14 -9 -23t-23 -9h-192q-14 0 -23 9t-9 23v192q0 14 9 23t23 9h192q14 0 23 -9t9 -23zM1792 962v-129q0 -27 -19 -46t-46 -19h-384q-27 0 -46 19t-19 46v129h514z" />
+    <glyph glyph-name="_455" unicode="&#xf1e5;" horiz-adv-x="1792" 
+d="M704 1216v-768q0 -26 -19 -45t-45 -19v-576q0 -26 -19 -45t-45 -19h-512q-26 0 -45 19t-19 45v512l249 873q7 23 31 23h424zM1024 1216v-704h-256v704h256zM1792 320v-512q0 -26 -19 -45t-45 -19h-512q-26 0 -45 19t-19 45v576q-26 0 -45 19t-19 45v768h424q24 0 31 -23z
+M736 1504v-224h-352v224q0 14 9 23t23 9h288q14 0 23 -9t9 -23zM1408 1504v-224h-352v224q0 14 9 23t23 9h288q14 0 23 -9t9 -23z" />
+    <glyph glyph-name="_456" unicode="&#xf1e6;" horiz-adv-x="1792" 
+d="M1755 1083q37 -38 37 -90.5t-37 -90.5l-401 -400l150 -150l-160 -160q-163 -163 -389.5 -186.5t-411.5 100.5l-362 -362h-181v181l362 362q-124 185 -100.5 411.5t186.5 389.5l160 160l150 -150l400 401q38 37 91 37t90 -37t37 -90.5t-37 -90.5l-400 -401l234 -234
+l401 400q38 37 91 37t90 -37z" />
+    <glyph glyph-name="_457" unicode="&#xf1e7;" horiz-adv-x="1792" 
+d="M873 796q0 -83 -63.5 -142.5t-152.5 -59.5t-152.5 59.5t-63.5 142.5q0 84 63.5 143t152.5 59t152.5 -59t63.5 -143zM1375 796q0 -83 -63 -142.5t-153 -59.5q-89 0 -152.5 59.5t-63.5 142.5q0 84 63.5 143t152.5 59q90 0 153 -59t63 -143zM1600 616v667q0 87 -32 123.5
+t-111 36.5h-1112q-83 0 -112.5 -34t-29.5 -126v-673q43 -23 88.5 -40t81 -28t81 -18.5t71 -11t70 -4t58.5 -0.5t56.5 2t44.5 2q68 1 95 -27q6 -6 10 -9q26 -25 61 -51q7 91 118 87q5 0 36.5 -1.5t43 -2t45.5 -1t53 1t54.5 4.5t61 8.5t62 13.5t67 19.5t67.5 27t72 34.5z
+M1763 621q-121 -149 -372 -252q84 -285 -23 -465q-66 -113 -183 -148q-104 -32 -182 15q-86 51 -82 164l-1 326v1q-8 2 -24.5 6t-23.5 5l-1 -338q4 -114 -83 -164q-79 -47 -183 -15q-117 36 -182 150q-105 180 -22 463q-251 103 -372 252q-25 37 -4 63t60 -1q4 -2 11.5 -7
+t10.5 -8v694q0 72 47 123t114 51h1257q67 0 114 -51t47 -123v-694l21 15q39 27 60 1t-4 -63z" />
+    <glyph glyph-name="_458" unicode="&#xf1e8;" horiz-adv-x="1792" 
+d="M896 1102v-434h-145v434h145zM1294 1102v-434h-145v434h145zM1294 342l253 254v795h-1194v-1049h326v-217l217 217h398zM1692 1536v-1013l-434 -434h-326l-217 -217h-217v217h-398v1158l109 289h1483z" />
+    <glyph glyph-name="_459" unicode="&#xf1e9;" 
+d="M773 217v-127q-1 -292 -6 -305q-12 -32 -51 -40q-54 -9 -181.5 38t-162.5 89q-13 15 -17 36q-1 12 4 26q4 10 34 47t181 216q1 0 60 70q15 19 39.5 24.5t49.5 -3.5q24 -10 37.5 -29t12.5 -42zM624 468q-3 -55 -52 -70l-120 -39q-275 -88 -292 -88q-35 2 -54 36
+q-12 25 -17 75q-8 76 1 166.5t30 124.5t56 32q13 0 202 -77q71 -29 115 -47l84 -34q23 -9 35.5 -30.5t11.5 -48.5zM1450 171q-7 -54 -91.5 -161t-135.5 -127q-37 -14 -63 7q-14 10 -184 287l-47 77q-14 21 -11.5 46t19.5 46q35 43 83 26q1 -1 119 -40q203 -66 242 -79.5
+t47 -20.5q28 -22 22 -61zM778 803q5 -102 -54 -122q-58 -17 -114 71l-378 598q-8 35 19 62q41 43 207.5 89.5t224.5 31.5q40 -10 49 -45q3 -18 22 -305.5t24 -379.5zM1440 695q3 -39 -26 -59q-15 -10 -329 -86q-67 -15 -91 -23l1 2q-23 -6 -46 4t-37 32q-30 47 0 87
+q1 1 75 102q125 171 150 204t34 39q28 19 65 2q48 -23 123 -133.5t81 -167.5v-3z" />
+    <glyph glyph-name="_460" unicode="&#xf1ea;" horiz-adv-x="2048" 
+d="M1024 1024h-384v-384h384v384zM1152 384v-128h-640v128h640zM1152 1152v-640h-640v640h640zM1792 384v-128h-512v128h512zM1792 640v-128h-512v128h512zM1792 896v-128h-512v128h512zM1792 1152v-128h-512v128h512zM256 192v960h-128v-960q0 -26 19 -45t45 -19t45 19
+t19 45zM1920 192v1088h-1536v-1088q0 -33 -11 -64h1483q26 0 45 19t19 45zM2048 1408v-1216q0 -80 -56 -136t-136 -56h-1664q-80 0 -136 56t-56 136v1088h256v128h1792z" />
+    <glyph glyph-name="_461" unicode="&#xf1eb;" horiz-adv-x="2048" 
+d="M1024 13q-20 0 -93 73.5t-73 93.5q0 32 62.5 54t103.5 22t103.5 -22t62.5 -54q0 -20 -73 -93.5t-93 -73.5zM1294 284q-2 0 -40 25t-101.5 50t-128.5 25t-128.5 -25t-101 -50t-40.5 -25q-18 0 -93.5 75t-75.5 93q0 13 10 23q78 77 196 121t233 44t233 -44t196 -121
+q10 -10 10 -23q0 -18 -75.5 -93t-93.5 -75zM1567 556q-11 0 -23 8q-136 105 -252 154.5t-268 49.5q-85 0 -170.5 -22t-149 -53t-113.5 -62t-79 -53t-31 -22q-17 0 -92 75t-75 93q0 12 10 22q132 132 320 205t380 73t380 -73t320 -205q10 -10 10 -22q0 -18 -75 -93t-92 -75z
+M1838 827q-11 0 -22 9q-179 157 -371.5 236.5t-420.5 79.5t-420.5 -79.5t-371.5 -236.5q-11 -9 -22 -9q-17 0 -92.5 75t-75.5 93q0 13 10 23q187 186 445 288t527 102t527 -102t445 -288q10 -10 10 -23q0 -18 -75.5 -93t-92.5 -75z" />
+    <glyph glyph-name="_462" unicode="&#xf1ec;" horiz-adv-x="1792" 
+d="M384 0q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM768 0q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM384 384q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5
+t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM1152 0q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM768 384q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5
+t37.5 90.5zM384 768q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM1152 384q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM768 768q0 53 -37.5 90.5t-90.5 37.5
+t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM1536 0v384q0 52 -38 90t-90 38t-90 -38t-38 -90v-384q0 -52 38 -90t90 -38t90 38t38 90zM1152 768q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5z
+M1536 1088v256q0 26 -19 45t-45 19h-1280q-26 0 -45 -19t-19 -45v-256q0 -26 19 -45t45 -19h1280q26 0 45 19t19 45zM1536 768q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM1664 1408v-1536q0 -52 -38 -90t-90 -38
+h-1408q-52 0 -90 38t-38 90v1536q0 52 38 90t90 38h1408q52 0 90 -38t38 -90z" />
+    <glyph glyph-name="_463" unicode="&#xf1ed;" 
+d="M1519 890q18 -84 -4 -204q-87 -444 -565 -444h-44q-25 0 -44 -16.5t-24 -42.5l-4 -19l-55 -346l-2 -15q-5 -26 -24.5 -42.5t-44.5 -16.5h-251q-21 0 -33 15t-9 36q9 56 26.5 168t26.5 168t27 167.5t27 167.5q5 37 43 37h131q133 -2 236 21q175 39 287 144q102 95 155 246
+q24 70 35 133q1 6 2.5 7.5t3.5 1t6 -3.5q79 -59 98 -162zM1347 1172q0 -107 -46 -236q-80 -233 -302 -315q-113 -40 -252 -42q0 -1 -90 -1l-90 1q-100 0 -118 -96q-2 -8 -85 -530q-1 -10 -12 -10h-295q-22 0 -36.5 16.5t-11.5 38.5l232 1471q5 29 27.5 48t51.5 19h598
+q34 0 97.5 -13t111.5 -32q107 -41 163.5 -123t56.5 -196z" />
+    <glyph glyph-name="_464" unicode="&#xf1ee;" horiz-adv-x="1792" 
+d="M441 864q33 0 52 -26q266 -364 362 -774h-446q-127 441 -367 749q-12 16 -3 33.5t29 17.5h373zM1000 507q-49 -199 -125 -393q-79 310 -256 594q40 221 44 449q211 -340 337 -650zM1099 1216q235 -324 384.5 -698.5t184.5 -773.5h-451q-41 665 -553 1472h435zM1792 640
+q0 -424 -101 -812q-67 560 -359 1083q-25 301 -106 584q-4 16 5.5 28.5t25.5 12.5h359q21 0 38.5 -13t22.5 -33q115 -409 115 -850z" />
+    <glyph glyph-name="uniF1F0" unicode="&#xf1f0;" horiz-adv-x="2304" 
+d="M1975 546h-138q14 37 66 179l3 9q4 10 10 26t9 26l12 -55zM531 611l-58 295q-11 54 -75 54h-268l-2 -13q311 -79 403 -336zM710 960l-162 -438l-17 89q-26 70 -85 129.5t-131 88.5l135 -510h175l261 641h-176zM849 318h166l104 642h-166zM1617 944q-69 27 -149 27
+q-123 0 -201 -59t-79 -153q-1 -102 145 -174q48 -23 67 -41t19 -39q0 -30 -30 -46t-69 -16q-86 0 -156 33l-22 11l-23 -144q74 -34 185 -34q130 -1 208.5 59t80.5 160q0 106 -140 174q-49 25 -71 42t-22 38q0 22 24.5 38.5t70.5 16.5q70 1 124 -24l15 -8zM2042 960h-128
+q-65 0 -87 -54l-246 -588h174l35 96h212q5 -22 20 -96h154zM2304 1280v-1280q0 -52 -38 -90t-90 -38h-2048q-52 0 -90 38t-38 90v1280q0 52 38 90t90 38h2048q52 0 90 -38t38 -90z" />
+    <glyph glyph-name="_466" unicode="&#xf1f1;" horiz-adv-x="2304" 
+d="M1119 1195q-128 85 -281 85q-103 0 -197.5 -40.5t-162.5 -108.5t-108.5 -162t-40.5 -197q0 -104 40.5 -198t108.5 -162t162 -108.5t198 -40.5q153 0 281 85q-131 107 -178 265.5t0.5 316.5t177.5 265zM1152 1171q-126 -99 -172 -249.5t-0.5 -300.5t172.5 -249
+q127 99 172.5 249t-0.5 300.5t-172 249.5zM1185 1195q130 -107 177.5 -265.5t0.5 -317t-178 -264.5q128 -85 281 -85q104 0 198 40.5t162 108.5t108.5 162t40.5 198q0 103 -40.5 197t-108.5 162t-162.5 108.5t-197.5 40.5q-153 0 -281 -85zM1926 473h7v3h-17v-3h7v-17h3v17z
+M1955 456h4v20h-5l-6 -13l-6 13h-5v-20h3v15l6 -13h4l5 13v-15zM1947 16v-2h-2h-3v3h3h2v-1zM1947 7h3l-4 5h2l1 1q1 1 1 3t-1 3l-1 1h-3h-6v-13h3v5h1zM685 75q0 19 11 31t30 12q18 0 29 -12.5t11 -30.5q0 -19 -11 -31t-29 -12q-19 0 -30 12t-11 31zM1158 119q30 0 35 -32
+h-70q5 32 35 32zM1514 75q0 19 11 31t29 12t29.5 -12.5t11.5 -30.5q0 -19 -11 -31t-30 -12q-18 0 -29 12t-11 31zM1786 75q0 18 11.5 30.5t29.5 12.5t29.5 -12.5t11.5 -30.5q0 -19 -11.5 -31t-29.5 -12t-29.5 12.5t-11.5 30.5zM1944 3q-2 0 -4 1q-1 0 -3 2t-2 3q-1 2 -1 4
+q0 3 1 4q0 2 2 4l1 1q2 0 2 1q2 1 4 1q3 0 4 -1l4 -2l2 -4v-1q1 -2 1 -3l-1 -1v-3t-1 -1l-1 -2q-2 -2 -4 -2q-1 -1 -4 -1zM599 7h30v85q0 24 -14.5 38.5t-39.5 15.5q-32 0 -47 -24q-14 24 -45 24q-24 0 -39 -20v16h-30v-135h30v75q0 36 33 36q30 0 30 -36v-75h29v75
+q0 36 33 36q30 0 30 -36v-75zM765 7h29v68v67h-29v-16q-17 20 -43 20q-29 0 -48 -20t-19 -51t19 -51t48 -20q28 0 43 20v-17zM943 48q0 34 -47 40l-14 2q-23 4 -23 14q0 15 25 15q23 0 43 -11l12 24q-22 14 -55 14q-26 0 -41 -12t-15 -32q0 -33 47 -39l13 -2q24 -4 24 -14
+q0 -17 -31 -17q-25 0 -45 14l-13 -23q25 -17 58 -17q29 0 45.5 12t16.5 32zM1073 14l-8 25q-13 -7 -26 -7q-19 0 -19 22v61h48v27h-48v41h-30v-41h-28v-27h28v-61q0 -50 47 -50q21 0 36 10zM1159 146q-29 0 -48 -20t-19 -51q0 -32 19.5 -51.5t49.5 -19.5q33 0 55 19l-14 22
+q-18 -15 -39 -15q-34 0 -41 33h101v12q0 32 -18 51.5t-46 19.5zM1318 146q-23 0 -35 -20v16h-30v-135h30v76q0 35 29 35q10 0 18 -4l9 28q-9 4 -21 4zM1348 75q0 -31 19.5 -51t52.5 -20q29 0 48 16l-14 24q-18 -13 -35 -12q-18 0 -29.5 12t-11.5 31t11.5 31t29.5 12
+q19 0 35 -12l14 24q-20 16 -48 16q-33 0 -52.5 -20t-19.5 -51zM1593 7h30v68v67h-30v-16q-15 20 -42 20q-29 0 -48.5 -20t-19.5 -51t19.5 -51t48.5 -20q28 0 42 20v-17zM1726 146q-23 0 -35 -20v16h-29v-135h29v76q0 35 29 35q10 0 18 -4l9 28q-8 4 -21 4zM1866 7h29v68v122
+h-29v-71q-15 20 -43 20t-47.5 -20.5t-19.5 -50.5t19.5 -50.5t47.5 -20.5q29 0 43 20v-17zM1944 27l-2 -1h-3q-2 -1 -4 -3q-3 -1 -3 -4q-1 -2 -1 -6q0 -3 1 -5q0 -2 3 -4q2 -2 4 -3t5 -1q4 0 6 1q0 1 2 2l2 1q1 1 3 4q1 2 1 5q0 4 -1 6q-1 1 -3 4q0 1 -2 2l-2 1q-1 0 -3 0.5
+t-3 0.5zM2304 1280v-1280q0 -52 -38 -90t-90 -38h-2048q-52 0 -90 38t-38 90v1280q0 52 38 90t90 38h2048q52 0 90 -38t38 -90z" />
+    <glyph glyph-name="_467" unicode="&#xf1f2;" horiz-adv-x="2304" 
+d="M313 759q0 -51 -36 -84q-29 -26 -89 -26h-17v220h17q61 0 89 -27q36 -31 36 -83zM2089 824q0 -52 -64 -52h-19v101h20q63 0 63 -49zM380 759q0 74 -50 120.5t-129 46.5h-95v-333h95q74 0 119 38q60 51 60 128zM410 593h65v333h-65v-333zM730 694q0 40 -20.5 62t-75.5 42
+q-29 10 -39.5 19t-10.5 23q0 16 13.5 26.5t34.5 10.5q29 0 53 -27l34 44q-41 37 -98 37q-44 0 -74 -27.5t-30 -67.5q0 -35 18 -55.5t64 -36.5q37 -13 45 -19q19 -12 19 -34q0 -20 -14 -33.5t-36 -13.5q-48 0 -71 44l-42 -40q44 -64 115 -64q51 0 83 30.5t32 79.5zM1008 604
+v77q-37 -37 -78 -37q-49 0 -80.5 32.5t-31.5 82.5q0 48 31.5 81.5t77.5 33.5q43 0 81 -38v77q-40 20 -80 20q-74 0 -125.5 -50.5t-51.5 -123.5t51 -123.5t125 -50.5q42 0 81 19zM2240 0v527q-65 -40 -144.5 -84t-237.5 -117t-329.5 -137.5t-417.5 -134.5t-504 -118h1569
+q26 0 45 19t19 45zM1389 757q0 75 -53 128t-128 53t-128 -53t-53 -128t53 -128t128 -53t128 53t53 128zM1541 584l144 342h-71l-90 -224l-89 224h-71l142 -342h35zM1714 593h184v56h-119v90h115v56h-115v74h119v57h-184v-333zM2105 593h80l-105 140q76 16 76 94q0 47 -31 73
+t-87 26h-97v-333h65v133h9zM2304 1274v-1268q0 -56 -38.5 -95t-93.5 -39h-2040q-55 0 -93.5 39t-38.5 95v1268q0 56 38.5 95t93.5 39h2040q55 0 93.5 -39t38.5 -95z" />
+    <glyph glyph-name="f1f3" unicode="&#xf1f3;" horiz-adv-x="2304" 
+d="M119 854h89l-45 108zM740 328l74 79l-70 79h-163v-49h142v-55h-142v-54h159zM898 406l99 -110v217zM1186 453q0 33 -40 33h-84v-69h83q41 0 41 36zM1475 457q0 29 -42 29h-82v-61h81q43 0 43 32zM1197 923q0 29 -42 29h-82v-60h81q43 0 43 31zM1656 854h89l-44 108z
+M699 1009v-271h-66v212l-94 -212h-57l-94 212v-212h-132l-25 60h-135l-25 -60h-70l116 271h96l110 -257v257h106l85 -184l77 184h108zM1255 453q0 -20 -5.5 -35t-14 -25t-22.5 -16.5t-26 -10t-31.5 -4.5t-31.5 -1t-32.5 0.5t-29.5 0.5v-91h-126l-80 90l-83 -90h-256v271h260
+l80 -89l82 89h207q109 0 109 -89zM964 794v-56h-217v271h217v-57h-152v-49h148v-55h-148v-54h152zM2304 235v-229q0 -55 -38.5 -94.5t-93.5 -39.5h-2040q-55 0 -93.5 39.5t-38.5 94.5v678h111l25 61h55l25 -61h218v46l19 -46h113l20 47v-47h541v99l10 1q10 0 10 -14v-86h279
+v23q23 -12 55 -18t52.5 -6.5t63 0.5t51.5 1l25 61h56l25 -61h227v58l34 -58h182v378h-180v-44l-25 44h-185v-44l-23 44h-249q-69 0 -109 -22v22h-172v-22q-24 22 -73 22h-628l-43 -97l-43 97h-198v-44l-22 44h-169l-78 -179v391q0 55 38.5 94.5t93.5 39.5h2040
+q55 0 93.5 -39.5t38.5 -94.5v-678h-120q-51 0 -81 -22v22h-177q-55 0 -78 -22v22h-316v-22q-31 22 -87 22h-209v-22q-23 22 -91 22h-234l-54 -58l-50 58h-349v-378h343l55 59l52 -59h211v89h21q59 0 90 13v-102h174v99h8q8 0 10 -2t2 -10v-87h529q57 0 88 24v-24h168
+q60 0 95 17zM1546 469q0 -23 -12 -43t-34 -29q25 -9 34 -26t9 -46v-54h-65v45q0 33 -12 43.5t-46 10.5h-69v-99h-65v271h154q48 0 77 -15t29 -58zM1269 936q0 -24 -12.5 -44t-33.5 -29q26 -9 34.5 -25.5t8.5 -46.5v-53h-65q0 9 0.5 26.5t0 25t-3 18.5t-8.5 16t-17.5 8.5
+t-29.5 3.5h-70v-98h-64v271l153 -1q49 0 78 -14.5t29 -57.5zM1798 327v-56h-216v271h216v-56h-151v-49h148v-55h-148v-54zM1372 1009v-271h-66v271h66zM2065 357q0 -86 -102 -86h-126v58h126q34 0 34 25q0 16 -17 21t-41.5 5t-49.5 3.5t-42 22.5t-17 55q0 39 26 60t66 21
+h130v-57h-119q-36 0 -36 -25q0 -16 17.5 -20.5t42 -4t49 -2.5t42 -21.5t17.5 -54.5zM2304 407v-101q-24 -35 -88 -35h-125v58h125q33 0 33 25q0 13 -12.5 19t-31 5.5t-40 2t-40 8t-31 24t-12.5 48.5q0 39 26.5 60t66.5 21h129v-57h-118q-36 0 -36 -25q0 -20 29 -22t68.5 -5
+t56.5 -26zM2139 1008v-270h-92l-122 203v-203h-132l-26 60h-134l-25 -60h-75q-129 0 -129 133q0 138 133 138h63v-59q-7 0 -28 1t-28.5 0.5t-23 -2t-21.5 -6.5t-14.5 -13.5t-11.5 -23t-3 -33.5q0 -38 13.5 -58t49.5 -20h29l92 213h97l109 -256v256h99l114 -188v188h66z" />
+    <glyph glyph-name="_469" unicode="&#xf1f4;" horiz-adv-x="2304" 
+d="M745 630q0 -37 -25.5 -61.5t-62.5 -24.5q-29 0 -46.5 16t-17.5 44q0 37 25 62.5t62 25.5q28 0 46.5 -16.5t18.5 -45.5zM1530 779q0 -42 -22 -57t-66 -15l-32 -1l17 107q2 11 13 11h18q22 0 35 -2t25 -12.5t12 -30.5zM1881 630q0 -36 -25.5 -61t-61.5 -25q-29 0 -47 16
+t-18 44q0 37 25 62.5t62 25.5q28 0 46.5 -16.5t18.5 -45.5zM513 801q0 59 -38.5 85.5t-100.5 26.5h-160q-19 0 -21 -19l-65 -408q-1 -6 3 -11t10 -5h76q20 0 22 19l18 110q1 8 7 13t15 6.5t17 1.5t19 -1t14 -1q86 0 135 48.5t49 134.5zM822 489l41 261q1 6 -3 11t-10 5h-76
+q-14 0 -17 -33q-27 40 -95 40q-72 0 -122.5 -54t-50.5 -127q0 -59 34.5 -94t92.5 -35q28 0 58 12t48 32q-4 -12 -4 -21q0 -16 13 -16h69q19 0 22 19zM1269 752q0 5 -4 9.5t-9 4.5h-77q-11 0 -18 -10l-106 -156l-44 150q-5 16 -22 16h-75q-5 0 -9 -4.5t-4 -9.5q0 -2 19.5 -59
+t42 -123t23.5 -70q-82 -112 -82 -120q0 -13 13 -13h77q11 0 18 10l255 368q2 2 2 7zM1649 801q0 59 -38.5 85.5t-100.5 26.5h-159q-20 0 -22 -19l-65 -408q-1 -6 3 -11t10 -5h82q12 0 16 13l18 116q1 8 7 13t15 6.5t17 1.5t19 -1t14 -1q86 0 135 48.5t49 134.5zM1958 489
+l41 261q1 6 -3 11t-10 5h-76q-14 0 -17 -33q-26 40 -95 40q-72 0 -122.5 -54t-50.5 -127q0 -59 34.5 -94t92.5 -35q29 0 59 12t47 32q0 -1 -2 -9t-2 -12q0 -16 13 -16h69q19 0 22 19zM2176 898v1q0 14 -13 14h-74q-11 0 -13 -11l-65 -416l-1 -2q0 -5 4 -9.5t10 -4.5h66
+q19 0 21 19zM392 764q-5 -35 -26 -46t-60 -11l-33 -1l17 107q2 11 13 11h19q40 0 58 -11.5t12 -48.5zM2304 1280v-1280q0 -52 -38 -90t-90 -38h-2048q-52 0 -90 38t-38 90v1280q0 52 38 90t90 38h2048q52 0 90 -38t38 -90z" />
+    <glyph glyph-name="_470" unicode="&#xf1f5;" horiz-adv-x="2304" 
+d="M1597 633q0 -69 -21 -106q-19 -35 -52 -35q-23 0 -41 9v224q29 30 57 30q57 0 57 -122zM2035 669h-110q6 98 56 98q51 0 54 -98zM476 534q0 59 -33 91.5t-101 57.5q-36 13 -52 24t-16 25q0 26 38 26q58 0 124 -33l18 112q-67 32 -149 32q-77 0 -123 -38q-48 -39 -48 -109
+q0 -58 32.5 -90.5t99.5 -56.5q39 -14 54.5 -25.5t15.5 -27.5q0 -31 -48 -31q-29 0 -70 12.5t-72 30.5l-18 -113q72 -41 168 -41q81 0 129 37q51 41 51 117zM771 749l19 111h-96v135l-129 -21l-18 -114l-46 -8l-17 -103h62v-219q0 -84 44 -120q38 -30 111 -30q32 0 79 11v118
+q-32 -7 -44 -7q-42 0 -42 50v197h77zM1087 724v139q-15 3 -28 3q-32 0 -55.5 -16t-33.5 -46l-10 56h-131v-471h150v306q26 31 82 31q16 0 26 -2zM1124 389h150v471h-150v-471zM1746 638q0 122 -45 179q-40 52 -111 52q-64 0 -117 -56l-8 47h-132v-645l150 25v151
+q36 -11 68 -11q83 0 134 56q61 65 61 202zM1278 986q0 33 -23 56t-56 23t-56 -23t-23 -56t23 -56.5t56 -23.5t56 23.5t23 56.5zM2176 629q0 113 -48 176q-50 64 -144 64q-96 0 -151.5 -66t-55.5 -180q0 -128 63 -188q55 -55 161 -55q101 0 160 40l-16 103q-57 -31 -128 -31
+q-43 0 -63 19q-23 19 -28 66h248q2 14 2 52zM2304 1280v-1280q0 -52 -38 -90t-90 -38h-2048q-52 0 -90 38t-38 90v1280q0 52 38 90t90 38h2048q52 0 90 -38t38 -90z" />
+    <glyph glyph-name="_471" unicode="&#xf1f6;" horiz-adv-x="2048" 
+d="M1558 684q61 -356 298 -556q0 -52 -38 -90t-90 -38h-448q0 -106 -75 -181t-181 -75t-180.5 74.5t-75.5 180.5zM1024 -176q16 0 16 16t-16 16q-59 0 -101.5 42.5t-42.5 101.5q0 16 -16 16t-16 -16q0 -73 51.5 -124.5t124.5 -51.5zM2026 1424q8 -10 7.5 -23.5t-10.5 -22.5
+l-1872 -1622q-10 -8 -23.5 -7t-21.5 11l-84 96q-8 10 -7.5 23.5t10.5 21.5l186 161q-19 32 -19 66q50 42 91 88t85 119.5t74.5 158.5t50 206t19.5 260q0 152 117 282.5t307 158.5q-8 19 -8 39q0 40 28 68t68 28t68 -28t28 -68q0 -20 -8 -39q124 -18 219 -82.5t148 -157.5
+l418 363q10 8 23.5 7t21.5 -11z" />
+    <glyph glyph-name="_472" unicode="&#xf1f7;" horiz-adv-x="2048" 
+d="M1040 -160q0 16 -16 16q-59 0 -101.5 42.5t-42.5 101.5q0 16 -16 16t-16 -16q0 -73 51.5 -124.5t124.5 -51.5q16 0 16 16zM503 315l877 760q-42 88 -132.5 146.5t-223.5 58.5q-93 0 -169.5 -31.5t-121.5 -80.5t-69 -103t-24 -105q0 -384 -137 -645zM1856 128
+q0 -52 -38 -90t-90 -38h-448q0 -106 -75 -181t-181 -75t-180.5 74.5t-75.5 180.5l149 129h757q-166 187 -227 459l111 97q61 -356 298 -556zM1942 1520l84 -96q8 -10 7.5 -23.5t-10.5 -22.5l-1872 -1622q-10 -8 -23.5 -7t-21.5 11l-84 96q-8 10 -7.5 23.5t10.5 21.5l186 161
+q-19 32 -19 66q50 42 91 88t85 119.5t74.5 158.5t50 206t19.5 260q0 152 117 282.5t307 158.5q-8 19 -8 39q0 40 28 68t68 28t68 -28t28 -68q0 -20 -8 -39q124 -18 219 -82.5t148 -157.5l418 363q10 8 23.5 7t21.5 -11z" />
+    <glyph glyph-name="_473" unicode="&#xf1f8;" horiz-adv-x="1408" 
+d="M512 160v704q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-704q0 -14 9 -23t23 -9h64q14 0 23 9t9 23zM768 160v704q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-704q0 -14 9 -23t23 -9h64q14 0 23 9t9 23zM1024 160v704q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-704
+q0 -14 9 -23t23 -9h64q14 0 23 9t9 23zM480 1152h448l-48 117q-7 9 -17 11h-317q-10 -2 -17 -11zM1408 1120v-64q0 -14 -9 -23t-23 -9h-96v-948q0 -83 -47 -143.5t-113 -60.5h-832q-66 0 -113 58.5t-47 141.5v952h-96q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h309l70 167
+q15 37 54 63t79 26h320q40 0 79 -26t54 -63l70 -167h309q14 0 23 -9t9 -23z" />
+    <glyph glyph-name="_474" unicode="&#xf1f9;" 
+d="M1150 462v-109q0 -50 -36.5 -89t-94 -60.5t-118 -32.5t-117.5 -11q-205 0 -342.5 139t-137.5 346q0 203 136 339t339 136q34 0 75.5 -4.5t93 -18t92.5 -34t69 -56.5t28 -81v-109q0 -16 -16 -16h-118q-16 0 -16 16v70q0 43 -65.5 67.5t-137.5 24.5q-140 0 -228.5 -91.5
+t-88.5 -237.5q0 -151 91.5 -249.5t233.5 -98.5q68 0 138 24t70 66v70q0 7 4.5 11.5t10.5 4.5h119q6 0 11 -4.5t5 -11.5zM768 1280q-130 0 -248.5 -51t-204 -136.5t-136.5 -204t-51 -248.5t51 -248.5t136.5 -204t204 -136.5t248.5 -51t248.5 51t204 136.5t136.5 204t51 248.5
+t-51 248.5t-136.5 204t-204 136.5t-248.5 51zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="_475" unicode="&#xf1fa;" 
+d="M972 761q0 108 -53.5 169t-147.5 61q-63 0 -124 -30.5t-110 -84.5t-79.5 -137t-30.5 -180q0 -112 53.5 -173t150.5 -61q96 0 176 66.5t122.5 166t42.5 203.5zM1536 640q0 -111 -37 -197t-98.5 -135t-131.5 -74.5t-145 -27.5q-6 0 -15.5 -0.5t-16.5 -0.5q-95 0 -142 53
+q-28 33 -33 83q-52 -66 -131.5 -110t-173.5 -44q-161 0 -249.5 95.5t-88.5 269.5q0 157 66 290t179 210.5t246 77.5q87 0 155 -35.5t106 -99.5l2 19l11 56q1 6 5.5 12t9.5 6h118q5 0 13 -11q5 -5 3 -16l-120 -614q-5 -24 -5 -48q0 -39 12.5 -52t44.5 -13q28 1 57 5.5t73 24
+t77 50t57 89.5t24 137q0 292 -174 466t-466 174q-130 0 -248.5 -51t-204 -136.5t-136.5 -204t-51 -248.5t51 -248.5t136.5 -204t204 -136.5t248.5 -51q228 0 405 144q11 9 24 8t21 -12l41 -49q8 -12 7 -24q-2 -13 -12 -22q-102 -83 -227.5 -128t-258.5 -45q-156 0 -298 61
+t-245 164t-164 245t-61 298t61 298t164 245t245 164t298 61q344 0 556 -212t212 -556z" />
+    <glyph glyph-name="_476" unicode="&#xf1fb;" horiz-adv-x="1792" 
+d="M1698 1442q94 -94 94 -226.5t-94 -225.5l-225 -223l104 -104q10 -10 10 -23t-10 -23l-210 -210q-10 -10 -23 -10t-23 10l-105 105l-603 -603q-37 -37 -90 -37h-203l-256 -128l-64 64l128 256v203q0 53 37 90l603 603l-105 105q-10 10 -10 23t10 23l210 210q10 10 23 10
+t23 -10l104 -104l223 225q93 94 225.5 94t226.5 -94zM512 64l576 576l-192 192l-576 -576v-192h192z" />
+    <glyph glyph-name="f1fc" unicode="&#xf1fc;" horiz-adv-x="1792" 
+d="M1615 1536q70 0 122.5 -46.5t52.5 -116.5q0 -63 -45 -151q-332 -629 -465 -752q-97 -91 -218 -91q-126 0 -216.5 92.5t-90.5 219.5q0 128 92 212l638 579q59 54 130 54zM706 502q39 -76 106.5 -130t150.5 -76l1 -71q4 -213 -129.5 -347t-348.5 -134q-123 0 -218 46.5
+t-152.5 127.5t-86.5 183t-29 220q7 -5 41 -30t62 -44.5t59 -36.5t46 -17q41 0 55 37q25 66 57.5 112.5t69.5 76t88 47.5t103 25.5t125 10.5z" />
+    <glyph glyph-name="_478" unicode="&#xf1fd;" horiz-adv-x="1792" 
+d="M1792 128v-384h-1792v384q45 0 85 14t59 27.5t47 37.5q30 27 51.5 38t56.5 11q24 0 44 -7t31 -15t33 -27q29 -25 47 -38t58 -27t86 -14q45 0 85 14.5t58 27t48 37.5q21 19 32.5 27t31 15t43.5 7q35 0 56.5 -11t51.5 -38q28 -24 47 -37.5t59 -27.5t85 -14t85 14t59 27.5
+t47 37.5q30 27 51.5 38t56.5 11q34 0 55.5 -11t51.5 -38q28 -24 47 -37.5t59 -27.5t85 -14zM1792 448v-192q-24 0 -44 7t-31 15t-33 27q-29 25 -47 38t-58 27t-85 14q-46 0 -86 -14t-58 -27t-47 -38q-22 -19 -33 -27t-31 -15t-44 -7q-35 0 -56.5 11t-51.5 38q-29 25 -47 38
+t-58 27t-86 14q-45 0 -85 -14.5t-58 -27t-48 -37.5q-21 -19 -32.5 -27t-31 -15t-43.5 -7q-35 0 -56.5 11t-51.5 38q-28 24 -47 37.5t-59 27.5t-85 14q-46 0 -86 -14t-58 -27t-47 -38q-30 -27 -51.5 -38t-56.5 -11v192q0 80 56 136t136 56h64v448h256v-448h256v448h256v-448
+h256v448h256v-448h64q80 0 136 -56t56 -136zM512 1312q0 -77 -36 -118.5t-92 -41.5q-53 0 -90.5 37.5t-37.5 90.5q0 29 9.5 51t23.5 34t31 28t31 31.5t23.5 44.5t9.5 67q38 0 83 -74t45 -150zM1024 1312q0 -77 -36 -118.5t-92 -41.5q-53 0 -90.5 37.5t-37.5 90.5
+q0 29 9.5 51t23.5 34t31 28t31 31.5t23.5 44.5t9.5 67q38 0 83 -74t45 -150zM1536 1312q0 -77 -36 -118.5t-92 -41.5q-53 0 -90.5 37.5t-37.5 90.5q0 29 9.5 51t23.5 34t31 28t31 31.5t23.5 44.5t9.5 67q38 0 83 -74t45 -150z" />
+    <glyph glyph-name="_479" unicode="&#xf1fe;" horiz-adv-x="2048" 
+d="M2048 0v-128h-2048v1536h128v-1408h1920zM1664 1024l256 -896h-1664v576l448 576l576 -576z" />
+    <glyph glyph-name="_480" unicode="&#xf200;" horiz-adv-x="1792" 
+d="M768 646l546 -546q-106 -108 -247.5 -168t-298.5 -60q-209 0 -385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103v-762zM955 640h773q0 -157 -60 -298.5t-168 -247.5zM1664 768h-768v768q209 0 385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="_481" unicode="&#xf201;" horiz-adv-x="2048" 
+d="M2048 0v-128h-2048v1536h128v-1408h1920zM1920 1248v-435q0 -21 -19.5 -29.5t-35.5 7.5l-121 121l-633 -633q-10 -10 -23 -10t-23 10l-233 233l-416 -416l-192 192l585 585q10 10 23 10t23 -10l233 -233l464 464l-121 121q-16 16 -7.5 35.5t29.5 19.5h435q14 0 23 -9
+t9 -23z" />
+    <glyph glyph-name="_482" unicode="&#xf202;" horiz-adv-x="1792" 
+d="M1292 832q0 -6 10 -41q10 -29 25 -49.5t41 -34t44 -20t55 -16.5q325 -91 325 -332q0 -146 -105.5 -242.5t-254.5 -96.5q-59 0 -111.5 18.5t-91.5 45.5t-77 74.5t-63 87.5t-53.5 103.5t-43.5 103t-39.5 106.5t-35.5 95q-32 81 -61.5 133.5t-73.5 96.5t-104 64t-142 20
+q-96 0 -183 -55.5t-138 -144.5t-51 -185q0 -160 106.5 -279.5t263.5 -119.5q177 0 258 95q56 63 83 116l84 -152q-15 -34 -44 -70l1 -1q-131 -152 -388 -152q-147 0 -269.5 79t-190.5 207.5t-68 274.5q0 105 43.5 206t116 176.5t172 121.5t204.5 46q87 0 159 -19t123.5 -50
+t95 -80t72.5 -99t58.5 -117t50.5 -124.5t50 -130.5t55 -127q96 -200 233 -200q81 0 138.5 48.5t57.5 128.5q0 42 -19 72t-50.5 46t-72.5 31.5t-84.5 27t-87.5 34t-81 52t-65 82t-39 122.5q-3 16 -3 33q0 110 87.5 192t198.5 78q78 -3 120.5 -14.5t90.5 -53.5h-1
+q12 -11 23 -24.5t26 -36t19 -27.5l-129 -99q-26 49 -54 70v1q-23 21 -97 21q-49 0 -84 -33t-35 -83z" />
+    <glyph glyph-name="_483" unicode="&#xf203;" 
+d="M1432 484q0 173 -234 239q-35 10 -53 16.5t-38 25t-29 46.5q0 2 -2 8.5t-3 12t-1 7.5q0 36 24.5 59.5t60.5 23.5q54 0 71 -15h-1q20 -15 39 -51l93 71q-39 54 -49 64q-33 29 -67.5 39t-85.5 10q-80 0 -142 -57.5t-62 -137.5q0 -7 2 -23q16 -96 64.5 -140t148.5 -73
+q29 -8 49 -15.5t45 -21.5t38.5 -34.5t13.5 -46.5v-5q1 -58 -40.5 -93t-100.5 -35q-97 0 -167 144q-23 47 -51.5 121.5t-48 125.5t-54 110.5t-74 95.5t-103.5 60.5t-147 24.5q-101 0 -192 -56t-144 -148t-50 -192v-1q4 -108 50.5 -199t133.5 -147.5t196 -56.5q186 0 279 110
+q20 27 31 51l-60 109q-42 -80 -99 -116t-146 -36q-115 0 -191 87t-76 204q0 105 82 189t186 84q112 0 170 -53.5t104 -172.5q8 -21 25.5 -68.5t28.5 -76.5t31.5 -74.5t38.5 -74t45.5 -62.5t55.5 -53.5t66 -33t80 -13.5q107 0 183 69.5t76 174.5zM1536 1120v-960
+q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="_484" unicode="&#xf204;" horiz-adv-x="2048" 
+d="M1152 640q0 104 -40.5 198.5t-109.5 163.5t-163.5 109.5t-198.5 40.5t-198.5 -40.5t-163.5 -109.5t-109.5 -163.5t-40.5 -198.5t40.5 -198.5t109.5 -163.5t163.5 -109.5t198.5 -40.5t198.5 40.5t163.5 109.5t109.5 163.5t40.5 198.5zM1920 640q0 104 -40.5 198.5
+t-109.5 163.5t-163.5 109.5t-198.5 40.5h-386q119 -90 188.5 -224t69.5 -288t-69.5 -288t-188.5 -224h386q104 0 198.5 40.5t163.5 109.5t109.5 163.5t40.5 198.5zM2048 640q0 -130 -51 -248.5t-136.5 -204t-204 -136.5t-248.5 -51h-768q-130 0 -248.5 51t-204 136.5
+t-136.5 204t-51 248.5t51 248.5t136.5 204t204 136.5t248.5 51h768q130 0 248.5 -51t204 -136.5t136.5 -204t51 -248.5z" />
+    <glyph glyph-name="_485" unicode="&#xf205;" horiz-adv-x="2048" 
+d="M0 640q0 130 51 248.5t136.5 204t204 136.5t248.5 51h768q130 0 248.5 -51t204 -136.5t136.5 -204t51 -248.5t-51 -248.5t-136.5 -204t-204 -136.5t-248.5 -51h-768q-130 0 -248.5 51t-204 136.5t-136.5 204t-51 248.5zM1408 128q104 0 198.5 40.5t163.5 109.5
+t109.5 163.5t40.5 198.5t-40.5 198.5t-109.5 163.5t-163.5 109.5t-198.5 40.5t-198.5 -40.5t-163.5 -109.5t-109.5 -163.5t-40.5 -198.5t40.5 -198.5t109.5 -163.5t163.5 -109.5t198.5 -40.5z" />
+    <glyph glyph-name="_486" unicode="&#xf206;" horiz-adv-x="2304" 
+d="M762 384h-314q-40 0 -57.5 35t6.5 67l188 251q-65 31 -137 31q-132 0 -226 -94t-94 -226t94 -226t226 -94q115 0 203 72.5t111 183.5zM576 512h186q-18 85 -75 148zM1056 512l288 384h-480l-99 -132q105 -103 126 -252h165zM2176 448q0 132 -94 226t-226 94
+q-60 0 -121 -24l174 -260q15 -23 10 -49t-27 -40q-15 -11 -36 -11q-35 0 -53 29l-174 260q-93 -95 -93 -225q0 -132 94 -226t226 -94t226 94t94 226zM2304 448q0 -185 -131.5 -316.5t-316.5 -131.5t-316.5 131.5t-131.5 316.5q0 97 39.5 183.5t109.5 149.5l-65 98l-353 -469
+q-18 -26 -51 -26h-197q-23 -164 -149 -274t-294 -110q-185 0 -316.5 131.5t-131.5 316.5t131.5 316.5t316.5 131.5q114 0 215 -55l137 183h-224q-26 0 -45 19t-19 45t19 45t45 19h384v-128h435l-85 128h-222q-26 0 -45 19t-19 45t19 45t45 19h256q33 0 53 -28l267 -400
+q91 44 192 44q185 0 316.5 -131.5t131.5 -316.5z" />
+    <glyph glyph-name="_487" unicode="&#xf207;" 
+d="M384 320q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM1408 320q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM1362 716l-72 384q-5 23 -22.5 37.5t-40.5 14.5
+h-918q-23 0 -40.5 -14.5t-22.5 -37.5l-72 -384q-5 -30 14 -53t49 -23h1062q30 0 49 23t14 53zM1136 1328q0 20 -14 34t-34 14h-640q-20 0 -34 -14t-14 -34t14 -34t34 -14h640q20 0 34 14t14 34zM1536 603v-603h-128v-128q0 -53 -37.5 -90.5t-90.5 -37.5t-90.5 37.5
+t-37.5 90.5v128h-768v-128q0 -53 -37.5 -90.5t-90.5 -37.5t-90.5 37.5t-37.5 90.5v128h-128v603q0 112 25 223l103 454q9 78 97.5 137t230 89t312.5 30t312.5 -30t230 -89t97.5 -137l105 -454q23 -102 23 -223z" />
+    <glyph glyph-name="_488" unicode="&#xf208;" horiz-adv-x="2048" 
+d="M1463 704q0 -35 -25 -60.5t-61 -25.5h-702q-36 0 -61 25.5t-25 60.5t25 60.5t61 25.5h702q36 0 61 -25.5t25 -60.5zM1677 704q0 86 -23 170h-982q-36 0 -61 25t-25 60q0 36 25 61t61 25h908q-88 143 -235 227t-320 84q-177 0 -327.5 -87.5t-238 -237.5t-87.5 -327
+q0 -86 23 -170h982q36 0 61 -25t25 -60q0 -36 -25 -61t-61 -25h-908q88 -143 235.5 -227t320.5 -84q132 0 253 51.5t208 139t139 208t52 253.5zM2048 959q0 -35 -25 -60t-61 -25h-131q17 -85 17 -170q0 -167 -65.5 -319.5t-175.5 -263t-262.5 -176t-319.5 -65.5
+q-246 0 -448.5 133t-301.5 350h-189q-36 0 -61 25t-25 61q0 35 25 60t61 25h132q-17 85 -17 170q0 167 65.5 319.5t175.5 263t262.5 176t320.5 65.5q245 0 447.5 -133t301.5 -350h188q36 0 61 -25t25 -61z" />
+    <glyph glyph-name="_489" unicode="&#xf209;" horiz-adv-x="1280" 
+d="M953 1158l-114 -328l117 -21q165 451 165 518q0 56 -38 56q-57 0 -130 -225zM654 471l33 -88q37 42 71 67l-33 5.5t-38.5 7t-32.5 8.5zM362 1367q0 -98 159 -521q17 10 49 10q15 0 75 -5l-121 351q-75 220 -123 220q-19 0 -29 -17.5t-10 -37.5zM283 608q0 -36 51.5 -119
+t117.5 -153t100 -70q14 0 25.5 13t11.5 27q0 24 -32 102q-13 32 -32 72t-47.5 89t-61.5 81t-62 32q-20 0 -45.5 -27t-25.5 -47zM125 273q0 -41 25 -104q59 -145 183.5 -227t281.5 -82q227 0 382 170q152 169 152 427q0 43 -1 67t-11.5 62t-30.5 56q-56 49 -211.5 75.5
+t-270.5 26.5q-37 0 -49 -11q-12 -5 -12 -35q0 -34 21.5 -60t55.5 -40t77.5 -23.5t87.5 -11.5t85 -4t70 0h23q24 0 40 -19q15 -19 19 -55q-28 -28 -96 -54q-61 -22 -93 -46q-64 -46 -108.5 -114t-44.5 -137q0 -31 18.5 -88.5t18.5 -87.5l-3 -12q-4 -12 -4 -14
+q-137 10 -146 216q-8 -2 -41 -2q2 -7 2 -21q0 -53 -40.5 -89.5t-94.5 -36.5q-82 0 -166.5 78t-84.5 159q0 34 33 67q52 -64 60 -76q77 -104 133 -104q12 0 26.5 8.5t14.5 20.5q0 34 -87.5 145t-116.5 111q-43 0 -70 -44.5t-27 -90.5zM11 264q0 101 42.5 163t136.5 88
+q-28 74 -28 104q0 62 61 123t122 61q29 0 70 -15q-163 462 -163 567q0 80 41 130.5t119 50.5q131 0 325 -581q6 -17 8 -23q6 16 29 79.5t43.5 118.5t54 127.5t64.5 123t70.5 86.5t76.5 36q71 0 112 -49t41 -122q0 -108 -159 -550q61 -15 100.5 -46t58.5 -78t26 -93.5
+t7 -110.5q0 -150 -47 -280t-132 -225t-211 -150t-278 -55q-111 0 -223 42q-149 57 -258 191.5t-109 286.5z" />
+    <glyph glyph-name="_490" unicode="&#xf20a;" horiz-adv-x="2048" 
+d="M785 528h207q-14 -158 -98.5 -248.5t-214.5 -90.5q-162 0 -254.5 116t-92.5 316q0 194 93 311.5t233 117.5q148 0 232 -87t97 -247h-203q-5 64 -35.5 99t-81.5 35q-57 0 -88.5 -60.5t-31.5 -177.5q0 -48 5 -84t18 -69.5t40 -51.5t66 -18q95 0 109 139zM1497 528h206
+q-14 -158 -98 -248.5t-214 -90.5q-162 0 -254.5 116t-92.5 316q0 194 93 311.5t233 117.5q148 0 232 -87t97 -247h-204q-4 64 -35 99t-81 35q-57 0 -88.5 -60.5t-31.5 -177.5q0 -48 5 -84t18 -69.5t39.5 -51.5t65.5 -18q49 0 76.5 38t33.5 101zM1856 647q0 207 -15.5 307
+t-60.5 161q-6 8 -13.5 14t-21.5 15t-16 11q-86 63 -697 63q-625 0 -710 -63q-5 -4 -17.5 -11.5t-21 -14t-14.5 -14.5q-45 -60 -60 -159.5t-15 -308.5q0 -208 15 -307.5t60 -160.5q6 -8 15 -15t20.5 -14t17.5 -12q44 -33 239.5 -49t470.5 -16q610 0 697 65q5 4 17 11t20.5 14
+t13.5 16q46 60 61 159t15 309zM2048 1408v-1536h-2048v1536h2048z" />
+    <glyph glyph-name="_491" unicode="&#xf20b;" 
+d="M992 912v-496q0 -14 -9 -23t-23 -9h-160q-14 0 -23 9t-9 23v496q0 112 -80 192t-192 80h-272v-1152q0 -14 -9 -23t-23 -9h-160q-14 0 -23 9t-9 23v1344q0 14 9 23t23 9h464q135 0 249 -66.5t180.5 -180.5t66.5 -249zM1376 1376v-880q0 -135 -66.5 -249t-180.5 -180.5
+t-249 -66.5h-464q-14 0 -23 9t-9 23v960q0 14 9 23t23 9h160q14 0 23 -9t9 -23v-768h272q112 0 192 80t80 192v880q0 14 9 23t23 9h160q14 0 23 -9t9 -23z" />
+    <glyph glyph-name="_492" unicode="&#xf20c;" 
+d="M1311 694v-114q0 -24 -13.5 -38t-37.5 -14h-202q-24 0 -38 14t-14 38v114q0 24 14 38t38 14h202q24 0 37.5 -14t13.5 -38zM821 464v250q0 53 -32.5 85.5t-85.5 32.5h-133q-68 0 -96 -52q-28 52 -96 52h-130q-53 0 -85.5 -32.5t-32.5 -85.5v-250q0 -22 21 -22h55
+q22 0 22 22v230q0 24 13.5 38t38.5 14h94q24 0 38 -14t14 -38v-230q0 -22 21 -22h54q22 0 22 22v230q0 24 14 38t38 14h97q24 0 37.5 -14t13.5 -38v-230q0 -22 22 -22h55q21 0 21 22zM1410 560v154q0 53 -33 85.5t-86 32.5h-264q-53 0 -86 -32.5t-33 -85.5v-410
+q0 -21 22 -21h55q21 0 21 21v180q31 -42 94 -42h191q53 0 86 32.5t33 85.5zM1536 1176v-1072q0 -96 -68 -164t-164 -68h-1072q-96 0 -164 68t-68 164v1072q0 96 68 164t164 68h1072q96 0 164 -68t68 -164z" />
+    <glyph glyph-name="_493" unicode="&#xf20d;" 
+d="M915 450h-294l147 551zM1001 128h311l-324 1024h-440l-324 -1024h311l383 314zM1536 1120v-960q0 -118 -85 -203t-203 -85h-960q-118 0 -203 85t-85 203v960q0 118 85 203t203 85h960q118 0 203 -85t85 -203z" />
+    <glyph glyph-name="_494" unicode="&#xf20e;" horiz-adv-x="2048" 
+d="M2048 641q0 -21 -13 -36.5t-33 -19.5l-205 -356q3 -9 3 -18q0 -20 -12.5 -35.5t-32.5 -19.5l-193 -337q3 -8 3 -16q0 -23 -16.5 -40t-40.5 -17q-25 0 -41 18h-400q-17 -20 -43 -20t-43 20h-399q-17 -20 -43 -20q-23 0 -40 16.5t-17 40.5q0 8 4 20l-193 335
+q-20 4 -32.5 19.5t-12.5 35.5q0 9 3 18l-206 356q-20 5 -32.5 20.5t-12.5 35.5q0 21 13.5 36.5t33.5 19.5l199 344q0 1 -0.5 3t-0.5 3q0 36 34 51l209 363q-4 10 -4 18q0 24 17 40.5t40 16.5q26 0 44 -21h396q16 21 43 21t43 -21h398q18 21 44 21q23 0 40 -16.5t17 -40.5
+q0 -6 -4 -18l207 -358q23 -1 39 -17.5t16 -38.5q0 -13 -7 -27l187 -324q19 -4 31.5 -19.5t12.5 -35.5zM1063 -158h389l-342 354h-143l-342 -354h360q18 16 39 16t39 -16zM112 654q1 -4 1 -13q0 -10 -2 -15l208 -360l15 -6l188 199v347l-187 194q-13 -8 -29 -10zM986 1438
+h-388l190 -200l554 200h-280q-16 -16 -38 -16t-38 16zM1689 226q1 6 5 11l-64 68l-17 -79h76zM1583 226l22 105l-252 266l-296 -307l63 -64h463zM1495 -142l16 28l65 310h-427l333 -343q8 4 13 5zM578 -158h5l342 354h-373v-335l4 -6q14 -5 22 -13zM552 226h402l64 66
+l-309 321l-157 -166v-221zM359 226h163v189l-168 -177q4 -8 5 -12zM358 1051q0 -1 0.5 -2t0.5 -2q0 -16 -8 -29l171 -177v269zM552 1121v-311l153 -157l297 314l-223 236zM556 1425l-4 -8v-264l205 74l-191 201q-6 -2 -10 -3zM1447 1438h-16l-621 -224l213 -225zM1023 946
+l-297 -315l311 -319l296 307zM688 634l-136 141v-284zM1038 270l-42 -44h85zM1374 618l238 -251l132 624l-3 5l-1 1zM1718 1018q-8 13 -8 29v2l-216 376q-5 1 -13 5l-437 -463l310 -327zM522 1142v223l-163 -282zM522 196h-163l163 -283v283zM1607 196l-48 -227l130 227h-82
+zM1729 266l207 361q-2 10 -2 14q0 1 3 16l-171 296l-129 -612l77 -82q5 3 15 7z" />
+    <glyph glyph-name="f210" unicode="&#xf210;" 
+d="M0 856q0 131 91.5 226.5t222.5 95.5h742l352 358v-1470q0 -132 -91.5 -227t-222.5 -95h-780q-131 0 -222.5 95t-91.5 227v790zM1232 102l-176 180v425q0 46 -32 79t-78 33h-484q-46 0 -78 -33t-32 -79v-492q0 -46 32.5 -79.5t77.5 -33.5h770z" />
+    <glyph glyph-name="_496" unicode="&#xf211;" 
+d="M934 1386q-317 -121 -556 -362.5t-358 -560.5q-20 89 -20 176q0 208 102.5 384.5t278.5 279t384 102.5q82 0 169 -19zM1203 1267q93 -65 164 -155q-389 -113 -674.5 -400.5t-396.5 -676.5q-93 72 -155 162q112 386 395 671t667 399zM470 -67q115 356 379.5 622t619.5 384
+q40 -92 54 -195q-292 -120 -516 -345t-343 -518q-103 14 -194 52zM1536 -125q-193 50 -367 115q-135 -84 -290 -107q109 205 274 370.5t369 275.5q-21 -152 -101 -284q65 -175 115 -370z" />
+    <glyph glyph-name="f212" unicode="&#xf212;" horiz-adv-x="2048" 
+d="M1893 1144l155 -1272q-131 0 -257 57q-200 91 -393 91q-226 0 -374 -148q-148 148 -374 148q-193 0 -393 -91q-128 -57 -252 -57h-5l155 1272q224 127 482 127q233 0 387 -106q154 106 387 106q258 0 482 -127zM1398 157q129 0 232 -28.5t260 -93.5l-124 1021
+q-171 78 -368 78q-224 0 -374 -141q-150 141 -374 141q-197 0 -368 -78l-124 -1021q105 43 165.5 65t148.5 39.5t178 17.5q202 0 374 -108q172 108 374 108zM1438 191l-55 907q-211 -4 -359 -155q-152 155 -374 155q-176 0 -336 -66l-114 -941q124 51 228.5 76t221.5 25
+q209 0 374 -102q172 107 374 102z" />
+    <glyph glyph-name="_498" unicode="&#xf213;" horiz-adv-x="2048" 
+d="M1500 165v733q0 21 -15 36t-35 15h-93q-20 0 -35 -15t-15 -36v-733q0 -20 15 -35t35 -15h93q20 0 35 15t15 35zM1216 165v531q0 20 -15 35t-35 15h-101q-20 0 -35 -15t-15 -35v-531q0 -20 15 -35t35 -15h101q20 0 35 15t15 35zM924 165v429q0 20 -15 35t-35 15h-101
+q-20 0 -35 -15t-15 -35v-429q0 -20 15 -35t35 -15h101q20 0 35 15t15 35zM632 165v362q0 20 -15 35t-35 15h-101q-20 0 -35 -15t-15 -35v-362q0 -20 15 -35t35 -15h101q20 0 35 15t15 35zM2048 311q0 -166 -118 -284t-284 -118h-1244q-166 0 -284 118t-118 284
+q0 116 63 214.5t168 148.5q-10 34 -10 73q0 113 80.5 193.5t193.5 80.5q102 0 180 -67q45 183 194 300t338 117q149 0 275 -73.5t199.5 -199.5t73.5 -275q0 -66 -14 -122q135 -33 221 -142.5t86 -247.5z" />
+    <glyph glyph-name="_499" unicode="&#xf214;" 
+d="M0 1536h1536v-1392l-776 -338l-760 338v1392zM1436 209v926h-1336v-926l661 -294zM1436 1235v201h-1336v-201h1336zM181 937v-115h-37v115h37zM181 789v-115h-37v115h37zM181 641v-115h-37v115h37zM181 493v-115h-37v115h37zM181 345v-115h-37v115h37zM207 202l15 34
+l105 -47l-15 -33zM343 142l15 34l105 -46l-15 -34zM478 82l15 34l105 -46l-15 -34zM614 23l15 33l104 -46l-15 -34zM797 10l105 46l15 -33l-105 -47zM932 70l105 46l15 -34l-105 -46zM1068 130l105 46l15 -34l-105 -46zM1203 189l105 47l15 -34l-105 -46zM259 1389v-36h-114
+v36h114zM421 1389v-36h-115v36h115zM583 1389v-36h-115v36h115zM744 1389v-36h-114v36h114zM906 1389v-36h-114v36h114zM1068 1389v-36h-115v36h115zM1230 1389v-36h-115v36h115zM1391 1389v-36h-114v36h114zM181 1049v-79h-37v115h115v-36h-78zM421 1085v-36h-115v36h115z
+M583 1085v-36h-115v36h115zM744 1085v-36h-114v36h114zM906 1085v-36h-114v36h114zM1068 1085v-36h-115v36h115zM1230 1085v-36h-115v36h115zM1355 970v79h-78v36h115v-115h-37zM1355 822v115h37v-115h-37zM1355 674v115h37v-115h-37zM1355 526v115h37v-115h-37zM1355 378
+v115h37v-115h-37zM1355 230v115h37v-115h-37zM760 265q-129 0 -221 91.5t-92 221.5q0 129 92 221t221 92q130 0 221.5 -92t91.5 -221q0 -130 -91.5 -221.5t-221.5 -91.5zM595 646q0 -36 19.5 -56.5t49.5 -25t64 -7t64 -2t49.5 -9t19.5 -30.5q0 -49 -112 -49q-97 0 -123 51
+h-3l-31 -63q67 -42 162 -42q29 0 56.5 5t55.5 16t45.5 33t17.5 53q0 46 -27.5 69.5t-67.5 27t-79.5 3t-67 5t-27.5 25.5q0 21 20.5 33t40.5 15t41 3q34 0 70.5 -11t51.5 -34h3l30 58q-3 1 -21 8.5t-22.5 9t-19.5 7t-22 7t-20 4.5t-24 4t-23 1q-29 0 -56.5 -5t-54 -16.5
+t-43 -34t-16.5 -53.5z" />
+    <glyph glyph-name="_500" unicode="&#xf215;" horiz-adv-x="2048" 
+d="M863 504q0 112 -79.5 191.5t-191.5 79.5t-191 -79.5t-79 -191.5t79 -191t191 -79t191.5 79t79.5 191zM1726 505q0 112 -79 191t-191 79t-191.5 -79t-79.5 -191q0 -113 79.5 -192t191.5 -79t191 79.5t79 191.5zM2048 1314v-1348q0 -44 -31.5 -75.5t-76.5 -31.5h-1832
+q-45 0 -76.5 31.5t-31.5 75.5v1348q0 44 31.5 75.5t76.5 31.5h431q44 0 76 -31.5t32 -75.5v-161h754v161q0 44 32 75.5t76 31.5h431q45 0 76.5 -31.5t31.5 -75.5z" />
+    <glyph glyph-name="_501" unicode="&#xf216;" horiz-adv-x="2048" 
+d="M1430 953zM1690 749q148 0 253 -98.5t105 -244.5q0 -157 -109 -261.5t-267 -104.5q-85 0 -162 27.5t-138 73.5t-118 106t-109 126t-103.5 132.5t-108.5 126.5t-117 106t-136 73.5t-159 27.5q-154 0 -251.5 -91.5t-97.5 -244.5q0 -157 104 -250t263 -93q100 0 208 37.5
+t193 98.5q5 4 21 18.5t30 24t22 9.5q14 0 24.5 -10.5t10.5 -24.5q0 -24 -60 -77q-101 -88 -234.5 -142t-260.5 -54q-133 0 -245.5 58t-180 165t-67.5 241q0 205 141.5 341t347.5 136q120 0 226.5 -43.5t185.5 -113t151.5 -153t139 -167.5t133.5 -153.5t149.5 -113
+t172.5 -43.5q102 0 168.5 61.5t66.5 162.5q0 95 -64.5 159t-159.5 64q-30 0 -81.5 -18.5t-68.5 -18.5q-20 0 -35.5 15t-15.5 35q0 18 8.5 57t8.5 59q0 159 -107.5 263t-266.5 104q-58 0 -111.5 -18.5t-84 -40.5t-55.5 -40.5t-33 -18.5q-15 0 -25.5 10.5t-10.5 25.5
+q0 19 25 46q59 67 147 103.5t182 36.5q191 0 318 -125.5t127 -315.5q0 -37 -4 -66q57 15 115 15z" />
+    <glyph glyph-name="_502" unicode="&#xf217;" horiz-adv-x="1664" 
+d="M1216 832q0 26 -19 45t-45 19h-128v128q0 26 -19 45t-45 19t-45 -19t-19 -45v-128h-128q-26 0 -45 -19t-19 -45t19 -45t45 -19h128v-128q0 -26 19 -45t45 -19t45 19t19 45v128h128q26 0 45 19t19 45zM640 0q0 -53 -37.5 -90.5t-90.5 -37.5t-90.5 37.5t-37.5 90.5
+t37.5 90.5t90.5 37.5t90.5 -37.5t37.5 -90.5zM1536 0q0 -53 -37.5 -90.5t-90.5 -37.5t-90.5 37.5t-37.5 90.5t37.5 90.5t90.5 37.5t90.5 -37.5t37.5 -90.5zM1664 1088v-512q0 -24 -16 -42.5t-41 -21.5l-1044 -122q1 -7 4.5 -21.5t6 -26.5t2.5 -22q0 -16 -24 -64h920
+q26 0 45 -19t19 -45t-19 -45t-45 -19h-1024q-26 0 -45 19t-19 45q0 14 11 39.5t29.5 59.5t20.5 38l-177 823h-204q-26 0 -45 19t-19 45t19 45t45 19h256q16 0 28.5 -6.5t20 -15.5t13 -24.5t7.5 -26.5t5.5 -29.5t4.5 -25.5h1201q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="_503" unicode="&#xf218;" horiz-adv-x="1664" 
+d="M1280 832q0 26 -19 45t-45 19t-45 -19l-147 -146v293q0 26 -19 45t-45 19t-45 -19t-19 -45v-293l-147 146q-19 19 -45 19t-45 -19t-19 -45t19 -45l256 -256q19 -19 45 -19t45 19l256 256q19 19 19 45zM640 0q0 -53 -37.5 -90.5t-90.5 -37.5t-90.5 37.5t-37.5 90.5
+t37.5 90.5t90.5 37.5t90.5 -37.5t37.5 -90.5zM1536 0q0 -53 -37.5 -90.5t-90.5 -37.5t-90.5 37.5t-37.5 90.5t37.5 90.5t90.5 37.5t90.5 -37.5t37.5 -90.5zM1664 1088v-512q0 -24 -16 -42.5t-41 -21.5l-1044 -122q1 -7 4.5 -21.5t6 -26.5t2.5 -22q0 -16 -24 -64h920
+q26 0 45 -19t19 -45t-19 -45t-45 -19h-1024q-26 0 -45 19t-19 45q0 14 11 39.5t29.5 59.5t20.5 38l-177 823h-204q-26 0 -45 19t-19 45t19 45t45 19h256q16 0 28.5 -6.5t20 -15.5t13 -24.5t7.5 -26.5t5.5 -29.5t4.5 -25.5h1201q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="_504" unicode="&#xf219;" horiz-adv-x="2048" 
+d="M212 768l623 -665l-300 665h-323zM1024 -4l349 772h-698zM538 896l204 384h-262l-288 -384h346zM1213 103l623 665h-323zM683 896h682l-204 384h-274zM1510 896h346l-288 384h-262zM1651 1382l384 -512q14 -18 13 -41.5t-17 -40.5l-960 -1024q-18 -20 -47 -20t-47 20
+l-960 1024q-16 17 -17 40.5t13 41.5l384 512q18 26 51 26h1152q33 0 51 -26z" />
+    <glyph glyph-name="_505" unicode="&#xf21a;" horiz-adv-x="2048" 
+d="M1811 -19q19 19 45 19t45 -19l128 -128l-90 -90l-83 83l-83 -83q-18 -19 -45 -19t-45 19l-83 83l-83 -83q-19 -19 -45 -19t-45 19l-83 83l-83 -83q-19 -19 -45 -19t-45 19l-83 83l-83 -83q-19 -19 -45 -19t-45 19l-83 83l-83 -83q-19 -19 -45 -19t-45 19l-83 83l-83 -83
+q-19 -19 -45 -19t-45 19l-83 83l-83 -83q-19 -19 -45 -19t-45 19l-128 128l90 90l83 -83l83 83q19 19 45 19t45 -19l83 -83l83 83q19 19 45 19t45 -19l83 -83l83 83q19 19 45 19t45 -19l83 -83l83 83q19 19 45 19t45 -19l83 -83l83 83q19 19 45 19t45 -19l83 -83l83 83
+q19 19 45 19t45 -19l83 -83zM237 19q-19 -19 -45 -19t-45 19l-128 128l90 90l83 -82l83 82q19 19 45 19t45 -19l83 -82l64 64v293l-210 314q-17 26 -7 56.5t40 40.5l177 58v299h128v128h256v128h256v-128h256v-128h128v-299l177 -58q30 -10 40 -40.5t-7 -56.5l-210 -314
+v-293l19 18q19 19 45 19t45 -19l83 -82l83 82q19 19 45 19t45 -19l128 -128l-90 -90l-83 83l-83 -83q-18 -19 -45 -19t-45 19l-83 83l-83 -83q-19 -19 -45 -19t-45 19l-83 83l-83 -83q-19 -19 -45 -19t-45 19l-83 83l-83 -83q-19 -19 -45 -19t-45 19l-83 83l-83 -83
+q-19 -19 -45 -19t-45 19l-83 83l-83 -83q-19 -19 -45 -19t-45 19l-83 83zM640 1152v-128l384 128l384 -128v128h-128v128h-512v-128h-128z" />
+    <glyph glyph-name="_506" unicode="&#xf21b;" 
+d="M576 0l96 448l-96 128l-128 64zM832 0l128 640l-128 -64l-96 -128zM992 1010q-2 4 -4 6q-10 8 -96 8q-70 0 -167 -19q-7 -2 -21 -2t-21 2q-97 19 -167 19q-86 0 -96 -8q-2 -2 -4 -6q2 -18 4 -27q2 -3 7.5 -6.5t7.5 -10.5q2 -4 7.5 -20.5t7 -20.5t7.5 -17t8.5 -17t9 -14
+t12 -13.5t14 -9.5t17.5 -8t20.5 -4t24.5 -2q36 0 59 12.5t32.5 30t14.5 34.5t11.5 29.5t17.5 12.5h12q11 0 17.5 -12.5t11.5 -29.5t14.5 -34.5t32.5 -30t59 -12.5q13 0 24.5 2t20.5 4t17.5 8t14 9.5t12 13.5t9 14t8.5 17t7.5 17t7 20.5t7.5 20.5q2 7 7.5 10.5t7.5 6.5
+q2 9 4 27zM1408 131q0 -121 -73 -190t-194 -69h-874q-121 0 -194 69t-73 190q0 61 4.5 118t19 125.5t37.5 123.5t63.5 103.5t93.5 74.5l-90 220h214q-22 64 -22 128q0 12 2 32q-194 40 -194 96q0 57 210 99q17 62 51.5 134t70.5 114q32 37 76 37q30 0 84 -31t84 -31t84 31
+t84 31q44 0 76 -37q36 -42 70.5 -114t51.5 -134q210 -42 210 -99q0 -56 -194 -96q7 -81 -20 -160h214l-82 -225q63 -33 107.5 -96.5t65.5 -143.5t29 -151.5t8 -148.5z" />
+    <glyph glyph-name="_507" unicode="&#xf21c;" horiz-adv-x="2304" 
+d="M2301 500q12 -103 -22 -198.5t-99 -163.5t-158.5 -106t-196.5 -31q-161 11 -279.5 125t-134.5 274q-12 111 27.5 210.5t118.5 170.5l-71 107q-96 -80 -151 -194t-55 -244q0 -27 -18.5 -46.5t-45.5 -19.5h-256h-69q-23 -164 -149 -274t-294 -110q-185 0 -316.5 131.5
+t-131.5 316.5t131.5 316.5t316.5 131.5q76 0 152 -27l24 45q-123 110 -304 110h-64q-26 0 -45 19t-19 45t19 45t45 19h128q78 0 145 -13.5t116.5 -38.5t71.5 -39.5t51 -36.5h512h115l-85 128h-222q-30 0 -49 22.5t-14 52.5q4 23 23 38t43 15h253q33 0 53 -28l70 -105
+l114 114q19 19 46 19h101q26 0 45 -19t19 -45v-128q0 -26 -19 -45t-45 -19h-179l115 -172q131 63 275 36q143 -26 244 -134.5t118 -253.5zM448 128q115 0 203 72.5t111 183.5h-314q-35 0 -55 31q-18 32 -1 63l147 277q-47 13 -91 13q-132 0 -226 -94t-94 -226t94 -226
+t226 -94zM1856 128q132 0 226 94t94 226t-94 226t-226 94q-60 0 -121 -24l174 -260q15 -23 10 -49t-27 -40q-15 -11 -36 -11q-35 0 -53 29l-174 260q-93 -95 -93 -225q0 -132 94 -226t226 -94z" />
+    <glyph glyph-name="_508" unicode="&#xf21d;" 
+d="M1408 0q0 -63 -61.5 -113.5t-164 -81t-225 -46t-253.5 -15.5t-253.5 15.5t-225 46t-164 81t-61.5 113.5q0 49 33 88.5t91 66.5t118 44.5t131 29.5q26 5 48 -10.5t26 -41.5q5 -26 -10.5 -48t-41.5 -26q-58 -10 -106 -23.5t-76.5 -25.5t-48.5 -23.5t-27.5 -19.5t-8.5 -12
+q3 -11 27 -26.5t73 -33t114 -32.5t160.5 -25t201.5 -10t201.5 10t160.5 25t114 33t73 33.5t27 27.5q-1 4 -8.5 11t-27.5 19t-48.5 23.5t-76.5 25t-106 23.5q-26 4 -41.5 26t-10.5 48q4 26 26 41.5t48 10.5q71 -12 131 -29.5t118 -44.5t91 -66.5t33 -88.5zM1024 896v-384
+q0 -26 -19 -45t-45 -19h-64v-384q0 -26 -19 -45t-45 -19h-256q-26 0 -45 19t-19 45v384h-64q-26 0 -45 19t-19 45v384q0 53 37.5 90.5t90.5 37.5h384q53 0 90.5 -37.5t37.5 -90.5zM928 1280q0 -93 -65.5 -158.5t-158.5 -65.5t-158.5 65.5t-65.5 158.5t65.5 158.5t158.5 65.5
+t158.5 -65.5t65.5 -158.5z" />
+    <glyph glyph-name="_509" unicode="&#xf21e;" horiz-adv-x="1792" 
+d="M1280 512h305q-5 -6 -10 -10.5t-9 -7.5l-3 -4l-623 -600q-18 -18 -44 -18t-44 18l-624 602q-5 2 -21 20h369q22 0 39.5 13.5t22.5 34.5l70 281l190 -667q6 -20 23 -33t39 -13q21 0 38 13t23 33l146 485l56 -112q18 -35 57 -35zM1792 940q0 -145 -103 -300h-369l-111 221
+q-8 17 -25.5 27t-36.5 8q-45 -5 -56 -46l-129 -430l-196 686q-6 20 -23.5 33t-39.5 13t-39 -13.5t-22 -34.5l-116 -464h-423q-103 155 -103 300q0 220 127 344t351 124q62 0 126.5 -21.5t120 -58t95.5 -68.5t76 -68q36 36 76 68t95.5 68.5t120 58t126.5 21.5q224 0 351 -124
+t127 -344z" />
+    <glyph glyph-name="venus" unicode="&#xf221;" horiz-adv-x="1280" 
+d="M1152 960q0 -221 -147.5 -384.5t-364.5 -187.5v-260h224q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-224v-224q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9t-9 23v224h-224q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h224v260q-150 16 -271.5 103t-186 224t-52.5 292
+q11 134 80.5 249t182 188t245.5 88q170 19 319 -54t236 -212t87 -306zM128 960q0 -185 131.5 -316.5t316.5 -131.5t316.5 131.5t131.5 316.5t-131.5 316.5t-316.5 131.5t-316.5 -131.5t-131.5 -316.5z" />
+    <glyph glyph-name="_511" unicode="&#xf222;" 
+d="M1472 1408q26 0 45 -19t19 -45v-416q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9t-9 23v262l-382 -383q126 -156 126 -359q0 -117 -45.5 -223.5t-123 -184t-184 -123t-223.5 -45.5t-223.5 45.5t-184 123t-123 184t-45.5 223.5t45.5 223.5t123 184t184 123t223.5 45.5
+q203 0 359 -126l382 382h-261q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h416zM576 0q185 0 316.5 131.5t131.5 316.5t-131.5 316.5t-316.5 131.5t-316.5 -131.5t-131.5 -316.5t131.5 -316.5t316.5 -131.5z" />
+    <glyph glyph-name="_512" unicode="&#xf223;" horiz-adv-x="1280" 
+d="M830 1220q145 -72 233.5 -210.5t88.5 -305.5q0 -221 -147.5 -384.5t-364.5 -187.5v-132h96q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-96v-96q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9t-9 23v96h-96q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h96v132q-217 24 -364.5 187.5
+t-147.5 384.5q0 167 88.5 305.5t233.5 210.5q-165 96 -228 273q-6 16 3.5 29.5t26.5 13.5h69q21 0 29 -20q44 -106 140 -171t214 -65t214 65t140 171q8 20 37 20h61q17 0 26.5 -13.5t3.5 -29.5q-63 -177 -228 -273zM576 256q185 0 316.5 131.5t131.5 316.5t-131.5 316.5
+t-316.5 131.5t-316.5 -131.5t-131.5 -316.5t131.5 -316.5t316.5 -131.5z" />
+    <glyph glyph-name="_513" unicode="&#xf224;" 
+d="M1024 1504q0 14 9 23t23 9h288q26 0 45 -19t19 -45v-288q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9t-9 23v134l-254 -255q126 -158 126 -359q0 -221 -147.5 -384.5t-364.5 -187.5v-132h96q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-96v-96q0 -14 -9 -23t-23 -9h-64
+q-14 0 -23 9t-9 23v96h-96q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h96v132q-149 16 -270.5 103t-186.5 223.5t-53 291.5q16 204 160 353.5t347 172.5q118 14 228 -19t198 -103l255 254h-134q-14 0 -23 9t-9 23v64zM576 256q185 0 316.5 131.5t131.5 316.5t-131.5 316.5
+t-316.5 131.5t-316.5 -131.5t-131.5 -316.5t131.5 -316.5t316.5 -131.5z" />
+    <glyph glyph-name="_514" unicode="&#xf225;" horiz-adv-x="1792" 
+d="M1280 1504q0 14 9 23t23 9h288q26 0 45 -19t19 -45v-288q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9t-9 23v134l-254 -255q126 -158 126 -359q0 -221 -147.5 -384.5t-364.5 -187.5v-132h96q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-96v-96q0 -14 -9 -23t-23 -9h-64
+q-14 0 -23 9t-9 23v96h-96q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h96v132q-217 24 -364.5 187.5t-147.5 384.5q0 201 126 359l-52 53l-101 -111q-9 -10 -22 -10.5t-23 7.5l-48 44q-10 8 -10.5 21.5t8.5 23.5l105 115l-111 112v-134q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9
+t-9 23v288q0 26 19 45t45 19h288q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-133l106 -107l86 94q9 10 22 10.5t23 -7.5l48 -44q10 -8 10.5 -21.5t-8.5 -23.5l-90 -99l57 -56q158 126 359 126t359 -126l255 254h-134q-14 0 -23 9t-9 23v64zM832 256q185 0 316.5 131.5
+t131.5 316.5t-131.5 316.5t-316.5 131.5t-316.5 -131.5t-131.5 -316.5t131.5 -316.5t316.5 -131.5z" />
+    <glyph glyph-name="_515" unicode="&#xf226;" horiz-adv-x="1792" 
+d="M1790 1007q12 -155 -52.5 -292t-186 -224t-271.5 -103v-260h224q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-224v-224q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9t-9 23v224h-512v-224q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9t-9 23v224h-224q-14 0 -23 9t-9 23v64q0 14 9 23
+t23 9h224v260q-150 16 -271.5 103t-186 224t-52.5 292q17 206 164.5 356.5t352.5 169.5q206 21 377 -94q171 115 377 94q205 -19 352.5 -169.5t164.5 -356.5zM896 647q128 131 128 313t-128 313q-128 -131 -128 -313t128 -313zM576 512q115 0 218 57q-154 165 -154 391
+q0 224 154 391q-103 57 -218 57q-185 0 -316.5 -131.5t-131.5 -316.5t131.5 -316.5t316.5 -131.5zM1152 128v260q-137 15 -256 94q-119 -79 -256 -94v-260h512zM1216 512q185 0 316.5 131.5t131.5 316.5t-131.5 316.5t-316.5 131.5q-115 0 -218 -57q154 -167 154 -391
+q0 -226 -154 -391q103 -57 218 -57z" />
+    <glyph glyph-name="_516" unicode="&#xf227;" horiz-adv-x="1920" 
+d="M1536 1120q0 14 9 23t23 9h288q26 0 45 -19t19 -45v-288q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9t-9 23v134l-254 -255q76 -95 107.5 -214t9.5 -247q-31 -182 -166 -312t-318 -156q-210 -29 -384.5 80t-241.5 300q-117 6 -221 57.5t-177.5 133t-113.5 192.5t-32 230
+q9 135 78 252t182 191.5t248 89.5q118 14 227.5 -19t198.5 -103l255 254h-134q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h288q26 0 45 -19t19 -45v-288q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9t-9 23v134l-254 -255q59 -74 93 -169q182 -9 328 -124l255 254h-134q-14 0 -23 9
+t-9 23v64zM1024 704q0 20 -4 58q-162 -25 -271 -150t-109 -292q0 -20 4 -58q162 25 271 150t109 292zM128 704q0 -168 111 -294t276 -149q-3 29 -3 59q0 210 135 369.5t338 196.5q-53 120 -163.5 193t-245.5 73q-185 0 -316.5 -131.5t-131.5 -316.5zM1088 -128
+q185 0 316.5 131.5t131.5 316.5q0 168 -111 294t-276 149q3 -28 3 -59q0 -210 -135 -369.5t-338 -196.5q53 -120 163.5 -193t245.5 -73z" />
+    <glyph glyph-name="_517" unicode="&#xf228;" horiz-adv-x="2048" 
+d="M1664 1504q0 14 9 23t23 9h288q26 0 45 -19t19 -45v-288q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9t-9 23v134l-254 -255q76 -95 107.5 -214t9.5 -247q-32 -180 -164.5 -310t-313.5 -157q-223 -34 -409 90q-117 -78 -256 -93v-132h96q14 0 23 -9t9 -23v-64q0 -14 -9 -23
+t-23 -9h-96v-96q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9t-9 23v96h-96q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h96v132q-155 17 -279.5 109.5t-187 237.5t-39.5 307q25 187 159.5 322.5t320.5 164.5q224 34 410 -90q146 97 320 97q201 0 359 -126l255 254h-134q-14 0 -23 9
+t-9 23v64zM896 391q128 131 128 313t-128 313q-128 -131 -128 -313t128 -313zM128 704q0 -185 131.5 -316.5t316.5 -131.5q117 0 218 57q-154 167 -154 391t154 391q-101 57 -218 57q-185 0 -316.5 -131.5t-131.5 -316.5zM1216 256q185 0 316.5 131.5t131.5 316.5
+t-131.5 316.5t-316.5 131.5q-117 0 -218 -57q154 -167 154 -391t-154 -391q101 -57 218 -57z" />
+    <glyph glyph-name="_518" unicode="&#xf229;" 
+d="M1472 1408q26 0 45 -19t19 -45v-416q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9t-9 23v262l-213 -214l140 -140q9 -10 9 -23t-9 -22l-46 -46q-9 -9 -22 -9t-23 9l-140 141l-78 -79q126 -156 126 -359q0 -117 -45.5 -223.5t-123 -184t-184 -123t-223.5 -45.5t-223.5 45.5
+t-184 123t-123 184t-45.5 223.5t45.5 223.5t123 184t184 123t223.5 45.5q203 0 359 -126l78 78l-172 172q-9 10 -9 23t9 22l46 46q9 9 22 9t23 -9l172 -172l213 213h-261q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h416zM576 0q185 0 316.5 131.5t131.5 316.5t-131.5 316.5
+t-316.5 131.5t-316.5 -131.5t-131.5 -316.5t131.5 -316.5t316.5 -131.5z" />
+    <glyph glyph-name="_519" unicode="&#xf22a;" horiz-adv-x="1280" 
+d="M640 892q217 -24 364.5 -187.5t147.5 -384.5q0 -167 -87 -306t-236 -212t-319 -54q-133 15 -245.5 88t-182 188t-80.5 249q-12 155 52.5 292t186 224t271.5 103v132h-160q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h160v165l-92 -92q-10 -9 -23 -9t-22 9l-46 46q-9 9 -9 22
+t9 23l202 201q19 19 45 19t45 -19l202 -201q9 -10 9 -23t-9 -22l-46 -46q-9 -9 -22 -9t-23 9l-92 92v-165h160q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-160v-132zM576 -128q185 0 316.5 131.5t131.5 316.5t-131.5 316.5t-316.5 131.5t-316.5 -131.5t-131.5 -316.5
+t131.5 -316.5t316.5 -131.5z" />
+    <glyph glyph-name="_520" unicode="&#xf22b;" horiz-adv-x="2048" 
+d="M1901 621q19 -19 19 -45t-19 -45l-294 -294q-9 -10 -22.5 -10t-22.5 10l-45 45q-10 9 -10 22.5t10 22.5l185 185h-294v-224q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9t-9 23v224h-132q-24 -217 -187.5 -364.5t-384.5 -147.5q-167 0 -306 87t-212 236t-54 319q15 133 88 245.5
+t188 182t249 80.5q155 12 292 -52.5t224 -186t103 -271.5h132v224q0 14 9 23t23 9h64q14 0 23 -9t9 -23v-224h294l-185 185q-10 9 -10 22.5t10 22.5l45 45q9 10 22.5 10t22.5 -10zM576 128q185 0 316.5 131.5t131.5 316.5t-131.5 316.5t-316.5 131.5t-316.5 -131.5
+t-131.5 -316.5t131.5 -316.5t316.5 -131.5z" />
+    <glyph glyph-name="_521" unicode="&#xf22c;" horiz-adv-x="1280" 
+d="M1152 960q0 -221 -147.5 -384.5t-364.5 -187.5v-612q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9t-9 23v612q-217 24 -364.5 187.5t-147.5 384.5q0 117 45.5 223.5t123 184t184 123t223.5 45.5t223.5 -45.5t184 -123t123 -184t45.5 -223.5zM576 512q185 0 316.5 131.5
+t131.5 316.5t-131.5 316.5t-316.5 131.5t-316.5 -131.5t-131.5 -316.5t131.5 -316.5t316.5 -131.5z" />
+    <glyph glyph-name="_522" unicode="&#xf22d;" horiz-adv-x="1280" 
+d="M1024 576q0 185 -131.5 316.5t-316.5 131.5t-316.5 -131.5t-131.5 -316.5t131.5 -316.5t316.5 -131.5t316.5 131.5t131.5 316.5zM1152 576q0 -117 -45.5 -223.5t-123 -184t-184 -123t-223.5 -45.5t-223.5 45.5t-184 123t-123 184t-45.5 223.5t45.5 223.5t123 184t184 123
+t223.5 45.5t223.5 -45.5t184 -123t123 -184t45.5 -223.5z" />
+    <glyph glyph-name="_523" unicode="&#xf22e;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="_524" unicode="&#xf22f;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="_525" unicode="&#xf230;" 
+d="M1451 1408q35 0 60 -25t25 -60v-1366q0 -35 -25 -60t-60 -25h-391v595h199l30 232h-229v148q0 56 23.5 84t91.5 28l122 1v207q-63 9 -178 9q-136 0 -217.5 -80t-81.5 -226v-171h-200v-232h200v-595h-735q-35 0 -60 25t-25 60v1366q0 35 25 60t60 25h1366z" />
+    <glyph glyph-name="_526" unicode="&#xf231;" horiz-adv-x="1280" 
+d="M0 939q0 108 37.5 203.5t103.5 166.5t152 123t185 78t202 26q158 0 294 -66.5t221 -193.5t85 -287q0 -96 -19 -188t-60 -177t-100 -149.5t-145 -103t-189 -38.5q-68 0 -135 32t-96 88q-10 -39 -28 -112.5t-23.5 -95t-20.5 -71t-26 -71t-32 -62.5t-46 -77.5t-62 -86.5
+l-14 -5l-9 10q-15 157 -15 188q0 92 21.5 206.5t66.5 287.5t52 203q-32 65 -32 169q0 83 52 156t132 73q61 0 95 -40.5t34 -102.5q0 -66 -44 -191t-44 -187q0 -63 45 -104.5t109 -41.5q55 0 102 25t78.5 68t56 95t38 110.5t20 111t6.5 99.5q0 173 -109.5 269.5t-285.5 96.5
+q-200 0 -334 -129.5t-134 -328.5q0 -44 12.5 -85t27 -65t27 -45.5t12.5 -30.5q0 -28 -15 -73t-37 -45q-2 0 -17 3q-51 15 -90.5 56t-61 94.5t-32.5 108t-11 106.5z" />
+    <glyph glyph-name="_527" unicode="&#xf232;" 
+d="M985 562q13 0 97.5 -44t89.5 -53q2 -5 2 -15q0 -33 -17 -76q-16 -39 -71 -65.5t-102 -26.5q-57 0 -190 62q-98 45 -170 118t-148 185q-72 107 -71 194v8q3 91 74 158q24 22 52 22q6 0 18 -1.5t19 -1.5q19 0 26.5 -6.5t15.5 -27.5q8 -20 33 -88t25 -75q0 -21 -34.5 -57.5
+t-34.5 -46.5q0 -7 5 -15q34 -73 102 -137q56 -53 151 -101q12 -7 22 -7q15 0 54 48.5t52 48.5zM782 32q127 0 243.5 50t200.5 134t134 200.5t50 243.5t-50 243.5t-134 200.5t-200.5 134t-243.5 50t-243.5 -50t-200.5 -134t-134 -200.5t-50 -243.5q0 -203 120 -368l-79 -233
+l242 77q158 -104 345 -104zM782 1414q153 0 292.5 -60t240.5 -161t161 -240.5t60 -292.5t-60 -292.5t-161 -240.5t-240.5 -161t-292.5 -60q-195 0 -365 94l-417 -134l136 405q-108 178 -108 389q0 153 60 292.5t161 240.5t240.5 161t292.5 60z" />
+    <glyph glyph-name="_528" unicode="&#xf233;" horiz-adv-x="1792" 
+d="M128 128h1024v128h-1024v-128zM128 640h1024v128h-1024v-128zM1696 192q0 40 -28 68t-68 28t-68 -28t-28 -68t28 -68t68 -28t68 28t28 68zM128 1152h1024v128h-1024v-128zM1696 704q0 40 -28 68t-68 28t-68 -28t-28 -68t28 -68t68 -28t68 28t28 68zM1696 1216
+q0 40 -28 68t-68 28t-68 -28t-28 -68t28 -68t68 -28t68 28t28 68zM1792 384v-384h-1792v384h1792zM1792 896v-384h-1792v384h1792zM1792 1408v-384h-1792v384h1792z" />
+    <glyph glyph-name="_529" unicode="&#xf234;" horiz-adv-x="2048" 
+d="M704 640q-159 0 -271.5 112.5t-112.5 271.5t112.5 271.5t271.5 112.5t271.5 -112.5t112.5 -271.5t-112.5 -271.5t-271.5 -112.5zM1664 512h352q13 0 22.5 -9.5t9.5 -22.5v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-352v-352q0 -13 -9.5 -22.5t-22.5 -9.5h-192q-13 0 -22.5 9.5
+t-9.5 22.5v352h-352q-13 0 -22.5 9.5t-9.5 22.5v192q0 13 9.5 22.5t22.5 9.5h352v352q0 13 9.5 22.5t22.5 9.5h192q13 0 22.5 -9.5t9.5 -22.5v-352zM928 288q0 -52 38 -90t90 -38h256v-238q-68 -50 -171 -50h-874q-121 0 -194 69t-73 190q0 53 3.5 103.5t14 109t26.5 108.5
+t43 97.5t62 81t85.5 53.5t111.5 20q19 0 39 -17q79 -61 154.5 -91.5t164.5 -30.5t164.5 30.5t154.5 91.5q20 17 39 17q132 0 217 -96h-223q-52 0 -90 -38t-38 -90v-192z" />
+    <glyph glyph-name="_530" unicode="&#xf235;" horiz-adv-x="2048" 
+d="M704 640q-159 0 -271.5 112.5t-112.5 271.5t112.5 271.5t271.5 112.5t271.5 -112.5t112.5 -271.5t-112.5 -271.5t-271.5 -112.5zM1781 320l249 -249q9 -9 9 -23q0 -13 -9 -22l-136 -136q-9 -9 -22 -9q-14 0 -23 9l-249 249l-249 -249q-9 -9 -23 -9q-13 0 -22 9l-136 136
+q-9 9 -9 22q0 14 9 23l249 249l-249 249q-9 9 -9 23q0 13 9 22l136 136q9 9 22 9q14 0 23 -9l249 -249l249 249q9 9 23 9q13 0 22 -9l136 -136q9 -9 9 -22q0 -14 -9 -23zM1283 320l-181 -181q-37 -37 -37 -91q0 -53 37 -90l83 -83q-21 -3 -44 -3h-874q-121 0 -194 69
+t-73 190q0 53 3.5 103.5t14 109t26.5 108.5t43 97.5t62 81t85.5 53.5t111.5 20q19 0 39 -17q154 -122 319 -122t319 122q20 17 39 17q28 0 57 -6q-28 -27 -41 -50t-13 -56q0 -54 37 -91z" />
+    <glyph glyph-name="_531" unicode="&#xf236;" horiz-adv-x="2048" 
+d="M256 512h1728q26 0 45 -19t19 -45v-448h-256v256h-1536v-256h-256v1216q0 26 19 45t45 19h128q26 0 45 -19t19 -45v-704zM832 832q0 106 -75 181t-181 75t-181 -75t-75 -181t75 -181t181 -75t181 75t75 181zM2048 576v64q0 159 -112.5 271.5t-271.5 112.5h-704
+q-26 0 -45 -19t-19 -45v-384h1152z" />
+    <glyph glyph-name="_532" unicode="&#xf237;" 
+d="M1536 1536l-192 -448h192v-192h-274l-55 -128h329v-192h-411l-357 -832l-357 832h-411v192h329l-55 128h-274v192h192l-192 448h256l323 -768h378l323 768h256zM768 320l108 256h-216z" />
+    <glyph glyph-name="_533" unicode="&#xf238;" 
+d="M1088 1536q185 0 316.5 -93.5t131.5 -226.5v-896q0 -130 -125.5 -222t-305.5 -97l213 -202q16 -15 8 -35t-30 -20h-1056q-22 0 -30 20t8 35l213 202q-180 5 -305.5 97t-125.5 222v896q0 133 131.5 226.5t316.5 93.5h640zM768 192q80 0 136 56t56 136t-56 136t-136 56
+t-136 -56t-56 -136t56 -136t136 -56zM1344 768v512h-1152v-512h1152z" />
+    <glyph glyph-name="_534" unicode="&#xf239;" 
+d="M1088 1536q185 0 316.5 -93.5t131.5 -226.5v-896q0 -130 -125.5 -222t-305.5 -97l213 -202q16 -15 8 -35t-30 -20h-1056q-22 0 -30 20t8 35l213 202q-180 5 -305.5 97t-125.5 222v896q0 133 131.5 226.5t316.5 93.5h640zM288 224q66 0 113 47t47 113t-47 113t-113 47
+t-113 -47t-47 -113t47 -113t113 -47zM704 768v512h-544v-512h544zM1248 224q66 0 113 47t47 113t-47 113t-113 47t-113 -47t-47 -113t47 -113t113 -47zM1408 768v512h-576v-512h576z" />
+    <glyph glyph-name="_535" unicode="&#xf23a;" horiz-adv-x="1792" 
+d="M597 1115v-1173q0 -25 -12.5 -42.5t-36.5 -17.5q-17 0 -33 8l-465 233q-21 10 -35.5 33.5t-14.5 46.5v1140q0 20 10 34t29 14q14 0 44 -15l511 -256q3 -3 3 -5zM661 1014l534 -866l-534 266v600zM1792 996v-1054q0 -25 -14 -40.5t-38 -15.5t-47 13l-441 220zM1789 1116
+q0 -3 -256.5 -419.5t-300.5 -487.5l-390 634l324 527q17 28 52 28q14 0 26 -6l541 -270q4 -2 4 -6z" />
+    <glyph glyph-name="_536" unicode="&#xf23b;" 
+d="M809 532l266 499h-112l-157 -312q-24 -48 -44 -92l-42 92l-155 312h-120l263 -493v-324h101v318zM1536 1408v-1536h-1536v1536h1536z" />
+    <glyph glyph-name="_537" unicode="&#xf23c;" horiz-adv-x="2296" 
+d="M478 -139q-8 -16 -27 -34.5t-37 -25.5q-25 -9 -51.5 3.5t-28.5 31.5q-1 22 40 55t68 38q23 4 34 -21.5t2 -46.5zM1819 -139q7 -16 26 -34.5t38 -25.5q25 -9 51.5 3.5t27.5 31.5q2 22 -39.5 55t-68.5 38q-22 4 -33 -21.5t-2 -46.5zM1867 -30q13 -27 56.5 -59.5t77.5 -41.5
+q45 -13 82 4.5t37 50.5q0 46 -67.5 100.5t-115.5 59.5q-40 5 -63.5 -37.5t-6.5 -76.5zM428 -30q-13 -27 -56 -59.5t-77 -41.5q-45 -13 -82 4.5t-37 50.5q0 46 67.5 100.5t115.5 59.5q40 5 63 -37.5t6 -76.5zM1158 1094h1q-41 0 -76 -15q27 -8 44 -30.5t17 -49.5
+q0 -35 -27 -60t-65 -25q-52 0 -80 43q-5 -23 -5 -42q0 -74 56 -126.5t135 -52.5q80 0 136 52.5t56 126.5t-56 126.5t-136 52.5zM1462 1312q-99 109 -220.5 131.5t-245.5 -44.5q27 60 82.5 96.5t118 39.5t121.5 -17t99.5 -74.5t44.5 -131.5zM2212 73q8 -11 -11 -42
+q7 -23 7 -40q1 -56 -44.5 -112.5t-109.5 -91.5t-118 -37q-48 -2 -92 21.5t-66 65.5q-687 -25 -1259 0q-23 -41 -66.5 -65t-92.5 -22q-86 3 -179.5 80.5t-92.5 160.5q2 22 7 40q-19 31 -11 42q6 10 31 1q14 22 41 51q-7 29 2 38q11 10 39 -4q29 20 59 34q0 29 13 37
+q23 12 51 -16q35 5 61 -2q18 -4 38 -19v73q-11 0 -18 2q-53 10 -97 44.5t-55 87.5q-9 38 0 81q15 62 93 95q2 17 19 35.5t36 23.5t33 -7.5t19 -30.5h13q46 -5 60 -23q3 -3 5 -7q10 1 30.5 3.5t30.5 3.5q-15 11 -30 17q-23 40 -91 43q0 6 1 10q-62 2 -118.5 18.5t-84.5 47.5
+q-32 36 -42.5 92t-2.5 112q16 126 90 179q23 16 52 4.5t32 -40.5q0 -1 1.5 -14t2.5 -21t3 -20t5.5 -19t8.5 -10q27 -14 76 -12q48 46 98 74q-40 4 -162 -14l47 46q61 58 163 111q145 73 282 86q-20 8 -41 15.5t-47 14t-42.5 10.5t-47.5 11t-43 10q595 126 904 -139
+q98 -84 158 -222q85 -10 121 9h1q5 3 8.5 10t5.5 19t3 19.5t3 21.5l1 14q3 28 32 40t52 -5q73 -52 91 -178q7 -57 -3.5 -113t-42.5 -91q-28 -32 -83.5 -48.5t-115.5 -18.5v-10q-71 -2 -95 -43q-14 -5 -31 -17q11 -1 32 -3.5t30 -3.5q1 5 5 8q16 18 60 23h13q5 18 19 30t33 8
+t36 -23t19 -36q79 -32 93 -95q9 -40 1 -81q-12 -53 -56 -88t-97 -44q-10 -2 -17 -2q0 -49 -1 -73q20 15 38 19q26 7 61 2q28 28 51 16q14 -9 14 -37q33 -16 59 -34q27 13 38 4q10 -10 2 -38q28 -30 41 -51q23 8 31 -1zM1937 1025q0 -29 -9 -54q82 -32 112 -132
+q4 37 -9.5 98.5t-41.5 90.5q-20 19 -36 17t-16 -20zM1859 925q35 -42 47.5 -108.5t-0.5 -124.5q67 13 97 45q13 14 18 28q-3 64 -31 114.5t-79 66.5q-15 -15 -52 -21zM1822 921q-30 0 -44 1q42 -115 53 -239q21 0 43 3q16 68 1 135t-53 100zM258 839q30 100 112 132
+q-9 25 -9 54q0 18 -16.5 20t-35.5 -17q-28 -29 -41.5 -90.5t-9.5 -98.5zM294 737q29 -31 97 -45q-13 58 -0.5 124.5t47.5 108.5v0q-37 6 -52 21q-51 -16 -78.5 -66t-31.5 -115q9 -17 18 -28zM471 683q14 124 73 235q-19 -4 -55 -18l-45 -19v1q-46 -89 -20 -196q25 -3 47 -3z
+M1434 644q8 -38 16.5 -108.5t11.5 -89.5q3 -18 9.5 -21.5t23.5 4.5q40 20 62 85.5t23 125.5q-24 2 -146 4zM1152 1285q-116 0 -199 -82.5t-83 -198.5q0 -117 83 -199.5t199 -82.5t199 82.5t83 199.5q0 116 -83 198.5t-199 82.5zM1380 646q-105 2 -211 0v1q-1 -27 2.5 -86
+t13.5 -66q29 -14 93.5 -14.5t95.5 10.5q9 3 11 39t-0.5 69.5t-4.5 46.5zM1112 447q8 4 9.5 48t-0.5 88t-4 63v1q-212 -3 -214 -3q-4 -20 -7 -62t0 -83t14 -46q34 -15 101 -16t101 10zM718 636q-16 -59 4.5 -118.5t77.5 -84.5q15 -8 24 -5t12 21q3 16 8 90t10 103
+q-69 -2 -136 -6zM591 510q3 -23 -34 -36q132 -141 271.5 -240t305.5 -154q172 49 310.5 146t293.5 250q-33 13 -30 34q0 2 0.5 3.5t1.5 3t1 2.5v1v-1q-17 2 -50 5.5t-48 4.5q-26 -90 -82 -132q-51 -38 -82 1q-5 6 -9 14q-7 13 -17 62q-2 -5 -5 -9t-7.5 -7t-8 -5.5t-9.5 -4
+l-10 -2.5t-12 -2l-12 -1.5t-13.5 -1t-13.5 -0.5q-106 -9 -163 11q-4 -17 -10 -26.5t-21 -15t-23 -7t-36 -3.5q-6 -1 -9 -1q-179 -17 -203 40q-2 -63 -56 -54q-47 8 -91 54q-12 13 -20 26q-17 29 -26 65q-58 -6 -87 -10q1 -2 4 -10zM507 -118q3 14 3 30q-17 71 -51 130
+t-73 70q-41 12 -101.5 -14.5t-104.5 -80t-39 -107.5q35 -53 100 -93t119 -42q51 -2 94 28t53 79zM510 53q23 -63 27 -119q195 113 392 174q-98 52 -180.5 120t-179.5 165q-6 -4 -29 -13q0 -1 -1 -4t-1 -5q31 -18 22 -37q-12 -23 -56 -34q-10 -13 -29 -24h-1q-2 -83 1 -150
+q19 -34 35 -73zM579 -113q532 -21 1145 0q-254 147 -428 196q-76 -35 -156 -57q-8 -3 -16 0q-65 21 -129 49q-208 -60 -416 -188h-1v-1q1 0 1 1zM1763 -67q4 54 28 120q14 38 33 71l-1 -1q3 77 3 153q-15 8 -30 25q-42 9 -56 33q-9 20 22 38q-2 4 -2 9q-16 4 -28 12
+q-204 -190 -383 -284q198 -59 414 -176zM2155 -90q5 54 -39 107.5t-104 80t-102 14.5q-38 -11 -72.5 -70.5t-51.5 -129.5q0 -16 3 -30q10 -49 53 -79t94 -28q54 2 119 42t100 93z" />
+    <glyph glyph-name="_538" unicode="&#xf23d;" horiz-adv-x="2304" 
+d="M1524 -25q0 -68 -48 -116t-116 -48t-116.5 48t-48.5 116t48.5 116.5t116.5 48.5t116 -48.5t48 -116.5zM775 -25q0 -68 -48.5 -116t-116.5 -48t-116 48t-48 116t48 116.5t116 48.5t116.5 -48.5t48.5 -116.5zM0 1469q57 -60 110.5 -104.5t121 -82t136 -63t166 -45.5
+t200 -31.5t250 -18.5t304 -9.5t372.5 -2.5q139 0 244.5 -5t181 -16.5t124 -27.5t71 -39.5t24 -51.5t-19.5 -64t-56.5 -76.5t-89.5 -91t-116 -104.5t-139 -119q-185 -157 -286 -247q29 51 76.5 109t94 105.5t94.5 98.5t83 91.5t54 80.5t13 70t-45.5 55.5t-116.5 41t-204 23.5
+t-304 5q-168 -2 -314 6t-256 23t-204.5 41t-159.5 51.5t-122.5 62.5t-91.5 66.5t-68 71.5t-50.5 69.5t-40 68t-36.5 59.5z" />
+    <glyph glyph-name="_539" unicode="&#xf23e;" horiz-adv-x="1792" 
+d="M896 1472q-169 0 -323 -66t-265.5 -177.5t-177.5 -265.5t-66 -323t66 -323t177.5 -265.5t265.5 -177.5t323 -66t323 66t265.5 177.5t177.5 265.5t66 323t-66 323t-177.5 265.5t-265.5 177.5t-323 66zM896 1536q182 0 348 -71t286 -191t191 -286t71 -348t-71 -348
+t-191 -286t-286 -191t-348 -71t-348 71t-286 191t-191 286t-71 348t71 348t191 286t286 191t348 71zM496 704q16 0 16 -16v-480q0 -16 -16 -16h-32q-16 0 -16 16v480q0 16 16 16h32zM896 640q53 0 90.5 -37.5t37.5 -90.5q0 -35 -17.5 -64t-46.5 -46v-114q0 -14 -9 -23
+t-23 -9h-64q-14 0 -23 9t-9 23v114q-29 17 -46.5 46t-17.5 64q0 53 37.5 90.5t90.5 37.5zM896 1408q209 0 385.5 -103t279.5 -279.5t103 -385.5t-103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103zM544 928v-96
+q0 -14 9 -23t23 -9h64q14 0 23 9t9 23v96q0 93 65.5 158.5t158.5 65.5t158.5 -65.5t65.5 -158.5v-96q0 -14 9 -23t23 -9h64q14 0 23 9t9 23v96q0 146 -103 249t-249 103t-249 -103t-103 -249zM1408 192v512q0 26 -19 45t-45 19h-896q-26 0 -45 -19t-19 -45v-512
+q0 -26 19 -45t45 -19h896q26 0 45 19t19 45z" />
+    <glyph glyph-name="_540" unicode="&#xf240;" horiz-adv-x="2304" 
+d="M1920 1024v-768h-1664v768h1664zM2048 448h128v384h-128v288q0 14 -9 23t-23 9h-1856q-14 0 -23 -9t-9 -23v-960q0 -14 9 -23t23 -9h1856q14 0 23 9t9 23v288zM2304 832v-384q0 -53 -37.5 -90.5t-90.5 -37.5v-160q0 -66 -47 -113t-113 -47h-1856q-66 0 -113 47t-47 113
+v960q0 66 47 113t113 47h1856q66 0 113 -47t47 -113v-160q53 0 90.5 -37.5t37.5 -90.5z" />
+    <glyph glyph-name="_541" unicode="&#xf241;" horiz-adv-x="2304" 
+d="M256 256v768h1280v-768h-1280zM2176 960q53 0 90.5 -37.5t37.5 -90.5v-384q0 -53 -37.5 -90.5t-90.5 -37.5v-160q0 -66 -47 -113t-113 -47h-1856q-66 0 -113 47t-47 113v960q0 66 47 113t113 47h1856q66 0 113 -47t47 -113v-160zM2176 448v384h-128v288q0 14 -9 23t-23 9
+h-1856q-14 0 -23 -9t-9 -23v-960q0 -14 9 -23t23 -9h1856q14 0 23 9t9 23v288h128z" />
+    <glyph glyph-name="_542" unicode="&#xf242;" horiz-adv-x="2304" 
+d="M256 256v768h896v-768h-896zM2176 960q53 0 90.5 -37.5t37.5 -90.5v-384q0 -53 -37.5 -90.5t-90.5 -37.5v-160q0 -66 -47 -113t-113 -47h-1856q-66 0 -113 47t-47 113v960q0 66 47 113t113 47h1856q66 0 113 -47t47 -113v-160zM2176 448v384h-128v288q0 14 -9 23t-23 9
+h-1856q-14 0 -23 -9t-9 -23v-960q0 -14 9 -23t23 -9h1856q14 0 23 9t9 23v288h128z" />
+    <glyph glyph-name="_543" unicode="&#xf243;" horiz-adv-x="2304" 
+d="M256 256v768h512v-768h-512zM2176 960q53 0 90.5 -37.5t37.5 -90.5v-384q0 -53 -37.5 -90.5t-90.5 -37.5v-160q0 -66 -47 -113t-113 -47h-1856q-66 0 -113 47t-47 113v960q0 66 47 113t113 47h1856q66 0 113 -47t47 -113v-160zM2176 448v384h-128v288q0 14 -9 23t-23 9
+h-1856q-14 0 -23 -9t-9 -23v-960q0 -14 9 -23t23 -9h1856q14 0 23 9t9 23v288h128z" />
+    <glyph glyph-name="_544" unicode="&#xf244;" horiz-adv-x="2304" 
+d="M2176 960q53 0 90.5 -37.5t37.5 -90.5v-384q0 -53 -37.5 -90.5t-90.5 -37.5v-160q0 -66 -47 -113t-113 -47h-1856q-66 0 -113 47t-47 113v960q0 66 47 113t113 47h1856q66 0 113 -47t47 -113v-160zM2176 448v384h-128v288q0 14 -9 23t-23 9h-1856q-14 0 -23 -9t-9 -23
+v-960q0 -14 9 -23t23 -9h1856q14 0 23 9t9 23v288h128z" />
+    <glyph glyph-name="_545" unicode="&#xf245;" horiz-adv-x="1280" 
+d="M1133 493q31 -30 14 -69q-17 -40 -59 -40h-382l201 -476q10 -25 0 -49t-34 -35l-177 -75q-25 -10 -49 0t-35 34l-191 452l-312 -312q-19 -19 -45 -19q-12 0 -24 5q-40 17 -40 59v1504q0 42 40 59q12 5 24 5q27 0 45 -19z" />
+    <glyph glyph-name="_546" unicode="&#xf246;" horiz-adv-x="1024" 
+d="M832 1408q-320 0 -320 -224v-416h128v-128h-128v-544q0 -224 320 -224h64v-128h-64q-272 0 -384 146q-112 -146 -384 -146h-64v128h64q320 0 320 224v544h-128v128h128v416q0 224 -320 224h-64v128h64q272 0 384 -146q112 146 384 146h64v-128h-64z" />
+    <glyph glyph-name="_547" unicode="&#xf247;" horiz-adv-x="2048" 
+d="M2048 1152h-128v-1024h128v-384h-384v128h-1280v-128h-384v384h128v1024h-128v384h384v-128h1280v128h384v-384zM1792 1408v-128h128v128h-128zM128 1408v-128h128v128h-128zM256 -128v128h-128v-128h128zM1664 0v128h128v1024h-128v128h-1280v-128h-128v-1024h128v-128
+h1280zM1920 -128v128h-128v-128h128zM1280 896h384v-768h-896v256h-384v768h896v-256zM512 512h640v512h-640v-512zM1536 256v512h-256v-384h-384v-128h640z" />
+    <glyph glyph-name="_548" unicode="&#xf248;" horiz-adv-x="2304" 
+d="M2304 768h-128v-640h128v-384h-384v128h-896v-128h-384v384h128v128h-384v-128h-384v384h128v640h-128v384h384v-128h896v128h384v-384h-128v-128h384v128h384v-384zM2048 1024v-128h128v128h-128zM1408 1408v-128h128v128h-128zM128 1408v-128h128v128h-128zM256 256
+v128h-128v-128h128zM1536 384h-128v-128h128v128zM384 384h896v128h128v640h-128v128h-896v-128h-128v-640h128v-128zM896 -128v128h-128v-128h128zM2176 -128v128h-128v-128h128zM2048 128v640h-128v128h-384v-384h128v-384h-384v128h-384v-128h128v-128h896v128h128z" />
+    <glyph glyph-name="_549" unicode="&#xf249;" 
+d="M1024 288v-416h-928q-40 0 -68 28t-28 68v1344q0 40 28 68t68 28h1344q40 0 68 -28t28 -68v-928h-416q-40 0 -68 -28t-28 -68zM1152 256h381q-15 -82 -65 -132l-184 -184q-50 -50 -132 -65v381z" />
+    <glyph glyph-name="_550" unicode="&#xf24a;" 
+d="M1400 256h-248v-248q29 10 41 22l185 185q12 12 22 41zM1120 384h288v896h-1280v-1280h896v288q0 40 28 68t68 28zM1536 1312v-1024q0 -40 -20 -88t-48 -76l-184 -184q-28 -28 -76 -48t-88 -20h-1024q-40 0 -68 28t-28 68v1344q0 40 28 68t68 28h1344q40 0 68 -28t28 -68
+z" />
+    <glyph glyph-name="_551" unicode="&#xf24b;" horiz-adv-x="2304" 
+d="M1951 538q0 -26 -15.5 -44.5t-38.5 -23.5q-8 -2 -18 -2h-153v140h153q10 0 18 -2q23 -5 38.5 -23.5t15.5 -44.5zM1933 751q0 -25 -15 -42t-38 -21q-3 -1 -15 -1h-139v129h139q3 0 8.5 -0.5t6.5 -0.5q23 -4 38 -21.5t15 -42.5zM728 587v308h-228v-308q0 -58 -38 -94.5
+t-105 -36.5q-108 0 -229 59v-112q53 -15 121 -23t109 -9l42 -1q328 0 328 217zM1442 403v113q-99 -52 -200 -59q-108 -8 -169 41t-61 142t61 142t169 41q101 -7 200 -58v112q-48 12 -100 19.5t-80 9.5l-28 2q-127 6 -218.5 -14t-140.5 -60t-71 -88t-22 -106t22 -106t71 -88
+t140.5 -60t218.5 -14q101 4 208 31zM2176 518q0 54 -43 88.5t-109 39.5v3q57 8 89 41.5t32 79.5q0 55 -41 88t-107 36q-3 0 -12 0.5t-14 0.5h-455v-510h491q74 0 121.5 36.5t47.5 96.5zM2304 1280v-1280q0 -52 -38 -90t-90 -38h-2048q-52 0 -90 38t-38 90v1280q0 52 38 90
+t90 38h2048q52 0 90 -38t38 -90z" />
+    <glyph glyph-name="_552" unicode="&#xf24c;" horiz-adv-x="2304" 
+d="M858 295v693q-106 -41 -172 -135.5t-66 -211.5t66 -211.5t172 -134.5zM1362 641q0 117 -66 211.5t-172 135.5v-694q106 41 172 135.5t66 211.5zM1577 641q0 -159 -78.5 -294t-213.5 -213.5t-294 -78.5q-119 0 -227.5 46.5t-187 125t-125 187t-46.5 227.5q0 159 78.5 294
+t213.5 213.5t294 78.5t294 -78.5t213.5 -213.5t78.5 -294zM1960 634q0 139 -55.5 261.5t-147.5 205.5t-213.5 131t-252.5 48h-301q-176 0 -323.5 -81t-235 -230t-87.5 -335q0 -171 87 -317.5t236 -231.5t323 -85h301q129 0 251.5 50.5t214.5 135t147.5 202.5t55.5 246z
+M2304 1280v-1280q0 -52 -38 -90t-90 -38h-2048q-52 0 -90 38t-38 90v1280q0 52 38 90t90 38h2048q52 0 90 -38t38 -90z" />
+    <glyph glyph-name="_553" unicode="&#xf24d;" horiz-adv-x="1792" 
+d="M1664 -96v1088q0 13 -9.5 22.5t-22.5 9.5h-1088q-13 0 -22.5 -9.5t-9.5 -22.5v-1088q0 -13 9.5 -22.5t22.5 -9.5h1088q13 0 22.5 9.5t9.5 22.5zM1792 992v-1088q0 -66 -47 -113t-113 -47h-1088q-66 0 -113 47t-47 113v1088q0 66 47 113t113 47h1088q66 0 113 -47t47 -113
+zM1408 1376v-160h-128v160q0 13 -9.5 22.5t-22.5 9.5h-1088q-13 0 -22.5 -9.5t-9.5 -22.5v-1088q0 -13 9.5 -22.5t22.5 -9.5h160v-128h-160q-66 0 -113 47t-47 113v1088q0 66 47 113t113 47h1088q66 0 113 -47t47 -113z" />
+    <glyph glyph-name="_554" unicode="&#xf24e;" horiz-adv-x="2304" 
+d="M1728 1088l-384 -704h768zM448 1088l-384 -704h768zM1269 1280q-14 -40 -45.5 -71.5t-71.5 -45.5v-1291h608q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-1344q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h608v1291q-40 14 -71.5 45.5t-45.5 71.5h-491q-14 0 -23 9t-9 23v64
+q0 14 9 23t23 9h491q21 57 70 92.5t111 35.5t111 -35.5t70 -92.5h491q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-491zM1088 1264q33 0 56.5 23.5t23.5 56.5t-23.5 56.5t-56.5 23.5t-56.5 -23.5t-23.5 -56.5t23.5 -56.5t56.5 -23.5zM2176 384q0 -73 -46.5 -131t-117.5 -91
+t-144.5 -49.5t-139.5 -16.5t-139.5 16.5t-144.5 49.5t-117.5 91t-46.5 131q0 11 35 81t92 174.5t107 195.5t102 184t56 100q18 33 56 33t56 -33q4 -7 56 -100t102 -184t107 -195.5t92 -174.5t35 -81zM896 384q0 -73 -46.5 -131t-117.5 -91t-144.5 -49.5t-139.5 -16.5
+t-139.5 16.5t-144.5 49.5t-117.5 91t-46.5 131q0 11 35 81t92 174.5t107 195.5t102 184t56 100q18 33 56 33t56 -33q4 -7 56 -100t102 -184t107 -195.5t92 -174.5t35 -81z" />
+    <glyph glyph-name="_555" unicode="&#xf250;" 
+d="M1408 1408q0 -261 -106.5 -461.5t-266.5 -306.5q160 -106 266.5 -306.5t106.5 -461.5h96q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-1472q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h96q0 261 106.5 461.5t266.5 306.5q-160 106 -266.5 306.5t-106.5 461.5h-96q-14 0 -23 9
+t-9 23v64q0 14 9 23t23 9h1472q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-96zM874 700q77 29 149 92.5t129.5 152.5t92.5 210t35 253h-1024q0 -132 35 -253t92.5 -210t129.5 -152.5t149 -92.5q19 -7 30.5 -23.5t11.5 -36.5t-11.5 -36.5t-30.5 -23.5q-77 -29 -149 -92.5
+t-129.5 -152.5t-92.5 -210t-35 -253h1024q0 132 -35 253t-92.5 210t-129.5 152.5t-149 92.5q-19 7 -30.5 23.5t-11.5 36.5t11.5 36.5t30.5 23.5z" />
+    <glyph glyph-name="_556" unicode="&#xf251;" 
+d="M1408 1408q0 -261 -106.5 -461.5t-266.5 -306.5q160 -106 266.5 -306.5t106.5 -461.5h96q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-1472q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h96q0 261 106.5 461.5t266.5 306.5q-160 106 -266.5 306.5t-106.5 461.5h-96q-14 0 -23 9
+t-9 23v64q0 14 9 23t23 9h1472q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-96zM1280 1408h-1024q0 -66 9 -128h1006q9 61 9 128zM1280 -128q0 130 -34 249.5t-90.5 208t-126.5 152t-146 94.5h-230q-76 -31 -146 -94.5t-126.5 -152t-90.5 -208t-34 -249.5h1024z" />
+    <glyph glyph-name="_557" unicode="&#xf252;" 
+d="M1408 1408q0 -261 -106.5 -461.5t-266.5 -306.5q160 -106 266.5 -306.5t106.5 -461.5h96q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-1472q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h96q0 261 106.5 461.5t266.5 306.5q-160 106 -266.5 306.5t-106.5 461.5h-96q-14 0 -23 9
+t-9 23v64q0 14 9 23t23 9h1472q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-96zM1280 1408h-1024q0 -206 85 -384h854q85 178 85 384zM1223 192q-54 141 -145.5 241.5t-194.5 142.5h-230q-103 -42 -194.5 -142.5t-145.5 -241.5h910z" />
+    <glyph glyph-name="_558" unicode="&#xf253;" 
+d="M1408 1408q0 -261 -106.5 -461.5t-266.5 -306.5q160 -106 266.5 -306.5t106.5 -461.5h96q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-1472q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h96q0 261 106.5 461.5t266.5 306.5q-160 106 -266.5 306.5t-106.5 461.5h-96q-14 0 -23 9
+t-9 23v64q0 14 9 23t23 9h1472q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-96zM874 700q77 29 149 92.5t129.5 152.5t92.5 210t35 253h-1024q0 -132 35 -253t92.5 -210t129.5 -152.5t149 -92.5q19 -7 30.5 -23.5t11.5 -36.5t-11.5 -36.5t-30.5 -23.5q-137 -51 -244 -196
+h700q-107 145 -244 196q-19 7 -30.5 23.5t-11.5 36.5t11.5 36.5t30.5 23.5z" />
+    <glyph glyph-name="_559" unicode="&#xf254;" 
+d="M1504 -64q14 0 23 -9t9 -23v-128q0 -14 -9 -23t-23 -9h-1472q-14 0 -23 9t-9 23v128q0 14 9 23t23 9h1472zM130 0q3 55 16 107t30 95t46 87t53.5 76t64.5 69.5t66 60t70.5 55t66.5 47.5t65 43q-43 28 -65 43t-66.5 47.5t-70.5 55t-66 60t-64.5 69.5t-53.5 76t-46 87
+t-30 95t-16 107h1276q-3 -55 -16 -107t-30 -95t-46 -87t-53.5 -76t-64.5 -69.5t-66 -60t-70.5 -55t-66.5 -47.5t-65 -43q43 -28 65 -43t66.5 -47.5t70.5 -55t66 -60t64.5 -69.5t53.5 -76t46 -87t30 -95t16 -107h-1276zM1504 1536q14 0 23 -9t9 -23v-128q0 -14 -9 -23t-23 -9
+h-1472q-14 0 -23 9t-9 23v128q0 14 9 23t23 9h1472z" />
+    <glyph glyph-name="_560" unicode="&#xf255;" 
+d="M768 1152q-53 0 -90.5 -37.5t-37.5 -90.5v-128h-32v93q0 48 -32 81.5t-80 33.5q-46 0 -79 -33t-33 -79v-429l-32 30v172q0 48 -32 81.5t-80 33.5q-46 0 -79 -33t-33 -79v-224q0 -47 35 -82l310 -296q39 -39 39 -102q0 -26 19 -45t45 -19h640q26 0 45 19t19 45v25
+q0 41 10 77l108 436q10 36 10 77v246q0 48 -32 81.5t-80 33.5q-46 0 -79 -33t-33 -79v-32h-32v125q0 40 -25 72.5t-64 40.5q-14 2 -23 2q-46 0 -79 -33t-33 -79v-128h-32v122q0 51 -32.5 89.5t-82.5 43.5q-5 1 -13 1zM768 1280q84 0 149 -50q57 34 123 34q59 0 111 -27
+t86 -76q27 7 59 7q100 0 170 -71.5t70 -171.5v-246q0 -51 -13 -108l-109 -436q-6 -24 -6 -71q0 -80 -56 -136t-136 -56h-640q-84 0 -138 58.5t-54 142.5l-308 296q-76 73 -76 175v224q0 99 70.5 169.5t169.5 70.5q11 0 16 -1q6 95 75.5 160t164.5 65q52 0 98 -21
+q72 69 174 69z" />
+    <glyph glyph-name="_561" unicode="&#xf256;" horiz-adv-x="1792" 
+d="M880 1408q-46 0 -79 -33t-33 -79v-656h-32v528q0 46 -33 79t-79 33t-79 -33t-33 -79v-528v-256l-154 205q-38 51 -102 51q-53 0 -90.5 -37.5t-37.5 -90.5q0 -43 26 -77l384 -512q38 -51 102 -51h688q34 0 61 22t34 56l76 405q5 32 5 59v498q0 46 -33 79t-79 33t-79 -33
+t-33 -79v-272h-32v528q0 46 -33 79t-79 33t-79 -33t-33 -79v-528h-32v656q0 46 -33 79t-79 33zM880 1536q68 0 125.5 -35.5t88.5 -96.5q19 4 42 4q99 0 169.5 -70.5t70.5 -169.5v-17q105 6 180.5 -64t75.5 -175v-498q0 -40 -8 -83l-76 -404q-14 -79 -76.5 -131t-143.5 -52
+h-688q-60 0 -114.5 27.5t-90.5 74.5l-384 512q-51 68 -51 154q0 106 75 181t181 75q78 0 128 -34v434q0 99 70.5 169.5t169.5 70.5q23 0 42 -4q31 61 88.5 96.5t125.5 35.5z" />
+    <glyph glyph-name="_562" unicode="&#xf257;" horiz-adv-x="1792" 
+d="M1073 -128h-177q-163 0 -226 141q-23 49 -23 102v5q-62 30 -98.5 88.5t-36.5 127.5q0 38 5 48h-261q-106 0 -181 75t-75 181t75 181t181 75h113l-44 17q-74 28 -119.5 93.5t-45.5 145.5q0 106 75 181t181 75q46 0 91 -17l628 -239h401q106 0 181 -75t75 -181v-668
+q0 -88 -54 -157.5t-140 -90.5l-339 -85q-92 -23 -186 -23zM1024 583l-155 -71l-163 -74q-30 -14 -48 -41.5t-18 -60.5q0 -46 33 -79t79 -33q26 0 46 10l338 154q-49 10 -80.5 50t-31.5 90v55zM1344 272q0 46 -33 79t-79 33q-26 0 -46 -10l-290 -132q-28 -13 -37 -17
+t-30.5 -17t-29.5 -23.5t-16 -29t-8 -40.5q0 -50 31.5 -82t81.5 -32q20 0 38 9l352 160q30 14 48 41.5t18 60.5zM1112 1024l-650 248q-24 8 -46 8q-53 0 -90.5 -37.5t-37.5 -90.5q0 -40 22.5 -73t59.5 -47l526 -200v-64h-640q-53 0 -90.5 -37.5t-37.5 -90.5t37.5 -90.5
+t90.5 -37.5h535l233 106v198q0 63 46 106l111 102h-69zM1073 0q82 0 155 19l339 85q43 11 70 45.5t27 78.5v668q0 53 -37.5 90.5t-90.5 37.5h-308l-136 -126q-36 -33 -36 -82v-296q0 -46 33 -77t79 -31t79 35t33 81v208h32v-208q0 -70 -57 -114q52 -8 86.5 -48.5t34.5 -93.5
+q0 -42 -23 -78t-61 -53l-310 -141h91z" />
+    <glyph glyph-name="_563" unicode="&#xf258;" horiz-adv-x="2048" 
+d="M1151 1536q61 0 116 -28t91 -77l572 -781q118 -159 118 -359v-355q0 -80 -56 -136t-136 -56h-384q-80 0 -136 56t-56 136v177l-286 143h-546q-80 0 -136 56t-56 136v32q0 119 84.5 203.5t203.5 84.5h420l42 128h-686q-100 0 -173.5 67.5t-81.5 166.5q-65 79 -65 182v32
+q0 80 56 136t136 56h959zM1920 -64v355q0 157 -93 284l-573 781q-39 52 -103 52h-959q-26 0 -45 -19t-19 -45q0 -32 1.5 -49.5t9.5 -40.5t25 -43q10 31 35.5 50t56.5 19h832v-32h-832q-26 0 -45 -19t-19 -45q0 -44 3 -58q8 -44 44 -73t81 -29h640h91q40 0 68 -28t28 -68
+q0 -15 -5 -30l-64 -192q-10 -29 -35 -47.5t-56 -18.5h-443q-66 0 -113 -47t-47 -113v-32q0 -26 19 -45t45 -19h561q16 0 29 -7l317 -158q24 -13 38.5 -36t14.5 -50v-197q0 -26 19 -45t45 -19h384q26 0 45 19t19 45z" />
+    <glyph glyph-name="_564" unicode="&#xf259;" horiz-adv-x="2048" 
+d="M459 -256q-77 0 -137.5 47.5t-79.5 122.5l-101 401q-13 57 -13 108q0 45 -5 67l-116 477q-7 27 -7 57q0 93 62 161t155 78q17 85 82.5 139t152.5 54q83 0 148 -51.5t85 -132.5l83 -348l103 428q20 81 85 132.5t148 51.5q89 0 155.5 -57.5t80.5 -144.5q92 -10 152 -79
+t60 -162q0 -24 -7 -59l-123 -512q10 7 37.5 28.5t38.5 29.5t35 23t41 20.5t41.5 11t49.5 5.5q105 0 180 -74t75 -179q0 -62 -28.5 -118t-78.5 -94l-507 -380q-68 -51 -153 -51h-694zM1104 1408q-38 0 -68.5 -24t-39.5 -62l-164 -682h-127l-145 602q-9 38 -39.5 62t-68.5 24
+q-48 0 -80 -33t-32 -80q0 -15 3 -28l132 -547h-26l-99 408q-9 37 -40 62.5t-69 25.5q-47 0 -80 -33t-33 -79q0 -14 3 -26l116 -478q7 -28 9 -86t10 -88l100 -401q8 -32 34 -52.5t59 -20.5h694q42 0 76 26l507 379q56 43 56 110q0 52 -37.5 88.5t-89.5 36.5q-43 0 -77 -26
+l-307 -230v227q0 4 32 138t68 282t39 161q4 18 4 29q0 47 -32 81t-79 34q-39 0 -69.5 -24t-39.5 -62l-116 -482h-26l150 624q3 14 3 28q0 48 -31.5 82t-79.5 34z" />
+    <glyph glyph-name="_565" unicode="&#xf25a;" horiz-adv-x="1792" 
+d="M640 1408q-53 0 -90.5 -37.5t-37.5 -90.5v-512v-384l-151 202q-41 54 -107 54q-52 0 -89 -38t-37 -90q0 -43 26 -77l384 -512q38 -51 102 -51h718q22 0 39.5 13.5t22.5 34.5l92 368q24 96 24 194v217q0 41 -28 71t-68 30t-68 -28t-28 -68h-32v61q0 48 -32 81.5t-80 33.5
+q-46 0 -79 -33t-33 -79v-64h-32v90q0 55 -37 94.5t-91 39.5q-53 0 -90.5 -37.5t-37.5 -90.5v-96h-32v570q0 55 -37 94.5t-91 39.5zM640 1536q107 0 181.5 -77.5t74.5 -184.5v-220q22 2 32 2q99 0 173 -69q47 21 99 21q113 0 184 -87q27 7 56 7q94 0 159 -67.5t65 -161.5
+v-217q0 -116 -28 -225l-92 -368q-16 -64 -68 -104.5t-118 -40.5h-718q-60 0 -114.5 27.5t-90.5 74.5l-384 512q-51 68 -51 154q0 105 74.5 180.5t179.5 75.5q71 0 130 -35v547q0 106 75 181t181 75zM768 128v384h-32v-384h32zM1024 128v384h-32v-384h32zM1280 128v384h-32
+v-384h32z" />
+    <glyph glyph-name="_566" unicode="&#xf25b;" 
+d="M1288 889q60 0 107 -23q141 -63 141 -226v-177q0 -94 -23 -186l-85 -339q-21 -86 -90.5 -140t-157.5 -54h-668q-106 0 -181 75t-75 181v401l-239 628q-17 45 -17 91q0 106 75 181t181 75q80 0 145.5 -45.5t93.5 -119.5l17 -44v113q0 106 75 181t181 75t181 -75t75 -181
+v-261q27 5 48 5q69 0 127.5 -36.5t88.5 -98.5zM1072 896q-33 0 -60.5 -18t-41.5 -48l-74 -163l-71 -155h55q50 0 90 -31.5t50 -80.5l154 338q10 20 10 46q0 46 -33 79t-79 33zM1293 761q-22 0 -40.5 -8t-29 -16t-23.5 -29.5t-17 -30.5t-17 -37l-132 -290q-10 -20 -10 -46
+q0 -46 33 -79t79 -33q33 0 60.5 18t41.5 48l160 352q9 18 9 38q0 50 -32 81.5t-82 31.5zM128 1120q0 -22 8 -46l248 -650v-69l102 111q43 46 106 46h198l106 233v535q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5v-640h-64l-200 526q-14 37 -47 59.5t-73 22.5
+q-53 0 -90.5 -37.5t-37.5 -90.5zM1180 -128q44 0 78.5 27t45.5 70l85 339q19 73 19 155v91l-141 -310q-17 -38 -53 -61t-78 -23q-53 0 -93.5 34.5t-48.5 86.5q-44 -57 -114 -57h-208v32h208q46 0 81 33t35 79t-31 79t-77 33h-296q-49 0 -82 -36l-126 -136v-308
+q0 -53 37.5 -90.5t90.5 -37.5h668z" />
+    <glyph glyph-name="_567" unicode="&#xf25c;" horiz-adv-x="1973" 
+d="M857 992v-117q0 -13 -9.5 -22t-22.5 -9h-298v-812q0 -13 -9 -22.5t-22 -9.5h-135q-13 0 -22.5 9t-9.5 23v812h-297q-13 0 -22.5 9t-9.5 22v117q0 14 9 23t23 9h793q13 0 22.5 -9.5t9.5 -22.5zM1895 995l77 -961q1 -13 -8 -24q-10 -10 -23 -10h-134q-12 0 -21 8.5
+t-10 20.5l-46 588l-189 -425q-8 -19 -29 -19h-120q-20 0 -29 19l-188 427l-45 -590q-1 -12 -10 -20.5t-21 -8.5h-135q-13 0 -23 10q-9 10 -9 24l78 961q1 12 10 20.5t21 8.5h142q20 0 29 -19l220 -520q10 -24 20 -51q3 7 9.5 24.5t10.5 26.5l221 520q9 19 29 19h141
+q13 0 22 -8.5t10 -20.5z" />
+    <glyph glyph-name="_568" unicode="&#xf25d;" horiz-adv-x="1792" 
+d="M1042 833q0 88 -60 121q-33 18 -117 18h-123v-281h162q66 0 102 37t36 105zM1094 548l205 -373q8 -17 -1 -31q-8 -16 -27 -16h-152q-20 0 -28 17l-194 365h-155v-350q0 -14 -9 -23t-23 -9h-134q-14 0 -23 9t-9 23v960q0 14 9 23t23 9h294q128 0 190 -24q85 -31 134 -109
+t49 -180q0 -92 -42.5 -165.5t-115.5 -109.5q6 -10 9 -16zM896 1376q-150 0 -286 -58.5t-234.5 -157t-157 -234.5t-58.5 -286t58.5 -286t157 -234.5t234.5 -157t286 -58.5t286 58.5t234.5 157t157 234.5t58.5 286t-58.5 286t-157 234.5t-234.5 157t-286 58.5zM1792 640
+q0 -182 -71 -348t-191 -286t-286 -191t-348 -71t-348 71t-286 191t-191 286t-71 348t71 348t191 286t286 191t348 71t348 -71t286 -191t191 -286t71 -348z" />
+    <glyph glyph-name="_569" unicode="&#xf25e;" horiz-adv-x="1792" 
+d="M605 303q153 0 257 104q14 18 3 36l-45 82q-6 13 -24 17q-16 2 -27 -11l-4 -3q-4 -4 -11.5 -10t-17.5 -13.5t-23.5 -14.5t-28.5 -13t-33.5 -9.5t-37.5 -3.5q-76 0 -125 50t-49 127q0 76 48 125.5t122 49.5q37 0 71.5 -14t50.5 -28l16 -14q11 -11 26 -10q16 2 24 14l53 78
+q13 20 -2 39q-3 4 -11 12t-30 23.5t-48.5 28t-67.5 22.5t-86 10q-148 0 -246 -96.5t-98 -240.5q0 -146 97 -241.5t247 -95.5zM1235 303q153 0 257 104q14 18 4 36l-45 82q-8 14 -25 17q-16 2 -27 -11l-4 -3q-4 -4 -11.5 -10t-17.5 -13.5t-23.5 -14.5t-28.5 -13t-33.5 -9.5
+t-37.5 -3.5q-76 0 -125 50t-49 127q0 76 48 125.5t122 49.5q37 0 71.5 -14t50.5 -28l16 -14q11 -11 26 -10q16 2 24 14l53 78q13 20 -2 39q-3 4 -11 12t-30 23.5t-48.5 28t-67.5 22.5t-86 10q-147 0 -245.5 -96.5t-98.5 -240.5q0 -146 97 -241.5t247 -95.5zM896 1376
+q-150 0 -286 -58.5t-234.5 -157t-157 -234.5t-58.5 -286t58.5 -286t157 -234.5t234.5 -157t286 -58.5t286 58.5t234.5 157t157 234.5t58.5 286t-58.5 286t-157 234.5t-234.5 157t-286 58.5zM896 1536q182 0 348 -71t286 -191t191 -286t71 -348t-71 -348t-191 -286t-286 -191
+t-348 -71t-348 71t-286 191t-191 286t-71 348t71 348t191 286t286 191t348 71z" />
+    <glyph glyph-name="f260" unicode="&#xf260;" horiz-adv-x="2048" 
+d="M736 736l384 -384l-384 -384l-672 672l672 672l168 -168l-96 -96l-72 72l-480 -480l480 -480l193 193l-289 287zM1312 1312l672 -672l-672 -672l-168 168l96 96l72 -72l480 480l-480 480l-193 -193l289 -287l-96 -96l-384 384z" />
+    <glyph glyph-name="f261" unicode="&#xf261;" horiz-adv-x="1792" 
+d="M717 182l271 271l-279 279l-88 -88l192 -191l-96 -96l-279 279l279 279l40 -40l87 87l-127 128l-454 -454zM1075 190l454 454l-454 454l-271 -271l279 -279l88 88l-192 191l96 96l279 -279l-279 -279l-40 40l-87 -88zM1792 640q0 -182 -71 -348t-191 -286t-286 -191
+t-348 -71t-348 71t-286 191t-191 286t-71 348t71 348t191 286t286 191t348 71t348 -71t286 -191t191 -286t71 -348z" />
+    <glyph glyph-name="_572" unicode="&#xf262;" horiz-adv-x="2304" 
+d="M651 539q0 -39 -27.5 -66.5t-65.5 -27.5q-39 0 -66.5 27.5t-27.5 66.5q0 38 27.5 65.5t66.5 27.5q38 0 65.5 -27.5t27.5 -65.5zM1805 540q0 -39 -27.5 -66.5t-66.5 -27.5t-66.5 27.5t-27.5 66.5t27.5 66t66.5 27t66.5 -27t27.5 -66zM765 539q0 79 -56.5 136t-136.5 57
+t-136.5 -56.5t-56.5 -136.5t56.5 -136.5t136.5 -56.5t136.5 56.5t56.5 136.5zM1918 540q0 80 -56.5 136.5t-136.5 56.5q-79 0 -136 -56.5t-57 -136.5t56.5 -136.5t136.5 -56.5t136.5 56.5t56.5 136.5zM850 539q0 -116 -81.5 -197.5t-196.5 -81.5q-116 0 -197.5 82t-81.5 197
+t82 196.5t197 81.5t196.5 -81.5t81.5 -196.5zM2004 540q0 -115 -81.5 -196.5t-197.5 -81.5q-115 0 -196.5 81.5t-81.5 196.5t81.5 196.5t196.5 81.5q116 0 197.5 -81.5t81.5 -196.5zM1040 537q0 191 -135.5 326.5t-326.5 135.5q-125 0 -231 -62t-168 -168.5t-62 -231.5
+t62 -231.5t168 -168.5t231 -62q191 0 326.5 135.5t135.5 326.5zM1708 1110q-254 111 -556 111q-319 0 -573 -110q117 0 223 -45.5t182.5 -122.5t122 -183t45.5 -223q0 115 43.5 219.5t118 180.5t177.5 123t217 50zM2187 537q0 191 -135 326.5t-326 135.5t-326.5 -135.5
+t-135.5 -326.5t135.5 -326.5t326.5 -135.5t326 135.5t135 326.5zM1921 1103h383q-44 -51 -75 -114.5t-40 -114.5q110 -151 110 -337q0 -156 -77 -288t-209 -208.5t-287 -76.5q-133 0 -249 56t-196 155q-47 -56 -129 -179q-11 22 -53.5 82.5t-74.5 97.5
+q-80 -99 -196.5 -155.5t-249.5 -56.5q-155 0 -287 76.5t-209 208.5t-77 288q0 186 110 337q-9 51 -40 114.5t-75 114.5h365q149 100 355 156.5t432 56.5q224 0 421 -56t348 -157z" />
+    <glyph glyph-name="f263" unicode="&#xf263;" horiz-adv-x="1280" 
+d="M640 629q-188 0 -321 133t-133 320q0 188 133 321t321 133t321 -133t133 -321q0 -187 -133 -320t-321 -133zM640 1306q-92 0 -157.5 -65.5t-65.5 -158.5q0 -92 65.5 -157.5t157.5 -65.5t157.5 65.5t65.5 157.5q0 93 -65.5 158.5t-157.5 65.5zM1163 574q13 -27 15 -49.5
+t-4.5 -40.5t-26.5 -38.5t-42.5 -37t-61.5 -41.5q-115 -73 -315 -94l73 -72l267 -267q30 -31 30 -74t-30 -73l-12 -13q-31 -30 -74 -30t-74 30q-67 68 -267 268l-267 -268q-31 -30 -74 -30t-73 30l-12 13q-31 30 -31 73t31 74l267 267l72 72q-203 21 -317 94
+q-39 25 -61.5 41.5t-42.5 37t-26.5 38.5t-4.5 40.5t15 49.5q10 20 28 35t42 22t56 -2t65 -35q5 -4 15 -11t43 -24.5t69 -30.5t92 -24t113 -11q91 0 174 25.5t120 50.5l38 25q33 26 65 35t56 2t42 -22t28 -35z" />
+    <glyph glyph-name="_574" unicode="&#xf264;" 
+d="M927 956q0 -66 -46.5 -112.5t-112.5 -46.5t-112.5 46.5t-46.5 112.5t46.5 112.5t112.5 46.5t112.5 -46.5t46.5 -112.5zM1141 593q-10 20 -28 32t-47.5 9.5t-60.5 -27.5q-10 -8 -29 -20t-81 -32t-127 -20t-124 18t-86 36l-27 18q-31 25 -60.5 27.5t-47.5 -9.5t-28 -32
+q-22 -45 -2 -74.5t87 -73.5q83 -53 226 -67l-51 -52q-142 -142 -191 -190q-22 -22 -22 -52.5t22 -52.5l9 -9q22 -22 52.5 -22t52.5 22l191 191q114 -115 191 -191q22 -22 52.5 -22t52.5 22l9 9q22 22 22 52.5t-22 52.5l-191 190l-52 52q141 14 225 67q67 44 87 73.5t-2 74.5
+zM1092 956q0 134 -95 229t-229 95t-229 -95t-95 -229t95 -229t229 -95t229 95t95 229zM1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="_575" unicode="&#xf265;" horiz-adv-x="1720" 
+d="M1565 1408q65 0 110 -45.5t45 -110.5v-519q0 -176 -68 -336t-182.5 -275t-274 -182.5t-334.5 -67.5q-176 0 -335.5 67.5t-274.5 182.5t-183 275t-68 336v519q0 64 46 110t110 46h1409zM861 344q47 0 82 33l404 388q37 35 37 85q0 49 -34.5 83.5t-83.5 34.5q-47 0 -82 -33
+l-323 -310l-323 310q-35 33 -81 33q-49 0 -83.5 -34.5t-34.5 -83.5q0 -51 36 -85l405 -388q33 -33 81 -33z" />
+    <glyph glyph-name="_576" unicode="&#xf266;" horiz-adv-x="2304" 
+d="M1494 -103l-295 695q-25 -49 -158.5 -305.5t-198.5 -389.5q-1 -1 -27.5 -0.5t-26.5 1.5q-82 193 -255.5 587t-259.5 596q-21 50 -66.5 107.5t-103.5 100.5t-102 43q0 5 -0.5 24t-0.5 27h583v-50q-39 -2 -79.5 -16t-66.5 -43t-10 -64q26 -59 216.5 -499t235.5 -540
+q31 61 140 266.5t131 247.5q-19 39 -126 281t-136 295q-38 69 -201 71v50l513 -1v-47q-60 -2 -93.5 -25t-12.5 -69q33 -70 87 -189.5t86 -187.5q110 214 173 363q24 55 -10 79.5t-129 26.5q1 7 1 25v24q64 0 170.5 0.5t180 1t92.5 0.5v-49q-62 -2 -119 -33t-90 -81
+l-213 -442q13 -33 127.5 -290t121.5 -274l441 1017q-14 38 -49.5 62.5t-65 31.5t-55.5 8v50l460 -4l1 -2l-1 -44q-139 -4 -201 -145q-526 -1216 -559 -1291h-49z" />
+    <glyph glyph-name="_577" unicode="&#xf267;" horiz-adv-x="1792" 
+d="M949 643q0 -26 -16.5 -45t-41.5 -19q-26 0 -45 16.5t-19 41.5q0 26 17 45t42 19t44 -16.5t19 -41.5zM964 585l350 581q-9 -8 -67.5 -62.5t-125.5 -116.5t-136.5 -127t-117 -110.5t-50.5 -51.5l-349 -580q7 7 67 62t126 116.5t136 127t117 111t50 50.5zM1611 640
+q0 -201 -104 -371q-3 2 -17 11t-26.5 16.5t-16.5 7.5q-13 0 -13 -13q0 -10 59 -44q-74 -112 -184.5 -190.5t-241.5 -110.5l-16 67q-1 10 -15 10q-5 0 -8 -5.5t-2 -9.5l16 -68q-72 -15 -146 -15q-199 0 -372 105q1 2 13 20.5t21.5 33.5t9.5 19q0 13 -13 13q-6 0 -17 -14.5
+t-22.5 -34.5t-13.5 -23q-113 75 -192 187.5t-110 244.5l69 15q10 3 10 15q0 5 -5.5 8t-10.5 2l-68 -15q-14 72 -14 139q0 206 109 379q2 -1 18.5 -12t30 -19t17.5 -8q13 0 13 12q0 6 -12.5 15.5t-32.5 21.5l-20 12q77 112 189 189t244 107l15 -67q2 -10 15 -10q5 0 8 5.5
+t2 10.5l-15 66q71 13 134 13q204 0 379 -109q-39 -56 -39 -65q0 -13 12 -13q11 0 48 64q111 -75 187.5 -186t107.5 -241l-56 -12q-10 -2 -10 -16q0 -5 5.5 -8t9.5 -2l57 13q14 -72 14 -140zM1696 640q0 163 -63.5 311t-170.5 255t-255 170.5t-311 63.5t-311 -63.5
+t-255 -170.5t-170.5 -255t-63.5 -311t63.5 -311t170.5 -255t255 -170.5t311 -63.5t311 63.5t255 170.5t170.5 255t63.5 311zM1792 640q0 -182 -71 -348t-191 -286t-286 -191t-348 -71t-348 71t-286 191t-191 286t-71 348t71 348t191 286t286 191t348 71t348 -71t286 -191
+t191 -286t71 -348z" />
+    <glyph glyph-name="_578" unicode="&#xf268;" horiz-adv-x="1792" 
+d="M893 1536q240 2 451 -120q232 -134 352 -372l-742 39q-160 9 -294 -74.5t-185 -229.5l-276 424q128 159 311 245.5t383 87.5zM146 1131l337 -663q72 -143 211 -217t293 -45l-230 -451q-212 33 -385 157.5t-272.5 316t-99.5 411.5q0 267 146 491zM1732 962
+q58 -150 59.5 -310.5t-48.5 -306t-153 -272t-246 -209.5q-230 -133 -498 -119l405 623q88 131 82.5 290.5t-106.5 277.5zM896 942q125 0 213.5 -88.5t88.5 -213.5t-88.5 -213.5t-213.5 -88.5t-213.5 88.5t-88.5 213.5t88.5 213.5t213.5 88.5z" />
+    <glyph glyph-name="_579" unicode="&#xf269;" horiz-adv-x="1792" 
+d="M903 -256q-283 0 -504.5 150.5t-329.5 398.5q-58 131 -67 301t26 332.5t111 312t179 242.5l-11 -281q11 14 68 15.5t70 -15.5q42 81 160.5 138t234.5 59q-54 -45 -119.5 -148.5t-58.5 -163.5q25 -8 62.5 -13.5t63 -7.5t68 -4t50.5 -3q15 -5 9.5 -45.5t-30.5 -75.5
+q-5 -7 -16.5 -18.5t-56.5 -35.5t-101 -34l15 -189l-139 67q-18 -43 -7.5 -81.5t36 -66.5t65.5 -41.5t81 -6.5q51 9 98 34.5t83.5 45t73.5 17.5q61 -4 89.5 -33t19.5 -65q-1 -2 -2.5 -5.5t-8.5 -12.5t-18 -15.5t-31.5 -10.5t-46.5 -1q-60 -95 -144.5 -135.5t-209.5 -29.5
+q74 -61 162.5 -82.5t168.5 -6t154.5 52t128 87.5t80.5 104q43 91 39 192.5t-37.5 188.5t-78.5 125q87 -38 137 -79.5t77 -112.5q15 170 -57.5 343t-209.5 284q265 -77 412 -279.5t151 -517.5q2 -127 -40.5 -255t-123.5 -238t-189 -196t-247.5 -135.5t-288.5 -49.5z" />
+    <glyph glyph-name="_580" unicode="&#xf26a;" horiz-adv-x="1792" 
+d="M1493 1308q-165 110 -359 110q-155 0 -293 -73t-240 -200q-75 -93 -119.5 -218t-48.5 -266v-42q4 -141 48.5 -266t119.5 -218q102 -127 240 -200t293 -73q194 0 359 110q-121 -108 -274.5 -168t-322.5 -60q-29 0 -43 1q-175 8 -333 82t-272 193t-181 281t-67 339
+q0 182 71 348t191 286t286 191t348 71h3q168 -1 320.5 -60.5t273.5 -167.5zM1792 640q0 -192 -77 -362.5t-213 -296.5q-104 -63 -222 -63q-137 0 -255 84q154 56 253.5 233t99.5 405q0 227 -99 404t-253 234q119 83 254 83q119 0 226 -65q135 -125 210.5 -295t75.5 -361z
+" />
+    <glyph glyph-name="_581" unicode="&#xf26b;" horiz-adv-x="1792" 
+d="M1792 599q0 -56 -7 -104h-1151q0 -146 109.5 -244.5t257.5 -98.5q99 0 185.5 46.5t136.5 130.5h423q-56 -159 -170.5 -281t-267.5 -188.5t-321 -66.5q-187 0 -356 83q-228 -116 -394 -116q-237 0 -237 263q0 115 45 275q17 60 109 229q199 360 475 606
+q-184 -79 -427 -354q63 274 283.5 449.5t501.5 175.5q30 0 45 -1q255 117 433 117q64 0 116 -13t94.5 -40.5t66.5 -76.5t24 -115q0 -116 -75 -286q101 -182 101 -390zM1722 1239q0 83 -53 132t-137 49q-108 0 -254 -70q121 -47 222.5 -131.5t170.5 -195.5q51 135 51 216z
+M128 2q0 -86 48.5 -132.5t134.5 -46.5q115 0 266 83q-122 72 -213.5 183t-137.5 245q-98 -205 -98 -332zM632 715h728q-5 142 -113 237t-251 95q-144 0 -251.5 -95t-112.5 -237z" />
+    <glyph glyph-name="_582" unicode="&#xf26c;" horiz-adv-x="2048" 
+d="M1792 288v960q0 13 -9.5 22.5t-22.5 9.5h-1600q-13 0 -22.5 -9.5t-9.5 -22.5v-960q0 -13 9.5 -22.5t22.5 -9.5h1600q13 0 22.5 9.5t9.5 22.5zM1920 1248v-960q0 -66 -47 -113t-113 -47h-736v-128h352q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-832q-14 0 -23 9t-9 23
+v64q0 14 9 23t23 9h352v128h-736q-66 0 -113 47t-47 113v960q0 66 47 113t113 47h1600q66 0 113 -47t47 -113z" />
+    <glyph glyph-name="_583" unicode="&#xf26d;" horiz-adv-x="1792" 
+d="M138 1408h197q-70 -64 -126 -149q-36 -56 -59 -115t-30 -125.5t-8.5 -120t10.5 -132t21 -126t28 -136.5q4 -19 6 -28q51 -238 81 -329q57 -171 152 -275h-272q-48 0 -82 34t-34 82v1304q0 48 34 82t82 34zM1346 1408h308q48 0 82 -34t34 -82v-1304q0 -48 -34 -82t-82 -34
+h-178q212 210 196 565l-469 -101q-2 -45 -12 -82t-31 -72t-59.5 -59.5t-93.5 -36.5q-123 -26 -199 40q-32 27 -53 61t-51.5 129t-64.5 258q-35 163 -45.5 263t-5.5 139t23 77q20 41 62.5 73t102.5 45q45 12 83.5 6.5t67 -17t54 -35t43 -48t34.5 -56.5l468 100
+q-68 175 -180 287z" />
+    <glyph glyph-name="_584" unicode="&#xf26e;" 
+d="M1401 -11l-6 -6q-113 -113 -259 -175q-154 -64 -317 -64q-165 0 -317 64q-148 63 -259 175q-113 112 -175 258q-42 103 -54 189q-4 28 48 36q51 8 56 -20q1 -1 1 -4q18 -90 46 -159q50 -124 152 -226q98 -98 226 -152q132 -56 276 -56q143 0 276 56q128 55 225 152l6 6
+q10 10 25 6q12 -3 33 -22q36 -37 17 -58zM929 604l-66 -66l63 -63q21 -21 -7 -49q-17 -17 -32 -17q-10 0 -19 10l-62 61l-66 -66q-5 -5 -15 -5q-15 0 -31 16l-2 2q-18 15 -18 29q0 7 8 17l66 65l-66 66q-16 16 14 45q18 18 31 18q6 0 13 -5l65 -66l65 65q18 17 48 -13
+q27 -27 11 -44zM1400 547q0 -118 -46 -228q-45 -105 -126 -186q-80 -80 -187 -126t-228 -46t-228 46t-187 126q-82 82 -125 186q-15 33 -15 40h-1q-9 27 43 44q50 16 60 -12q37 -99 97 -167h1v339v2q3 136 102 232q105 103 253 103q147 0 251 -103t104 -249
+q0 -147 -104.5 -251t-250.5 -104q-58 0 -112 16q-28 11 -13 61q16 51 44 43l14 -3q14 -3 33 -6t30 -3q104 0 176 71.5t72 174.5q0 101 -72 171q-71 71 -175 71q-107 0 -178 -80q-64 -72 -64 -160v-413q110 -67 242 -67q96 0 185 36.5t156 103.5t103.5 155t36.5 183
+q0 198 -141 339q-140 140 -339 140q-200 0 -340 -140q-53 -53 -77 -87l-2 -2q-8 -11 -13 -15.5t-21.5 -9.5t-38.5 3q-21 5 -36.5 16.5t-15.5 26.5v680q0 15 10.5 26.5t27.5 11.5h877q30 0 30 -55t-30 -55h-811v-483h1q40 42 102 84t108 61q109 46 231 46q121 0 228 -46
+t187 -126q81 -81 126 -186q46 -112 46 -229zM1369 1128q9 -8 9 -18t-5.5 -18t-16.5 -21q-26 -26 -39 -26q-9 0 -16 7q-106 91 -207 133q-128 56 -276 56q-133 0 -262 -49q-27 -10 -45 37q-9 25 -8 38q3 16 16 20q130 57 299 57q164 0 316 -64q137 -58 235 -152z" />
+    <glyph glyph-name="_585" unicode="&#xf270;" horiz-adv-x="1792" 
+d="M1551 60q15 6 26 3t11 -17.5t-15 -33.5q-13 -16 -44 -43.5t-95.5 -68t-141 -74t-188 -58t-229.5 -24.5q-119 0 -238 31t-209 76.5t-172.5 104t-132.5 105t-84 87.5q-8 9 -10 16.5t1 12t8 7t11.5 2t11.5 -4.5q192 -117 300 -166q389 -176 799 -90q190 40 391 135z
+M1758 175q11 -16 2.5 -69.5t-28.5 -102.5q-34 -83 -85 -124q-17 -14 -26 -9t0 24q21 45 44.5 121.5t6.5 98.5q-5 7 -15.5 11.5t-27 6t-29.5 2.5t-35 0t-31.5 -2t-31 -3t-22.5 -2q-6 -1 -13 -1.5t-11 -1t-8.5 -1t-7 -0.5h-5.5h-4.5t-3 0.5t-2 1.5l-1.5 3q-6 16 47 40t103 30
+q46 7 108 1t76 -24zM1364 618q0 -31 13.5 -64t32 -58t37.5 -46t33 -32l13 -11l-227 -224q-40 37 -79 75.5t-58 58.5l-19 20q-11 11 -25 33q-38 -59 -97.5 -102.5t-127.5 -63.5t-140 -23t-137.5 21t-117.5 65.5t-83 113t-31 162.5q0 84 28 154t72 116.5t106.5 83t122.5 57
+t130 34.5t119.5 18.5t99.5 6.5v127q0 65 -21 97q-34 53 -121 53q-6 0 -16.5 -1t-40.5 -12t-56 -29.5t-56 -59.5t-48 -96l-294 27q0 60 22 119t67 113t108 95t151.5 65.5t190.5 24.5q100 0 181 -25t129.5 -61.5t81 -83t45 -86t12.5 -73.5v-589zM692 597q0 -86 70 -133
+q66 -44 139 -22q84 25 114 123q14 45 14 101v162q-59 -2 -111 -12t-106.5 -33.5t-87 -71t-32.5 -114.5z" />
+    <glyph glyph-name="_586" unicode="&#xf271;" horiz-adv-x="1792" 
+d="M1536 1280q52 0 90 -38t38 -90v-1280q0 -52 -38 -90t-90 -38h-1408q-52 0 -90 38t-38 90v1280q0 52 38 90t90 38h128v96q0 66 47 113t113 47h64q66 0 113 -47t47 -113v-96h384v96q0 66 47 113t113 47h64q66 0 113 -47t47 -113v-96h128zM1152 1376v-288q0 -14 9 -23t23 -9
+h64q14 0 23 9t9 23v288q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23zM384 1376v-288q0 -14 9 -23t23 -9h64q14 0 23 9t9 23v288q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23zM1536 -128v1024h-1408v-1024h1408zM896 448h224q14 0 23 -9t9 -23v-64q0 -14 -9 -23t-23 -9h-224
+v-224q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9t-9 23v224h-224q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h224v224q0 14 9 23t23 9h64q14 0 23 -9t9 -23v-224z" />
+    <glyph glyph-name="_587" unicode="&#xf272;" horiz-adv-x="1792" 
+d="M1152 416v-64q0 -14 -9 -23t-23 -9h-576q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h576q14 0 23 -9t9 -23zM128 -128h1408v1024h-1408v-1024zM512 1088v288q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-288q0 -14 9 -23t23 -9h64q14 0 23 9t9 23zM1280 1088v288q0 14 -9 23
+t-23 9h-64q-14 0 -23 -9t-9 -23v-288q0 -14 9 -23t23 -9h64q14 0 23 9t9 23zM1664 1152v-1280q0 -52 -38 -90t-90 -38h-1408q-52 0 -90 38t-38 90v1280q0 52 38 90t90 38h128v96q0 66 47 113t113 47h64q66 0 113 -47t47 -113v-96h384v96q0 66 47 113t113 47h64q66 0 113 -47
+t47 -113v-96h128q52 0 90 -38t38 -90z" />
+    <glyph glyph-name="_588" unicode="&#xf273;" horiz-adv-x="1792" 
+d="M1111 151l-46 -46q-9 -9 -22 -9t-23 9l-188 189l-188 -189q-10 -9 -23 -9t-22 9l-46 46q-9 9 -9 22t9 23l189 188l-189 188q-9 10 -9 23t9 22l46 46q9 9 22 9t23 -9l188 -188l188 188q10 9 23 9t22 -9l46 -46q9 -9 9 -22t-9 -23l-188 -188l188 -188q9 -10 9 -23t-9 -22z
+M128 -128h1408v1024h-1408v-1024zM512 1088v288q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-288q0 -14 9 -23t23 -9h64q14 0 23 9t9 23zM1280 1088v288q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-288q0 -14 9 -23t23 -9h64q14 0 23 9t9 23zM1664 1152v-1280
+q0 -52 -38 -90t-90 -38h-1408q-52 0 -90 38t-38 90v1280q0 52 38 90t90 38h128v96q0 66 47 113t113 47h64q66 0 113 -47t47 -113v-96h384v96q0 66 47 113t113 47h64q66 0 113 -47t47 -113v-96h128q52 0 90 -38t38 -90z" />
+    <glyph glyph-name="_589" unicode="&#xf274;" horiz-adv-x="1792" 
+d="M1303 572l-512 -512q-10 -9 -23 -9t-23 9l-288 288q-9 10 -9 23t9 22l46 46q9 9 22 9t23 -9l220 -220l444 444q10 9 23 9t22 -9l46 -46q9 -9 9 -22t-9 -23zM128 -128h1408v1024h-1408v-1024zM512 1088v288q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-288q0 -14 9 -23
+t23 -9h64q14 0 23 9t9 23zM1280 1088v288q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-288q0 -14 9 -23t23 -9h64q14 0 23 9t9 23zM1664 1152v-1280q0 -52 -38 -90t-90 -38h-1408q-52 0 -90 38t-38 90v1280q0 52 38 90t90 38h128v96q0 66 47 113t113 47h64q66 0 113 -47
+t47 -113v-96h384v96q0 66 47 113t113 47h64q66 0 113 -47t47 -113v-96h128q52 0 90 -38t38 -90z" />
+    <glyph glyph-name="_590" unicode="&#xf275;" horiz-adv-x="1792" 
+d="M448 1536q26 0 45 -19t19 -45v-891l536 429q17 14 40 14q26 0 45 -19t19 -45v-379l536 429q17 14 40 14q26 0 45 -19t19 -45v-1152q0 -26 -19 -45t-45 -19h-1664q-26 0 -45 19t-19 45v1664q0 26 19 45t45 19h384z" />
+    <glyph glyph-name="_591" unicode="&#xf276;" horiz-adv-x="1024" 
+d="M512 448q66 0 128 15v-655q0 -26 -19 -45t-45 -19h-128q-26 0 -45 19t-19 45v655q62 -15 128 -15zM512 1536q212 0 362 -150t150 -362t-150 -362t-362 -150t-362 150t-150 362t150 362t362 150zM512 1312q14 0 23 9t9 23t-9 23t-23 9q-146 0 -249 -103t-103 -249
+q0 -14 9 -23t23 -9t23 9t9 23q0 119 84.5 203.5t203.5 84.5z" />
+    <glyph glyph-name="_592" unicode="&#xf277;" horiz-adv-x="1792" 
+d="M1745 1239q10 -10 10 -23t-10 -23l-141 -141q-28 -28 -68 -28h-1344q-26 0 -45 19t-19 45v256q0 26 19 45t45 19h576v64q0 26 19 45t45 19h128q26 0 45 -19t19 -45v-64h512q40 0 68 -28zM768 320h256v-512q0 -26 -19 -45t-45 -19h-128q-26 0 -45 19t-19 45v512zM1600 768
+q26 0 45 -19t19 -45v-256q0 -26 -19 -45t-45 -19h-1344q-40 0 -68 28l-141 141q-10 10 -10 23t10 23l141 141q28 28 68 28h512v192h256v-192h576z" />
+    <glyph glyph-name="_593" unicode="&#xf278;" horiz-adv-x="2048" 
+d="M2020 1525q28 -20 28 -53v-1408q0 -20 -11 -36t-29 -23l-640 -256q-24 -11 -48 0l-616 246l-616 -246q-10 -5 -24 -5q-19 0 -36 11q-28 20 -28 53v1408q0 20 11 36t29 23l640 256q24 11 48 0l616 -246l616 246q32 13 60 -6zM736 1390v-1270l576 -230v1270zM128 1173
+v-1270l544 217v1270zM1920 107v1270l-544 -217v-1270z" />
+    <glyph glyph-name="_594" unicode="&#xf279;" horiz-adv-x="1792" 
+d="M512 1536q13 0 22.5 -9.5t9.5 -22.5v-1472q0 -20 -17 -28l-480 -256q-7 -4 -15 -4q-13 0 -22.5 9.5t-9.5 22.5v1472q0 20 17 28l480 256q7 4 15 4zM1760 1536q13 0 22.5 -9.5t9.5 -22.5v-1472q0 -20 -17 -28l-480 -256q-7 -4 -15 -4q-13 0 -22.5 9.5t-9.5 22.5v1472
+q0 20 17 28l480 256q7 4 15 4zM640 1536q8 0 14 -3l512 -256q18 -10 18 -29v-1472q0 -13 -9.5 -22.5t-22.5 -9.5q-8 0 -14 3l-512 256q-18 10 -18 29v1472q0 13 9.5 22.5t22.5 9.5z" />
+    <glyph glyph-name="_595" unicode="&#xf27a;" horiz-adv-x="1792" 
+d="M640 640q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM1024 640q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM1408 640q0 53 -37.5 90.5t-90.5 37.5
+t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5zM1792 640q0 -174 -120 -321.5t-326 -233t-450 -85.5q-110 0 -211 18q-173 -173 -435 -229q-52 -10 -86 -13q-12 -1 -22 6t-13 18q-4 15 20 37q5 5 23.5 21.5t25.5 23.5t23.5 25.5t24 31.5t20.5 37
+t20 48t14.5 57.5t12.5 72.5q-146 90 -229.5 216.5t-83.5 269.5q0 174 120 321.5t326 233t450 85.5t450 -85.5t326 -233t120 -321.5z" />
+    <glyph glyph-name="_596" unicode="&#xf27b;" horiz-adv-x="1792" 
+d="M640 640q0 -53 -37.5 -90.5t-90.5 -37.5t-90.5 37.5t-37.5 90.5t37.5 90.5t90.5 37.5t90.5 -37.5t37.5 -90.5zM1024 640q0 -53 -37.5 -90.5t-90.5 -37.5t-90.5 37.5t-37.5 90.5t37.5 90.5t90.5 37.5t90.5 -37.5t37.5 -90.5zM1408 640q0 -53 -37.5 -90.5t-90.5 -37.5
+t-90.5 37.5t-37.5 90.5t37.5 90.5t90.5 37.5t90.5 -37.5t37.5 -90.5zM896 1152q-204 0 -381.5 -69.5t-282 -187.5t-104.5 -255q0 -112 71.5 -213.5t201.5 -175.5l87 -50l-27 -96q-24 -91 -70 -172q152 63 275 171l43 38l57 -6q69 -8 130 -8q204 0 381.5 69.5t282 187.5
+t104.5 255t-104.5 255t-282 187.5t-381.5 69.5zM1792 640q0 -174 -120 -321.5t-326 -233t-450 -85.5q-70 0 -145 8q-198 -175 -460 -242q-49 -14 -114 -22h-5q-15 0 -27 10.5t-16 27.5v1q-3 4 -0.5 12t2 10t4.5 9.5l6 9t7 8.5t8 9q7 8 31 34.5t34.5 38t31 39.5t32.5 51
+t27 59t26 76q-157 89 -247.5 220t-90.5 281q0 130 71 248.5t191 204.5t286 136.5t348 50.5t348 -50.5t286 -136.5t191 -204.5t71 -248.5z" />
+    <glyph glyph-name="_597" unicode="&#xf27c;" horiz-adv-x="1024" 
+d="M512 345l512 295v-591l-512 -296v592zM0 640v-591l512 296zM512 1527v-591l-512 -296v591zM512 936l512 295v-591z" />
+    <glyph glyph-name="_598" unicode="&#xf27d;" horiz-adv-x="1792" 
+d="M1709 1018q-10 -236 -332 -651q-333 -431 -562 -431q-142 0 -240 263q-44 160 -132 482q-72 262 -157 262q-18 0 -127 -76l-77 98q24 21 108 96.5t130 115.5q156 138 241 146q95 9 153 -55.5t81 -203.5q44 -287 66 -373q55 -249 120 -249q51 0 154 161q101 161 109 246
+q13 139 -109 139q-57 0 -121 -26q120 393 459 382q251 -8 236 -326z" />
+    <glyph glyph-name="f27e" unicode="&#xf27e;" 
+d="M0 1408h1536v-1536h-1536v1536zM1085 293l-221 631l221 297h-634l221 -297l-221 -631l317 -304z" />
+    <glyph glyph-name="uniF280" unicode="&#xf280;" 
+d="M0 1408h1536v-1536h-1536v1536zM908 1088l-12 -33l75 -83l-31 -114l25 -25l107 57l107 -57l25 25l-31 114l75 83l-12 33h-95l-53 96h-32l-53 -96h-95zM641 925q32 0 44.5 -16t11.5 -63l174 21q0 55 -17.5 92.5t-50.5 56t-69 25.5t-85 7q-133 0 -199 -57.5t-66 -182.5v-72
+h-96v-128h76q20 0 20 -8v-382q0 -14 -5 -20t-18 -7l-73 -7v-88h448v86l-149 14q-6 1 -8.5 1.5t-3.5 2.5t-0.5 4t1 7t0.5 10v387h191l38 128h-231q-6 0 -2 6t4 9v80q0 27 1.5 40.5t7.5 28t19.5 20t36.5 5.5zM1248 96v86l-54 9q-7 1 -9.5 2.5t-2.5 3t1 7.5t1 12v520h-275
+l-23 -101l83 -22q23 -7 23 -27v-370q0 -14 -6 -18.5t-20 -6.5l-70 -9v-86h352z" />
+    <glyph glyph-name="uniF281" unicode="&#xf281;" horiz-adv-x="1792" 
+d="M1792 690q0 -58 -29.5 -105.5t-79.5 -72.5q12 -46 12 -96q0 -155 -106.5 -287t-290.5 -208.5t-400 -76.5t-399.5 76.5t-290 208.5t-106.5 287q0 47 11 94q-51 25 -82 73.5t-31 106.5q0 82 58 140.5t141 58.5q85 0 145 -63q218 152 515 162l116 521q3 13 15 21t26 5
+l369 -81q18 37 54 59.5t79 22.5q62 0 106 -43.5t44 -105.5t-44 -106t-106 -44t-105.5 43.5t-43.5 105.5l-334 74l-104 -472q300 -9 519 -160q58 61 143 61q83 0 141 -58.5t58 -140.5zM418 491q0 -62 43.5 -106t105.5 -44t106 44t44 106t-44 105.5t-106 43.5q-61 0 -105 -44
+t-44 -105zM1228 136q11 11 11 26t-11 26q-10 10 -25 10t-26 -10q-41 -42 -121 -62t-160 -20t-160 20t-121 62q-11 10 -26 10t-25 -10q-11 -10 -11 -25.5t11 -26.5q43 -43 118.5 -68t122.5 -29.5t91 -4.5t91 4.5t122.5 29.5t118.5 68zM1225 341q62 0 105.5 44t43.5 106
+q0 61 -44 105t-105 44q-62 0 -106 -43.5t-44 -105.5t44 -106t106 -44z" />
+    <glyph glyph-name="_602" unicode="&#xf282;" horiz-adv-x="1792" 
+d="M69 741h1q16 126 58.5 241.5t115 217t167.5 176t223.5 117.5t276.5 43q231 0 414 -105.5t294 -303.5q104 -187 104 -442v-188h-1125q1 -111 53.5 -192.5t136.5 -122.5t189.5 -57t213 -3t208 46.5t173.5 84.5v-377q-92 -55 -229.5 -92t-312.5 -38t-316 53
+q-189 73 -311.5 249t-124.5 372q-3 242 111 412t325 268q-48 -60 -78 -125.5t-46 -159.5h635q8 77 -8 140t-47 101.5t-70.5 66.5t-80.5 41t-75 20.5t-56 8.5l-22 1q-135 -5 -259.5 -44.5t-223.5 -104.5t-176 -140.5t-138 -163.5z" />
+    <glyph glyph-name="_603" unicode="&#xf283;" horiz-adv-x="2304" 
+d="M0 32v608h2304v-608q0 -66 -47 -113t-113 -47h-1984q-66 0 -113 47t-47 113zM640 256v-128h384v128h-384zM256 256v-128h256v128h-256zM2144 1408q66 0 113 -47t47 -113v-224h-2304v224q0 66 47 113t113 47h1984z" />
+    <glyph glyph-name="_604" unicode="&#xf284;" horiz-adv-x="1792" 
+d="M1584 246l-218 111q-74 -120 -196.5 -189t-263.5 -69q-147 0 -271 72t-196 196t-72 270q0 110 42.5 209.5t115 172t172 115t209.5 42.5q131 0 247.5 -60.5t192.5 -168.5l215 125q-110 169 -286.5 265t-378.5 96q-161 0 -308 -63t-253 -169t-169 -253t-63 -308t63 -308
+t169 -253t253 -169t308 -63q213 0 397.5 107t290.5 292zM1030 643l693 -352q-116 -253 -334.5 -400t-492.5 -147q-182 0 -348 71t-286 191t-191 286t-71 348t71 348t191 286t286 191t348 71q260 0 470.5 -133.5t335.5 -366.5zM1543 640h-39v-160h-96v352h136q32 0 54.5 -20
+t28.5 -48t1 -56t-27.5 -48t-57.5 -20z" />
+    <glyph glyph-name="uniF285" unicode="&#xf285;" horiz-adv-x="1792" 
+d="M1427 827l-614 386l92 151h855zM405 562l-184 116v858l1183 -743zM1424 697l147 -95v-858l-532 335zM1387 718l-500 -802h-855l356 571z" />
+    <glyph glyph-name="uniF286" unicode="&#xf286;" horiz-adv-x="1792" 
+d="M640 528v224q0 16 -16 16h-96q-16 0 -16 -16v-224q0 -16 16 -16h96q16 0 16 16zM1152 528v224q0 16 -16 16h-96q-16 0 -16 -16v-224q0 -16 16 -16h96q16 0 16 16zM1664 496v-752h-640v320q0 80 -56 136t-136 56t-136 -56t-56 -136v-320h-640v752q0 16 16 16h96
+q16 0 16 -16v-112h128v624q0 16 16 16h96q16 0 16 -16v-112h128v112q0 16 16 16h96q16 0 16 -16v-112h128v112q0 6 2.5 9.5t8.5 5t9.5 2t11.5 0t9 -0.5v391q-32 15 -32 50q0 23 16.5 39t38.5 16t38.5 -16t16.5 -39q0 -35 -32 -50v-17q45 10 83 10q21 0 59.5 -7.5t54.5 -7.5
+q17 0 47 7.5t37 7.5q16 0 16 -16v-210q0 -15 -35 -21.5t-62 -6.5q-18 0 -54.5 7.5t-55.5 7.5q-40 0 -90 -12v-133q1 0 9 0.5t11.5 0t9.5 -2t8.5 -5t2.5 -9.5v-112h128v112q0 16 16 16h96q16 0 16 -16v-112h128v112q0 16 16 16h96q16 0 16 -16v-624h128v112q0 16 16 16h96
+q16 0 16 -16z" />
+    <glyph glyph-name="_607" unicode="&#xf287;" horiz-adv-x="2304" 
+d="M2288 731q16 -8 16 -27t-16 -27l-320 -192q-8 -5 -16 -5q-9 0 -16 4q-16 10 -16 28v128h-858q37 -58 83 -165q16 -37 24.5 -55t24 -49t27 -47t27 -34t31.5 -26t33 -8h96v96q0 14 9 23t23 9h320q14 0 23 -9t9 -23v-320q0 -14 -9 -23t-23 -9h-320q-14 0 -23 9t-9 23v96h-96
+q-32 0 -61 10t-51 23.5t-45 40.5t-37 46t-33.5 57t-28.5 57.5t-28 60.5q-23 53 -37 81.5t-36 65t-44.5 53.5t-46.5 17h-360q-22 -84 -91 -138t-157 -54q-106 0 -181 75t-75 181t75 181t181 75q88 0 157 -54t91 -138h104q24 0 46.5 17t44.5 53.5t36 65t37 81.5q19 41 28 60.5
+t28.5 57.5t33.5 57t37 46t45 40.5t51 23.5t61 10h107q21 57 70 92.5t111 35.5q80 0 136 -56t56 -136t-56 -136t-136 -56q-62 0 -111 35.5t-70 92.5h-107q-17 0 -33 -8t-31.5 -26t-27 -34t-27 -47t-24 -49t-24.5 -55q-46 -107 -83 -165h1114v128q0 18 16 28t32 -1z" />
+    <glyph glyph-name="_608" unicode="&#xf288;" horiz-adv-x="1792" 
+d="M1150 774q0 -56 -39.5 -95t-95.5 -39h-253v269h253q56 0 95.5 -39.5t39.5 -95.5zM1329 774q0 130 -91.5 222t-222.5 92h-433v-896h180v269h253q130 0 222 91.5t92 221.5zM1792 640q0 -182 -71 -348t-191 -286t-286 -191t-348 -71t-348 71t-286 191t-191 286t-71 348
+t71 348t191 286t286 191t348 71t348 -71t286 -191t191 -286t71 -348z" />
+    <glyph glyph-name="_609" unicode="&#xf289;" horiz-adv-x="2304" 
+d="M1645 438q0 59 -34 106.5t-87 68.5q-7 -45 -23 -92q-7 -24 -27.5 -38t-44.5 -14q-12 0 -24 3q-31 10 -45 38.5t-4 58.5q23 71 23 143q0 123 -61 227.5t-166 165.5t-228 61q-134 0 -247 -73t-167 -194q108 -28 188 -106q22 -23 22 -55t-22 -54t-54 -22t-55 22
+q-75 75 -180 75q-106 0 -181 -74.5t-75 -180.5t75 -180.5t181 -74.5h1046q79 0 134.5 55.5t55.5 133.5zM1798 438q0 -142 -100.5 -242t-242.5 -100h-1046q-169 0 -289 119.5t-120 288.5q0 153 100 267t249 136q62 184 221 298t354 114q235 0 408.5 -158.5t196.5 -389.5
+q116 -25 192.5 -118.5t76.5 -214.5zM2048 438q0 -175 -97 -319q-23 -33 -64 -33q-24 0 -43 13q-26 17 -32 48.5t12 57.5q71 104 71 233t-71 233q-18 26 -12 57t32 49t57.5 11.5t49.5 -32.5q97 -142 97 -318zM2304 438q0 -244 -134 -443q-23 -34 -64 -34q-23 0 -42 13
+q-26 18 -32.5 49t11.5 57q108 164 108 358q0 195 -108 357q-18 26 -11.5 57.5t32.5 48.5q26 18 57 12t49 -33q134 -198 134 -442z" />
+    <glyph glyph-name="_610" unicode="&#xf28a;" 
+d="M1500 -13q0 -89 -63 -152.5t-153 -63.5t-153.5 63.5t-63.5 152.5q0 90 63.5 153.5t153.5 63.5t153 -63.5t63 -153.5zM1267 268q-115 -15 -192.5 -102.5t-77.5 -205.5q0 -74 33 -138q-146 -78 -379 -78q-109 0 -201 21t-153.5 54.5t-110.5 76.5t-76 85t-44.5 83
+t-23.5 66.5t-6 39.5q0 19 4.5 42.5t18.5 56t36.5 58t64 43.5t94.5 18t94 -17.5t63 -41t35.5 -53t17.5 -49t4 -33.5q0 -34 -23 -81q28 -27 82 -42t93 -17l40 -1q115 0 190 51t75 133q0 26 -9 48.5t-31.5 44.5t-49.5 41t-74 44t-93.5 47.5t-119.5 56.5q-28 13 -43 20
+q-116 55 -187 100t-122.5 102t-72 125.5t-20.5 162.5q0 78 20.5 150t66 137.5t112.5 114t166.5 77t221.5 28.5q120 0 220 -26t164.5 -67t109.5 -94t64 -105.5t19 -103.5q0 -46 -15 -82.5t-36.5 -58t-48.5 -36t-49 -19.5t-39 -5h-8h-32t-39 5t-44 14t-41 28t-37 46t-24 70.5
+t-10 97.5q-15 16 -59 25.5t-81 10.5l-37 1q-68 0 -117.5 -31t-70.5 -70t-21 -76q0 -24 5 -43t24 -46t53 -51t97 -53.5t150 -58.5q76 -25 138.5 -53.5t109 -55.5t83 -59t60.5 -59.5t41 -62.5t26.5 -62t14.5 -63.5t6 -62t1 -62.5z" />
+    <glyph glyph-name="_611" unicode="&#xf28b;" 
+d="M704 352v576q0 14 -9 23t-23 9h-256q-14 0 -23 -9t-9 -23v-576q0 -14 9 -23t23 -9h256q14 0 23 9t9 23zM1152 352v576q0 14 -9 23t-23 9h-256q-14 0 -23 -9t-9 -23v-576q0 -14 9 -23t23 -9h256q14 0 23 9t9 23zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103
+t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="_612" unicode="&#xf28c;" 
+d="M768 1408q209 0 385.5 -103t279.5 -279.5t103 -385.5t-103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103zM768 96q148 0 273 73t198 198t73 273t-73 273t-198 198t-273 73t-273 -73t-198 -198t-73 -273
+t73 -273t198 -198t273 -73zM864 320q-14 0 -23 9t-9 23v576q0 14 9 23t23 9h192q14 0 23 -9t9 -23v-576q0 -14 -9 -23t-23 -9h-192zM480 320q-14 0 -23 9t-9 23v576q0 14 9 23t23 9h192q14 0 23 -9t9 -23v-576q0 -14 -9 -23t-23 -9h-192z" />
+    <glyph glyph-name="_613" unicode="&#xf28d;" 
+d="M1088 352v576q0 14 -9 23t-23 9h-576q-14 0 -23 -9t-9 -23v-576q0 -14 9 -23t23 -9h576q14 0 23 9t9 23zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5
+t103 -385.5z" />
+    <glyph glyph-name="_614" unicode="&#xf28e;" 
+d="M768 1408q209 0 385.5 -103t279.5 -279.5t103 -385.5t-103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103zM768 96q148 0 273 73t198 198t73 273t-73 273t-198 198t-273 73t-273 -73t-198 -198t-73 -273
+t73 -273t198 -198t273 -73zM480 320q-14 0 -23 9t-9 23v576q0 14 9 23t23 9h576q14 0 23 -9t9 -23v-576q0 -14 -9 -23t-23 -9h-576z" />
+    <glyph glyph-name="_615" unicode="&#xf290;" horiz-adv-x="1792" 
+d="M1757 128l35 -313q3 -28 -16 -50q-19 -21 -48 -21h-1664q-29 0 -48 21q-19 22 -16 50l35 313h1722zM1664 967l86 -775h-1708l86 775q3 24 21 40.5t43 16.5h256v-128q0 -53 37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5v128h384v-128q0 -53 37.5 -90.5t90.5 -37.5
+t90.5 37.5t37.5 90.5v128h256q25 0 43 -16.5t21 -40.5zM1280 1152v-256q0 -26 -19 -45t-45 -19t-45 19t-19 45v256q0 106 -75 181t-181 75t-181 -75t-75 -181v-256q0 -26 -19 -45t-45 -19t-45 19t-19 45v256q0 159 112.5 271.5t271.5 112.5t271.5 -112.5t112.5 -271.5z" />
+    <glyph glyph-name="_616" unicode="&#xf291;" horiz-adv-x="2048" 
+d="M1920 768q53 0 90.5 -37.5t37.5 -90.5t-37.5 -90.5t-90.5 -37.5h-15l-115 -662q-8 -46 -44 -76t-82 -30h-1280q-46 0 -82 30t-44 76l-115 662h-15q-53 0 -90.5 37.5t-37.5 90.5t37.5 90.5t90.5 37.5h1792zM485 -32q26 2 43.5 22.5t15.5 46.5l-32 416q-2 26 -22.5 43.5
+t-46.5 15.5t-43.5 -22.5t-15.5 -46.5l32 -416q2 -25 20.5 -42t43.5 -17h5zM896 32v416q0 26 -19 45t-45 19t-45 -19t-19 -45v-416q0 -26 19 -45t45 -19t45 19t19 45zM1280 32v416q0 26 -19 45t-45 19t-45 -19t-19 -45v-416q0 -26 19 -45t45 -19t45 19t19 45zM1632 27l32 416
+q2 26 -15.5 46.5t-43.5 22.5t-46.5 -15.5t-22.5 -43.5l-32 -416q-2 -26 15.5 -46.5t43.5 -22.5h5q25 0 43.5 17t20.5 42zM476 1244l-93 -412h-132l101 441q19 88 89 143.5t160 55.5h167q0 26 19 45t45 19h384q26 0 45 -19t19 -45h167q90 0 160 -55.5t89 -143.5l101 -441
+h-132l-93 412q-11 44 -45.5 72t-79.5 28h-167q0 -26 -19 -45t-45 -19h-384q-26 0 -45 19t-19 45h-167q-45 0 -79.5 -28t-45.5 -72z" />
+    <glyph glyph-name="_617" unicode="&#xf292;" horiz-adv-x="1792" 
+d="M991 512l64 256h-254l-64 -256h254zM1759 1016l-56 -224q-7 -24 -31 -24h-327l-64 -256h311q15 0 25 -12q10 -14 6 -28l-56 -224q-5 -24 -31 -24h-327l-81 -328q-7 -24 -31 -24h-224q-16 0 -26 12q-9 12 -6 28l78 312h-254l-81 -328q-7 -24 -31 -24h-225q-15 0 -25 12
+q-9 12 -6 28l78 312h-311q-15 0 -25 12q-9 12 -6 28l56 224q7 24 31 24h327l64 256h-311q-15 0 -25 12q-10 14 -6 28l56 224q5 24 31 24h327l81 328q7 24 32 24h224q15 0 25 -12q9 -12 6 -28l-78 -312h254l81 328q7 24 32 24h224q15 0 25 -12q9 -12 6 -28l-78 -312h311
+q15 0 25 -12q9 -12 6 -28z" />
+    <glyph glyph-name="_618" unicode="&#xf293;" 
+d="M841 483l148 -148l-149 -149zM840 1094l149 -149l-148 -148zM710 -130l464 464l-306 306l306 306l-464 464v-611l-255 255l-93 -93l320 -321l-320 -321l93 -93l255 255v-611zM1429 640q0 -209 -32 -365.5t-87.5 -257t-140.5 -162.5t-181.5 -86.5t-219.5 -24.5
+t-219.5 24.5t-181.5 86.5t-140.5 162.5t-87.5 257t-32 365.5t32 365.5t87.5 257t140.5 162.5t181.5 86.5t219.5 24.5t219.5 -24.5t181.5 -86.5t140.5 -162.5t87.5 -257t32 -365.5z" />
+    <glyph glyph-name="_619" unicode="&#xf294;" horiz-adv-x="1024" 
+d="M596 113l173 172l-173 172v-344zM596 823l173 172l-173 172v-344zM628 640l356 -356l-539 -540v711l-297 -296l-108 108l372 373l-372 373l108 108l297 -296v711l539 -540z" />
+    <glyph glyph-name="_620" unicode="&#xf295;" 
+d="M1280 256q0 52 -38 90t-90 38t-90 -38t-38 -90t38 -90t90 -38t90 38t38 90zM512 1024q0 52 -38 90t-90 38t-90 -38t-38 -90t38 -90t90 -38t90 38t38 90zM1536 256q0 -159 -112.5 -271.5t-271.5 -112.5t-271.5 112.5t-112.5 271.5t112.5 271.5t271.5 112.5t271.5 -112.5
+t112.5 -271.5zM1440 1344q0 -20 -13 -38l-1056 -1408q-19 -26 -51 -26h-160q-26 0 -45 19t-19 45q0 20 13 38l1056 1408q19 26 51 26h160q26 0 45 -19t19 -45zM768 1024q0 -159 -112.5 -271.5t-271.5 -112.5t-271.5 112.5t-112.5 271.5t112.5 271.5t271.5 112.5
+t271.5 -112.5t112.5 -271.5z" />
+    <glyph glyph-name="_621" unicode="&#xf296;" horiz-adv-x="1792" 
+d="M104 830l792 -1015l-868 630q-18 13 -25 34.5t0 42.5l101 308v0zM566 830h660l-330 -1015v0zM368 1442l198 -612h-462l198 612q8 23 33 23t33 -23zM1688 830l101 -308q7 -21 0 -42.5t-25 -34.5l-868 -630l792 1015v0zM1688 830h-462l198 612q8 23 33 23t33 -23z" />
+    <glyph glyph-name="_622" unicode="&#xf297;" horiz-adv-x="1792" 
+d="M384 704h160v224h-160v-224zM1221 372v92q-104 -36 -243 -38q-135 -1 -259.5 46.5t-220.5 122.5l1 -96q88 -80 212 -128.5t272 -47.5q129 0 238 49zM640 704h640v224h-640v-224zM1792 736q0 -187 -99 -352q89 -102 89 -229q0 -157 -129.5 -268t-313.5 -111
+q-122 0 -225 52.5t-161 140.5q-19 -1 -57 -1t-57 1q-58 -88 -161 -140.5t-225 -52.5q-184 0 -313.5 111t-129.5 268q0 127 89 229q-99 165 -99 352q0 209 120 385.5t326.5 279.5t449.5 103t449.5 -103t326.5 -279.5t120 -385.5z" />
+    <glyph glyph-name="_623" unicode="&#xf298;" 
+d="M515 625v-128h-252v128h252zM515 880v-127h-252v127h252zM1273 369v-128h-341v128h341zM1273 625v-128h-672v128h672zM1273 880v-127h-672v127h672zM1408 20v1240q0 8 -6 14t-14 6h-32l-378 -256l-210 171l-210 -171l-378 256h-32q-8 0 -14 -6t-6 -14v-1240q0 -8 6 -14
+t14 -6h1240q8 0 14 6t6 14zM553 1130l185 150h-406zM983 1130l221 150h-406zM1536 1260v-1240q0 -62 -43 -105t-105 -43h-1240q-62 0 -105 43t-43 105v1240q0 62 43 105t105 43h1240q62 0 105 -43t43 -105z" />
+    <glyph glyph-name="_624" unicode="&#xf299;" horiz-adv-x="1792" 
+d="M896 720q-104 196 -160 278q-139 202 -347 318q-34 19 -70 36q-89 40 -94 32t34 -38l39 -31q62 -43 112.5 -93.5t94.5 -116.5t70.5 -113t70.5 -131q9 -17 13 -25q44 -84 84 -153t98 -154t115.5 -150t131 -123.5t148.5 -90.5q153 -66 154 -60q1 3 -49 37q-53 36 -81 57
+q-77 58 -179 211t-185 310zM549 177q-76 60 -132.5 125t-98 143.5t-71 154.5t-58.5 186t-52 209t-60.5 252t-76.5 289q273 0 497.5 -36t379 -92t271 -144.5t185.5 -172.5t110 -198.5t56 -199.5t12.5 -198.5t-9.5 -173t-20 -143.5t-13 -107l323 -327h-104l-281 285
+q-22 -2 -91.5 -14t-121.5 -19t-138 -6t-160.5 17t-167.5 59t-179 111z" />
+    <glyph glyph-name="_625" unicode="&#xf29a;" horiz-adv-x="1792" 
+d="M1374 879q-6 26 -28.5 39.5t-48.5 7.5q-261 -62 -401 -62t-401 62q-26 6 -48.5 -7.5t-28.5 -39.5t7.5 -48.5t39.5 -28.5q194 -46 303 -58q-2 -158 -15.5 -269t-26.5 -155.5t-41 -115.5l-9 -21q-10 -25 1 -49t36 -34q9 -4 23 -4q44 0 60 41l8 20q54 139 71 259h42
+q17 -120 71 -259l8 -20q16 -41 60 -41q14 0 23 4q25 10 36 34t1 49l-9 21q-28 71 -41 115.5t-26.5 155.5t-15.5 269q109 12 303 58q26 6 39.5 28.5t7.5 48.5zM1024 1024q0 53 -37.5 90.5t-90.5 37.5t-90.5 -37.5t-37.5 -90.5t37.5 -90.5t90.5 -37.5t90.5 37.5t37.5 90.5z
+M1600 640q0 -143 -55.5 -273.5t-150 -225t-225 -150t-273.5 -55.5t-273.5 55.5t-225 150t-150 225t-55.5 273.5t55.5 273.5t150 225t225 150t273.5 55.5t273.5 -55.5t225 -150t150 -225t55.5 -273.5zM896 1408q-156 0 -298 -61t-245 -164t-164 -245t-61 -298t61 -298
+t164 -245t245 -164t298 -61t298 61t245 164t164 245t61 298t-61 298t-164 245t-245 164t-298 61zM1792 640q0 -182 -71 -348t-191 -286t-286 -191t-348 -71t-348 71t-286 191t-191 286t-71 348t71 348t191 286t286 191t348 71t348 -71t286 -191t191 -286t71 -348z" />
+    <glyph glyph-name="_626" unicode="&#xf29b;" 
+d="M1438 723q34 -35 29 -82l-44 -551q-4 -42 -34.5 -70t-71.5 -28q-6 0 -9 1q-44 3 -72.5 36.5t-25.5 77.5l35 429l-143 -8q55 -113 55 -240q0 -216 -148 -372l-137 137q91 101 91 235q0 145 -102.5 248t-247.5 103q-134 0 -236 -92l-137 138q120 114 284 141l264 300
+l-149 87l-181 -161q-33 -30 -77 -27.5t-73 35.5t-26.5 77t34.5 73l239 213q26 23 60 26.5t64 -14.5l488 -283q36 -21 48 -68q17 -67 -26 -117l-205 -232l371 20q49 3 83 -32zM1240 1180q-74 0 -126 52t-52 126t52 126t126 52t126.5 -52t52.5 -126t-52.5 -126t-126.5 -52z
+M613 -62q106 0 196 61l139 -139q-146 -116 -335 -116q-148 0 -273.5 73t-198.5 198t-73 273q0 188 116 336l139 -139q-60 -88 -60 -197q0 -145 102.5 -247.5t247.5 -102.5z" />
+    <glyph glyph-name="_627" unicode="&#xf29c;" 
+d="M880 336v-160q0 -14 -9 -23t-23 -9h-160q-14 0 -23 9t-9 23v160q0 14 9 23t23 9h160q14 0 23 -9t9 -23zM1136 832q0 -50 -15 -90t-45.5 -69t-52 -44t-59.5 -36q-32 -18 -46.5 -28t-26 -24t-11.5 -29v-32q0 -14 -9 -23t-23 -9h-160q-14 0 -23 9t-9 23v68q0 35 10.5 64.5
+t24 47.5t39 35.5t41 25.5t44.5 21q53 25 75 43t22 49q0 42 -43.5 71.5t-95.5 29.5q-56 0 -95 -27q-29 -20 -80 -83q-9 -12 -25 -12q-11 0 -19 6l-108 82q-10 7 -12 20t5 23q122 192 349 192q129 0 238.5 -89.5t109.5 -214.5zM768 1280q-130 0 -248.5 -51t-204 -136.5
+t-136.5 -204t-51 -248.5t51 -248.5t136.5 -204t204 -136.5t248.5 -51t248.5 51t204 136.5t136.5 204t51 248.5t-51 248.5t-136.5 204t-204 136.5t-248.5 51zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5
+t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="_628" unicode="&#xf29d;" horiz-adv-x="1408" 
+d="M366 1225q-64 0 -110 45.5t-46 110.5q0 64 46 109.5t110 45.5t109.5 -45.5t45.5 -109.5q0 -65 -45.5 -110.5t-109.5 -45.5zM917 583q0 -50 -30 -67.5t-63.5 -6.5t-47.5 34l-367 438q-7 12 -14 15.5t-11 1.5l-3 -3q-7 -8 4 -21l122 -139l1 -354l-161 -457
+q-67 -192 -92 -234q-15 -26 -28 -32q-50 -26 -103 -1q-29 13 -41.5 43t-9.5 57q2 17 197 618l5 416l-85 -164l35 -222q4 -24 -1 -42t-14 -27.5t-19 -16t-17 -7.5l-7 -2q-19 -3 -34.5 3t-24 16t-14 22t-7.5 19.5t-2 9.5l-46 299l211 381q23 34 113 34q75 0 107 -40l424 -521
+q7 -5 14 -17l3 -3l-1 -1q7 -13 7 -29zM514 433q43 -113 88.5 -225t69.5 -168l24 -55q36 -93 42 -125q11 -70 -36 -97q-35 -22 -66 -16t-51 22t-29 35h-1q-6 16 -8 25l-124 351zM1338 -159q31 -49 31 -57q0 -5 -3 -7q-9 -5 -14.5 0.5t-15.5 26t-16 30.5q-114 172 -423 661
+q3 -1 7 1t7 4l3 2q11 9 11 17z" />
+    <glyph glyph-name="_629" unicode="&#xf29e;" horiz-adv-x="2304" 
+d="M504 542h171l-1 265zM1530 641q0 87 -50.5 140t-146.5 53h-54v-388h52q91 0 145 57t54 138zM956 1018l1 -756q0 -14 -9.5 -24t-23.5 -10h-216q-14 0 -23.5 10t-9.5 24v62h-291l-55 -81q-10 -15 -28 -15h-267q-21 0 -30.5 18t3.5 35l556 757q9 14 27 14h332q14 0 24 -10
+t10 -24zM1783 641q0 -193 -125.5 -303t-324.5 -110h-270q-14 0 -24 10t-10 24v756q0 14 10 24t24 10h268q200 0 326 -109t126 -302zM1939 640q0 -11 -0.5 -29t-8 -71.5t-21.5 -102t-44.5 -108t-73.5 -102.5h-51q38 45 66.5 104.5t41.5 112t21 98t9 72.5l1 27q0 8 -0.5 22.5
+t-7.5 60t-20 91.5t-41 111.5t-66 124.5h43q41 -47 72 -107t45.5 -111.5t23 -96t10.5 -70.5zM2123 640q0 -11 -0.5 -29t-8 -71.5t-21.5 -102t-45 -108t-74 -102.5h-51q38 45 66.5 104.5t41.5 112t21 98t9 72.5l1 27q0 8 -0.5 22.5t-7.5 60t-19.5 91.5t-40.5 111.5t-66 124.5
+h43q41 -47 72 -107t45.5 -111.5t23 -96t10.5 -70.5zM2304 640q0 -11 -0.5 -29t-8 -71.5t-21.5 -102t-44.5 -108t-73.5 -102.5h-51q38 45 66 104.5t41 112t21 98t9 72.5l1 27q0 8 -0.5 22.5t-7.5 60t-19.5 91.5t-40.5 111.5t-66 124.5h43q41 -47 72 -107t45.5 -111.5t23 -96
+t9.5 -70.5z" />
+    <glyph glyph-name="uniF2A0" unicode="&#xf2a0;" horiz-adv-x="1408" 
+d="M617 -153q0 11 -13 58t-31 107t-20 69q-1 4 -5 26.5t-8.5 36t-13.5 21.5q-15 14 -51 14q-23 0 -70 -5.5t-71 -5.5q-34 0 -47 11q-6 5 -11 15.5t-7.5 20t-6.5 24t-5 18.5q-37 128 -37 255t37 255q1 4 5 18.5t6.5 24t7.5 20t11 15.5q13 11 47 11q24 0 71 -5.5t70 -5.5
+q36 0 51 14q9 8 13.5 21.5t8.5 36t5 26.5q2 9 20 69t31 107t13 58q0 22 -43.5 52.5t-75.5 42.5q-20 8 -45 8q-34 0 -98 -18q-57 -17 -96.5 -40.5t-71 -66t-46 -70t-45.5 -94.5q-6 -12 -9 -19q-49 -107 -68 -216t-19 -244t19 -244t68 -216q56 -122 83 -161q63 -91 179 -127
+l6 -2q64 -18 98 -18q25 0 45 8q32 12 75.5 42.5t43.5 52.5zM776 760q-26 0 -45 19t-19 45.5t19 45.5q37 37 37 90q0 52 -37 91q-19 19 -19 45t19 45t45 19t45 -19q75 -75 75 -181t-75 -181q-21 -19 -45 -19zM957 579q-27 0 -45 19q-19 19 -19 45t19 45q112 114 112 272
+t-112 272q-19 19 -19 45t19 45t45 19t45 -19q150 -150 150 -362t-150 -362q-18 -19 -45 -19zM1138 398q-27 0 -45 19q-19 19 -19 45t19 45q90 91 138.5 208t48.5 245t-48.5 245t-138.5 208q-19 19 -19 45t19 45t45 19t45 -19q109 -109 167 -249t58 -294t-58 -294t-167 -249
+q-18 -19 -45 -19z" />
+    <glyph glyph-name="uniF2A1" unicode="&#xf2a1;" horiz-adv-x="2176" 
+d="M192 352q-66 0 -113 -47t-47 -113t47 -113t113 -47t113 47t47 113t-47 113t-113 47zM704 352q-66 0 -113 -47t-47 -113t47 -113t113 -47t113 47t47 113t-47 113t-113 47zM704 864q-66 0 -113 -47t-47 -113t47 -113t113 -47t113 47t47 113t-47 113t-113 47zM1472 352
+q-66 0 -113 -47t-47 -113t47 -113t113 -47t113 47t47 113t-47 113t-113 47zM1984 352q-66 0 -113 -47t-47 -113t47 -113t113 -47t113 47t47 113t-47 113t-113 47zM1472 864q-66 0 -113 -47t-47 -113t47 -113t113 -47t113 47t47 113t-47 113t-113 47zM1984 864
+q-66 0 -113 -47t-47 -113t47 -113t113 -47t113 47t47 113t-47 113t-113 47zM1984 1376q-66 0 -113 -47t-47 -113t47 -113t113 -47t113 47t47 113t-47 113t-113 47zM384 192q0 -80 -56 -136t-136 -56t-136 56t-56 136t56 136t136 56t136 -56t56 -136zM896 192q0 -80 -56 -136
+t-136 -56t-136 56t-56 136t56 136t136 56t136 -56t56 -136zM384 704q0 -80 -56 -136t-136 -56t-136 56t-56 136t56 136t136 56t136 -56t56 -136zM896 704q0 -80 -56 -136t-136 -56t-136 56t-56 136t56 136t136 56t136 -56t56 -136zM384 1216q0 -80 -56 -136t-136 -56
+t-136 56t-56 136t56 136t136 56t136 -56t56 -136zM1664 192q0 -80 -56 -136t-136 -56t-136 56t-56 136t56 136t136 56t136 -56t56 -136zM896 1216q0 -80 -56 -136t-136 -56t-136 56t-56 136t56 136t136 56t136 -56t56 -136zM2176 192q0 -80 -56 -136t-136 -56t-136 56
+t-56 136t56 136t136 56t136 -56t56 -136zM1664 704q0 -80 -56 -136t-136 -56t-136 56t-56 136t56 136t136 56t136 -56t56 -136zM2176 704q0 -80 -56 -136t-136 -56t-136 56t-56 136t56 136t136 56t136 -56t56 -136zM1664 1216q0 -80 -56 -136t-136 -56t-136 56t-56 136
+t56 136t136 56t136 -56t56 -136zM2176 1216q0 -80 -56 -136t-136 -56t-136 56t-56 136t56 136t136 56t136 -56t56 -136z" />
+    <glyph glyph-name="uniF2A2" unicode="&#xf2a2;" horiz-adv-x="1792" 
+d="M128 -192q0 -26 -19 -45t-45 -19t-45 19t-19 45t19 45t45 19t45 -19t19 -45zM320 0q0 -26 -19 -45t-45 -19t-45 19t-19 45t19 45t45 19t45 -19t19 -45zM365 365l256 -256l-90 -90l-256 256zM704 384q0 -26 -19 -45t-45 -19t-45 19t-19 45t19 45t45 19t45 -19t19 -45z
+M1411 704q0 -59 -11.5 -108.5t-37.5 -93.5t-44 -67.5t-53 -64.5q-31 -35 -45.5 -54t-33.5 -50t-26.5 -64t-7.5 -74q0 -159 -112.5 -271.5t-271.5 -112.5q-26 0 -45 19t-19 45t19 45t45 19q106 0 181 75t75 181q0 57 11.5 105.5t37 91t43.5 66.5t52 63q40 46 59.5 72
+t37.5 74.5t18 103.5q0 185 -131.5 316.5t-316.5 131.5t-316.5 -131.5t-131.5 -316.5q0 -26 -19 -45t-45 -19t-45 19t-19 45q0 117 45.5 223.5t123 184t184 123t223.5 45.5t223.5 -45.5t184 -123t123 -184t45.5 -223.5zM896 576q0 -26 -19 -45t-45 -19t-45 19t-19 45t19 45
+t45 19t45 -19t19 -45zM1184 704q0 -26 -19 -45t-45 -19t-45 19t-19 45q0 93 -65.5 158.5t-158.5 65.5q-92 0 -158 -65.5t-66 -158.5q0 -26 -19 -45t-45 -19t-45 19t-19 45q0 146 103 249t249 103t249 -103t103 -249zM1578 993q10 -25 -1 -49t-36 -34q-9 -4 -23 -4
+q-19 0 -35.5 11t-23.5 30q-68 178 -224 295q-21 16 -25 42t12 47q17 21 43 25t47 -12q183 -137 266 -351zM1788 1074q9 -25 -1.5 -49t-35.5 -34q-11 -4 -23 -4q-44 0 -60 41q-92 238 -297 393q-22 16 -25.5 42t12.5 47q16 22 42 25.5t47 -12.5q235 -175 341 -449z" />
+    <glyph glyph-name="uniF2A3" unicode="&#xf2a3;" horiz-adv-x="2304" 
+d="M1032 576q-59 2 -84 55q-17 34 -48 53.5t-68 19.5q-53 0 -90.5 -37.5t-37.5 -90.5q0 -56 36 -89l10 -8q34 -31 82 -31q37 0 68 19.5t48 53.5q25 53 84 55zM1600 704q0 56 -36 89l-10 8q-34 31 -82 31q-37 0 -68 -19.5t-48 -53.5q-25 -53 -84 -55q59 -2 84 -55
+q17 -34 48 -53.5t68 -19.5q53 0 90.5 37.5t37.5 90.5zM1174 925q-17 -35 -55 -48t-73 4q-62 31 -134 31q-51 0 -99 -17q3 0 9.5 0.5t9.5 0.5q92 0 170.5 -50t118.5 -133q17 -36 3.5 -73.5t-49.5 -54.5q-18 -9 -39 -9q21 0 39 -9q36 -17 49.5 -54.5t-3.5 -73.5
+q-40 -83 -118.5 -133t-170.5 -50h-6q-16 2 -44 4l-290 27l-239 -120q-14 -7 -29 -7q-40 0 -57 35l-160 320q-11 23 -4 47.5t29 37.5l209 119l148 267q17 155 91.5 291.5t195.5 236.5q31 25 70.5 21.5t64.5 -34.5t21.5 -70t-34.5 -65q-70 -59 -117 -128q123 84 267 101
+q40 5 71.5 -19t35.5 -64q5 -40 -19 -71.5t-64 -35.5q-84 -10 -159 -55q46 10 99 10q115 0 218 -50q36 -18 49 -55.5t-5 -73.5zM2137 1085l160 -320q11 -23 4 -47.5t-29 -37.5l-209 -119l-148 -267q-17 -155 -91.5 -291.5t-195.5 -236.5q-26 -22 -61 -22q-45 0 -74 35
+q-25 31 -21.5 70t34.5 65q70 59 117 128q-123 -84 -267 -101q-4 -1 -12 -1q-36 0 -63.5 24t-31.5 60q-5 40 19 71.5t64 35.5q84 10 159 55q-46 -10 -99 -10q-115 0 -218 50q-36 18 -49 55.5t5 73.5q17 35 55 48t73 -4q62 -31 134 -31q51 0 99 17q-3 0 -9.5 -0.5t-9.5 -0.5
+q-92 0 -170.5 50t-118.5 133q-17 36 -3.5 73.5t49.5 54.5q18 9 39 9q-21 0 -39 9q-36 17 -49.5 54.5t3.5 73.5q40 83 118.5 133t170.5 50h6h1q14 -2 42 -4l291 -27l239 120q14 7 29 7q40 0 57 -35z" />
+    <glyph glyph-name="uniF2A4" unicode="&#xf2a4;" horiz-adv-x="1792" 
+d="M1056 704q0 -26 19 -45t45 -19t45 19t19 45q0 146 -103 249t-249 103t-249 -103t-103 -249q0 -26 19 -45t45 -19t45 19t19 45q0 93 66 158.5t158 65.5t158 -65.5t66 -158.5zM835 1280q-117 0 -223.5 -45.5t-184 -123t-123 -184t-45.5 -223.5q0 -26 19 -45t45 -19t45 19
+t19 45q0 185 131.5 316.5t316.5 131.5t316.5 -131.5t131.5 -316.5q0 -55 -18 -103.5t-37.5 -74.5t-59.5 -72q-34 -39 -52 -63t-43.5 -66.5t-37 -91t-11.5 -105.5q0 -106 -75 -181t-181 -75q-26 0 -45 -19t-19 -45t19 -45t45 -19q159 0 271.5 112.5t112.5 271.5q0 41 7.5 74
+t26.5 64t33.5 50t45.5 54q35 41 53 64.5t44 67.5t37.5 93.5t11.5 108.5q0 117 -45.5 223.5t-123 184t-184 123t-223.5 45.5zM591 561l226 -226l-579 -579q-12 -12 -29 -12t-29 12l-168 168q-12 12 -12 29t12 29zM1612 1524l168 -168q12 -12 12 -29t-12 -30l-233 -233
+l-26 -25l-71 -71q-66 153 -195 258l91 91l207 207q13 12 30 12t29 -12z" />
+    <glyph glyph-name="uniF2A5" unicode="&#xf2a5;" 
+d="M866 1021q0 -27 -13 -94q-11 -50 -31.5 -150t-30.5 -150q-2 -11 -4.5 -12.5t-13.5 -2.5q-20 -2 -31 -2q-58 0 -84 49.5t-26 113.5q0 88 35 174t103 124q28 14 51 14q28 0 36.5 -16.5t8.5 -47.5zM1352 597q0 14 -39 75.5t-52 66.5q-21 8 -34 8q-91 0 -226 -77l-2 2
+q3 22 27.5 135t24.5 178q0 233 -242 233q-24 0 -68 -6q-94 -17 -168.5 -89.5t-111.5 -166.5t-37 -189q0 -146 80.5 -225t227.5 -79q25 0 25 -3t-1 -5q-4 -34 -26 -117q-14 -52 -51.5 -101t-82.5 -49q-42 0 -42 47q0 24 10.5 47.5t25 39.5t29.5 28.5t26 20t11 8.5q0 3 -7 10
+q-24 22 -58.5 36.5t-65.5 14.5q-35 0 -63.5 -34t-41 -75t-12.5 -75q0 -88 51.5 -142t138.5 -54q82 0 155 53t117.5 126t65.5 153q6 22 15.5 66.5t14.5 66.5q3 12 14 18q118 60 227 60q48 0 127 -18q1 -1 4 -1q5 0 9.5 4.5t4.5 8.5zM1536 1120v-960q0 -119 -84.5 -203.5
+t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="uniF2A6" unicode="&#xf2a6;" horiz-adv-x="1535" 
+d="M744 1231q0 24 -2 38.5t-8.5 30t-21 23t-37.5 7.5q-39 0 -78 -23q-105 -58 -159 -190.5t-54 -269.5q0 -44 8.5 -85.5t26.5 -80.5t52.5 -62.5t81.5 -23.5q4 0 18 -0.5t20 0t16 3t15 8.5t7 16q16 77 48 231.5t48 231.5q19 91 19 146zM1498 575q0 -7 -7.5 -13.5t-15.5 -6.5
+l-6 1q-22 3 -62 11t-72 12.5t-63 4.5q-167 0 -351 -93q-15 -8 -21 -27q-10 -36 -24.5 -105.5t-22.5 -100.5q-23 -91 -70 -179.5t-112.5 -164.5t-154.5 -123t-185 -47q-135 0 -214.5 83.5t-79.5 219.5q0 53 19.5 117t63 116.5t97.5 52.5q38 0 120 -33.5t83 -61.5
+q0 -1 -16.5 -12.5t-39.5 -31t-46 -44.5t-39 -61t-16 -74q0 -33 16.5 -53t48.5 -20q45 0 85 31.5t66.5 78t48 105.5t32.5 107t16 90v9q0 2 -3.5 3.5t-8.5 1.5h-10t-10 -0.5t-6 -0.5q-227 0 -352 122.5t-125 348.5q0 108 34.5 221t96 210t156 167.5t204.5 89.5q52 9 106 9
+q374 0 374 -360q0 -98 -38 -273t-43 -211l3 -3q101 57 182.5 88t167.5 31q22 0 53 -13q19 -7 80 -102.5t61 -116.5z" />
+    <glyph glyph-name="uniF2A7" unicode="&#xf2a7;" horiz-adv-x="1664" 
+d="M831 863q32 0 59 -18l222 -148q61 -40 110 -97l146 -170q40 -46 29 -106l-72 -413q-6 -32 -29.5 -53.5t-55.5 -25.5l-527 -56l-352 -32h-9q-39 0 -67.5 28t-28.5 68q0 37 27 64t65 32l260 32h-448q-41 0 -69.5 30t-26.5 71q2 39 32 65t69 26l442 1l-521 64q-41 5 -66 37
+t-19 73q6 35 34.5 57.5t65.5 22.5h10l481 -60l-351 94q-38 10 -62 41.5t-18 68.5q6 36 33 58.5t62 22.5q6 0 20 -2l448 -96l217 -37q1 0 3 -0.5t3 -0.5q23 0 30.5 23t-12.5 36l-186 125q-35 23 -42 63.5t18 73.5q27 38 76 38zM761 661l186 -125l-218 37l-5 2l-36 38
+l-238 262q-1 1 -2.5 3.5t-2.5 3.5q-24 31 -18.5 70t37.5 64q31 23 68 17.5t64 -33.5l142 -147q-2 -1 -5 -3.5t-4 -4.5q-32 -45 -23 -99t55 -85zM1648 1115l15 -266q4 -73 -11 -147l-48 -219q-12 -59 -67 -87l-106 -54q2 62 -39 109l-146 170q-53 61 -117 103l-222 148
+q-34 23 -76 23q-51 0 -88 -37l-235 312q-25 33 -18 73.5t41 63.5q33 22 71.5 14t62.5 -40l266 -352l-262 455q-21 35 -10.5 75t47.5 59q35 18 72.5 6t57.5 -46l241 -420l-136 337q-15 35 -4.5 74t44.5 56q37 19 76 6t56 -51l193 -415l101 -196q8 -15 23 -17.5t27 7.5t11 26
+l-12 224q-2 41 26 71t69 31q39 0 67 -28.5t30 -67.5z" />
+    <glyph glyph-name="uniF2A8" unicode="&#xf2a8;" horiz-adv-x="1792" 
+d="M335 180q-2 0 -6 2q-86 57 -168.5 145t-139.5 180q-21 30 -21 69q0 9 2 19t4 18t7 18t8.5 16t10.5 17t10 15t12 15.5t11 14.5q184 251 452 365q-110 198 -110 211q0 19 17 29q116 64 128 64q18 0 28 -16l124 -229q92 19 192 19q266 0 497.5 -137.5t378.5 -369.5
+q20 -31 20 -69t-20 -69q-91 -142 -218.5 -253.5t-278.5 -175.5q110 -198 110 -211q0 -20 -17 -29q-116 -64 -127 -64q-19 0 -29 16l-124 229l-64 119l-444 820l7 7q-58 -24 -99 -47q3 -5 127 -234t243 -449t119 -223q0 -7 -9 -9q-13 -3 -72 -3q-57 0 -60 7l-456 841
+q-39 -28 -82 -68q24 -43 214 -393.5t190 -354.5q0 -10 -11 -10q-14 0 -82.5 22t-72.5 28l-106 197l-224 413q-44 -53 -78 -106q2 -3 18 -25t23 -34l176 -327q0 -10 -10 -10zM1165 282l49 -91q273 111 450 385q-180 277 -459 389q67 -64 103 -148.5t36 -176.5
+q0 -106 -47 -200.5t-132 -157.5zM848 896q0 -20 14 -34t34 -14q86 0 147 -61t61 -147q0 -20 14 -34t34 -14t34 14t14 34q0 126 -89 215t-215 89q-20 0 -34 -14t-14 -34zM1214 961l-9 4l7 -7z" />
+    <glyph glyph-name="uniF2A9" unicode="&#xf2a9;" horiz-adv-x="1280" 
+d="M1050 430q0 -215 -147 -374q-148 -161 -378 -161q-232 0 -378 161q-147 159 -147 374q0 147 68 270.5t189 196.5t268 73q96 0 182 -31q-32 -62 -39 -126q-66 28 -143 28q-167 0 -280.5 -123t-113.5 -291q0 -170 112.5 -288.5t281.5 -118.5t281 118.5t112 288.5
+q0 89 -32 166q66 13 123 49q41 -98 41 -212zM846 619q0 -192 -79.5 -345t-238.5 -253l-14 -1q-29 0 -62 5q83 32 146.5 102.5t99.5 154.5t58.5 189t30 192.5t7.5 178.5q0 69 -3 103q55 -160 55 -326zM791 947v-2q-73 214 -206 440q88 -59 142.5 -186.5t63.5 -251.5z
+M1035 744q-83 0 -160 75q218 120 290 247q19 37 21 56q-42 -94 -139.5 -166.5t-204.5 -97.5q-35 54 -35 113q0 37 17 79t43 68q46 44 157 74q59 16 106 58.5t74 100.5q74 -105 74 -253q0 -109 -24 -170q-32 -77 -88.5 -130.5t-130.5 -53.5z" />
+    <glyph glyph-name="uniF2AA" unicode="&#xf2aa;" 
+d="M1050 495q0 78 -28 147q-41 -25 -85 -34q22 -50 22 -114q0 -117 -77 -198.5t-193 -81.5t-193.5 81.5t-77.5 198.5q0 115 78 199.5t193 84.5q53 0 98 -19q4 43 27 87q-60 21 -125 21q-154 0 -257.5 -108.5t-103.5 -263.5t103.5 -261t257.5 -106t257.5 106.5t103.5 260.5z
+M872 850q2 -24 2 -71q0 -63 -5 -123t-20.5 -132.5t-40.5 -130t-68.5 -106t-100.5 -70.5q21 -3 42 -3h10q219 139 219 411q0 116 -38 225zM872 850q-4 80 -44 171.5t-98 130.5q92 -156 142 -302zM1207 955q0 102 -51 174q-41 -86 -124 -109q-69 -19 -109 -53.5t-40 -99.5
+q0 -40 24 -77q74 17 140.5 67t95.5 115q-4 -52 -74.5 -111.5t-138.5 -97.5q52 -52 110 -52q51 0 90 37t60 90q17 42 17 117zM1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960q119 0 203.5 -84.5
+t84.5 -203.5z" />
+    <glyph glyph-name="uniF2AB" unicode="&#xf2ab;" 
+d="M1279 388q0 22 -22 27q-67 15 -118 59t-80 108q-7 19 -7 25q0 15 19.5 26t43 17t43 20.5t19.5 36.5q0 19 -18.5 31.5t-38.5 12.5q-12 0 -32 -8t-31 -8q-4 0 -12 2q5 95 5 114q0 79 -17 114q-36 78 -103 121.5t-152 43.5q-199 0 -275 -165q-17 -35 -17 -114q0 -19 5 -114
+q-4 -2 -14 -2q-12 0 -32 7.5t-30 7.5q-21 0 -38.5 -12t-17.5 -32q0 -21 19.5 -35.5t43 -20.5t43 -17t19.5 -26q0 -6 -7 -25q-64 -138 -198 -167q-22 -5 -22 -27q0 -46 137 -68q2 -5 6 -26t11.5 -30.5t23.5 -9.5q12 0 37.5 4.5t39.5 4.5q35 0 67 -15t54 -32.5t57.5 -32.5
+t76.5 -15q43 0 79 15t57.5 32.5t53.5 32.5t67 15q14 0 39.5 -4t38.5 -4q16 0 23 10t11 30t6 25q137 22 137 68zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5
+t103 -385.5z" />
+    <glyph glyph-name="uniF2AC" unicode="&#xf2ac;" horiz-adv-x="1664" 
+d="M848 1408q134 1 240.5 -68.5t163.5 -192.5q27 -58 27 -179q0 -47 -9 -191q14 -7 28 -7q18 0 51 13.5t51 13.5q29 0 56 -18t27 -46q0 -32 -31.5 -54t-69 -31.5t-69 -29t-31.5 -47.5q0 -15 12 -43q37 -82 102.5 -150t144.5 -101q28 -12 80 -23q28 -6 28 -35
+q0 -70 -219 -103q-7 -11 -11 -39t-14 -46.5t-33 -18.5q-20 0 -62 6.5t-64 6.5q-37 0 -62 -5q-32 -5 -63 -22.5t-58 -38t-58 -40.5t-76 -33.5t-99 -13.5q-52 0 -96.5 13.5t-75 33.5t-57.5 40.5t-58 38t-62 22.5q-26 5 -63 5q-24 0 -65.5 -7.5t-58.5 -7.5q-25 0 -35 18.5
+t-14 47.5t-11 40q-219 33 -219 103q0 29 28 35q52 11 80 23q78 32 144.5 101t102.5 150q12 28 12 43q0 28 -31.5 47.5t-69.5 29.5t-69.5 31.5t-31.5 52.5q0 27 26 45.5t55 18.5q15 0 48 -13t53 -13q18 0 32 7q-9 142 -9 190q0 122 27 180q64 137 172 198t264 63z" />
+    <glyph glyph-name="uniF2AD" unicode="&#xf2ad;" 
+d="M1280 388q0 22 -22 27q-67 14 -118 58t-80 109q-7 14 -7 25q0 15 19.5 26t42.5 17t42.5 20.5t19.5 36.5q0 19 -18.5 31.5t-38.5 12.5q-11 0 -31 -8t-32 -8q-4 0 -12 2q5 63 5 115q0 78 -17 114q-36 78 -102.5 121.5t-152.5 43.5q-198 0 -275 -165q-18 -38 -18 -115
+q0 -38 6 -114q-10 -2 -15 -2q-11 0 -31.5 8t-30.5 8q-20 0 -37.5 -12.5t-17.5 -32.5q0 -21 19.5 -35.5t42.5 -20.5t42.5 -17t19.5 -26q0 -11 -7 -25q-64 -138 -198 -167q-22 -5 -22 -27q0 -47 138 -69q2 -5 6 -26t11 -30.5t23 -9.5q13 0 38.5 5t38.5 5q35 0 67.5 -15
+t54.5 -32.5t57.5 -32.5t76.5 -15q43 0 79 15t57.5 32.5t54 32.5t67.5 15q13 0 39 -4.5t39 -4.5q15 0 22.5 9.5t11.5 31t5 24.5q138 22 138 69zM1536 1120v-960q0 -119 -84.5 -203.5t-203.5 -84.5h-960q-119 0 -203.5 84.5t-84.5 203.5v960q0 119 84.5 203.5t203.5 84.5h960
+q119 0 203.5 -84.5t84.5 -203.5z" />
+    <glyph glyph-name="uniF2AE" unicode="&#xf2ae;" horiz-adv-x="2304" 
+d="M2304 1536q-69 -46 -125 -92t-89 -81t-59.5 -71.5t-37.5 -57.5t-22 -44.5t-14 -29.5q-10 -18 -35.5 -136.5t-48.5 -164.5q-15 -29 -50 -60.5t-67.5 -50.5t-72.5 -41t-48 -28q-47 -31 -151 -231q-341 14 -630 -158q-92 -53 -303 -179q47 16 86 31t55 22l15 7
+q71 27 163 64.5t133.5 53.5t108 34.5t142.5 31.5q186 31 465 -7q1 0 10 -3q11 -6 14 -17t-3 -22l-194 -345q-15 -29 -47 -22q-128 24 -354 24q-146 0 -402 -44.5t-392 -46.5q-82 -1 -149 13t-107 37t-61 40t-33 34l-1 1v2q0 6 6 6q138 0 371 55q192 366 374.5 524t383.5 158
+q5 0 14.5 -0.5t38 -5t55 -12t61.5 -24.5t63 -39.5t54 -59t40 -82.5l102 177q2 4 21 42.5t44.5 86.5t61 109.5t84 133.5t100.5 137q66 82 128 141.5t121.5 96.5t92.5 53.5t88 39.5z" />
+    <glyph glyph-name="uniF2B0" unicode="&#xf2b0;" 
+d="M1322 640q0 -45 -5 -76l-236 14l224 -78q-19 -73 -58 -141l-214 103l177 -158q-44 -61 -107 -108l-157 178l103 -215q-61 -37 -140 -59l-79 228l14 -240q-38 -6 -76 -6t-76 6l14 238l-78 -226q-74 19 -140 59l103 215l-157 -178q-59 43 -108 108l178 158l-214 -104
+q-39 69 -58 141l224 79l-237 -14q-5 42 -5 76q0 35 5 77l238 -14l-225 79q19 73 58 140l214 -104l-177 159q46 61 107 108l158 -178l-103 215q67 39 140 58l77 -224l-13 236q36 6 75 6q38 0 76 -6l-14 -237l78 225q74 -19 140 -59l-103 -214l158 178q61 -47 107 -108
+l-177 -159l213 104q37 -62 58 -141l-224 -78l237 14q5 -31 5 -77zM1352 640q0 160 -78.5 295.5t-213 214t-292.5 78.5q-119 0 -227 -46.5t-186.5 -125t-124.5 -187.5t-46 -229q0 -119 46 -228t124.5 -187.5t186.5 -125t227 -46.5q158 0 292.5 78.5t213 214t78.5 294.5z
+M1425 1023v-766l-657 -383l-657 383v766l657 383zM768 -183l708 412v823l-708 411l-708 -411v-823zM1536 1088v-896l-768 -448l-768 448v896l768 448z" />
+    <glyph glyph-name="uniF2B1" unicode="&#xf2b1;" horiz-adv-x="1664" 
+d="M339 1318h691l-26 -72h-665q-110 0 -188.5 -79t-78.5 -189v-771q0 -95 60.5 -169.5t153.5 -93.5q23 -5 98 -5v-72h-45q-140 0 -239.5 100t-99.5 240v771q0 140 99.5 240t239.5 100zM1190 1536h247l-482 -1294q-23 -61 -40.5 -103.5t-45 -98t-54 -93.5t-64.5 -78.5
+t-79.5 -65t-95.5 -41t-116 -18.5v195q163 26 220 182q20 52 20 105q0 54 -20 106l-285 733h228l187 -585zM1664 978v-1111h-795q37 55 45 73h678v1038q0 85 -49.5 155t-129.5 99l25 67q101 -34 163.5 -123.5t62.5 -197.5z" />
+    <glyph glyph-name="uniF2B2" unicode="&#xf2b2;" horiz-adv-x="1792" 
+d="M852 1227q0 -29 -17 -52.5t-45 -23.5t-45 23.5t-17 52.5t17 52.5t45 23.5t45 -23.5t17 -52.5zM688 -149v114q0 30 -20.5 51.5t-50.5 21.5t-50 -21.5t-20 -51.5v-114q0 -30 20.5 -52t49.5 -22q30 0 50.5 22t20.5 52zM860 -149v114q0 30 -20 51.5t-50 21.5t-50.5 -21.5
+t-20.5 -51.5v-114q0 -30 20.5 -52t50.5 -22q29 0 49.5 22t20.5 52zM1034 -149v114q0 30 -20.5 51.5t-50.5 21.5t-50.5 -21.5t-20.5 -51.5v-114q0 -30 20.5 -52t50.5 -22t50.5 22t20.5 52zM1208 -149v114q0 30 -20.5 51.5t-50.5 21.5t-50.5 -21.5t-20.5 -51.5v-114
+q0 -30 20.5 -52t50.5 -22t50.5 22t20.5 52zM1476 535q-84 -160 -232 -259.5t-323 -99.5q-123 0 -229.5 51.5t-178.5 137t-113 197.5t-41 232q0 88 21 174q-104 -175 -104 -390q0 -162 65 -312t185 -251q30 57 91 57q56 0 86 -50q32 50 87 50q56 0 86 -50q32 50 87 50t87 -50
+q30 50 86 50q28 0 52.5 -15.5t37.5 -40.5q112 94 177 231.5t73 287.5zM1326 564q0 75 -72 75q-17 0 -47 -6q-95 -19 -149 -19q-226 0 -226 243q0 86 30 204q-83 -127 -83 -275q0 -150 89 -260.5t235 -110.5q111 0 210 70q13 48 13 79zM884 1223q0 50 -32 89.5t-81 39.5
+t-81 -39.5t-32 -89.5q0 -51 31.5 -90.5t81.5 -39.5t81.5 39.5t31.5 90.5zM1513 884q0 96 -37.5 179t-113 137t-173.5 54q-77 0 -149 -35t-127 -94q-48 -159 -48 -268q0 -104 45.5 -157t147.5 -53q53 0 142 19q36 6 53 6q51 0 77.5 -28t26.5 -80q0 -26 -4 -46
+q75 68 117.5 165.5t42.5 200.5zM1792 667q0 -111 -33.5 -249.5t-93.5 -204.5q-58 -64 -195 -142.5t-228 -104.5l-4 -1v-114q0 -43 -29.5 -75t-72.5 -32q-56 0 -86 50q-32 -50 -87 -50t-87 50q-30 -50 -86 -50q-55 0 -87 50q-30 -50 -86 -50q-47 0 -75 33.5t-28 81.5
+q-90 -68 -198 -68q-118 0 -211 80q54 1 106 20q-113 31 -182 127q32 -7 71 -7q89 0 164 46q-192 192 -240 306q-24 56 -24 160q0 57 9 125.5t31.5 146.5t55 141t86.5 105t120 42q59 0 81 -52q19 29 42 54q2 3 12 13t13 16q10 15 23 38t25 42t28 39q87 111 211.5 177
+t260.5 66q35 0 62 -4q59 64 146 64q83 0 140 -57q5 -5 5 -12q0 -5 -6 -13.5t-12.5 -16t-16 -17l-10.5 -10.5q17 -6 36 -18t19 -24q0 -6 -16 -25q157 -138 197 -378q25 30 60 30q45 0 100 -49q90 -80 90 -279z" />
+    <glyph glyph-name="uniF2B3" unicode="&#xf2b3;" 
+d="M917 631q0 33 -6 64h-362v-132h217q-12 -76 -74.5 -120.5t-142.5 -44.5q-99 0 -169 71.5t-70 170.5t70 170.5t169 71.5q93 0 153 -59l104 101q-108 100 -257 100q-160 0 -272 -112.5t-112 -271.5t112 -271.5t272 -112.5q165 0 266.5 105t101.5 270zM1262 585h109v110
+h-109v110h-110v-110h-110v-110h110v-110h110v110zM1536 640q0 -209 -103 -385.5t-279.5 -279.5t-385.5 -103t-385.5 103t-279.5 279.5t-103 385.5t103 385.5t279.5 279.5t385.5 103t385.5 -103t279.5 -279.5t103 -385.5z" />
+    <glyph glyph-name="uniF2B4" unicode="&#xf2b4;" 
+d="M1536 1024v-839q0 -48 -49 -62q-174 -52 -338 -52q-73 0 -215.5 29.5t-227.5 29.5q-164 0 -370 -48v-338h-160v1368q-63 25 -101 81t-38 124q0 91 64 155t155 64t155 -64t64 -155q0 -68 -38 -124t-101 -81v-68q190 44 343 44q99 0 198 -15q14 -2 111.5 -22.5t149.5 -20.5
+q77 0 165 18q11 2 80 21t89 19q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="uniF2B5" unicode="&#xf2b5;" horiz-adv-x="2304" 
+d="M192 384q40 0 56 32t0 64t-56 32t-56 -32t0 -64t56 -32zM1665 442q-10 13 -38.5 50t-41.5 54t-38 49t-42.5 53t-40.5 47t-45 49l-125 -140q-83 -94 -208.5 -92t-205.5 98q-57 69 -56.5 158t58.5 157l177 206q-22 11 -51 16.5t-47.5 6t-56.5 -0.5t-49 -1q-92 0 -158 -66
+l-158 -158h-155v-544q5 0 21 0.5t22 0t19.5 -2t20.5 -4.5t17.5 -8.5t18.5 -13.5l297 -292q115 -111 227 -111q78 0 125 47q57 -20 112.5 8t72.5 85q74 -6 127 44q20 18 36 45.5t14 50.5q10 -10 43 -10q43 0 77 21t49.5 53t12 71.5t-30.5 73.5zM1824 384h96v512h-93l-157 180
+q-66 76 -169 76h-167q-89 0 -146 -67l-209 -243q-28 -33 -28 -75t27 -75q43 -51 110 -52t111 49l193 218q25 23 53.5 21.5t47 -27t8.5 -56.5q16 -19 56 -63t60 -68q29 -36 82.5 -105.5t64.5 -84.5q52 -66 60 -140zM2112 384q40 0 56 32t0 64t-56 32t-56 -32t0 -64t56 -32z
+M2304 960v-640q0 -26 -19 -45t-45 -19h-434q-27 -65 -82 -106.5t-125 -51.5q-33 -48 -80.5 -81.5t-102.5 -45.5q-42 -53 -104.5 -81.5t-128.5 -24.5q-60 -34 -126 -39.5t-127.5 14t-117 53.5t-103.5 81l-287 282h-358q-26 0 -45 19t-19 45v672q0 26 19 45t45 19h421
+q14 14 47 48t47.5 48t44 40t50.5 37.5t51 25.5t62 19.5t68 5.5h117q99 0 181 -56q82 56 181 56h167q35 0 67 -6t56.5 -14.5t51.5 -26.5t44.5 -31t43 -39.5t39 -42t41 -48t41.5 -48.5h355q26 0 45 -19t19 -45z" />
+    <glyph glyph-name="uniF2B6" unicode="&#xf2b6;" horiz-adv-x="1792" 
+d="M1792 882v-978q0 -66 -47 -113t-113 -47h-1472q-66 0 -113 47t-47 113v978q0 15 11 24q8 7 39 34.5t41.5 36t45.5 37.5t70 55.5t96 73t143.5 107t192.5 140.5q5 4 52.5 40t71.5 52.5t64 35t69 18.5t69 -18.5t65 -35.5t71 -52t52 -40q110 -80 192.5 -140.5t143.5 -107
+t96 -73t70 -55.5t45.5 -37.5t41.5 -36t39 -34.5q11 -9 11 -24zM1228 297q263 191 345 252q11 8 12.5 20.5t-6.5 23.5l-38 52q-8 11 -21 12.5t-24 -6.5q-231 -169 -343 -250q-5 -3 -52 -39t-71.5 -52.5t-64.5 -35t-69 -18.5t-69 18.5t-64.5 35t-71.5 52.5t-52 39
+q-186 134 -343 250q-11 8 -24 6.5t-21 -12.5l-38 -52q-8 -11 -6.5 -23.5t12.5 -20.5q82 -61 345 -252q10 -8 50 -38t65 -47t64 -39.5t77.5 -33.5t75.5 -11t75.5 11t79 34.5t64.5 39.5t65 47.5t48 36.5z" />
+    <glyph glyph-name="uniF2B7" unicode="&#xf2b7;" horiz-adv-x="1792" 
+d="M1474 623l39 -51q8 -11 6.5 -23.5t-11.5 -20.5q-43 -34 -126.5 -98.5t-146.5 -113t-67 -51.5q-39 -32 -60 -48t-60.5 -41t-76.5 -36.5t-74 -11.5h-1h-1q-37 0 -74 11.5t-76 36.5t-61 41.5t-60 47.5q-5 4 -65 50.5t-143.5 111t-122.5 94.5q-11 8 -12.5 20.5t6.5 23.5
+l37 52q8 11 21.5 13t24.5 -7q94 -73 306 -236q5 -4 43.5 -35t60.5 -46.5t56.5 -32.5t58.5 -17h1h1q24 0 58.5 17t56.5 32.5t60.5 46.5t43.5 35q258 198 313 242q11 8 24 6.5t21 -12.5zM1664 -96v928q-90 83 -159 139q-91 74 -389 304q-3 2 -43 35t-61 48t-56 32.5t-59 17.5
+h-1h-1q-24 0 -59 -17.5t-56 -32.5t-61 -48t-43 -35q-215 -166 -315.5 -245.5t-129.5 -104t-82 -74.5q-14 -12 -21 -19v-928q0 -13 9.5 -22.5t22.5 -9.5h1472q13 0 22.5 9.5t9.5 22.5zM1792 832v-928q0 -66 -47 -113t-113 -47h-1472q-66 0 -113 47t-47 113v928q0 56 41 94
+q123 114 350 290.5t233 181.5q36 30 59 47.5t61.5 42t76 36.5t74.5 12h1h1q37 0 74.5 -12t76 -36.5t61.5 -42t59 -47.5q43 -36 156 -122t226 -177t201 -173q41 -38 41 -94z" />
+    <glyph glyph-name="uniF2B8" unicode="&#xf2b8;" 
+d="M330 1l202 -214l-34 236l-216 213zM556 -225l274 218l-11 245l-300 -215zM245 413l227 -213l-48 327l-245 204zM495 189l317 214l-14 324l-352 -200zM843 178l95 -80l-2 239l-103 79q0 -1 1 -8.5t0 -12t-5 -7.5l-78 -52l85 -70q7 -6 7 -88zM138 930l256 -200l-68 465
+l-279 173zM1173 267l15 234l-230 -164l2 -240zM417 722l373 194l-19 441l-423 -163zM1270 357l20 233l-226 142l-2 -105l144 -95q6 -4 4 -9l-7 -119zM1461 496l30 222l-179 -128l-20 -228zM1273 329l-71 49l-8 -117q0 -5 -4 -8l-234 -187q-7 -5 -14 0l-98 83l7 -161
+q0 -5 -4 -8l-293 -234q-4 -2 -6 -2q-8 2 -8 3l-228 242q-4 4 -59 277q-2 7 5 11l61 37q-94 86 -95 92l-72 351q-2 7 6 12l94 45q-133 100 -135 108l-96 466q-2 10 7 13l433 135q5 0 8 -1l317 -153q6 -4 6 -9l20 -463q0 -7 -6 -10l-118 -61l126 -85q5 -2 5 -8l5 -123l121 74
+q5 4 11 0l84 -56l3 110q0 6 5 9l206 126q6 3 11 0l245 -135q4 -4 5 -7t-6.5 -60t-17.5 -124.5t-10 -70.5q0 -5 -4 -7l-191 -153q-6 -5 -13 0z" />
+    <glyph glyph-name="uniF2B9" unicode="&#xf2b9;" horiz-adv-x="1664" 
+d="M1201 298q0 57 -5.5 107t-21 100.5t-39.5 86t-64 58t-91 22.5q-6 -4 -33.5 -20.5t-42.5 -24.5t-40.5 -20t-49 -17t-46.5 -5t-46.5 5t-49 17t-40.5 20t-42.5 24.5t-33.5 20.5q-51 0 -91 -22.5t-64 -58t-39.5 -86t-21 -100.5t-5.5 -107q0 -73 42 -121.5t103 -48.5h576
+q61 0 103 48.5t42 121.5zM1028 892q0 108 -76.5 184t-183.5 76t-183.5 -76t-76.5 -184q0 -107 76.5 -183t183.5 -76t183.5 76t76.5 183zM1664 352v-192q0 -14 -9 -23t-23 -9h-96v-224q0 -66 -47 -113t-113 -47h-1216q-66 0 -113 47t-47 113v1472q0 66 47 113t113 47h1216
+q66 0 113 -47t47 -113v-224h96q14 0 23 -9t9 -23v-192q0 -14 -9 -23t-23 -9h-96v-128h96q14 0 23 -9t9 -23v-192q0 -14 -9 -23t-23 -9h-96v-128h96q14 0 23 -9t9 -23z" />
+    <glyph glyph-name="uniF2BA" unicode="&#xf2ba;" horiz-adv-x="1664" 
+d="M1028 892q0 -107 -76.5 -183t-183.5 -76t-183.5 76t-76.5 183q0 108 76.5 184t183.5 76t183.5 -76t76.5 -184zM980 672q46 0 82.5 -17t60 -47.5t39.5 -67t24 -81t11.5 -82.5t3.5 -79q0 -67 -39.5 -118.5t-105.5 -51.5h-576q-66 0 -105.5 51.5t-39.5 118.5q0 48 4.5 93.5
+t18.5 98.5t36.5 91.5t63 64.5t93.5 26h5q7 -4 32 -19.5t35.5 -21t33 -17t37 -16t35 -9t39.5 -4.5t39.5 4.5t35 9t37 16t33 17t35.5 21t32 19.5zM1664 928q0 -13 -9.5 -22.5t-22.5 -9.5h-96v-128h96q13 0 22.5 -9.5t9.5 -22.5v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-96v-128h96
+q13 0 22.5 -9.5t9.5 -22.5v-192q0 -13 -9.5 -22.5t-22.5 -9.5h-96v-224q0 -66 -47 -113t-113 -47h-1216q-66 0 -113 47t-47 113v1472q0 66 47 113t113 47h1216q66 0 113 -47t47 -113v-224h96q13 0 22.5 -9.5t9.5 -22.5v-192zM1408 -96v1472q0 13 -9.5 22.5t-22.5 9.5h-1216
+q-13 0 -22.5 -9.5t-9.5 -22.5v-1472q0 -13 9.5 -22.5t22.5 -9.5h1216q13 0 22.5 9.5t9.5 22.5z" />
+    <glyph glyph-name="uniF2BB" unicode="&#xf2bb;" horiz-adv-x="2048" 
+d="M1024 405q0 64 -9 117.5t-29.5 103t-60.5 78t-97 28.5q-6 -4 -30 -18t-37.5 -21.5t-35.5 -17.5t-43 -14.5t-42 -4.5t-42 4.5t-43 14.5t-35.5 17.5t-37.5 21.5t-30 18q-57 0 -97 -28.5t-60.5 -78t-29.5 -103t-9 -117.5t37 -106.5t91 -42.5h512q54 0 91 42.5t37 106.5z
+M867 925q0 94 -66.5 160.5t-160.5 66.5t-160.5 -66.5t-66.5 -160.5t66.5 -160.5t160.5 -66.5t160.5 66.5t66.5 160.5zM1792 416v64q0 14 -9 23t-23 9h-576q-14 0 -23 -9t-9 -23v-64q0 -14 9 -23t23 -9h576q14 0 23 9t9 23zM1792 676v56q0 15 -10.5 25.5t-25.5 10.5h-568
+q-15 0 -25.5 -10.5t-10.5 -25.5v-56q0 -15 10.5 -25.5t25.5 -10.5h568q15 0 25.5 10.5t10.5 25.5zM1792 928v64q0 14 -9 23t-23 9h-576q-14 0 -23 -9t-9 -23v-64q0 -14 9 -23t23 -9h576q14 0 23 9t9 23zM2048 1248v-1216q0 -66 -47 -113t-113 -47h-352v96q0 14 -9 23t-23 9
+h-64q-14 0 -23 -9t-9 -23v-96h-768v96q0 14 -9 23t-23 9h-64q-14 0 -23 -9t-9 -23v-96h-352q-66 0 -113 47t-47 113v1216q0 66 47 113t113 47h1728q66 0 113 -47t47 -113z" />
+    <glyph glyph-name="uniF2BC" unicode="&#xf2bc;" horiz-adv-x="2048" 
+d="M1024 405q0 -64 -37 -106.5t-91 -42.5h-512q-54 0 -91 42.5t-37 106.5t9 117.5t29.5 103t60.5 78t97 28.5q6 -4 30 -18t37.5 -21.5t35.5 -17.5t43 -14.5t42 -4.5t42 4.5t43 14.5t35.5 17.5t37.5 21.5t30 18q57 0 97 -28.5t60.5 -78t29.5 -103t9 -117.5zM867 925
+q0 -94 -66.5 -160.5t-160.5 -66.5t-160.5 66.5t-66.5 160.5t66.5 160.5t160.5 66.5t160.5 -66.5t66.5 -160.5zM1792 480v-64q0 -14 -9 -23t-23 -9h-576q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h576q14 0 23 -9t9 -23zM1792 732v-56q0 -15 -10.5 -25.5t-25.5 -10.5h-568
+q-15 0 -25.5 10.5t-10.5 25.5v56q0 15 10.5 25.5t25.5 10.5h568q15 0 25.5 -10.5t10.5 -25.5zM1792 992v-64q0 -14 -9 -23t-23 -9h-576q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h576q14 0 23 -9t9 -23zM1920 32v1216q0 13 -9.5 22.5t-22.5 9.5h-1728q-13 0 -22.5 -9.5
+t-9.5 -22.5v-1216q0 -13 9.5 -22.5t22.5 -9.5h352v96q0 14 9 23t23 9h64q14 0 23 -9t9 -23v-96h768v96q0 14 9 23t23 9h64q14 0 23 -9t9 -23v-96h352q13 0 22.5 9.5t9.5 22.5zM2048 1248v-1216q0 -66 -47 -113t-113 -47h-1728q-66 0 -113 47t-47 113v1216q0 66 47 113
+t113 47h1728q66 0 113 -47t47 -113z" />
+    <glyph glyph-name="uniF2BD" unicode="&#xf2bd;" horiz-adv-x="1792" 
+d="M1523 197q-22 155 -87.5 257.5t-184.5 118.5q-67 -74 -159.5 -115.5t-195.5 -41.5t-195.5 41.5t-159.5 115.5q-119 -16 -184.5 -118.5t-87.5 -257.5q106 -150 271 -237.5t356 -87.5t356 87.5t271 237.5zM1280 896q0 159 -112.5 271.5t-271.5 112.5t-271.5 -112.5
+t-112.5 -271.5t112.5 -271.5t271.5 -112.5t271.5 112.5t112.5 271.5zM1792 640q0 -182 -71 -347.5t-190.5 -286t-285.5 -191.5t-349 -71q-182 0 -348 71t-286 191t-191 286t-71 348t71 348t191 286t286 191t348 71t348 -71t286 -191t191 -286t71 -348z" />
+    <glyph glyph-name="uniF2BE" unicode="&#xf2be;" horiz-adv-x="1792" 
+d="M896 1536q182 0 348 -71t286 -191t191 -286t71 -348q0 -181 -70.5 -347t-190.5 -286t-286 -191.5t-349 -71.5t-349 71t-285.5 191.5t-190.5 286t-71 347.5t71 348t191 286t286 191t348 71zM1515 185q149 205 149 455q0 156 -61 298t-164 245t-245 164t-298 61t-298 -61
+t-245 -164t-164 -245t-61 -298q0 -250 149 -455q66 327 306 327q131 -128 313 -128t313 128q240 0 306 -327zM1280 832q0 159 -112.5 271.5t-271.5 112.5t-271.5 -112.5t-112.5 -271.5t112.5 -271.5t271.5 -112.5t271.5 112.5t112.5 271.5z" />
+    <glyph glyph-name="uniF2C0" unicode="&#xf2c0;" 
+d="M1201 752q47 -14 89.5 -38t89 -73t79.5 -115.5t55 -172t22 -236.5q0 -154 -100 -263.5t-241 -109.5h-854q-141 0 -241 109.5t-100 263.5q0 131 22 236.5t55 172t79.5 115.5t89 73t89.5 38q-79 125 -79 272q0 104 40.5 198.5t109.5 163.5t163.5 109.5t198.5 40.5
+t198.5 -40.5t163.5 -109.5t109.5 -163.5t40.5 -198.5q0 -147 -79 -272zM768 1408q-159 0 -271.5 -112.5t-112.5 -271.5t112.5 -271.5t271.5 -112.5t271.5 112.5t112.5 271.5t-112.5 271.5t-271.5 112.5zM1195 -128q88 0 150.5 71.5t62.5 173.5q0 239 -78.5 377t-225.5 145
+q-145 -127 -336 -127t-336 127q-147 -7 -225.5 -145t-78.5 -377q0 -102 62.5 -173.5t150.5 -71.5h854z" />
+    <glyph glyph-name="uniF2C1" unicode="&#xf2c1;" horiz-adv-x="1280" 
+d="M1024 278q0 -64 -37 -107t-91 -43h-512q-54 0 -91 43t-37 107t9 118t29.5 104t61 78.5t96.5 28.5q80 -75 188 -75t188 75q56 0 96.5 -28.5t61 -78.5t29.5 -104t9 -118zM870 797q0 -94 -67.5 -160.5t-162.5 -66.5t-162.5 66.5t-67.5 160.5t67.5 160.5t162.5 66.5
+t162.5 -66.5t67.5 -160.5zM1152 -96v1376h-1024v-1376q0 -13 9.5 -22.5t22.5 -9.5h960q13 0 22.5 9.5t9.5 22.5zM1280 1376v-1472q0 -66 -47 -113t-113 -47h-960q-66 0 -113 47t-47 113v1472q0 66 47 113t113 47h352v-96q0 -14 9 -23t23 -9h192q14 0 23 9t9 23v96h352
+q66 0 113 -47t47 -113z" />
+    <glyph glyph-name="uniF2C2" unicode="&#xf2c2;" horiz-adv-x="2048" 
+d="M896 324q0 54 -7.5 100.5t-24.5 90t-51 68.5t-81 25q-64 -64 -156 -64t-156 64q-47 0 -81 -25t-51 -68.5t-24.5 -90t-7.5 -100.5q0 -55 31.5 -93.5t75.5 -38.5h426q44 0 75.5 38.5t31.5 93.5zM768 768q0 80 -56 136t-136 56t-136 -56t-56 -136t56 -136t136 -56t136 56
+t56 136zM1792 288v64q0 14 -9 23t-23 9h-704q-14 0 -23 -9t-9 -23v-64q0 -14 9 -23t23 -9h704q14 0 23 9t9 23zM1408 544v64q0 14 -9 23t-23 9h-320q-14 0 -23 -9t-9 -23v-64q0 -14 9 -23t23 -9h320q14 0 23 9t9 23zM1792 544v64q0 14 -9 23t-23 9h-192q-14 0 -23 -9t-9 -23
+v-64q0 -14 9 -23t23 -9h192q14 0 23 9t9 23zM1792 800v64q0 14 -9 23t-23 9h-704q-14 0 -23 -9t-9 -23v-64q0 -14 9 -23t23 -9h704q14 0 23 9t9 23zM128 1152h1792v96q0 14 -9 23t-23 9h-1728q-14 0 -23 -9t-9 -23v-96zM2048 1248v-1216q0 -66 -47 -113t-113 -47h-1728
+q-66 0 -113 47t-47 113v1216q0 66 47 113t113 47h1728q66 0 113 -47t47 -113z" />
+    <glyph glyph-name="uniF2C3" unicode="&#xf2c3;" horiz-adv-x="2048" 
+d="M896 324q0 -55 -31.5 -93.5t-75.5 -38.5h-426q-44 0 -75.5 38.5t-31.5 93.5q0 54 7.5 100.5t24.5 90t51 68.5t81 25q64 -64 156 -64t156 64q47 0 81 -25t51 -68.5t24.5 -90t7.5 -100.5zM768 768q0 -80 -56 -136t-136 -56t-136 56t-56 136t56 136t136 56t136 -56t56 -136z
+M1792 352v-64q0 -14 -9 -23t-23 -9h-704q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h704q14 0 23 -9t9 -23zM1408 608v-64q0 -14 -9 -23t-23 -9h-320q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h320q14 0 23 -9t9 -23zM1792 608v-64q0 -14 -9 -23t-23 -9h-192q-14 0 -23 9t-9 23v64
+q0 14 9 23t23 9h192q14 0 23 -9t9 -23zM1792 864v-64q0 -14 -9 -23t-23 -9h-704q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h704q14 0 23 -9t9 -23zM1920 32v1120h-1792v-1120q0 -13 9.5 -22.5t22.5 -9.5h1728q13 0 22.5 9.5t9.5 22.5zM2048 1248v-1216q0 -66 -47 -113t-113 -47
+h-1728q-66 0 -113 47t-47 113v1216q0 66 47 113t113 47h1728q66 0 113 -47t47 -113z" />
+    <glyph glyph-name="uniF2C4" unicode="&#xf2c4;" horiz-adv-x="1792" 
+d="M1255 749q0 318 -105 474.5t-330 156.5q-222 0 -326 -157t-104 -474q0 -316 104 -471.5t326 -155.5q74 0 131 17q-22 43 -39 73t-44 65t-53.5 56.5t-63 36t-77.5 14.5q-46 0 -79 -16l-49 97q105 91 276 91q132 0 215.5 -54t150.5 -155q67 149 67 402zM1645 117h117
+q3 -27 -2 -67t-26.5 -95t-58 -100.5t-107 -78t-162.5 -32.5q-71 0 -130.5 19t-105.5 56t-79 78t-66 96q-97 -27 -205 -27q-150 0 -292.5 58t-253 158.5t-178 249t-67.5 317.5q0 170 67.5 319.5t178.5 250.5t253.5 159t291.5 58q121 0 238.5 -36t217 -106t176 -164.5
+t119.5 -219t43 -261.5q0 -190 -80.5 -347.5t-218.5 -264.5q47 -70 93.5 -106.5t104.5 -36.5q61 0 94 37.5t38 85.5z" />
+    <glyph glyph-name="uniF2C5" unicode="&#xf2c5;" horiz-adv-x="2304" 
+d="M453 -101q0 -21 -16 -37.5t-37 -16.5q-1 0 -13 3q-63 15 -162 140q-225 284 -225 676q0 341 213 614q39 51 95 103.5t94 52.5q19 0 35 -13.5t16 -32.5q0 -27 -63 -90q-98 -102 -147 -184q-119 -199 -119 -449q0 -281 123 -491q50 -85 136 -173q2 -3 14.5 -16t19.5 -21
+t17 -20.5t14.5 -23.5t4.5 -21zM1796 33q0 -29 -17.5 -48.5t-46.5 -19.5h-1081q-26 0 -45 19t-19 45q0 29 17.5 48.5t46.5 19.5h1081q26 0 45 -19t19 -45zM1581 644q0 -134 -67 -233q-25 -38 -69.5 -78.5t-83.5 -60.5q-16 -10 -27 -10q-7 0 -15 6t-8 12q0 9 19 30t42 46
+t42 67.5t19 88.5q0 76 -35 130q-29 42 -46 42q-3 0 -3 -5q0 -12 7.5 -35.5t7.5 -36.5q0 -22 -21.5 -35t-44.5 -13q-66 0 -66 76q0 15 1.5 44t1.5 44q0 25 -10 46q-13 25 -42 53.5t-51 28.5q-5 0 -7 -0.5t-3.5 -2.5t-1.5 -6q0 -2 16 -26t16 -54q0 -37 -19 -68t-46 -54
+t-53.5 -46t-45.5 -54t-19 -68q0 -98 42 -160q29 -43 79 -63q16 -5 17 -10q1 -2 1 -5q0 -16 -18 -16q-6 0 -33 11q-119 43 -195 139.5t-76 218.5q0 55 24.5 115.5t60 115t70.5 108.5t59.5 113.5t24.5 111.5q0 53 -25 94q-29 48 -56 64q-19 9 -19 21q0 20 41 20q50 0 110 -29
+q41 -19 71 -44.5t49.5 -51t33.5 -62.5t22 -69t16 -80q0 -1 3 -17.5t4.5 -25t5.5 -25t9 -27t11 -21.5t14.5 -16.5t18.5 -5.5q23 0 37 14t14 37q0 25 -20 67t-20 52t10 10q27 0 93 -70q72 -76 102.5 -156t30.5 -186zM2304 615q0 -274 -138 -503q-19 -32 -48 -72t-68 -86.5
+t-81 -77t-74 -30.5q-16 0 -31 15.5t-15 31.5q0 15 29 50.5t68.5 77t48.5 52.5q183 230 183 531q0 131 -20.5 235t-72.5 211q-58 119 -163 228q-2 3 -13 13.5t-16.5 16.5t-15 17.5t-15 20t-9.5 18.5t-4 19q0 19 16 35.5t35 16.5q70 0 196 -169q98 -131 146 -273t60 -314
+q2 -42 2 -64z" />
+    <glyph glyph-name="uniF2C6" unicode="&#xf2c6;" horiz-adv-x="1792" 
+d="M1189 229l147 693q9 44 -10.5 63t-51.5 7l-864 -333q-29 -11 -39.5 -25t-2.5 -26.5t32 -19.5l221 -69l513 323q21 14 32 6q7 -5 -4 -15l-415 -375v0v0l-16 -228q23 0 45 22l108 104l224 -165q64 -36 81 38zM1792 640q0 -182 -71 -348t-191 -286t-286 -191t-348 -71
+t-348 71t-286 191t-191 286t-71 348t71 348t191 286t286 191t348 71t348 -71t286 -191t191 -286t71 -348z" />
+    <glyph glyph-name="uniF2C7" unicode="&#xf2c7;" horiz-adv-x="1024" 
+d="M640 192q0 -80 -56 -136t-136 -56t-136 56t-56 136q0 60 35 110t93 71v907h128v-907q58 -21 93 -71t35 -110zM768 192q0 77 -34 144t-94 112v768q0 80 -56 136t-136 56t-136 -56t-56 -136v-768q-60 -45 -94 -112t-34 -144q0 -133 93.5 -226.5t226.5 -93.5t226.5 93.5
+t93.5 226.5zM896 192q0 -185 -131.5 -316.5t-316.5 -131.5t-316.5 131.5t-131.5 316.5q0 182 128 313v711q0 133 93.5 226.5t226.5 93.5t226.5 -93.5t93.5 -226.5v-711q128 -131 128 -313zM1024 768v-128h-192v128h192zM1024 1024v-128h-192v128h192zM1024 1280v-128h-192
+v128h192z" />
+    <glyph glyph-name="uniF2C8" unicode="&#xf2c8;" horiz-adv-x="1024" 
+d="M640 192q0 -80 -56 -136t-136 -56t-136 56t-56 136q0 60 35 110t93 71v651h128v-651q58 -21 93 -71t35 -110zM768 192q0 77 -34 144t-94 112v768q0 80 -56 136t-136 56t-136 -56t-56 -136v-768q-60 -45 -94 -112t-34 -144q0 -133 93.5 -226.5t226.5 -93.5t226.5 93.5
+t93.5 226.5zM896 192q0 -185 -131.5 -316.5t-316.5 -131.5t-316.5 131.5t-131.5 316.5q0 182 128 313v711q0 133 93.5 226.5t226.5 93.5t226.5 -93.5t93.5 -226.5v-711q128 -131 128 -313zM1024 768v-128h-192v128h192zM1024 1024v-128h-192v128h192zM1024 1280v-128h-192
+v128h192z" />
+    <glyph glyph-name="uniF2C9" unicode="&#xf2c9;" horiz-adv-x="1024" 
+d="M640 192q0 -80 -56 -136t-136 -56t-136 56t-56 136q0 60 35 110t93 71v395h128v-395q58 -21 93 -71t35 -110zM768 192q0 77 -34 144t-94 112v768q0 80 -56 136t-136 56t-136 -56t-56 -136v-768q-60 -45 -94 -112t-34 -144q0 -133 93.5 -226.5t226.5 -93.5t226.5 93.5
+t93.5 226.5zM896 192q0 -185 -131.5 -316.5t-316.5 -131.5t-316.5 131.5t-131.5 316.5q0 182 128 313v711q0 133 93.5 226.5t226.5 93.5t226.5 -93.5t93.5 -226.5v-711q128 -131 128 -313zM1024 768v-128h-192v128h192zM1024 1024v-128h-192v128h192zM1024 1280v-128h-192
+v128h192z" />
+    <glyph glyph-name="uniF2CA" unicode="&#xf2ca;" horiz-adv-x="1024" 
+d="M640 192q0 -80 -56 -136t-136 -56t-136 56t-56 136q0 60 35 110t93 71v139h128v-139q58 -21 93 -71t35 -110zM768 192q0 77 -34 144t-94 112v768q0 80 -56 136t-136 56t-136 -56t-56 -136v-768q-60 -45 -94 -112t-34 -144q0 -133 93.5 -226.5t226.5 -93.5t226.5 93.5
+t93.5 226.5zM896 192q0 -185 -131.5 -316.5t-316.5 -131.5t-316.5 131.5t-131.5 316.5q0 182 128 313v711q0 133 93.5 226.5t226.5 93.5t226.5 -93.5t93.5 -226.5v-711q128 -131 128 -313zM1024 768v-128h-192v128h192zM1024 1024v-128h-192v128h192zM1024 1280v-128h-192
+v128h192z" />
+    <glyph glyph-name="uniF2CB" unicode="&#xf2cb;" horiz-adv-x="1024" 
+d="M640 192q0 -80 -56 -136t-136 -56t-136 56t-56 136q0 79 56 135.5t136 56.5t136 -56.5t56 -135.5zM768 192q0 77 -34 144t-94 112v768q0 80 -56 136t-136 56t-136 -56t-56 -136v-768q-60 -45 -94 -112t-34 -144q0 -133 93.5 -226.5t226.5 -93.5t226.5 93.5t93.5 226.5z
+M896 192q0 -185 -131.5 -316.5t-316.5 -131.5t-316.5 131.5t-131.5 316.5q0 182 128 313v711q0 133 93.5 226.5t226.5 93.5t226.5 -93.5t93.5 -226.5v-711q128 -131 128 -313zM1024 768v-128h-192v128h192zM1024 1024v-128h-192v128h192zM1024 1280v-128h-192v128h192z" />
+    <glyph glyph-name="uniF2CC" unicode="&#xf2cc;" horiz-adv-x="1920" 
+d="M1433 1287q10 -10 10 -23t-10 -23l-626 -626q-10 -10 -23 -10t-23 10l-82 82q-10 10 -10 23t10 23l44 44q-72 91 -81.5 207t46.5 215q-74 71 -176 71q-106 0 -181 -75t-75 -181v-1280h-256v1280q0 104 40.5 198.5t109.5 163.5t163.5 109.5t198.5 40.5q106 0 201 -41
+t166 -115q94 39 197 24.5t185 -79.5l44 44q10 10 23 10t23 -10zM1344 1024q26 0 45 -19t19 -45t-19 -45t-45 -19t-45 19t-19 45t19 45t45 19zM1600 896q-26 0 -45 19t-19 45t19 45t45 19t45 -19t19 -45t-19 -45t-45 -19zM1856 1024q26 0 45 -19t19 -45t-19 -45t-45 -19
+t-45 19t-19 45t19 45t45 19zM1216 896q26 0 45 -19t19 -45t-19 -45t-45 -19t-45 19t-19 45t19 45t45 19zM1408 832q0 26 19 45t45 19t45 -19t19 -45t-19 -45t-45 -19t-45 19t-19 45zM1728 896q26 0 45 -19t19 -45t-19 -45t-45 -19t-45 19t-19 45t19 45t45 19zM1088 768
+q26 0 45 -19t19 -45t-19 -45t-45 -19t-45 19t-19 45t19 45t45 19zM1344 640q-26 0 -45 19t-19 45t19 45t45 19t45 -19t19 -45t-19 -45t-45 -19zM1600 768q26 0 45 -19t19 -45t-19 -45t-45 -19t-45 19t-19 45t19 45t45 19zM1216 512q-26 0 -45 19t-19 45t19 45t45 19t45 -19
+t19 -45t-19 -45t-45 -19zM1472 640q26 0 45 -19t19 -45t-19 -45t-45 -19t-45 19t-19 45t19 45t45 19zM1088 512q26 0 45 -19t19 -45t-19 -45t-45 -19t-45 19t-19 45t19 45t45 19zM1344 512q26 0 45 -19t19 -45t-19 -45t-45 -19t-45 19t-19 45t19 45t45 19zM1216 384
+q26 0 45 -19t19 -45t-19 -45t-45 -19t-45 19t-19 45t19 45t45 19zM1088 256q26 0 45 -19t19 -45t-19 -45t-45 -19t-45 19t-19 45t19 45t45 19z" />
+    <glyph glyph-name="uniF2CD" unicode="&#xf2cd;" horiz-adv-x="1792" 
+d="M1664 448v-192q0 -169 -128 -286v-194q0 -14 -9 -23t-23 -9h-64q-14 0 -23 9t-9 23v118q-63 -22 -128 -22h-768q-65 0 -128 22v-110q0 -17 -9.5 -28.5t-22.5 -11.5h-64q-13 0 -22.5 11.5t-9.5 28.5v186q-128 117 -128 286v192h1536zM704 864q0 -14 -9 -23t-23 -9t-23 9
+t-9 23t9 23t23 9t23 -9t9 -23zM768 928q0 -14 -9 -23t-23 -9t-23 9t-9 23t9 23t23 9t23 -9t9 -23zM704 992q0 -14 -9 -23t-23 -9t-23 9t-9 23t9 23t23 9t23 -9t9 -23zM832 992q0 -14 -9 -23t-23 -9t-23 9t-9 23t9 23t23 9t23 -9t9 -23zM768 1056q0 -14 -9 -23t-23 -9t-23 9
+t-9 23t9 23t23 9t23 -9t9 -23zM704 1120q0 -14 -9 -23t-23 -9t-23 9t-9 23t9 23t23 9t23 -9t9 -23zM1792 608v-64q0 -14 -9 -23t-23 -9h-1728q-14 0 -23 9t-9 23v64q0 14 9 23t23 9h96v640q0 106 75 181t181 75q108 0 184 -78q46 19 98 12t93 -39l22 22q11 11 22 0l42 -42
+q11 -11 0 -22l-314 -314q-11 -11 -22 0l-42 42q-11 11 0 22l22 22q-36 46 -40.5 104t23.5 108q-37 35 -88 35q-53 0 -90.5 -37.5t-37.5 -90.5v-640h1504q14 0 23 -9t9 -23zM896 1056q0 -14 -9 -23t-23 -9t-23 9t-9 23t9 23t23 9t23 -9t9 -23zM832 1120q0 -14 -9 -23t-23 -9
+t-23 9t-9 23t9 23t23 9t23 -9t9 -23zM768 1184q0 -14 -9 -23t-23 -9t-23 9t-9 23t9 23t23 9t23 -9t9 -23zM960 1120q0 -14 -9 -23t-23 -9t-23 9t-9 23t9 23t23 9t23 -9t9 -23zM896 1184q0 -14 -9 -23t-23 -9t-23 9t-9 23t9 23t23 9t23 -9t9 -23zM832 1248q0 -14 -9 -23
+t-23 -9t-23 9t-9 23t9 23t23 9t23 -9t9 -23zM1024 1184q0 -14 -9 -23t-23 -9t-23 9t-9 23t9 23t23 9t23 -9t9 -23zM960 1248q0 -14 -9 -23t-23 -9t-23 9t-9 23t9 23t23 9t23 -9t9 -23zM1088 1248q0 -14 -9 -23t-23 -9t-23 9t-9 23t9 23t23 9t23 -9t9 -23z" />
+    <glyph glyph-name="uniF2CE" unicode="&#xf2ce;" 
+d="M994 344q0 -86 -17 -197q-31 -215 -55 -313q-22 -90 -152 -90t-152 90q-24 98 -55 313q-17 110 -17 197q0 168 224 168t224 -168zM1536 768q0 -240 -134 -434t-350 -280q-8 -3 -15 3t-6 15q7 48 10 66q4 32 6 47q1 9 9 12q159 81 255.5 234t96.5 337q0 180 -91 330.5
+t-247 234.5t-337 74q-124 -7 -237 -61t-193.5 -140.5t-128 -202t-46.5 -240.5q1 -184 99 -336.5t257 -231.5q7 -3 9 -12q3 -21 6 -45q1 -9 5 -32.5t6 -35.5q1 -9 -6.5 -15t-15.5 -2q-148 58 -261 169.5t-173.5 264t-52.5 319.5q7 143 66 273.5t154.5 227t225 157.5t272.5 70
+q164 10 315.5 -46.5t261 -160.5t175 -250.5t65.5 -308.5zM994 800q0 -93 -65.5 -158.5t-158.5 -65.5t-158.5 65.5t-65.5 158.5t65.5 158.5t158.5 65.5t158.5 -65.5t65.5 -158.5zM1282 768q0 -122 -53.5 -228.5t-146.5 -177.5q-8 -6 -16 -2t-10 14q-6 52 -29 92q-7 10 3 20
+q58 54 91 127t33 155q0 111 -58.5 204t-157.5 141.5t-212 36.5q-133 -15 -229 -113t-109 -231q-10 -92 23.5 -176t98.5 -144q10 -10 3 -20q-24 -41 -29 -93q-2 -9 -10 -13t-16 2q-95 74 -148.5 183t-51.5 234q3 131 69 244t177 181.5t241 74.5q144 7 268 -60t196.5 -187.5
+t72.5 -263.5z" />
+    <glyph glyph-name="uniF2D0" unicode="&#xf2d0;" horiz-adv-x="1792" 
+d="M256 128h1280v768h-1280v-768zM1792 1248v-1216q0 -66 -47 -113t-113 -47h-1472q-66 0 -113 47t-47 113v1216q0 66 47 113t113 47h1472q66 0 113 -47t47 -113z" />
+    <glyph glyph-name="uniF2D1" unicode="&#xf2d1;" horiz-adv-x="1792" 
+d="M1792 224v-192q0 -66 -47 -113t-113 -47h-1472q-66 0 -113 47t-47 113v192q0 66 47 113t113 47h1472q66 0 113 -47t47 -113z" />
+    <glyph glyph-name="uniF2D2" unicode="&#xf2d2;" horiz-adv-x="2048" 
+d="M256 0h768v512h-768v-512zM1280 512h512v768h-768v-256h96q66 0 113 -47t47 -113v-352zM2048 1376v-960q0 -66 -47 -113t-113 -47h-608v-352q0 -66 -47 -113t-113 -47h-960q-66 0 -113 47t-47 113v960q0 66 47 113t113 47h608v352q0 66 47 113t113 47h960q66 0 113 -47
+t47 -113z" />
+    <glyph glyph-name="uniF2D3" unicode="&#xf2d3;" horiz-adv-x="1792" 
+d="M1175 215l146 146q10 10 10 23t-10 23l-233 233l233 233q10 10 10 23t-10 23l-146 146q-10 10 -23 10t-23 -10l-233 -233l-233 233q-10 10 -23 10t-23 -10l-146 -146q-10 -10 -10 -23t10 -23l233 -233l-233 -233q-10 -10 -10 -23t10 -23l146 -146q10 -10 23 -10t23 10
+l233 233l233 -233q10 -10 23 -10t23 10zM1792 1248v-1216q0 -66 -47 -113t-113 -47h-1472q-66 0 -113 47t-47 113v1216q0 66 47 113t113 47h1472q66 0 113 -47t47 -113z" />
+    <glyph glyph-name="uniF2D4" unicode="&#xf2d4;" horiz-adv-x="1792" 
+d="M1257 425l-146 -146q-10 -10 -23 -10t-23 10l-169 169l-169 -169q-10 -10 -23 -10t-23 10l-146 146q-10 10 -10 23t10 23l169 169l-169 169q-10 10 -10 23t10 23l146 146q10 10 23 10t23 -10l169 -169l169 169q10 10 23 10t23 -10l146 -146q10 -10 10 -23t-10 -23
+l-169 -169l169 -169q10 -10 10 -23t-10 -23zM256 128h1280v1024h-1280v-1024zM1792 1248v-1216q0 -66 -47 -113t-113 -47h-1472q-66 0 -113 47t-47 113v1216q0 66 47 113t113 47h1472q66 0 113 -47t47 -113z" />
+    <glyph glyph-name="uniF2D5" unicode="&#xf2d5;" horiz-adv-x="1792" 
+d="M1070 358l306 564h-654l-306 -564h654zM1792 640q0 -182 -71 -348t-191 -286t-286 -191t-348 -71t-348 71t-286 191t-191 286t-71 348t71 348t191 286t286 191t348 71t348 -71t286 -191t191 -286t71 -348z" />
+    <glyph glyph-name="uniF2D6" unicode="&#xf2d6;" horiz-adv-x="1794" 
+d="M1291 1060q-15 17 -35 8.5t-26 -28.5t5 -38q14 -17 40 -14.5t34 20.5t-18 52zM895 814q-8 -8 -19.5 -8t-18.5 8q-8 8 -8 19t8 18q7 8 18.5 8t19.5 -8q7 -7 7 -18t-7 -19zM1060 740l-35 -35q-12 -13 -29.5 -13t-30.5 13l-38 38q-12 13 -12 30t12 30l35 35q12 12 29.5 12
+t30.5 -12l38 -39q12 -12 12 -29.5t-12 -29.5zM951 870q-7 -8 -18.5 -8t-19.5 8q-7 8 -7 19t7 19q8 8 19 8t19 -8t8 -19t-8 -19zM1354 968q-34 -64 -107.5 -85.5t-127.5 16.5q-38 28 -61 66.5t-21 87.5t39 92t75.5 53t70.5 -5t70 -51q2 -2 13 -12.5t14.5 -13.5t13 -13.5
+t12.5 -15.5t10 -15.5t8.5 -18t4 -18.5t1 -21t-5 -22t-9.5 -24zM1555 486q3 20 -8.5 34.5t-27.5 21.5t-33 17t-23 20q-40 71 -84 98.5t-113 11.5q19 13 40 18.5t33 4.5l12 -1q2 45 -34 90q6 20 6.5 40.5t-2.5 30.5l-3 10q43 24 71 65t34 91q10 84 -43 150.5t-137 76.5
+q-60 7 -114 -18.5t-82 -74.5q-30 -51 -33.5 -101t14.5 -87t43.5 -64t56.5 -42q-45 4 -88 36t-57 88q-28 108 32 222q-16 21 -29 32q-50 0 -89 -19q19 24 42 37t36 14l13 1q0 50 -13 78q-10 21 -32.5 28.5t-47 -3.5t-37.5 -40q2 4 4 7q-7 -28 -6.5 -75.5t19 -117t48.5 -122.5
+q-25 -14 -47 -36q-35 -16 -85.5 -70.5t-84.5 -101.5l-33 -46q-90 -34 -181 -125.5t-75 -162.5q1 -16 11 -27q-15 -12 -30 -30q-21 -25 -21 -54t21.5 -40t63.5 6q41 19 77 49.5t55 60.5q-2 2 -6.5 5t-20.5 7.5t-33 3.5q23 5 51 12.5t40 10t27.5 6t26 4t23.5 0.5q14 -7 22 34
+q7 37 7 90q0 102 -40 150q106 -103 101 -219q-1 -29 -15 -50t-27 -27l-13 -6q-4 -7 -19 -32t-26 -45.5t-26.5 -52t-25 -61t-17 -63t-6.5 -66.5t10 -63q-35 54 -37 80q-22 -24 -34.5 -39t-33.5 -42t-30.5 -46t-16.5 -41t-0.5 -38t25.5 -27q45 -25 144 64t190.5 221.5
+t122.5 228.5q86 52 145 115.5t86 119.5q47 -93 154 -178q104 -83 167 -80q39 2 46 43zM1794 640q0 -182 -71 -348t-191 -286t-286.5 -191t-348.5 -71t-348.5 71t-286.5 191t-191 286t-71 348t71 348t191 286t286.5 191t348.5 71t348.5 -71t286.5 -191t191 -286t71 -348z" />
+    <glyph glyph-name="uniF2D7" unicode="&#xf2d7;" 
+d="M518 1353v-655q103 -1 191.5 1.5t125.5 5.5l37 3q68 2 90.5 24.5t39.5 94.5l33 142h103l-14 -322l7 -319h-103l-29 127q-15 68 -45 93t-84 26q-87 8 -352 8v-556q0 -78 43.5 -115.5t133.5 -37.5h357q35 0 59.5 2t55 7.5t54 18t48.5 32t46 50.5t39 73l93 216h89
+q-6 -37 -31.5 -252t-30.5 -276q-146 5 -263.5 8t-162.5 4h-44h-628l-376 -12v102l127 25q67 13 91.5 37t25.5 79l8 643q3 402 -8 645q-2 61 -25.5 84t-91.5 36l-127 24v102l376 -12h702q139 0 374 27q-6 -68 -14 -194.5t-12 -219.5l-5 -92h-93l-32 124q-31 121 -74 179.5
+t-113 58.5h-548q-28 0 -35.5 -8.5t-7.5 -30.5z" />
+    <glyph glyph-name="uniF2D8" unicode="&#xf2d8;" 
+d="M922 739v-182q0 -4 0.5 -15t0 -15l-1.5 -12t-3.5 -11.5t-6.5 -7.5t-11 -5.5t-16 -1.5v309q9 0 16 -1t11 -5t6.5 -5.5t3.5 -9.5t1 -10.5v-13.5v-14zM1238 643v-121q0 -1 0.5 -12.5t0 -15.5t-2.5 -11.5t-7.5 -10.5t-13.5 -3q-9 0 -14 9q-4 10 -4 165v7v8.5v9t1.5 8.5l3.5 7
+t5 5.5t8 1.5q6 0 10 -1.5t6.5 -4.5t4 -6t2 -8.5t0.5 -8v-9.5v-9zM180 407h122v472h-122v-472zM614 407h106v472h-159l-28 -221q-20 148 -32 221h-158v-472h107v312l45 -312h76l43 319v-319zM1039 712q0 67 -5 90q-3 16 -11 28.5t-17 20.5t-25 14t-26.5 8.5t-31 4t-29 1.5
+h-29.5h-12h-91v-472h56q169 -1 197 24.5t25 180.5q-1 62 -1 100zM1356 515v133q0 29 -2 45t-9.5 33.5t-24.5 25t-46 7.5q-46 0 -77 -34v154h-117v-472h110l7 30q30 -36 77 -36q50 0 66 30.5t16 83.5zM1536 1248v-1216q0 -66 -47 -113t-113 -47h-1216q-66 0 -113 47t-47 113
+v1216q0 66 47 113t113 47h1216q66 0 113 -47t47 -113z" />
+    <glyph glyph-name="uniF2D9" unicode="&#xf2d9;" horiz-adv-x="2176" 
+d="M1143 -197q-6 1 -11 4q-13 8 -36 23t-86 65t-116.5 104.5t-112 140t-89.5 172.5q-17 3 -175 37q66 -213 235 -362t391 -184zM502 409l168 -28q-25 76 -41 167.5t-19 145.5l-4 53q-84 -82 -121 -224q5 -65 17 -114zM612 1018q-43 -64 -77 -148q44 46 74 68zM2049 584
+q0 161 -62 307t-167.5 252t-250.5 168.5t-304 62.5q-147 0 -281 -52.5t-240 -148.5q-30 -58 -45 -160q60 51 143 83.5t158.5 43t143 13.5t108.5 -1l40 -3q33 -1 53 -15.5t24.5 -33t6.5 -37t-1 -28.5q-126 11 -227.5 0.5t-183 -43.5t-142.5 -71.5t-131 -98.5
+q4 -36 11.5 -92.5t35.5 -178t62 -179.5q123 -6 247.5 14.5t214.5 53.5t162.5 67t109.5 59l37 24q22 16 39.5 20.5t30.5 -5t17 -34.5q14 -97 -39 -121q-208 -97 -467 -134q-135 -20 -317 -16q41 -96 110 -176.5t137 -127t130.5 -79t101.5 -43.5l39 -12q143 -23 263 15
+q195 99 314 289t119 418zM2123 621q-14 -135 -40 -212q-70 -208 -181.5 -346.5t-318.5 -253.5q-48 -33 -82 -44q-72 -26 -163 -16q-36 -3 -73 -3q-283 0 -504.5 173t-295.5 442q-1 0 -4 0.5t-5 0.5q-6 -50 2.5 -112.5t26 -115t36 -98t31.5 -71.5l14 -26q8 -12 54 -82
+q-71 38 -124.5 106.5t-78.5 140t-39.5 137t-17.5 107.5l-2 42q-5 2 -33.5 12.5t-48.5 18t-53 20.5t-57.5 25t-50 25.5t-42.5 27t-25 25.5q19 -10 50.5 -25.5t113 -45.5t145.5 -38l2 32q11 149 94 290q41 202 176 365q28 115 81 214q15 28 32 45t49 32q158 74 303.5 104
+t302 11t306.5 -97q220 -115 333 -336t87 -474z" />
+    <glyph glyph-name="uniF2DA" unicode="&#xf2da;" horiz-adv-x="1792" 
+d="M1341 752q29 44 -6.5 129.5t-121.5 142.5q-58 39 -125.5 53.5t-118 4.5t-68.5 -37q-12 -23 -4.5 -28t42.5 -10q23 -3 38.5 -5t44.5 -9.5t56 -17.5q36 -13 67.5 -31.5t53 -37t40 -38.5t30.5 -38t22 -34.5t16.5 -28.5t12 -18.5t10.5 -6t11 9.5zM1704 178
+q-52 -127 -148.5 -220t-214.5 -141.5t-253 -60.5t-266 13.5t-251 91t-210 161.5t-141.5 235.5t-46.5 303.5q1 41 8.5 84.5t12.5 64t24 80.5t23 73q-51 -208 1 -397t173 -318t291 -206t346 -83t349 74.5t289 244.5q20 27 18 14q0 -4 -4 -14zM1465 627q0 -104 -40.5 -199
+t-108.5 -164t-162 -109.5t-198 -40.5t-198 40.5t-162 109.5t-108.5 164t-40.5 199t40.5 199t108.5 164t162 109.5t198 40.5t198 -40.5t162 -109.5t108.5 -164t40.5 -199zM1752 915q-65 147 -180.5 251t-253 153.5t-292 53.5t-301 -36.5t-275.5 -129t-220 -211.5t-131 -297
+t-10 -373q-49 161 -51.5 311.5t35.5 272.5t109 227t165.5 180.5t207 126t232 71t242.5 9t236 -54t216 -124.5t178 -197q33 -50 62 -121t31 -112zM1690 573q12 244 -136.5 416t-396.5 240q-8 0 -10 5t24 8q125 -4 230 -50t173 -120t116 -168.5t58.5 -199t-1 -208
+t-61.5 -197.5t-122.5 -167t-185 -117.5t-248.5 -46.5q108 30 201.5 80t174 123t129.5 176.5t55 225.5z" />
+    <glyph glyph-name="uniF2DB" unicode="&#xf2db;" 
+d="M192 256v-128h-112q-16 0 -16 16v16h-48q-16 0 -16 16v32q0 16 16 16h48v16q0 16 16 16h112zM192 512v-128h-112q-16 0 -16 16v16h-48q-16 0 -16 16v32q0 16 16 16h48v16q0 16 16 16h112zM192 768v-128h-112q-16 0 -16 16v16h-48q-16 0 -16 16v32q0 16 16 16h48v16
+q0 16 16 16h112zM192 1024v-128h-112q-16 0 -16 16v16h-48q-16 0 -16 16v32q0 16 16 16h48v16q0 16 16 16h112zM192 1280v-128h-112q-16 0 -16 16v16h-48q-16 0 -16 16v32q0 16 16 16h48v16q0 16 16 16h112zM1280 1440v-1472q0 -40 -28 -68t-68 -28h-832q-40 0 -68 28
+t-28 68v1472q0 40 28 68t68 28h832q40 0 68 -28t28 -68zM1536 208v-32q0 -16 -16 -16h-48v-16q0 -16 -16 -16h-112v128h112q16 0 16 -16v-16h48q16 0 16 -16zM1536 464v-32q0 -16 -16 -16h-48v-16q0 -16 -16 -16h-112v128h112q16 0 16 -16v-16h48q16 0 16 -16zM1536 720v-32
+q0 -16 -16 -16h-48v-16q0 -16 -16 -16h-112v128h112q16 0 16 -16v-16h48q16 0 16 -16zM1536 976v-32q0 -16 -16 -16h-48v-16q0 -16 -16 -16h-112v128h112q16 0 16 -16v-16h48q16 0 16 -16zM1536 1232v-32q0 -16 -16 -16h-48v-16q0 -16 -16 -16h-112v128h112q16 0 16 -16v-16
+h48q16 0 16 -16z" />
+    <glyph glyph-name="uniF2DC" unicode="&#xf2dc;" horiz-adv-x="1664" 
+d="M1566 419l-167 -33l186 -107q23 -13 29.5 -38.5t-6.5 -48.5q-14 -23 -39 -29.5t-48 6.5l-186 106l55 -160q13 -38 -12 -63.5t-60.5 -20.5t-48.5 42l-102 300l-271 156v-313l208 -238q16 -18 17 -39t-11 -36.5t-28.5 -25t-37 -5.5t-36.5 22l-112 128v-214q0 -26 -19 -45
+t-45 -19t-45 19t-19 45v214l-112 -128q-16 -18 -36.5 -22t-37 5.5t-28.5 25t-11 36.5t17 39l208 238v313l-271 -156l-102 -300q-13 -37 -48.5 -42t-60.5 20.5t-12 63.5l55 160l-186 -106q-23 -13 -48 -6.5t-39 29.5q-13 23 -6.5 48.5t29.5 38.5l186 107l-167 33
+q-29 6 -42 29t-8.5 46.5t25.5 40t50 10.5l310 -62l271 157l-271 157l-310 -62q-4 -1 -13 -1q-27 0 -44 18t-19 40t11 43t40 26l167 33l-186 107q-23 13 -29.5 38.5t6.5 48.5t39 30t48 -7l186 -106l-55 160q-13 38 12 63.5t60.5 20.5t48.5 -42l102 -300l271 -156v313
+l-208 238q-16 18 -17 39t11 36.5t28.5 25t37 5.5t36.5 -22l112 -128v214q0 26 19 45t45 19t45 -19t19 -45v-214l112 128q16 18 36.5 22t37 -5.5t28.5 -25t11 -36.5t-17 -39l-208 -238v-313l271 156l102 300q13 37 48.5 42t60.5 -20.5t12 -63.5l-55 -160l186 106
+q23 13 48 6.5t39 -29.5q13 -23 6.5 -48.5t-29.5 -38.5l-186 -107l167 -33q27 -5 40 -26t11 -43t-19 -40t-44 -18q-9 0 -13 1l-310 62l-271 -157l271 -157l310 62q29 6 50 -10.5t25.5 -40t-8.5 -46.5t-42 -29z" />
+    <glyph glyph-name="uniF2DD" unicode="&#xf2dd;" horiz-adv-x="1792" 
+d="M1473 607q7 118 -33 226.5t-113 189t-177 131t-221 57.5q-116 7 -225.5 -32t-192 -110.5t-135 -175t-59.5 -220.5q-7 -118 33 -226.5t113 -189t177.5 -131t221.5 -57.5q155 -9 293 59t224 195.5t94 283.5zM1792 1536l-349 -348q120 -117 180.5 -272t50.5 -321
+q-11 -183 -102 -339t-241 -255.5t-332 -124.5l-999 -132l347 347q-120 116 -180.5 271.5t-50.5 321.5q11 184 102 340t241.5 255.5t332.5 124.5q167 22 500 66t500 66z" />
+    <glyph glyph-name="uniF2DE" unicode="&#xf2de;" horiz-adv-x="1792" 
+d="M948 508l163 -329h-51l-175 350l-171 -350h-49l179 374l-78 33l21 49l240 -102l-21 -50zM563 1100l304 -130l-130 -304l-304 130zM907 915l240 -103l-103 -239l-239 102zM1188 765l191 -81l-82 -190l-190 81zM1680 640q0 159 -62 304t-167.5 250.5t-250.5 167.5t-304 62
+t-304 -62t-250.5 -167.5t-167.5 -250.5t-62 -304t62 -304t167.5 -250.5t250.5 -167.5t304 -62t304 62t250.5 167.5t167.5 250.5t62 304zM1792 640q0 -182 -71 -348t-191 -286t-286 -191t-348 -71t-348 71t-286 191t-191 286t-71 348t71 348t191 286t286 191t348 71t348 -71
+t286 -191t191 -286t71 -348z" />
+    <glyph glyph-name="uniF2E0" unicode="&#xf2e0;" horiz-adv-x="1920" 
+d="M1334 302q-4 24 -27.5 34t-49.5 10.5t-48.5 12.5t-25.5 38q-5 47 33 139.5t75 181t32 127.5q-14 101 -117 103q-45 1 -75 -16l-3 -2l-5 -2.5t-4.5 -2t-5 -2t-5 -0.5t-6 1.5t-6 3.5t-6.5 5q-3 2 -9 8.5t-9 9t-8.5 7.5t-9.5 7.5t-9.5 5.5t-11 4.5t-11.5 2.5q-30 5 -48 -3
+t-45 -31q-1 -1 -9 -8.5t-12.5 -11t-15 -10t-16.5 -5.5t-17 3q-54 27 -84 40q-41 18 -94 -5t-76 -65q-16 -28 -41 -98.5t-43.5 -132.5t-40 -134t-21.5 -73q-22 -69 18.5 -119t110.5 -46q30 2 50.5 15t38.5 46q7 13 79 199.5t77 194.5q6 11 21.5 18t29.5 0q27 -15 21 -53
+q-2 -18 -51 -139.5t-50 -132.5q-6 -38 19.5 -56.5t60.5 -7t55 49.5q4 8 45.5 92t81.5 163.5t46 88.5q20 29 41 28q29 0 25 -38q-2 -16 -65.5 -147.5t-70.5 -159.5q-12 -53 13 -103t74 -74q17 -9 51 -15.5t71.5 -8t62.5 14t20 48.5zM383 86q3 -15 -5 -27.5t-23 -15.5
+q-14 -3 -26.5 5t-15.5 23q-3 14 5 27t22 16t27 -5t16 -23zM953 -177q12 -17 8.5 -37.5t-20.5 -32.5t-37.5 -8t-32.5 21q-11 17 -7.5 37.5t20.5 32.5t37.5 8t31.5 -21zM177 635q-18 -27 -49.5 -33t-57.5 13q-26 18 -32 50t12 58q18 27 49.5 33t57.5 -12q26 -19 32 -50.5
+t-12 -58.5zM1467 -42q19 -28 13 -61.5t-34 -52.5t-60.5 -13t-51.5 34t-13 61t33 53q28 19 60.5 13t52.5 -34zM1579 562q69 -113 42.5 -244.5t-134.5 -207.5q-90 -63 -199 -60q-20 -80 -84.5 -127t-143.5 -44.5t-140 57.5q-12 -9 -13 -10q-103 -71 -225 -48.5t-193 126.5
+q-50 73 -53 164q-83 14 -142.5 70.5t-80.5 128t-2 152t81 138.5q-36 60 -38 128t24.5 125t79.5 98.5t121 50.5q32 85 99 148t146.5 91.5t168 17t159.5 -66.5q72 21 140 17.5t128.5 -36t104.5 -80t67.5 -115t17.5 -140.5q52 -16 87 -57t45.5 -89t-5.5 -99.5t-58 -87.5z
+M455 1222q14 -20 9.5 -44.5t-24.5 -38.5q-19 -14 -43.5 -9.5t-37.5 24.5q-14 20 -9.5 44.5t24.5 38.5q19 14 43.5 9.5t37.5 -24.5zM614 1503q4 -16 -5 -30.5t-26 -18.5t-31 5.5t-18 26.5q-3 17 6.5 31t25.5 18q17 4 31 -5.5t17 -26.5zM1800 555q4 -20 -6.5 -37t-30.5 -21
+q-19 -4 -36 6.5t-21 30.5t6.5 37t30.5 22q20 4 36.5 -7.5t20.5 -30.5zM1136 1448q16 -27 8.5 -58.5t-35.5 -47.5q-27 -16 -57.5 -8.5t-46.5 34.5q-16 28 -8.5 59t34.5 48t58 9t47 -36zM1882 792q4 -15 -4 -27.5t-23 -16.5q-15 -3 -27.5 5.5t-15.5 22.5q-3 15 5 28t23 16
+q14 3 26.5 -5t15.5 -23zM1691 1033q15 -22 10.5 -49t-26.5 -43q-22 -15 -49 -10t-42 27t-10 49t27 43t48.5 11t41.5 -28z" />
+    <glyph glyph-name="uniF2E1" unicode="&#xf2e1;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="uniF2E2" unicode="&#xf2e2;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="uniF2E3" unicode="&#xf2e3;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="uniF2E4" unicode="&#xf2e4;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="uniF2E5" unicode="&#xf2e5;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="uniF2E6" unicode="&#xf2e6;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="uniF2E7" unicode="&#xf2e7;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="_698" unicode="&#xf2e8;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="uniF2E9" unicode="&#xf2e9;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="uniF2EA" unicode="&#xf2ea;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="uniF2EB" unicode="&#xf2eb;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="uniF2EC" unicode="&#xf2ec;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="uniF2ED" unicode="&#xf2ed;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="uniF2EE" unicode="&#xf2ee;" horiz-adv-x="1792" 
+ />
+    <glyph glyph-name="lessequal" unicode="&#xf500;" horiz-adv-x="1792" 
+ />
+  </font>
+</defs></svg>
diff --git a/_site/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.ttf b/_site/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.ttf
new file mode 100755
index 00000000..35acda2f
Binary files /dev/null and b/_site/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.ttf differ
diff --git a/_site/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.woff b/_site/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.woff
new file mode 100755
index 00000000..400014a4
Binary files /dev/null and b/_site/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.woff differ
diff --git a/_site/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.woff2 b/_site/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.woff2
new file mode 100755
index 00000000..4d13fc60
Binary files /dev/null and b/_site/site/public/font-awesome-4.7.0/fonts/fontawesome-webfont.woff2 differ
diff --git a/_site/site/public/font-awesome-4.7.0/less/animated.less b/_site/site/public/font-awesome-4.7.0/less/animated.less
new file mode 100755
index 00000000..66ad52a5
--- /dev/null
+++ b/_site/site/public/font-awesome-4.7.0/less/animated.less
@@ -0,0 +1,34 @@
+// Animated Icons
+// --------------------------
+
+.@{fa-css-prefix}-spin {
+  -webkit-animation: fa-spin 2s infinite linear;
+          animation: fa-spin 2s infinite linear;
+}
+
+.@{fa-css-prefix}-pulse {
+  -webkit-animation: fa-spin 1s infinite steps(8);
+          animation: fa-spin 1s infinite steps(8);
+}
+
+@-webkit-keyframes fa-spin {
+  0% {
+    -webkit-transform: rotate(0deg);
+            transform: rotate(0deg);
+  }
+  100% {
+    -webkit-transform: rotate(359deg);
+            transform: rotate(359deg);
+  }
+}
+
+@keyframes fa-spin {
+  0% {
+    -webkit-transform: rotate(0deg);
+            transform: rotate(0deg);
+  }
+  100% {
+    -webkit-transform: rotate(359deg);
+            transform: rotate(359deg);
+  }
+}
diff --git a/_site/site/public/font-awesome-4.7.0/less/bordered-pulled.less b/_site/site/public/font-awesome-4.7.0/less/bordered-pulled.less
new file mode 100755
index 00000000..f1c8ad75
--- /dev/null
+++ b/_site/site/public/font-awesome-4.7.0/less/bordered-pulled.less
@@ -0,0 +1,25 @@
+// Bordered & Pulled
+// -------------------------
+
+.@{fa-css-prefix}-border {
+  padding: .2em .25em .15em;
+  border: solid .08em @fa-border-color;
+  border-radius: .1em;
+}
+
+.@{fa-css-prefix}-pull-left { float: left; }
+.@{fa-css-prefix}-pull-right { float: right; }
+
+.@{fa-css-prefix} {
+  &.@{fa-css-prefix}-pull-left { margin-right: .3em; }
+  &.@{fa-css-prefix}-pull-right { margin-left: .3em; }
+}
+
+/* Deprecated as of 4.4.0 */
+.pull-right { float: right; }
+.pull-left { float: left; }
+
+.@{fa-css-prefix} {
+  &.pull-left { margin-right: .3em; }
+  &.pull-right { margin-left: .3em; }
+}
diff --git a/_site/site/public/font-awesome-4.7.0/less/core.less b/_site/site/public/font-awesome-4.7.0/less/core.less
new file mode 100755
index 00000000..c577ac84
--- /dev/null
+++ b/_site/site/public/font-awesome-4.7.0/less/core.less
@@ -0,0 +1,12 @@
+// Base Class Definition
+// -------------------------
+
+.@{fa-css-prefix} {
+  display: inline-block;
+  font: normal normal normal @fa-font-size-base/@fa-line-height-base FontAwesome; // shortening font declaration
+  font-size: inherit; // can't have font-size inherit on line above, so need to override
+  text-rendering: auto; // optimizelegibility throws things off #1094
+  -webkit-font-smoothing: antialiased;
+  -moz-osx-font-smoothing: grayscale;
+
+}
diff --git a/_site/site/public/font-awesome-4.7.0/less/fixed-width.less b/_site/site/public/font-awesome-4.7.0/less/fixed-width.less
new file mode 100755
index 00000000..110289f2
--- /dev/null
+++ b/_site/site/public/font-awesome-4.7.0/less/fixed-width.less
@@ -0,0 +1,6 @@
+// Fixed Width Icons
+// -------------------------
+.@{fa-css-prefix}-fw {
+  width: (18em / 14);
+  text-align: center;
+}
diff --git a/_site/site/public/font-awesome-4.7.0/less/font-awesome.less b/_site/site/public/font-awesome-4.7.0/less/font-awesome.less
new file mode 100755
index 00000000..c3677def
--- /dev/null
+++ b/_site/site/public/font-awesome-4.7.0/less/font-awesome.less
@@ -0,0 +1,18 @@
+/*!
+ *  Font Awesome 4.7.0 by @davegandy - http://fontawesome.io - @fontawesome
+ *  License - http://fontawesome.io/license (Font: SIL OFL 1.1, CSS: MIT License)
+ */
+
+@import "variables.less";
+@import "mixins.less";
+@import "path.less";
+@import "core.less";
+@import "larger.less";
+@import "fixed-width.less";
+@import "list.less";
+@import "bordered-pulled.less";
+@import "animated.less";
+@import "rotated-flipped.less";
+@import "stacked.less";
+@import "icons.less";
+@import "screen-reader.less";
diff --git a/_site/site/public/font-awesome-4.7.0/less/icons.less b/_site/site/public/font-awesome-4.7.0/less/icons.less
new file mode 100755
index 00000000..159d6004
--- /dev/null
+++ b/_site/site/public/font-awesome-4.7.0/less/icons.less
@@ -0,0 +1,789 @@
+/* Font Awesome uses the Unicode Private Use Area (PUA) to ensure screen
+   readers do not read off random characters that represent icons */
+
+.@{fa-css-prefix}-glass:before { content: @fa-var-glass; }
+.@{fa-css-prefix}-music:before { content: @fa-var-music; }
+.@{fa-css-prefix}-search:before { content: @fa-var-search; }
+.@{fa-css-prefix}-envelope-o:before { content: @fa-var-envelope-o; }
+.@{fa-css-prefix}-heart:before { content: @fa-var-heart; }
+.@{fa-css-prefix}-star:before { content: @fa-var-star; }
+.@{fa-css-prefix}-star-o:before { content: @fa-var-star-o; }
+.@{fa-css-prefix}-user:before { content: @fa-var-user; }
+.@{fa-css-prefix}-film:before { content: @fa-var-film; }
+.@{fa-css-prefix}-th-large:before { content: @fa-var-th-large; }
+.@{fa-css-prefix}-th:before { content: @fa-var-th; }
+.@{fa-css-prefix}-th-list:before { content: @fa-var-th-list; }
+.@{fa-css-prefix}-check:before { content: @fa-var-check; }
+.@{fa-css-prefix}-remove:before,
+.@{fa-css-prefix}-close:before,
+.@{fa-css-prefix}-times:before { content: @fa-var-times; }
+.@{fa-css-prefix}-search-plus:before { content: @fa-var-search-plus; }
+.@{fa-css-prefix}-search-minus:before { content: @fa-var-search-minus; }
+.@{fa-css-prefix}-power-off:before { content: @fa-var-power-off; }
+.@{fa-css-prefix}-signal:before { content: @fa-var-signal; }
+.@{fa-css-prefix}-gear:before,
+.@{fa-css-prefix}-cog:before { content: @fa-var-cog; }
+.@{fa-css-prefix}-trash-o:before { content: @fa-var-trash-o; }
+.@{fa-css-prefix}-home:before { content: @fa-var-home; }
+.@{fa-css-prefix}-file-o:before { content: @fa-var-file-o; }
+.@{fa-css-prefix}-clock-o:before { content: @fa-var-clock-o; }
+.@{fa-css-prefix}-road:before { content: @fa-var-road; }
+.@{fa-css-prefix}-download:before { content: @fa-var-download; }
+.@{fa-css-prefix}-arrow-circle-o-down:before { content: @fa-var-arrow-circle-o-down; }
+.@{fa-css-prefix}-arrow-circle-o-up:before { content: @fa-var-arrow-circle-o-up; }
+.@{fa-css-prefix}-inbox:before { content: @fa-var-inbox; }
+.@{fa-css-prefix}-play-circle-o:before { content: @fa-var-play-circle-o; }
+.@{fa-css-prefix}-rotate-right:before,
+.@{fa-css-prefix}-repeat:before { content: @fa-var-repeat; }
+.@{fa-css-prefix}-refresh:before { content: @fa-var-refresh; }
+.@{fa-css-prefix}-list-alt:before { content: @fa-var-list-alt; }
+.@{fa-css-prefix}-lock:before { content: @fa-var-lock; }
+.@{fa-css-prefix}-flag:before { content: @fa-var-flag; }
+.@{fa-css-prefix}-headphones:before { content: @fa-var-headphones; }
+.@{fa-css-prefix}-volume-off:before { content: @fa-var-volume-off; }
+.@{fa-css-prefix}-volume-down:before { content: @fa-var-volume-down; }
+.@{fa-css-prefix}-volume-up:before { content: @fa-var-volume-up; }
+.@{fa-css-prefix}-qrcode:before { content: @fa-var-qrcode; }
+.@{fa-css-prefix}-barcode:before { content: @fa-var-barcode; }
+.@{fa-css-prefix}-tag:before { content: @fa-var-tag; }
+.@{fa-css-prefix}-tags:before { content: @fa-var-tags; }
+.@{fa-css-prefix}-book:before { content: @fa-var-book; }
+.@{fa-css-prefix}-bookmark:before { content: @fa-var-bookmark; }
+.@{fa-css-prefix}-print:before { content: @fa-var-print; }
+.@{fa-css-prefix}-camera:before { content: @fa-var-camera; }
+.@{fa-css-prefix}-font:before { content: @fa-var-font; }
+.@{fa-css-prefix}-bold:before { content: @fa-var-bold; }
+.@{fa-css-prefix}-italic:before { content: @fa-var-italic; }
+.@{fa-css-prefix}-text-height:before { content: @fa-var-text-height; }
+.@{fa-css-prefix}-text-width:before { content: @fa-var-text-width; }
+.@{fa-css-prefix}-align-left:before { content: @fa-var-align-left; }
+.@{fa-css-prefix}-align-center:before { content: @fa-var-align-center; }
+.@{fa-css-prefix}-align-right:before { content: @fa-var-align-right; }
+.@{fa-css-prefix}-align-justify:before { content: @fa-var-align-justify; }
+.@{fa-css-prefix}-list:before { content: @fa-var-list; }
+.@{fa-css-prefix}-dedent:before,
+.@{fa-css-prefix}-outdent:before { content: @fa-var-outdent; }
+.@{fa-css-prefix}-indent:before { content: @fa-var-indent; }
+.@{fa-css-prefix}-video-camera:before { content: @fa-var-video-camera; }
+.@{fa-css-prefix}-photo:before,
+.@{fa-css-prefix}-image:before,
+.@{fa-css-prefix}-picture-o:before { content: @fa-var-picture-o; }
+.@{fa-css-prefix}-pencil:before { content: @fa-var-pencil; }
+.@{fa-css-prefix}-map-marker:before { content: @fa-var-map-marker; }
+.@{fa-css-prefix}-adjust:before { content: @fa-var-adjust; }
+.@{fa-css-prefix}-tint:before { content: @fa-var-tint; }
+.@{fa-css-prefix}-edit:before,
+.@{fa-css-prefix}-pencil-square-o:before { content: @fa-var-pencil-square-o; }
+.@{fa-css-prefix}-share-square-o:before { content: @fa-var-share-square-o; }
+.@{fa-css-prefix}-check-square-o:before { content: @fa-var-check-square-o; }
+.@{fa-css-prefix}-arrows:before { content: @fa-var-arrows; }
+.@{fa-css-prefix}-step-backward:before { content: @fa-var-step-backward; }
+.@{fa-css-prefix}-fast-backward:before { content: @fa-var-fast-backward; }
+.@{fa-css-prefix}-backward:before { content: @fa-var-backward; }
+.@{fa-css-prefix}-play:before { content: @fa-var-play; }
+.@{fa-css-prefix}-pause:before { content: @fa-var-pause; }
+.@{fa-css-prefix}-stop:before { content: @fa-var-stop; }
+.@{fa-css-prefix}-forward:before { content: @fa-var-forward; }
+.@{fa-css-prefix}-fast-forward:before { content: @fa-var-fast-forward; }
+.@{fa-css-prefix}-step-forward:before { content: @fa-var-step-forward; }
+.@{fa-css-prefix}-eject:before { content: @fa-var-eject; }
+.@{fa-css-prefix}-chevron-left:before { content: @fa-var-chevron-left; }
+.@{fa-css-prefix}-chevron-right:before { content: @fa-var-chevron-right; }
+.@{fa-css-prefix}-plus-circle:before { content: @fa-var-plus-circle; }
+.@{fa-css-prefix}-minus-circle:before { content: @fa-var-minus-circle; }
+.@{fa-css-prefix}-times-circle:before { content: @fa-var-times-circle; }
+.@{fa-css-prefix}-check-circle:before { content: @fa-var-check-circle; }
+.@{fa-css-prefix}-question-circle:before { content: @fa-var-question-circle; }
+.@{fa-css-prefix}-info-circle:before { content: @fa-var-info-circle; }
+.@{fa-css-prefix}-crosshairs:before { content: @fa-var-crosshairs; }
+.@{fa-css-prefix}-times-circle-o:before { content: @fa-var-times-circle-o; }
+.@{fa-css-prefix}-check-circle-o:before { content: @fa-var-check-circle-o; }
+.@{fa-css-prefix}-ban:before { content: @fa-var-ban; }
+.@{fa-css-prefix}-arrow-left:before { content: @fa-var-arrow-left; }
+.@{fa-css-prefix}-arrow-right:before { content: @fa-var-arrow-right; }
+.@{fa-css-prefix}-arrow-up:before { content: @fa-var-arrow-up; }
+.@{fa-css-prefix}-arrow-down:before { content: @fa-var-arrow-down; }
+.@{fa-css-prefix}-mail-forward:before,
+.@{fa-css-prefix}-share:before { content: @fa-var-share; }
+.@{fa-css-prefix}-expand:before { content: @fa-var-expand; }
+.@{fa-css-prefix}-compress:before { content: @fa-var-compress; }
+.@{fa-css-prefix}-plus:before { content: @fa-var-plus; }
+.@{fa-css-prefix}-minus:before { content: @fa-var-minus; }
+.@{fa-css-prefix}-asterisk:before { content: @fa-var-asterisk; }
+.@{fa-css-prefix}-exclamation-circle:before { content: @fa-var-exclamation-circle; }
+.@{fa-css-prefix}-gift:before { content: @fa-var-gift; }
+.@{fa-css-prefix}-leaf:before { content: @fa-var-leaf; }
+.@{fa-css-prefix}-fire:before { content: @fa-var-fire; }
+.@{fa-css-prefix}-eye:before { content: @fa-var-eye; }
+.@{fa-css-prefix}-eye-slash:before { content: @fa-var-eye-slash; }
+.@{fa-css-prefix}-warning:before,
+.@{fa-css-prefix}-exclamation-triangle:before { content: @fa-var-exclamation-triangle; }
+.@{fa-css-prefix}-plane:before { content: @fa-var-plane; }
+.@{fa-css-prefix}-calendar:before { content: @fa-var-calendar; }
+.@{fa-css-prefix}-random:before { content: @fa-var-random; }
+.@{fa-css-prefix}-comment:before { content: @fa-var-comment; }
+.@{fa-css-prefix}-magnet:before { content: @fa-var-magnet; }
+.@{fa-css-prefix}-chevron-up:before { content: @fa-var-chevron-up; }
+.@{fa-css-prefix}-chevron-down:before { content: @fa-var-chevron-down; }
+.@{fa-css-prefix}-retweet:before { content: @fa-var-retweet; }
+.@{fa-css-prefix}-shopping-cart:before { content: @fa-var-shopping-cart; }
+.@{fa-css-prefix}-folder:before { content: @fa-var-folder; }
+.@{fa-css-prefix}-folder-open:before { content: @fa-var-folder-open; }
+.@{fa-css-prefix}-arrows-v:before { content: @fa-var-arrows-v; }
+.@{fa-css-prefix}-arrows-h:before { content: @fa-var-arrows-h; }
+.@{fa-css-prefix}-bar-chart-o:before,
+.@{fa-css-prefix}-bar-chart:before { content: @fa-var-bar-chart; }
+.@{fa-css-prefix}-twitter-square:before { content: @fa-var-twitter-square; }
+.@{fa-css-prefix}-facebook-square:before { content: @fa-var-facebook-square; }
+.@{fa-css-prefix}-camera-retro:before { content: @fa-var-camera-retro; }
+.@{fa-css-prefix}-key:before { content: @fa-var-key; }
+.@{fa-css-prefix}-gears:before,
+.@{fa-css-prefix}-cogs:before { content: @fa-var-cogs; }
+.@{fa-css-prefix}-comments:before { content: @fa-var-comments; }
+.@{fa-css-prefix}-thumbs-o-up:before { content: @fa-var-thumbs-o-up; }
+.@{fa-css-prefix}-thumbs-o-down:before { content: @fa-var-thumbs-o-down; }
+.@{fa-css-prefix}-star-half:before { content: @fa-var-star-half; }
+.@{fa-css-prefix}-heart-o:before { content: @fa-var-heart-o; }
+.@{fa-css-prefix}-sign-out:before { content: @fa-var-sign-out; }
+.@{fa-css-prefix}-linkedin-square:before { content: @fa-var-linkedin-square; }
+.@{fa-css-prefix}-thumb-tack:before { content: @fa-var-thumb-tack; }
+.@{fa-css-prefix}-external-link:before { content: @fa-var-external-link; }
+.@{fa-css-prefix}-sign-in:before { content: @fa-var-sign-in; }
+.@{fa-css-prefix}-trophy:before { content: @fa-var-trophy; }
+.@{fa-css-prefix}-github-square:before { content: @fa-var-github-square; }
+.@{fa-css-prefix}-upload:before { content: @fa-var-upload; }
+.@{fa-css-prefix}-lemon-o:before { content: @fa-var-lemon-o; }
+.@{fa-css-prefix}-phone:before { content: @fa-var-phone; }
+.@{fa-css-prefix}-square-o:before { content: @fa-var-square-o; }
+.@{fa-css-prefix}-bookmark-o:before { content: @fa-var-bookmark-o; }
+.@{fa-css-prefix}-phone-square:before { content: @fa-var-phone-square; }
+.@{fa-css-prefix}-twitter:before { content: @fa-var-twitter; }
+.@{fa-css-prefix}-facebook-f:before,
+.@{fa-css-prefix}-facebook:before { content: @fa-var-facebook; }
+.@{fa-css-prefix}-github:before { content: @fa-var-github; }
+.@{fa-css-prefix}-unlock:before { content: @fa-var-unlock; }
+.@{fa-css-prefix}-credit-card:before { content: @fa-var-credit-card; }
+.@{fa-css-prefix}-feed:before,
+.@{fa-css-prefix}-rss:before { content: @fa-var-rss; }
+.@{fa-css-prefix}-hdd-o:before { content: @fa-var-hdd-o; }
+.@{fa-css-prefix}-bullhorn:before { content: @fa-var-bullhorn; }
+.@{fa-css-prefix}-bell:before { content: @fa-var-bell; }
+.@{fa-css-prefix}-certificate:before { content: @fa-var-certificate; }
+.@{fa-css-prefix}-hand-o-right:before { content: @fa-var-hand-o-right; }
+.@{fa-css-prefix}-hand-o-left:before { content: @fa-var-hand-o-left; }
+.@{fa-css-prefix}-hand-o-up:before { content: @fa-var-hand-o-up; }
+.@{fa-css-prefix}-hand-o-down:before { content: @fa-var-hand-o-down; }
+.@{fa-css-prefix}-arrow-circle-left:before { content: @fa-var-arrow-circle-left; }
+.@{fa-css-prefix}-arrow-circle-right:before { content: @fa-var-arrow-circle-right; }
+.@{fa-css-prefix}-arrow-circle-up:before { content: @fa-var-arrow-circle-up; }
+.@{fa-css-prefix}-arrow-circle-down:before { content: @fa-var-arrow-circle-down; }
+.@{fa-css-prefix}-globe:before { content: @fa-var-globe; }
+.@{fa-css-prefix}-wrench:before { content: @fa-var-wrench; }
+.@{fa-css-prefix}-tasks:before { content: @fa-var-tasks; }
+.@{fa-css-prefix}-filter:before { content: @fa-var-filter; }
+.@{fa-css-prefix}-briefcase:before { content: @fa-var-briefcase; }
+.@{fa-css-prefix}-arrows-alt:before { content: @fa-var-arrows-alt; }
+.@{fa-css-prefix}-group:before,
+.@{fa-css-prefix}-users:before { content: @fa-var-users; }
+.@{fa-css-prefix}-chain:before,
+.@{fa-css-prefix}-link:before { content: @fa-var-link; }
+.@{fa-css-prefix}-cloud:before { content: @fa-var-cloud; }
+.@{fa-css-prefix}-flask:before { content: @fa-var-flask; }
+.@{fa-css-prefix}-cut:before,
+.@{fa-css-prefix}-scissors:before { content: @fa-var-scissors; }
+.@{fa-css-prefix}-copy:before,
+.@{fa-css-prefix}-files-o:before { content: @fa-var-files-o; }
+.@{fa-css-prefix}-paperclip:before { content: @fa-var-paperclip; }
+.@{fa-css-prefix}-save:before,
+.@{fa-css-prefix}-floppy-o:before { content: @fa-var-floppy-o; }
+.@{fa-css-prefix}-square:before { content: @fa-var-square; }
+.@{fa-css-prefix}-navicon:before,
+.@{fa-css-prefix}-reorder:before,
+.@{fa-css-prefix}-bars:before { content: @fa-var-bars; }
+.@{fa-css-prefix}-list-ul:before { content: @fa-var-list-ul; }
+.@{fa-css-prefix}-list-ol:before { content: @fa-var-list-ol; }
+.@{fa-css-prefix}-strikethrough:before { content: @fa-var-strikethrough; }
+.@{fa-css-prefix}-underline:before { content: @fa-var-underline; }
+.@{fa-css-prefix}-table:before { content: @fa-var-table; }
+.@{fa-css-prefix}-magic:before { content: @fa-var-magic; }
+.@{fa-css-prefix}-truck:before { content: @fa-var-truck; }
+.@{fa-css-prefix}-pinterest:before { content: @fa-var-pinterest; }
+.@{fa-css-prefix}-pinterest-square:before { content: @fa-var-pinterest-square; }
+.@{fa-css-prefix}-google-plus-square:before { content: @fa-var-google-plus-square; }
+.@{fa-css-prefix}-google-plus:before { content: @fa-var-google-plus; }
+.@{fa-css-prefix}-money:before { content: @fa-var-money; }
+.@{fa-css-prefix}-caret-down:before { content: @fa-var-caret-down; }
+.@{fa-css-prefix}-caret-up:before { content: @fa-var-caret-up; }
+.@{fa-css-prefix}-caret-left:before { content: @fa-var-caret-left; }
+.@{fa-css-prefix}-caret-right:before { content: @fa-var-caret-right; }
+.@{fa-css-prefix}-columns:before { content: @fa-var-columns; }
+.@{fa-css-prefix}-unsorted:before,
+.@{fa-css-prefix}-sort:before { content: @fa-var-sort; }
+.@{fa-css-prefix}-sort-down:before,
+.@{fa-css-prefix}-sort-desc:before { content: @fa-var-sort-desc; }
+.@{fa-css-prefix}-sort-up:before,
+.@{fa-css-prefix}-sort-asc:before { content: @fa-var-sort-asc; }
+.@{fa-css-prefix}-envelope:before { content: @fa-var-envelope; }
+.@{fa-css-prefix}-linkedin:before { content: @fa-var-linkedin; }
+.@{fa-css-prefix}-rotate-left:before,
+.@{fa-css-prefix}-undo:before { content: @fa-var-undo; }
+.@{fa-css-prefix}-legal:before,
+.@{fa-css-prefix}-gavel:before { content: @fa-var-gavel; }
+.@{fa-css-prefix}-dashboard:before,
+.@{fa-css-prefix}-tachometer:before { content: @fa-var-tachometer; }
+.@{fa-css-prefix}-comment-o:before { content: @fa-var-comment-o; }
+.@{fa-css-prefix}-comments-o:before { content: @fa-var-comments-o; }
+.@{fa-css-prefix}-flash:before,
+.@{fa-css-prefix}-bolt:before { content: @fa-var-bolt; }
+.@{fa-css-prefix}-sitemap:before { content: @fa-var-sitemap; }
+.@{fa-css-prefix}-umbrella:before { content: @fa-var-umbrella; }
+.@{fa-css-prefix}-paste:before,
+.@{fa-css-prefix}-clipboard:before { content: @fa-var-clipboard; }
+.@{fa-css-prefix}-lightbulb-o:before { content: @fa-var-lightbulb-o; }
+.@{fa-css-prefix}-exchange:before { content: @fa-var-exchange; }
+.@{fa-css-prefix}-cloud-download:before { content: @fa-var-cloud-download; }
+.@{fa-css-prefix}-cloud-upload:before { content: @fa-var-cloud-upload; }
+.@{fa-css-prefix}-user-md:before { content: @fa-var-user-md; }
+.@{fa-css-prefix}-stethoscope:before { content: @fa-var-stethoscope; }
+.@{fa-css-prefix}-suitcase:before { content: @fa-var-suitcase; }
+.@{fa-css-prefix}-bell-o:before { content: @fa-var-bell-o; }
+.@{fa-css-prefix}-coffee:before { content: @fa-var-coffee; }
+.@{fa-css-prefix}-cutlery:before { content: @fa-var-cutlery; }
+.@{fa-css-prefix}-file-text-o:before { content: @fa-var-file-text-o; }
+.@{fa-css-prefix}-building-o:before { content: @fa-var-building-o; }
+.@{fa-css-prefix}-hospital-o:before { content: @fa-var-hospital-o; }
+.@{fa-css-prefix}-ambulance:before { content: @fa-var-ambulance; }
+.@{fa-css-prefix}-medkit:before { content: @fa-var-medkit; }
+.@{fa-css-prefix}-fighter-jet:before { content: @fa-var-fighter-jet; }
+.@{fa-css-prefix}-beer:before { content: @fa-var-beer; }
+.@{fa-css-prefix}-h-square:before { content: @fa-var-h-square; }
+.@{fa-css-prefix}-plus-square:before { content: @fa-var-plus-square; }
+.@{fa-css-prefix}-angle-double-left:before { content: @fa-var-angle-double-left; }
+.@{fa-css-prefix}-angle-double-right:before { content: @fa-var-angle-double-right; }
+.@{fa-css-prefix}-angle-double-up:before { content: @fa-var-angle-double-up; }
+.@{fa-css-prefix}-angle-double-down:before { content: @fa-var-angle-double-down; }
+.@{fa-css-prefix}-angle-left:before { content: @fa-var-angle-left; }
+.@{fa-css-prefix}-angle-right:before { content: @fa-var-angle-right; }
+.@{fa-css-prefix}-angle-up:before { content: @fa-var-angle-up; }
+.@{fa-css-prefix}-angle-down:before { content: @fa-var-angle-down; }
+.@{fa-css-prefix}-desktop:before { content: @fa-var-desktop; }
+.@{fa-css-prefix}-laptop:before { content: @fa-var-laptop; }
+.@{fa-css-prefix}-tablet:before { content: @fa-var-tablet; }
+.@{fa-css-prefix}-mobile-phone:before,
+.@{fa-css-prefix}-mobile:before { content: @fa-var-mobile; }
+.@{fa-css-prefix}-circle-o:before { content: @fa-var-circle-o; }
+.@{fa-css-prefix}-quote-left:before { content: @fa-var-quote-left; }
+.@{fa-css-prefix}-quote-right:before { content: @fa-var-quote-right; }
+.@{fa-css-prefix}-spinner:before { content: @fa-var-spinner; }
+.@{fa-css-prefix}-circle:before { content: @fa-var-circle; }
+.@{fa-css-prefix}-mail-reply:before,
+.@{fa-css-prefix}-reply:before { content: @fa-var-reply; }
+.@{fa-css-prefix}-github-alt:before { content: @fa-var-github-alt; }
+.@{fa-css-prefix}-folder-o:before { content: @fa-var-folder-o; }
+.@{fa-css-prefix}-folder-open-o:before { content: @fa-var-folder-open-o; }
+.@{fa-css-prefix}-smile-o:before { content: @fa-var-smile-o; }
+.@{fa-css-prefix}-frown-o:before { content: @fa-var-frown-o; }
+.@{fa-css-prefix}-meh-o:before { content: @fa-var-meh-o; }
+.@{fa-css-prefix}-gamepad:before { content: @fa-var-gamepad; }
+.@{fa-css-prefix}-keyboard-o:before { content: @fa-var-keyboard-o; }
+.@{fa-css-prefix}-flag-o:before { content: @fa-var-flag-o; }
+.@{fa-css-prefix}-flag-checkered:before { content: @fa-var-flag-checkered; }
+.@{fa-css-prefix}-terminal:before { content: @fa-var-terminal; }
+.@{fa-css-prefix}-code:before { content: @fa-var-code; }
+.@{fa-css-prefix}-mail-reply-all:before,
+.@{fa-css-prefix}-reply-all:before { content: @fa-var-reply-all; }
+.@{fa-css-prefix}-star-half-empty:before,
+.@{fa-css-prefix}-star-half-full:before,
+.@{fa-css-prefix}-star-half-o:before { content: @fa-var-star-half-o; }
+.@{fa-css-prefix}-location-arrow:before { content: @fa-var-location-arrow; }
+.@{fa-css-prefix}-crop:before { content: @fa-var-crop; }
+.@{fa-css-prefix}-code-fork:before { content: @fa-var-code-fork; }
+.@{fa-css-prefix}-unlink:before,
+.@{fa-css-prefix}-chain-broken:before { content: @fa-var-chain-broken; }
+.@{fa-css-prefix}-question:before { content: @fa-var-question; }
+.@{fa-css-prefix}-info:before { content: @fa-var-info; }
+.@{fa-css-prefix}-exclamation:before { content: @fa-var-exclamation; }
+.@{fa-css-prefix}-superscript:before { content: @fa-var-superscript; }
+.@{fa-css-prefix}-subscript:before { content: @fa-var-subscript; }
+.@{fa-css-prefix}-eraser:before { content: @fa-var-eraser; }
+.@{fa-css-prefix}-puzzle-piece:before { content: @fa-var-puzzle-piece; }
+.@{fa-css-prefix}-microphone:before { content: @fa-var-microphone; }
+.@{fa-css-prefix}-microphone-slash:before { content: @fa-var-microphone-slash; }
+.@{fa-css-prefix}-shield:before { content: @fa-var-shield; }
+.@{fa-css-prefix}-calendar-o:before { content: @fa-var-calendar-o; }
+.@{fa-css-prefix}-fire-extinguisher:before { content: @fa-var-fire-extinguisher; }
+.@{fa-css-prefix}-rocket:before { content: @fa-var-rocket; }
+.@{fa-css-prefix}-maxcdn:before { content: @fa-var-maxcdn; }
+.@{fa-css-prefix}-chevron-circle-left:before { content: @fa-var-chevron-circle-left; }
+.@{fa-css-prefix}-chevron-circle-right:before { content: @fa-var-chevron-circle-right; }
+.@{fa-css-prefix}-chevron-circle-up:before { content: @fa-var-chevron-circle-up; }
+.@{fa-css-prefix}-chevron-circle-down:before { content: @fa-var-chevron-circle-down; }
+.@{fa-css-prefix}-html5:before { content: @fa-var-html5; }
+.@{fa-css-prefix}-css3:before { content: @fa-var-css3; }
+.@{fa-css-prefix}-anchor:before { content: @fa-var-anchor; }
+.@{fa-css-prefix}-unlock-alt:before { content: @fa-var-unlock-alt; }
+.@{fa-css-prefix}-bullseye:before { content: @fa-var-bullseye; }
+.@{fa-css-prefix}-ellipsis-h:before { content: @fa-var-ellipsis-h; }
+.@{fa-css-prefix}-ellipsis-v:before { content: @fa-var-ellipsis-v; }
+.@{fa-css-prefix}-rss-square:before { content: @fa-var-rss-square; }
+.@{fa-css-prefix}-play-circle:before { content: @fa-var-play-circle; }
+.@{fa-css-prefix}-ticket:before { content: @fa-var-ticket; }
+.@{fa-css-prefix}-minus-square:before { content: @fa-var-minus-square; }
+.@{fa-css-prefix}-minus-square-o:before { content: @fa-var-minus-square-o; }
+.@{fa-css-prefix}-level-up:before { content: @fa-var-level-up; }
+.@{fa-css-prefix}-level-down:before { content: @fa-var-level-down; }
+.@{fa-css-prefix}-check-square:before { content: @fa-var-check-square; }
+.@{fa-css-prefix}-pencil-square:before { content: @fa-var-pencil-square; }
+.@{fa-css-prefix}-external-link-square:before { content: @fa-var-external-link-square; }
+.@{fa-css-prefix}-share-square:before { content: @fa-var-share-square; }
+.@{fa-css-prefix}-compass:before { content: @fa-var-compass; }
+.@{fa-css-prefix}-toggle-down:before,
+.@{fa-css-prefix}-caret-square-o-down:before { content: @fa-var-caret-square-o-down; }
+.@{fa-css-prefix}-toggle-up:before,
+.@{fa-css-prefix}-caret-square-o-up:before { content: @fa-var-caret-square-o-up; }
+.@{fa-css-prefix}-toggle-right:before,
+.@{fa-css-prefix}-caret-square-o-right:before { content: @fa-var-caret-square-o-right; }
+.@{fa-css-prefix}-euro:before,
+.@{fa-css-prefix}-eur:before { content: @fa-var-eur; }
+.@{fa-css-prefix}-gbp:before { content: @fa-var-gbp; }
+.@{fa-css-prefix}-dollar:before,
+.@{fa-css-prefix}-usd:before { content: @fa-var-usd; }
+.@{fa-css-prefix}-rupee:before,
+.@{fa-css-prefix}-inr:before { content: @fa-var-inr; }
+.@{fa-css-prefix}-cny:before,
+.@{fa-css-prefix}-rmb:before,
+.@{fa-css-prefix}-yen:before,
+.@{fa-css-prefix}-jpy:before { content: @fa-var-jpy; }
+.@{fa-css-prefix}-ruble:before,
+.@{fa-css-prefix}-rouble:before,
+.@{fa-css-prefix}-rub:before { content: @fa-var-rub; }
+.@{fa-css-prefix}-won:before,
+.@{fa-css-prefix}-krw:before { content: @fa-var-krw; }
+.@{fa-css-prefix}-bitcoin:before,
+.@{fa-css-prefix}-btc:before { content: @fa-var-btc; }
+.@{fa-css-prefix}-file:before { content: @fa-var-file; }
+.@{fa-css-prefix}-file-text:before { content: @fa-var-file-text; }
+.@{fa-css-prefix}-sort-alpha-asc:before { content: @fa-var-sort-alpha-asc; }
+.@{fa-css-prefix}-sort-alpha-desc:before { content: @fa-var-sort-alpha-desc; }
+.@{fa-css-prefix}-sort-amount-asc:before { content: @fa-var-sort-amount-asc; }
+.@{fa-css-prefix}-sort-amount-desc:before { content: @fa-var-sort-amount-desc; }
+.@{fa-css-prefix}-sort-numeric-asc:before { content: @fa-var-sort-numeric-asc; }
+.@{fa-css-prefix}-sort-numeric-desc:before { content: @fa-var-sort-numeric-desc; }
+.@{fa-css-prefix}-thumbs-up:before { content: @fa-var-thumbs-up; }
+.@{fa-css-prefix}-thumbs-down:before { content: @fa-var-thumbs-down; }
+.@{fa-css-prefix}-youtube-square:before { content: @fa-var-youtube-square; }
+.@{fa-css-prefix}-youtube:before { content: @fa-var-youtube; }
+.@{fa-css-prefix}-xing:before { content: @fa-var-xing; }
+.@{fa-css-prefix}-xing-square:before { content: @fa-var-xing-square; }
+.@{fa-css-prefix}-youtube-play:before { content: @fa-var-youtube-play; }
+.@{fa-css-prefix}-dropbox:before { content: @fa-var-dropbox; }
+.@{fa-css-prefix}-stack-overflow:before { content: @fa-var-stack-overflow; }
+.@{fa-css-prefix}-instagram:before { content: @fa-var-instagram; }
+.@{fa-css-prefix}-flickr:before { content: @fa-var-flickr; }
+.@{fa-css-prefix}-adn:before { content: @fa-var-adn; }
+.@{fa-css-prefix}-bitbucket:before { content: @fa-var-bitbucket; }
+.@{fa-css-prefix}-bitbucket-square:before { content: @fa-var-bitbucket-square; }
+.@{fa-css-prefix}-tumblr:before { content: @fa-var-tumblr; }
+.@{fa-css-prefix}-tumblr-square:before { content: @fa-var-tumblr-square; }
+.@{fa-css-prefix}-long-arrow-down:before { content: @fa-var-long-arrow-down; }
+.@{fa-css-prefix}-long-arrow-up:before { content: @fa-var-long-arrow-up; }
+.@{fa-css-prefix}-long-arrow-left:before { content: @fa-var-long-arrow-left; }
+.@{fa-css-prefix}-long-arrow-right:before { content: @fa-var-long-arrow-right; }
+.@{fa-css-prefix}-apple:before { content: @fa-var-apple; }
+.@{fa-css-prefix}-windows:before { content: @fa-var-windows; }
+.@{fa-css-prefix}-android:before { content: @fa-var-android; }
+.@{fa-css-prefix}-linux:before { content: @fa-var-linux; }
+.@{fa-css-prefix}-dribbble:before { content: @fa-var-dribbble; }
+.@{fa-css-prefix}-skype:before { content: @fa-var-skype; }
+.@{fa-css-prefix}-foursquare:before { content: @fa-var-foursquare; }
+.@{fa-css-prefix}-trello:before { content: @fa-var-trello; }
+.@{fa-css-prefix}-female:before { content: @fa-var-female; }
+.@{fa-css-prefix}-male:before { content: @fa-var-male; }
+.@{fa-css-prefix}-gittip:before,
+.@{fa-css-prefix}-gratipay:before { content: @fa-var-gratipay; }
+.@{fa-css-prefix}-sun-o:before { content: @fa-var-sun-o; }
+.@{fa-css-prefix}-moon-o:before { content: @fa-var-moon-o; }
+.@{fa-css-prefix}-archive:before { content: @fa-var-archive; }
+.@{fa-css-prefix}-bug:before { content: @fa-var-bug; }
+.@{fa-css-prefix}-vk:before { content: @fa-var-vk; }
+.@{fa-css-prefix}-weibo:before { content: @fa-var-weibo; }
+.@{fa-css-prefix}-renren:before { content: @fa-var-renren; }
+.@{fa-css-prefix}-pagelines:before { content: @fa-var-pagelines; }
+.@{fa-css-prefix}-stack-exchange:before { content: @fa-var-stack-exchange; }
+.@{fa-css-prefix}-arrow-circle-o-right:before { content: @fa-var-arrow-circle-o-right; }
+.@{fa-css-prefix}-arrow-circle-o-left:before { content: @fa-var-arrow-circle-o-left; }
+.@{fa-css-prefix}-toggle-left:before,
+.@{fa-css-prefix}-caret-square-o-left:before { content: @fa-var-caret-square-o-left; }
+.@{fa-css-prefix}-dot-circle-o:before { content: @fa-var-dot-circle-o; }
+.@{fa-css-prefix}-wheelchair:before { content: @fa-var-wheelchair; }
+.@{fa-css-prefix}-vimeo-square:before { content: @fa-var-vimeo-square; }
+.@{fa-css-prefix}-turkish-lira:before,
+.@{fa-css-prefix}-try:before { content: @fa-var-try; }
+.@{fa-css-prefix}-plus-square-o:before { content: @fa-var-plus-square-o; }
+.@{fa-css-prefix}-space-shuttle:before { content: @fa-var-space-shuttle; }
+.@{fa-css-prefix}-slack:before { content: @fa-var-slack; }
+.@{fa-css-prefix}-envelope-square:before { content: @fa-var-envelope-square; }
+.@{fa-css-prefix}-wordpress:before { content: @fa-var-wordpress; }
+.@{fa-css-prefix}-openid:before { content: @fa-var-openid; }
+.@{fa-css-prefix}-institution:before,
+.@{fa-css-prefix}-bank:before,
+.@{fa-css-prefix}-university:before { content: @fa-var-university; }
+.@{fa-css-prefix}-mortar-board:before,
+.@{fa-css-prefix}-graduation-cap:before { content: @fa-var-graduation-cap; }
+.@{fa-css-prefix}-yahoo:before { content: @fa-var-yahoo; }
+.@{fa-css-prefix}-google:before { content: @fa-var-google; }
+.@{fa-css-prefix}-reddit:before { content: @fa-var-reddit; }
+.@{fa-css-prefix}-reddit-square:before { content: @fa-var-reddit-square; }
+.@{fa-css-prefix}-stumbleupon-circle:before { content: @fa-var-stumbleupon-circle; }
+.@{fa-css-prefix}-stumbleupon:before { content: @fa-var-stumbleupon; }
+.@{fa-css-prefix}-delicious:before { content: @fa-var-delicious; }
+.@{fa-css-prefix}-digg:before { content: @fa-var-digg; }
+.@{fa-css-prefix}-pied-piper-pp:before { content: @fa-var-pied-piper-pp; }
+.@{fa-css-prefix}-pied-piper-alt:before { content: @fa-var-pied-piper-alt; }
+.@{fa-css-prefix}-drupal:before { content: @fa-var-drupal; }
+.@{fa-css-prefix}-joomla:before { content: @fa-var-joomla; }
+.@{fa-css-prefix}-language:before { content: @fa-var-language; }
+.@{fa-css-prefix}-fax:before { content: @fa-var-fax; }
+.@{fa-css-prefix}-building:before { content: @fa-var-building; }
+.@{fa-css-prefix}-child:before { content: @fa-var-child; }
+.@{fa-css-prefix}-paw:before { content: @fa-var-paw; }
+.@{fa-css-prefix}-spoon:before { content: @fa-var-spoon; }
+.@{fa-css-prefix}-cube:before { content: @fa-var-cube; }
+.@{fa-css-prefix}-cubes:before { content: @fa-var-cubes; }
+.@{fa-css-prefix}-behance:before { content: @fa-var-behance; }
+.@{fa-css-prefix}-behance-square:before { content: @fa-var-behance-square; }
+.@{fa-css-prefix}-steam:before { content: @fa-var-steam; }
+.@{fa-css-prefix}-steam-square:before { content: @fa-var-steam-square; }
+.@{fa-css-prefix}-recycle:before { content: @fa-var-recycle; }
+.@{fa-css-prefix}-automobile:before,
+.@{fa-css-prefix}-car:before { content: @fa-var-car; }
+.@{fa-css-prefix}-cab:before,
+.@{fa-css-prefix}-taxi:before { content: @fa-var-taxi; }
+.@{fa-css-prefix}-tree:before { content: @fa-var-tree; }
+.@{fa-css-prefix}-spotify:before { content: @fa-var-spotify; }
+.@{fa-css-prefix}-deviantart:before { content: @fa-var-deviantart; }
+.@{fa-css-prefix}-soundcloud:before { content: @fa-var-soundcloud; }
+.@{fa-css-prefix}-database:before { content: @fa-var-database; }
+.@{fa-css-prefix}-file-pdf-o:before { content: @fa-var-file-pdf-o; }
+.@{fa-css-prefix}-file-word-o:before { content: @fa-var-file-word-o; }
+.@{fa-css-prefix}-file-excel-o:before { content: @fa-var-file-excel-o; }
+.@{fa-css-prefix}-file-powerpoint-o:before { content: @fa-var-file-powerpoint-o; }
+.@{fa-css-prefix}-file-photo-o:before,
+.@{fa-css-prefix}-file-picture-o:before,
+.@{fa-css-prefix}-file-image-o:before { content: @fa-var-file-image-o; }
+.@{fa-css-prefix}-file-zip-o:before,
+.@{fa-css-prefix}-file-archive-o:before { content: @fa-var-file-archive-o; }
+.@{fa-css-prefix}-file-sound-o:before,
+.@{fa-css-prefix}-file-audio-o:before { content: @fa-var-file-audio-o; }
+.@{fa-css-prefix}-file-movie-o:before,
+.@{fa-css-prefix}-file-video-o:before { content: @fa-var-file-video-o; }
+.@{fa-css-prefix}-file-code-o:before { content: @fa-var-file-code-o; }
+.@{fa-css-prefix}-vine:before { content: @fa-var-vine; }
+.@{fa-css-prefix}-codepen:before { content: @fa-var-codepen; }
+.@{fa-css-prefix}-jsfiddle:before { content: @fa-var-jsfiddle; }
+.@{fa-css-prefix}-life-bouy:before,
+.@{fa-css-prefix}-life-buoy:before,
+.@{fa-css-prefix}-life-saver:before,
+.@{fa-css-prefix}-support:before,
+.@{fa-css-prefix}-life-ring:before { content: @fa-var-life-ring; }
+.@{fa-css-prefix}-circle-o-notch:before { content: @fa-var-circle-o-notch; }
+.@{fa-css-prefix}-ra:before,
+.@{fa-css-prefix}-resistance:before,
+.@{fa-css-prefix}-rebel:before { content: @fa-var-rebel; }
+.@{fa-css-prefix}-ge:before,
+.@{fa-css-prefix}-empire:before { content: @fa-var-empire; }
+.@{fa-css-prefix}-git-square:before { content: @fa-var-git-square; }
+.@{fa-css-prefix}-git:before { content: @fa-var-git; }
+.@{fa-css-prefix}-y-combinator-square:before,
+.@{fa-css-prefix}-yc-square:before,
+.@{fa-css-prefix}-hacker-news:before { content: @fa-var-hacker-news; }
+.@{fa-css-prefix}-tencent-weibo:before { content: @fa-var-tencent-weibo; }
+.@{fa-css-prefix}-qq:before { content: @fa-var-qq; }
+.@{fa-css-prefix}-wechat:before,
+.@{fa-css-prefix}-weixin:before { content: @fa-var-weixin; }
+.@{fa-css-prefix}-send:before,
+.@{fa-css-prefix}-paper-plane:before { content: @fa-var-paper-plane; }
+.@{fa-css-prefix}-send-o:before,
+.@{fa-css-prefix}-paper-plane-o:before { content: @fa-var-paper-plane-o; }
+.@{fa-css-prefix}-history:before { content: @fa-var-history; }
+.@{fa-css-prefix}-circle-thin:before { content: @fa-var-circle-thin; }
+.@{fa-css-prefix}-header:before { content: @fa-var-header; }
+.@{fa-css-prefix}-paragraph:before { content: @fa-var-paragraph; }
+.@{fa-css-prefix}-sliders:before { content: @fa-var-sliders; }
+.@{fa-css-prefix}-share-alt:before { content: @fa-var-share-alt; }
+.@{fa-css-prefix}-share-alt-square:before { content: @fa-var-share-alt-square; }
+.@{fa-css-prefix}-bomb:before { content: @fa-var-bomb; }
+.@{fa-css-prefix}-soccer-ball-o:before,
+.@{fa-css-prefix}-futbol-o:before { content: @fa-var-futbol-o; }
+.@{fa-css-prefix}-tty:before { content: @fa-var-tty; }
+.@{fa-css-prefix}-binoculars:before { content: @fa-var-binoculars; }
+.@{fa-css-prefix}-plug:before { content: @fa-var-plug; }
+.@{fa-css-prefix}-slideshare:before { content: @fa-var-slideshare; }
+.@{fa-css-prefix}-twitch:before { content: @fa-var-twitch; }
+.@{fa-css-prefix}-yelp:before { content: @fa-var-yelp; }
+.@{fa-css-prefix}-newspaper-o:before { content: @fa-var-newspaper-o; }
+.@{fa-css-prefix}-wifi:before { content: @fa-var-wifi; }
+.@{fa-css-prefix}-calculator:before { content: @fa-var-calculator; }
+.@{fa-css-prefix}-paypal:before { content: @fa-var-paypal; }
+.@{fa-css-prefix}-google-wallet:before { content: @fa-var-google-wallet; }
+.@{fa-css-prefix}-cc-visa:before { content: @fa-var-cc-visa; }
+.@{fa-css-prefix}-cc-mastercard:before { content: @fa-var-cc-mastercard; }
+.@{fa-css-prefix}-cc-discover:before { content: @fa-var-cc-discover; }
+.@{fa-css-prefix}-cc-amex:before { content: @fa-var-cc-amex; }
+.@{fa-css-prefix}-cc-paypal:before { content: @fa-var-cc-paypal; }
+.@{fa-css-prefix}-cc-stripe:before { content: @fa-var-cc-stripe; }
+.@{fa-css-prefix}-bell-slash:before { content: @fa-var-bell-slash; }
+.@{fa-css-prefix}-bell-slash-o:before { content: @fa-var-bell-slash-o; }
+.@{fa-css-prefix}-trash:before { content: @fa-var-trash; }
+.@{fa-css-prefix}-copyright:before { content: @fa-var-copyright; }
+.@{fa-css-prefix}-at:before { content: @fa-var-at; }
+.@{fa-css-prefix}-eyedropper:before { content: @fa-var-eyedropper; }
+.@{fa-css-prefix}-paint-brush:before { content: @fa-var-paint-brush; }
+.@{fa-css-prefix}-birthday-cake:before { content: @fa-var-birthday-cake; }
+.@{fa-css-prefix}-area-chart:before { content: @fa-var-area-chart; }
+.@{fa-css-prefix}-pie-chart:before { content: @fa-var-pie-chart; }
+.@{fa-css-prefix}-line-chart:before { content: @fa-var-line-chart; }
+.@{fa-css-prefix}-lastfm:before { content: @fa-var-lastfm; }
+.@{fa-css-prefix}-lastfm-square:before { content: @fa-var-lastfm-square; }
+.@{fa-css-prefix}-toggle-off:before { content: @fa-var-toggle-off; }
+.@{fa-css-prefix}-toggle-on:before { content: @fa-var-toggle-on; }
+.@{fa-css-prefix}-bicycle:before { content: @fa-var-bicycle; }
+.@{fa-css-prefix}-bus:before { content: @fa-var-bus; }
+.@{fa-css-prefix}-ioxhost:before { content: @fa-var-ioxhost; }
+.@{fa-css-prefix}-angellist:before { content: @fa-var-angellist; }
+.@{fa-css-prefix}-cc:before { content: @fa-var-cc; }
+.@{fa-css-prefix}-shekel:before,
+.@{fa-css-prefix}-sheqel:before,
+.@{fa-css-prefix}-ils:before { content: @fa-var-ils; }
+.@{fa-css-prefix}-meanpath:before { content: @fa-var-meanpath; }
+.@{fa-css-prefix}-buysellads:before { content: @fa-var-buysellads; }
+.@{fa-css-prefix}-connectdevelop:before { content: @fa-var-connectdevelop; }
+.@{fa-css-prefix}-dashcube:before { content: @fa-var-dashcube; }
+.@{fa-css-prefix}-forumbee:before { content: @fa-var-forumbee; }
+.@{fa-css-prefix}-leanpub:before { content: @fa-var-leanpub; }
+.@{fa-css-prefix}-sellsy:before { content: @fa-var-sellsy; }
+.@{fa-css-prefix}-shirtsinbulk:before { content: @fa-var-shirtsinbulk; }
+.@{fa-css-prefix}-simplybuilt:before { content: @fa-var-simplybuilt; }
+.@{fa-css-prefix}-skyatlas:before { content: @fa-var-skyatlas; }
+.@{fa-css-prefix}-cart-plus:before { content: @fa-var-cart-plus; }
+.@{fa-css-prefix}-cart-arrow-down:before { content: @fa-var-cart-arrow-down; }
+.@{fa-css-prefix}-diamond:before { content: @fa-var-diamond; }
+.@{fa-css-prefix}-ship:before { content: @fa-var-ship; }
+.@{fa-css-prefix}-user-secret:before { content: @fa-var-user-secret; }
+.@{fa-css-prefix}-motorcycle:before { content: @fa-var-motorcycle; }
+.@{fa-css-prefix}-street-view:before { content: @fa-var-street-view; }
+.@{fa-css-prefix}-heartbeat:before { content: @fa-var-heartbeat; }
+.@{fa-css-prefix}-venus:before { content: @fa-var-venus; }
+.@{fa-css-prefix}-mars:before { content: @fa-var-mars; }
+.@{fa-css-prefix}-mercury:before { content: @fa-var-mercury; }
+.@{fa-css-prefix}-intersex:before,
+.@{fa-css-prefix}-transgender:before { content: @fa-var-transgender; }
+.@{fa-css-prefix}-transgender-alt:before { content: @fa-var-transgender-alt; }
+.@{fa-css-prefix}-venus-double:before { content: @fa-var-venus-double; }
+.@{fa-css-prefix}-mars-double:before { content: @fa-var-mars-double; }
+.@{fa-css-prefix}-venus-mars:before { content: @fa-var-venus-mars; }
+.@{fa-css-prefix}-mars-stroke:before { content: @fa-var-mars-stroke; }
+.@{fa-css-prefix}-mars-stroke-v:before { content: @fa-var-mars-stroke-v; }
+.@{fa-css-prefix}-mars-stroke-h:before { content: @fa-var-mars-stroke-h; }
+.@{fa-css-prefix}-neuter:before { content: @fa-var-neuter; }
+.@{fa-css-prefix}-genderless:before { content: @fa-var-genderless; }
+.@{fa-css-prefix}-facebook-official:before { content: @fa-var-facebook-official; }
+.@{fa-css-prefix}-pinterest-p:before { content: @fa-var-pinterest-p; }
+.@{fa-css-prefix}-whatsapp:before { content: @fa-var-whatsapp; }
+.@{fa-css-prefix}-server:before { content: @fa-var-server; }
+.@{fa-css-prefix}-user-plus:before { content: @fa-var-user-plus; }
+.@{fa-css-prefix}-user-times:before { content: @fa-var-user-times; }
+.@{fa-css-prefix}-hotel:before,
+.@{fa-css-prefix}-bed:before { content: @fa-var-bed; }
+.@{fa-css-prefix}-viacoin:before { content: @fa-var-viacoin; }
+.@{fa-css-prefix}-train:before { content: @fa-var-train; }
+.@{fa-css-prefix}-subway:before { content: @fa-var-subway; }
+.@{fa-css-prefix}-medium:before { content: @fa-var-medium; }
+.@{fa-css-prefix}-yc:before,
+.@{fa-css-prefix}-y-combinator:before { content: @fa-var-y-combinator; }
+.@{fa-css-prefix}-optin-monster:before { content: @fa-var-optin-monster; }
+.@{fa-css-prefix}-opencart:before { content: @fa-var-opencart; }
+.@{fa-css-prefix}-expeditedssl:before { content: @fa-var-expeditedssl; }
+.@{fa-css-prefix}-battery-4:before,
+.@{fa-css-prefix}-battery:before,
+.@{fa-css-prefix}-battery-full:before { content: @fa-var-battery-full; }
+.@{fa-css-prefix}-battery-3:before,
+.@{fa-css-prefix}-battery-three-quarters:before { content: @fa-var-battery-three-quarters; }
+.@{fa-css-prefix}-battery-2:before,
+.@{fa-css-prefix}-battery-half:before { content: @fa-var-battery-half; }
+.@{fa-css-prefix}-battery-1:before,
+.@{fa-css-prefix}-battery-quarter:before { content: @fa-var-battery-quarter; }
+.@{fa-css-prefix}-battery-0:before,
+.@{fa-css-prefix}-battery-empty:before { content: @fa-var-battery-empty; }
+.@{fa-css-prefix}-mouse-pointer:before { content: @fa-var-mouse-pointer; }
+.@{fa-css-prefix}-i-cursor:before { content: @fa-var-i-cursor; }
+.@{fa-css-prefix}-object-group:before { content: @fa-var-object-group; }
+.@{fa-css-prefix}-object-ungroup:before { content: @fa-var-object-ungroup; }
+.@{fa-css-prefix}-sticky-note:before { content: @fa-var-sticky-note; }
+.@{fa-css-prefix}-sticky-note-o:before { content: @fa-var-sticky-note-o; }
+.@{fa-css-prefix}-cc-jcb:before { content: @fa-var-cc-jcb; }
+.@{fa-css-prefix}-cc-diners-club:before { content: @fa-var-cc-diners-club; }
+.@{fa-css-prefix}-clone:before { content: @fa-var-clone; }
+.@{fa-css-prefix}-balance-scale:before { content: @fa-var-balance-scale; }
+.@{fa-css-prefix}-hourglass-o:before { content: @fa-var-hourglass-o; }
+.@{fa-css-prefix}-hourglass-1:before,
+.@{fa-css-prefix}-hourglass-start:before { content: @fa-var-hourglass-start; }
+.@{fa-css-prefix}-hourglass-2:before,
+.@{fa-css-prefix}-hourglass-half:before { content: @fa-var-hourglass-half; }
+.@{fa-css-prefix}-hourglass-3:before,
+.@{fa-css-prefix}-hourglass-end:before { content: @fa-var-hourglass-end; }
+.@{fa-css-prefix}-hourglass:before { content: @fa-var-hourglass; }
+.@{fa-css-prefix}-hand-grab-o:before,
+.@{fa-css-prefix}-hand-rock-o:before { content: @fa-var-hand-rock-o; }
+.@{fa-css-prefix}-hand-stop-o:before,
+.@{fa-css-prefix}-hand-paper-o:before { content: @fa-var-hand-paper-o; }
+.@{fa-css-prefix}-hand-scissors-o:before { content: @fa-var-hand-scissors-o; }
+.@{fa-css-prefix}-hand-lizard-o:before { content: @fa-var-hand-lizard-o; }
+.@{fa-css-prefix}-hand-spock-o:before { content: @fa-var-hand-spock-o; }
+.@{fa-css-prefix}-hand-pointer-o:before { content: @fa-var-hand-pointer-o; }
+.@{fa-css-prefix}-hand-peace-o:before { content: @fa-var-hand-peace-o; }
+.@{fa-css-prefix}-trademark:before { content: @fa-var-trademark; }
+.@{fa-css-prefix}-registered:before { content: @fa-var-registered; }
+.@{fa-css-prefix}-creative-commons:before { content: @fa-var-creative-commons; }
+.@{fa-css-prefix}-gg:before { content: @fa-var-gg; }
+.@{fa-css-prefix}-gg-circle:before { content: @fa-var-gg-circle; }
+.@{fa-css-prefix}-tripadvisor:before { content: @fa-var-tripadvisor; }
+.@{fa-css-prefix}-odnoklassniki:before { content: @fa-var-odnoklassniki; }
+.@{fa-css-prefix}-odnoklassniki-square:before { content: @fa-var-odnoklassniki-square; }
+.@{fa-css-prefix}-get-pocket:before { content: @fa-var-get-pocket; }
+.@{fa-css-prefix}-wikipedia-w:before { content: @fa-var-wikipedia-w; }
+.@{fa-css-prefix}-safari:before { content: @fa-var-safari; }
+.@{fa-css-prefix}-chrome:before { content: @fa-var-chrome; }
+.@{fa-css-prefix}-firefox:before { content: @fa-var-firefox; }
+.@{fa-css-prefix}-opera:before { content: @fa-var-opera; }
+.@{fa-css-prefix}-internet-explorer:before { content: @fa-var-internet-explorer; }
+.@{fa-css-prefix}-tv:before,
+.@{fa-css-prefix}-television:before { content: @fa-var-television; }
+.@{fa-css-prefix}-contao:before { content: @fa-var-contao; }
+.@{fa-css-prefix}-500px:before { content: @fa-var-500px; }
+.@{fa-css-prefix}-amazon:before { content: @fa-var-amazon; }
+.@{fa-css-prefix}-calendar-plus-o:before { content: @fa-var-calendar-plus-o; }
+.@{fa-css-prefix}-calendar-minus-o:before { content: @fa-var-calendar-minus-o; }
+.@{fa-css-prefix}-calendar-times-o:before { content: @fa-var-calendar-times-o; }
+.@{fa-css-prefix}-calendar-check-o:before { content: @fa-var-calendar-check-o; }
+.@{fa-css-prefix}-industry:before { content: @fa-var-industry; }
+.@{fa-css-prefix}-map-pin:before { content: @fa-var-map-pin; }
+.@{fa-css-prefix}-map-signs:before { content: @fa-var-map-signs; }
+.@{fa-css-prefix}-map-o:before { content: @fa-var-map-o; }
+.@{fa-css-prefix}-map:before { content: @fa-var-map; }
+.@{fa-css-prefix}-commenting:before { content: @fa-var-commenting; }
+.@{fa-css-prefix}-commenting-o:before { content: @fa-var-commenting-o; }
+.@{fa-css-prefix}-houzz:before { content: @fa-var-houzz; }
+.@{fa-css-prefix}-vimeo:before { content: @fa-var-vimeo; }
+.@{fa-css-prefix}-black-tie:before { content: @fa-var-black-tie; }
+.@{fa-css-prefix}-fonticons:before { content: @fa-var-fonticons; }
+.@{fa-css-prefix}-reddit-alien:before { content: @fa-var-reddit-alien; }
+.@{fa-css-prefix}-edge:before { content: @fa-var-edge; }
+.@{fa-css-prefix}-credit-card-alt:before { content: @fa-var-credit-card-alt; }
+.@{fa-css-prefix}-codiepie:before { content: @fa-var-codiepie; }
+.@{fa-css-prefix}-modx:before { content: @fa-var-modx; }
+.@{fa-css-prefix}-fort-awesome:before { content: @fa-var-fort-awesome; }
+.@{fa-css-prefix}-usb:before { content: @fa-var-usb; }
+.@{fa-css-prefix}-product-hunt:before { content: @fa-var-product-hunt; }
+.@{fa-css-prefix}-mixcloud:before { content: @fa-var-mixcloud; }
+.@{fa-css-prefix}-scribd:before { content: @fa-var-scribd; }
+.@{fa-css-prefix}-pause-circle:before { content: @fa-var-pause-circle; }
+.@{fa-css-prefix}-pause-circle-o:before { content: @fa-var-pause-circle-o; }
+.@{fa-css-prefix}-stop-circle:before { content: @fa-var-stop-circle; }
+.@{fa-css-prefix}-stop-circle-o:before { content: @fa-var-stop-circle-o; }
+.@{fa-css-prefix}-shopping-bag:before { content: @fa-var-shopping-bag; }
+.@{fa-css-prefix}-shopping-basket:before { content: @fa-var-shopping-basket; }
+.@{fa-css-prefix}-hashtag:before { content: @fa-var-hashtag; }
+.@{fa-css-prefix}-bluetooth:before { content: @fa-var-bluetooth; }
+.@{fa-css-prefix}-bluetooth-b:before { content: @fa-var-bluetooth-b; }
+.@{fa-css-prefix}-percent:before { content: @fa-var-percent; }
+.@{fa-css-prefix}-gitlab:before { content: @fa-var-gitlab; }
+.@{fa-css-prefix}-wpbeginner:before { content: @fa-var-wpbeginner; }
+.@{fa-css-prefix}-wpforms:before { content: @fa-var-wpforms; }
+.@{fa-css-prefix}-envira:before { content: @fa-var-envira; }
+.@{fa-css-prefix}-universal-access:before { content: @fa-var-universal-access; }
+.@{fa-css-prefix}-wheelchair-alt:before { content: @fa-var-wheelchair-alt; }
+.@{fa-css-prefix}-question-circle-o:before { content: @fa-var-question-circle-o; }
+.@{fa-css-prefix}-blind:before { content: @fa-var-blind; }
+.@{fa-css-prefix}-audio-description:before { content: @fa-var-audio-description; }
+.@{fa-css-prefix}-volume-control-phone:before { content: @fa-var-volume-control-phone; }
+.@{fa-css-prefix}-braille:before { content: @fa-var-braille; }
+.@{fa-css-prefix}-assistive-listening-systems:before { content: @fa-var-assistive-listening-systems; }
+.@{fa-css-prefix}-asl-interpreting:before,
+.@{fa-css-prefix}-american-sign-language-interpreting:before { content: @fa-var-american-sign-language-interpreting; }
+.@{fa-css-prefix}-deafness:before,
+.@{fa-css-prefix}-hard-of-hearing:before,
+.@{fa-css-prefix}-deaf:before { content: @fa-var-deaf; }
+.@{fa-css-prefix}-glide:before { content: @fa-var-glide; }
+.@{fa-css-prefix}-glide-g:before { content: @fa-var-glide-g; }
+.@{fa-css-prefix}-signing:before,
+.@{fa-css-prefix}-sign-language:before { content: @fa-var-sign-language; }
+.@{fa-css-prefix}-low-vision:before { content: @fa-var-low-vision; }
+.@{fa-css-prefix}-viadeo:before { content: @fa-var-viadeo; }
+.@{fa-css-prefix}-viadeo-square:before { content: @fa-var-viadeo-square; }
+.@{fa-css-prefix}-snapchat:before { content: @fa-var-snapchat; }
+.@{fa-css-prefix}-snapchat-ghost:before { content: @fa-var-snapchat-ghost; }
+.@{fa-css-prefix}-snapchat-square:before { content: @fa-var-snapchat-square; }
+.@{fa-css-prefix}-pied-piper:before { content: @fa-var-pied-piper; }
+.@{fa-css-prefix}-first-order:before { content: @fa-var-first-order; }
+.@{fa-css-prefix}-yoast:before { content: @fa-var-yoast; }
+.@{fa-css-prefix}-themeisle:before { content: @fa-var-themeisle; }
+.@{fa-css-prefix}-google-plus-circle:before,
+.@{fa-css-prefix}-google-plus-official:before { content: @fa-var-google-plus-official; }
+.@{fa-css-prefix}-fa:before,
+.@{fa-css-prefix}-font-awesome:before { content: @fa-var-font-awesome; }
+.@{fa-css-prefix}-handshake-o:before { content: @fa-var-handshake-o; }
+.@{fa-css-prefix}-envelope-open:before { content: @fa-var-envelope-open; }
+.@{fa-css-prefix}-envelope-open-o:before { content: @fa-var-envelope-open-o; }
+.@{fa-css-prefix}-linode:before { content: @fa-var-linode; }
+.@{fa-css-prefix}-address-book:before { content: @fa-var-address-book; }
+.@{fa-css-prefix}-address-book-o:before { content: @fa-var-address-book-o; }
+.@{fa-css-prefix}-vcard:before,
+.@{fa-css-prefix}-address-card:before { content: @fa-var-address-card; }
+.@{fa-css-prefix}-vcard-o:before,
+.@{fa-css-prefix}-address-card-o:before { content: @fa-var-address-card-o; }
+.@{fa-css-prefix}-user-circle:before { content: @fa-var-user-circle; }
+.@{fa-css-prefix}-user-circle-o:before { content: @fa-var-user-circle-o; }
+.@{fa-css-prefix}-user-o:before { content: @fa-var-user-o; }
+.@{fa-css-prefix}-id-badge:before { content: @fa-var-id-badge; }
+.@{fa-css-prefix}-drivers-license:before,
+.@{fa-css-prefix}-id-card:before { content: @fa-var-id-card; }
+.@{fa-css-prefix}-drivers-license-o:before,
+.@{fa-css-prefix}-id-card-o:before { content: @fa-var-id-card-o; }
+.@{fa-css-prefix}-quora:before { content: @fa-var-quora; }
+.@{fa-css-prefix}-free-code-camp:before { content: @fa-var-free-code-camp; }
+.@{fa-css-prefix}-telegram:before { content: @fa-var-telegram; }
+.@{fa-css-prefix}-thermometer-4:before,
+.@{fa-css-prefix}-thermometer:before,
+.@{fa-css-prefix}-thermometer-full:before { content: @fa-var-thermometer-full; }
+.@{fa-css-prefix}-thermometer-3:before,
+.@{fa-css-prefix}-thermometer-three-quarters:before { content: @fa-var-thermometer-three-quarters; }
+.@{fa-css-prefix}-thermometer-2:before,
+.@{fa-css-prefix}-thermometer-half:before { content: @fa-var-thermometer-half; }
+.@{fa-css-prefix}-thermometer-1:before,
+.@{fa-css-prefix}-thermometer-quarter:before { content: @fa-var-thermometer-quarter; }
+.@{fa-css-prefix}-thermometer-0:before,
+.@{fa-css-prefix}-thermometer-empty:before { content: @fa-var-thermometer-empty; }
+.@{fa-css-prefix}-shower:before { content: @fa-var-shower; }
+.@{fa-css-prefix}-bathtub:before,
+.@{fa-css-prefix}-s15:before,
+.@{fa-css-prefix}-bath:before { content: @fa-var-bath; }
+.@{fa-css-prefix}-podcast:before { content: @fa-var-podcast; }
+.@{fa-css-prefix}-window-maximize:before { content: @fa-var-window-maximize; }
+.@{fa-css-prefix}-window-minimize:before { content: @fa-var-window-minimize; }
+.@{fa-css-prefix}-window-restore:before { content: @fa-var-window-restore; }
+.@{fa-css-prefix}-times-rectangle:before,
+.@{fa-css-prefix}-window-close:before { content: @fa-var-window-close; }
+.@{fa-css-prefix}-times-rectangle-o:before,
+.@{fa-css-prefix}-window-close-o:before { content: @fa-var-window-close-o; }
+.@{fa-css-prefix}-bandcamp:before { content: @fa-var-bandcamp; }
+.@{fa-css-prefix}-grav:before { content: @fa-var-grav; }
+.@{fa-css-prefix}-etsy:before { content: @fa-var-etsy; }
+.@{fa-css-prefix}-imdb:before { content: @fa-var-imdb; }
+.@{fa-css-prefix}-ravelry:before { content: @fa-var-ravelry; }
+.@{fa-css-prefix}-eercast:before { content: @fa-var-eercast; }
+.@{fa-css-prefix}-microchip:before { content: @fa-var-microchip; }
+.@{fa-css-prefix}-snowflake-o:before { content: @fa-var-snowflake-o; }
+.@{fa-css-prefix}-superpowers:before { content: @fa-var-superpowers; }
+.@{fa-css-prefix}-wpexplorer:before { content: @fa-var-wpexplorer; }
+.@{fa-css-prefix}-meetup:before { content: @fa-var-meetup; }
diff --git a/_site/site/public/font-awesome-4.7.0/less/larger.less b/_site/site/public/font-awesome-4.7.0/less/larger.less
new file mode 100755
index 00000000..c9d64677
--- /dev/null
+++ b/_site/site/public/font-awesome-4.7.0/less/larger.less
@@ -0,0 +1,13 @@
+// Icon Sizes
+// -------------------------
+
+/* makes the font 33% larger relative to the icon container */
+.@{fa-css-prefix}-lg {
+  font-size: (4em / 3);
+  line-height: (3em / 4);
+  vertical-align: -15%;
+}
+.@{fa-css-prefix}-2x { font-size: 2em; }
+.@{fa-css-prefix}-3x { font-size: 3em; }
+.@{fa-css-prefix}-4x { font-size: 4em; }
+.@{fa-css-prefix}-5x { font-size: 5em; }
diff --git a/_site/site/public/font-awesome-4.7.0/less/list.less b/_site/site/public/font-awesome-4.7.0/less/list.less
new file mode 100755
index 00000000..0b440382
--- /dev/null
+++ b/_site/site/public/font-awesome-4.7.0/less/list.less
@@ -0,0 +1,19 @@
+// List Icons
+// -------------------------
+
+.@{fa-css-prefix}-ul {
+  padding-left: 0;
+  margin-left: @fa-li-width;
+  list-style-type: none;
+  > li { position: relative; }
+}
+.@{fa-css-prefix}-li {
+  position: absolute;
+  left: -@fa-li-width;
+  width: @fa-li-width;
+  top: (2em / 14);
+  text-align: center;
+  &.@{fa-css-prefix}-lg {
+    left: (-@fa-li-width + (4em / 14));
+  }
+}
diff --git a/_site/site/public/font-awesome-4.7.0/less/mixins.less b/_site/site/public/font-awesome-4.7.0/less/mixins.less
new file mode 100755
index 00000000..beef231d
--- /dev/null
+++ b/_site/site/public/font-awesome-4.7.0/less/mixins.less
@@ -0,0 +1,60 @@
+// Mixins
+// --------------------------
+
+.fa-icon() {
+  display: inline-block;
+  font: normal normal normal @fa-font-size-base/@fa-line-height-base FontAwesome; // shortening font declaration
+  font-size: inherit; // can't have font-size inherit on line above, so need to override
+  text-rendering: auto; // optimizelegibility throws things off #1094
+  -webkit-font-smoothing: antialiased;
+  -moz-osx-font-smoothing: grayscale;
+
+}
+
+.fa-icon-rotate(@degrees, @rotation) {
+  -ms-filter: "progid:DXImageTransform.Microsoft.BasicImage(rotation=@{rotation})";
+  -webkit-transform: rotate(@degrees);
+      -ms-transform: rotate(@degrees);
+          transform: rotate(@degrees);
+}
+
+.fa-icon-flip(@horiz, @vert, @rotation) {
+  -ms-filter: "progid:DXImageTransform.Microsoft.BasicImage(rotation=@{rotation}, mirror=1)";
+  -webkit-transform: scale(@horiz, @vert);
+      -ms-transform: scale(@horiz, @vert);
+          transform: scale(@horiz, @vert);
+}
+
+
+// Only display content to screen readers. A la Bootstrap 4.
+//
+// See: http://a11yproject.com/posts/how-to-hide-content/
+
+.sr-only() {
+  position: absolute;
+  width: 1px;
+  height: 1px;
+  padding: 0;
+  margin: -1px;
+  overflow: hidden;
+  clip: rect(0,0,0,0);
+  border: 0;
+}
+
+// Use in conjunction with .sr-only to only display content when it's focused.
+//
+// Useful for "Skip to main content" links; see http://www.w3.org/TR/2013/NOTE-WCAG20-TECHS-20130905/G1
+//
+// Credit: HTML5 Boilerplate
+
+.sr-only-focusable() {
+  &:active,
+  &:focus {
+    position: static;
+    width: auto;
+    height: auto;
+    margin: 0;
+    overflow: visible;
+    clip: auto;
+  }
+}
diff --git a/_site/site/public/font-awesome-4.7.0/less/path.less b/_site/site/public/font-awesome-4.7.0/less/path.less
new file mode 100755
index 00000000..835be41f
--- /dev/null
+++ b/_site/site/public/font-awesome-4.7.0/less/path.less
@@ -0,0 +1,15 @@
+/* FONT PATH
+ * -------------------------- */
+
+@font-face {
+  font-family: 'FontAwesome';
+  src: url('@{fa-font-path}/fontawesome-webfont.eot?v=@{fa-version}');
+  src: url('@{fa-font-path}/fontawesome-webfont.eot?#iefix&v=@{fa-version}') format('embedded-opentype'),
+    url('@{fa-font-path}/fontawesome-webfont.woff2?v=@{fa-version}') format('woff2'),
+    url('@{fa-font-path}/fontawesome-webfont.woff?v=@{fa-version}') format('woff'),
+    url('@{fa-font-path}/fontawesome-webfont.ttf?v=@{fa-version}') format('truetype'),
+    url('@{fa-font-path}/fontawesome-webfont.svg?v=@{fa-version}#fontawesomeregular') format('svg');
+  // src: url('@{fa-font-path}/FontAwesome.otf') format('opentype'); // used when developing fonts
+  font-weight: normal;
+  font-style: normal;
+}
diff --git a/_site/site/public/font-awesome-4.7.0/less/rotated-flipped.less b/_site/site/public/font-awesome-4.7.0/less/rotated-flipped.less
new file mode 100755
index 00000000..f6ba8147
--- /dev/null
+++ b/_site/site/public/font-awesome-4.7.0/less/rotated-flipped.less
@@ -0,0 +1,20 @@
+// Rotated & Flipped Icons
+// -------------------------
+
+.@{fa-css-prefix}-rotate-90  { .fa-icon-rotate(90deg, 1);  }
+.@{fa-css-prefix}-rotate-180 { .fa-icon-rotate(180deg, 2); }
+.@{fa-css-prefix}-rotate-270 { .fa-icon-rotate(270deg, 3); }
+
+.@{fa-css-prefix}-flip-horizontal { .fa-icon-flip(-1, 1, 0); }
+.@{fa-css-prefix}-flip-vertical   { .fa-icon-flip(1, -1, 2); }
+
+// Hook for IE8-9
+// -------------------------
+
+:root .@{fa-css-prefix}-rotate-90,
+:root .@{fa-css-prefix}-rotate-180,
+:root .@{fa-css-prefix}-rotate-270,
+:root .@{fa-css-prefix}-flip-horizontal,
+:root .@{fa-css-prefix}-flip-vertical {
+  filter: none;
+}
diff --git a/_site/site/public/font-awesome-4.7.0/less/screen-reader.less b/_site/site/public/font-awesome-4.7.0/less/screen-reader.less
new file mode 100755
index 00000000..11c18819
--- /dev/null
+++ b/_site/site/public/font-awesome-4.7.0/less/screen-reader.less
@@ -0,0 +1,5 @@
+// Screen Readers
+// -------------------------
+
+.sr-only { .sr-only(); }
+.sr-only-focusable { .sr-only-focusable(); }
diff --git a/_site/site/public/font-awesome-4.7.0/less/stacked.less b/_site/site/public/font-awesome-4.7.0/less/stacked.less
new file mode 100755
index 00000000..fc53fb0e
--- /dev/null
+++ b/_site/site/public/font-awesome-4.7.0/less/stacked.less
@@ -0,0 +1,20 @@
+// Stacked Icons
+// -------------------------
+
+.@{fa-css-prefix}-stack {
+  position: relative;
+  display: inline-block;
+  width: 2em;
+  height: 2em;
+  line-height: 2em;
+  vertical-align: middle;
+}
+.@{fa-css-prefix}-stack-1x, .@{fa-css-prefix}-stack-2x {
+  position: absolute;
+  left: 0;
+  width: 100%;
+  text-align: center;
+}
+.@{fa-css-prefix}-stack-1x { line-height: inherit; }
+.@{fa-css-prefix}-stack-2x { font-size: 2em; }
+.@{fa-css-prefix}-inverse { color: @fa-inverse; }
diff --git a/_site/site/public/font-awesome-4.7.0/less/variables.less b/_site/site/public/font-awesome-4.7.0/less/variables.less
new file mode 100755
index 00000000..7ddbbc01
--- /dev/null
+++ b/_site/site/public/font-awesome-4.7.0/less/variables.less
@@ -0,0 +1,800 @@
+// Variables
+// --------------------------
+
+@fa-font-path:        "../fonts";
+@fa-font-size-base:   14px;
+@fa-line-height-base: 1;
+//@fa-font-path:        "//netdna.bootstrapcdn.com/font-awesome/4.7.0/fonts"; // for referencing Bootstrap CDN font files directly
+@fa-css-prefix:       fa;
+@fa-version:          "4.7.0";
+@fa-border-color:     #eee;
+@fa-inverse:          #fff;
+@fa-li-width:         (30em / 14);
+
+@fa-var-500px: "\f26e";
+@fa-var-address-book: "\f2b9";
+@fa-var-address-book-o: "\f2ba";
+@fa-var-address-card: "\f2bb";
+@fa-var-address-card-o: "\f2bc";
+@fa-var-adjust: "\f042";
+@fa-var-adn: "\f170";
+@fa-var-align-center: "\f037";
+@fa-var-align-justify: "\f039";
+@fa-var-align-left: "\f036";
+@fa-var-align-right: "\f038";
+@fa-var-amazon: "\f270";
+@fa-var-ambulance: "\f0f9";
+@fa-var-american-sign-language-interpreting: "\f2a3";
+@fa-var-anchor: "\f13d";
+@fa-var-android: "\f17b";
+@fa-var-angellist: "\f209";
+@fa-var-angle-double-down: "\f103";
+@fa-var-angle-double-left: "\f100";
+@fa-var-angle-double-right: "\f101";
+@fa-var-angle-double-up: "\f102";
+@fa-var-angle-down: "\f107";
+@fa-var-angle-left: "\f104";
+@fa-var-angle-right: "\f105";
+@fa-var-angle-up: "\f106";
+@fa-var-apple: "\f179";
+@fa-var-archive: "\f187";
+@fa-var-area-chart: "\f1fe";
+@fa-var-arrow-circle-down: "\f0ab";
+@fa-var-arrow-circle-left: "\f0a8";
+@fa-var-arrow-circle-o-down: "\f01a";
+@fa-var-arrow-circle-o-left: "\f190";
+@fa-var-arrow-circle-o-right: "\f18e";
+@fa-var-arrow-circle-o-up: "\f01b";
+@fa-var-arrow-circle-right: "\f0a9";
+@fa-var-arrow-circle-up: "\f0aa";
+@fa-var-arrow-down: "\f063";
+@fa-var-arrow-left: "\f060";
+@fa-var-arrow-right: "\f061";
+@fa-var-arrow-up: "\f062";
+@fa-var-arrows: "\f047";
+@fa-var-arrows-alt: "\f0b2";
+@fa-var-arrows-h: "\f07e";
+@fa-var-arrows-v: "\f07d";
+@fa-var-asl-interpreting: "\f2a3";
+@fa-var-assistive-listening-systems: "\f2a2";
+@fa-var-asterisk: "\f069";
+@fa-var-at: "\f1fa";
+@fa-var-audio-description: "\f29e";
+@fa-var-automobile: "\f1b9";
+@fa-var-backward: "\f04a";
+@fa-var-balance-scale: "\f24e";
+@fa-var-ban: "\f05e";
+@fa-var-bandcamp: "\f2d5";
+@fa-var-bank: "\f19c";
+@fa-var-bar-chart: "\f080";
+@fa-var-bar-chart-o: "\f080";
+@fa-var-barcode: "\f02a";
+@fa-var-bars: "\f0c9";
+@fa-var-bath: "\f2cd";
+@fa-var-bathtub: "\f2cd";
+@fa-var-battery: "\f240";
+@fa-var-battery-0: "\f244";
+@fa-var-battery-1: "\f243";
+@fa-var-battery-2: "\f242";
+@fa-var-battery-3: "\f241";
+@fa-var-battery-4: "\f240";
+@fa-var-battery-empty: "\f244";
+@fa-var-battery-full: "\f240";
+@fa-var-battery-half: "\f242";
+@fa-var-battery-quarter: "\f243";
+@fa-var-battery-three-quarters: "\f241";
+@fa-var-bed: "\f236";
+@fa-var-beer: "\f0fc";
+@fa-var-behance: "\f1b4";
+@fa-var-behance-square: "\f1b5";
+@fa-var-bell: "\f0f3";
+@fa-var-bell-o: "\f0a2";
+@fa-var-bell-slash: "\f1f6";
+@fa-var-bell-slash-o: "\f1f7";
+@fa-var-bicycle: "\f206";
+@fa-var-binoculars: "\f1e5";
+@fa-var-birthday-cake: "\f1fd";
+@fa-var-bitbucket: "\f171";
+@fa-var-bitbucket-square: "\f172";
+@fa-var-bitcoin: "\f15a";
+@fa-var-black-tie: "\f27e";
+@fa-var-blind: "\f29d";
+@fa-var-bluetooth: "\f293";
+@fa-var-bluetooth-b: "\f294";
+@fa-var-bold: "\f032";
+@fa-var-bolt: "\f0e7";
+@fa-var-bomb: "\f1e2";
+@fa-var-book: "\f02d";
+@fa-var-bookmark: "\f02e";
+@fa-var-bookmark-o: "\f097";
+@fa-var-braille: "\f2a1";
+@fa-var-briefcase: "\f0b1";
+@fa-var-btc: "\f15a";
+@fa-var-bug: "\f188";
+@fa-var-building: "\f1ad";
+@fa-var-building-o: "\f0f7";
+@fa-var-bullhorn: "\f0a1";
+@fa-var-bullseye: "\f140";
+@fa-var-bus: "\f207";
+@fa-var-buysellads: "\f20d";
+@fa-var-cab: "\f1ba";
+@fa-var-calculator: "\f1ec";
+@fa-var-calendar: "\f073";
+@fa-var-calendar-check-o: "\f274";
+@fa-var-calendar-minus-o: "\f272";
+@fa-var-calendar-o: "\f133";
+@fa-var-calendar-plus-o: "\f271";
+@fa-var-calendar-times-o: "\f273";
+@fa-var-camera: "\f030";
+@fa-var-camera-retro: "\f083";
+@fa-var-car: "\f1b9";
+@fa-var-caret-down: "\f0d7";
+@fa-var-caret-left: "\f0d9";
+@fa-var-caret-right: "\f0da";
+@fa-var-caret-square-o-down: "\f150";
+@fa-var-caret-square-o-left: "\f191";
+@fa-var-caret-square-o-right: "\f152";
+@fa-var-caret-square-o-up: "\f151";
+@fa-var-caret-up: "\f0d8";
+@fa-var-cart-arrow-down: "\f218";
+@fa-var-cart-plus: "\f217";
+@fa-var-cc: "\f20a";
+@fa-var-cc-amex: "\f1f3";
+@fa-var-cc-diners-club: "\f24c";
+@fa-var-cc-discover: "\f1f2";
+@fa-var-cc-jcb: "\f24b";
+@fa-var-cc-mastercard: "\f1f1";
+@fa-var-cc-paypal: "\f1f4";
+@fa-var-cc-stripe: "\f1f5";
+@fa-var-cc-visa: "\f1f0";
+@fa-var-certificate: "\f0a3";
+@fa-var-chain: "\f0c1";
+@fa-var-chain-broken: "\f127";
+@fa-var-check: "\f00c";
+@fa-var-check-circle: "\f058";
+@fa-var-check-circle-o: "\f05d";
+@fa-var-check-square: "\f14a";
+@fa-var-check-square-o: "\f046";
+@fa-var-chevron-circle-down: "\f13a";
+@fa-var-chevron-circle-left: "\f137";
+@fa-var-chevron-circle-right: "\f138";
+@fa-var-chevron-circle-up: "\f139";
+@fa-var-chevron-down: "\f078";
+@fa-var-chevron-left: "\f053";
+@fa-var-chevron-right: "\f054";
+@fa-var-chevron-up: "\f077";
+@fa-var-child: "\f1ae";
+@fa-var-chrome: "\f268";
+@fa-var-circle: "\f111";
+@fa-var-circle-o: "\f10c";
+@fa-var-circle-o-notch: "\f1ce";
+@fa-var-circle-thin: "\f1db";
+@fa-var-clipboard: "\f0ea";
+@fa-var-clock-o: "\f017";
+@fa-var-clone: "\f24d";
+@fa-var-close: "\f00d";
+@fa-var-cloud: "\f0c2";
+@fa-var-cloud-download: "\f0ed";
+@fa-var-cloud-upload: "\f0ee";
+@fa-var-cny: "\f157";
+@fa-var-code: "\f121";
+@fa-var-code-fork: "\f126";
+@fa-var-codepen: "\f1cb";
+@fa-var-codiepie: "\f284";
+@fa-var-coffee: "\f0f4";
+@fa-var-cog: "\f013";
+@fa-var-cogs: "\f085";
+@fa-var-columns: "\f0db";
+@fa-var-comment: "\f075";
+@fa-var-comment-o: "\f0e5";
+@fa-var-commenting: "\f27a";
+@fa-var-commenting-o: "\f27b";
+@fa-var-comments: "\f086";
+@fa-var-comments-o: "\f0e6";
+@fa-var-compass: "\f14e";
+@fa-var-compress: "\f066";
+@fa-var-connectdevelop: "\f20e";
+@fa-var-contao: "\f26d";
+@fa-var-copy: "\f0c5";
+@fa-var-copyright: "\f1f9";
+@fa-var-creative-commons: "\f25e";
+@fa-var-credit-card: "\f09d";
+@fa-var-credit-card-alt: "\f283";
+@fa-var-crop: "\f125";
+@fa-var-crosshairs: "\f05b";
+@fa-var-css3: "\f13c";
+@fa-var-cube: "\f1b2";
+@fa-var-cubes: "\f1b3";
+@fa-var-cut: "\f0c4";
+@fa-var-cutlery: "\f0f5";
+@fa-var-dashboard: "\f0e4";
+@fa-var-dashcube: "\f210";
+@fa-var-database: "\f1c0";
+@fa-var-deaf: "\f2a4";
+@fa-var-deafness: "\f2a4";
+@fa-var-dedent: "\f03b";
+@fa-var-delicious: "\f1a5";
+@fa-var-desktop: "\f108";
+@fa-var-deviantart: "\f1bd";
+@fa-var-diamond: "\f219";
+@fa-var-digg: "\f1a6";
+@fa-var-dollar: "\f155";
+@fa-var-dot-circle-o: "\f192";
+@fa-var-download: "\f019";
+@fa-var-dribbble: "\f17d";
+@fa-var-drivers-license: "\f2c2";
+@fa-var-drivers-license-o: "\f2c3";
+@fa-var-dropbox: "\f16b";
+@fa-var-drupal: "\f1a9";
+@fa-var-edge: "\f282";
+@fa-var-edit: "\f044";
+@fa-var-eercast: "\f2da";
+@fa-var-eject: "\f052";
+@fa-var-ellipsis-h: "\f141";
+@fa-var-ellipsis-v: "\f142";
+@fa-var-empire: "\f1d1";
+@fa-var-envelope: "\f0e0";
+@fa-var-envelope-o: "\f003";
+@fa-var-envelope-open: "\f2b6";
+@fa-var-envelope-open-o: "\f2b7";
+@fa-var-envelope-square: "\f199";
+@fa-var-envira: "\f299";
+@fa-var-eraser: "\f12d";
+@fa-var-etsy: "\f2d7";
+@fa-var-eur: "\f153";
+@fa-var-euro: "\f153";
+@fa-var-exchange: "\f0ec";
+@fa-var-exclamation: "\f12a";
+@fa-var-exclamation-circle: "\f06a";
+@fa-var-exclamation-triangle: "\f071";
+@fa-var-expand: "\f065";
+@fa-var-expeditedssl: "\f23e";
+@fa-var-external-link: "\f08e";
+@fa-var-external-link-square: "\f14c";
+@fa-var-eye: "\f06e";
+@fa-var-eye-slash: "\f070";
+@fa-var-eyedropper: "\f1fb";
+@fa-var-fa: "\f2b4";
+@fa-var-facebook: "\f09a";
+@fa-var-facebook-f: "\f09a";
+@fa-var-facebook-official: "\f230";
+@fa-var-facebook-square: "\f082";
+@fa-var-fast-backward: "\f049";
+@fa-var-fast-forward: "\f050";
+@fa-var-fax: "\f1ac";
+@fa-var-feed: "\f09e";
+@fa-var-female: "\f182";
+@fa-var-fighter-jet: "\f0fb";
+@fa-var-file: "\f15b";
+@fa-var-file-archive-o: "\f1c6";
+@fa-var-file-audio-o: "\f1c7";
+@fa-var-file-code-o: "\f1c9";
+@fa-var-file-excel-o: "\f1c3";
+@fa-var-file-image-o: "\f1c5";
+@fa-var-file-movie-o: "\f1c8";
+@fa-var-file-o: "\f016";
+@fa-var-file-pdf-o: "\f1c1";
+@fa-var-file-photo-o: "\f1c5";
+@fa-var-file-picture-o: "\f1c5";
+@fa-var-file-powerpoint-o: "\f1c4";
+@fa-var-file-sound-o: "\f1c7";
+@fa-var-file-text: "\f15c";
+@fa-var-file-text-o: "\f0f6";
+@fa-var-file-video-o: "\f1c8";
+@fa-var-file-word-o: "\f1c2";
+@fa-var-file-zip-o: "\f1c6";
+@fa-var-files-o: "\f0c5";
+@fa-var-film: "\f008";
+@fa-var-filter: "\f0b0";
+@fa-var-fire: "\f06d";
+@fa-var-fire-extinguisher: "\f134";
+@fa-var-firefox: "\f269";
+@fa-var-first-order: "\f2b0";
+@fa-var-flag: "\f024";
+@fa-var-flag-checkered: "\f11e";
+@fa-var-flag-o: "\f11d";
+@fa-var-flash: "\f0e7";
+@fa-var-flask: "\f0c3";
+@fa-var-flickr: "\f16e";
+@fa-var-floppy-o: "\f0c7";
+@fa-var-folder: "\f07b";
+@fa-var-folder-o: "\f114";
+@fa-var-folder-open: "\f07c";
+@fa-var-folder-open-o: "\f115";
+@fa-var-font: "\f031";
+@fa-var-font-awesome: "\f2b4";
+@fa-var-fonticons: "\f280";
+@fa-var-fort-awesome: "\f286";
+@fa-var-forumbee: "\f211";
+@fa-var-forward: "\f04e";
+@fa-var-foursquare: "\f180";
+@fa-var-free-code-camp: "\f2c5";
+@fa-var-frown-o: "\f119";
+@fa-var-futbol-o: "\f1e3";
+@fa-var-gamepad: "\f11b";
+@fa-var-gavel: "\f0e3";
+@fa-var-gbp: "\f154";
+@fa-var-ge: "\f1d1";
+@fa-var-gear: "\f013";
+@fa-var-gears: "\f085";
+@fa-var-genderless: "\f22d";
+@fa-var-get-pocket: "\f265";
+@fa-var-gg: "\f260";
+@fa-var-gg-circle: "\f261";
+@fa-var-gift: "\f06b";
+@fa-var-git: "\f1d3";
+@fa-var-git-square: "\f1d2";
+@fa-var-github: "\f09b";
+@fa-var-github-alt: "\f113";
+@fa-var-github-square: "\f092";
+@fa-var-gitlab: "\f296";
+@fa-var-gittip: "\f184";
+@fa-var-glass: "\f000";
+@fa-var-glide: "\f2a5";
+@fa-var-glide-g: "\f2a6";
+@fa-var-globe: "\f0ac";
+@fa-var-google: "\f1a0";
+@fa-var-google-plus: "\f0d5";
+@fa-var-google-plus-circle: "\f2b3";
+@fa-var-google-plus-official: "\f2b3";
+@fa-var-google-plus-square: "\f0d4";
+@fa-var-google-wallet: "\f1ee";
+@fa-var-graduation-cap: "\f19d";
+@fa-var-gratipay: "\f184";
+@fa-var-grav: "\f2d6";
+@fa-var-group: "\f0c0";
+@fa-var-h-square: "\f0fd";
+@fa-var-hacker-news: "\f1d4";
+@fa-var-hand-grab-o: "\f255";
+@fa-var-hand-lizard-o: "\f258";
+@fa-var-hand-o-down: "\f0a7";
+@fa-var-hand-o-left: "\f0a5";
+@fa-var-hand-o-right: "\f0a4";
+@fa-var-hand-o-up: "\f0a6";
+@fa-var-hand-paper-o: "\f256";
+@fa-var-hand-peace-o: "\f25b";
+@fa-var-hand-pointer-o: "\f25a";
+@fa-var-hand-rock-o: "\f255";
+@fa-var-hand-scissors-o: "\f257";
+@fa-var-hand-spock-o: "\f259";
+@fa-var-hand-stop-o: "\f256";
+@fa-var-handshake-o: "\f2b5";
+@fa-var-hard-of-hearing: "\f2a4";
+@fa-var-hashtag: "\f292";
+@fa-var-hdd-o: "\f0a0";
+@fa-var-header: "\f1dc";
+@fa-var-headphones: "\f025";
+@fa-var-heart: "\f004";
+@fa-var-heart-o: "\f08a";
+@fa-var-heartbeat: "\f21e";
+@fa-var-history: "\f1da";
+@fa-var-home: "\f015";
+@fa-var-hospital-o: "\f0f8";
+@fa-var-hotel: "\f236";
+@fa-var-hourglass: "\f254";
+@fa-var-hourglass-1: "\f251";
+@fa-var-hourglass-2: "\f252";
+@fa-var-hourglass-3: "\f253";
+@fa-var-hourglass-end: "\f253";
+@fa-var-hourglass-half: "\f252";
+@fa-var-hourglass-o: "\f250";
+@fa-var-hourglass-start: "\f251";
+@fa-var-houzz: "\f27c";
+@fa-var-html5: "\f13b";
+@fa-var-i-cursor: "\f246";
+@fa-var-id-badge: "\f2c1";
+@fa-var-id-card: "\f2c2";
+@fa-var-id-card-o: "\f2c3";
+@fa-var-ils: "\f20b";
+@fa-var-image: "\f03e";
+@fa-var-imdb: "\f2d8";
+@fa-var-inbox: "\f01c";
+@fa-var-indent: "\f03c";
+@fa-var-industry: "\f275";
+@fa-var-info: "\f129";
+@fa-var-info-circle: "\f05a";
+@fa-var-inr: "\f156";
+@fa-var-instagram: "\f16d";
+@fa-var-institution: "\f19c";
+@fa-var-internet-explorer: "\f26b";
+@fa-var-intersex: "\f224";
+@fa-var-ioxhost: "\f208";
+@fa-var-italic: "\f033";
+@fa-var-joomla: "\f1aa";
+@fa-var-jpy: "\f157";
+@fa-var-jsfiddle: "\f1cc";
+@fa-var-key: "\f084";
+@fa-var-keyboard-o: "\f11c";
+@fa-var-krw: "\f159";
+@fa-var-language: "\f1ab";
+@fa-var-laptop: "\f109";
+@fa-var-lastfm: "\f202";
+@fa-var-lastfm-square: "\f203";
+@fa-var-leaf: "\f06c";
+@fa-var-leanpub: "\f212";
+@fa-var-legal: "\f0e3";
+@fa-var-lemon-o: "\f094";
+@fa-var-level-down: "\f149";
+@fa-var-level-up: "\f148";
+@fa-var-life-bouy: "\f1cd";
+@fa-var-life-buoy: "\f1cd";
+@fa-var-life-ring: "\f1cd";
+@fa-var-life-saver: "\f1cd";
+@fa-var-lightbulb-o: "\f0eb";
+@fa-var-line-chart: "\f201";
+@fa-var-link: "\f0c1";
+@fa-var-linkedin: "\f0e1";
+@fa-var-linkedin-square: "\f08c";
+@fa-var-linode: "\f2b8";
+@fa-var-linux: "\f17c";
+@fa-var-list: "\f03a";
+@fa-var-list-alt: "\f022";
+@fa-var-list-ol: "\f0cb";
+@fa-var-list-ul: "\f0ca";
+@fa-var-location-arrow: "\f124";
+@fa-var-lock: "\f023";
+@fa-var-long-arrow-down: "\f175";
+@fa-var-long-arrow-left: "\f177";
+@fa-var-long-arrow-right: "\f178";
+@fa-var-long-arrow-up: "\f176";
+@fa-var-low-vision: "\f2a8";
+@fa-var-magic: "\f0d0";
+@fa-var-magnet: "\f076";
+@fa-var-mail-forward: "\f064";
+@fa-var-mail-reply: "\f112";
+@fa-var-mail-reply-all: "\f122";
+@fa-var-male: "\f183";
+@fa-var-map: "\f279";
+@fa-var-map-marker: "\f041";
+@fa-var-map-o: "\f278";
+@fa-var-map-pin: "\f276";
+@fa-var-map-signs: "\f277";
+@fa-var-mars: "\f222";
+@fa-var-mars-double: "\f227";
+@fa-var-mars-stroke: "\f229";
+@fa-var-mars-stroke-h: "\f22b";
+@fa-var-mars-stroke-v: "\f22a";
+@fa-var-maxcdn: "\f136";
+@fa-var-meanpath: "\f20c";
+@fa-var-medium: "\f23a";
+@fa-var-medkit: "\f0fa";
+@fa-var-meetup: "\f2e0";
+@fa-var-meh-o: "\f11a";
+@fa-var-mercury: "\f223";
+@fa-var-microchip: "\f2db";
+@fa-var-microphone: "\f130";
+@fa-var-microphone-slash: "\f131";
+@fa-var-minus: "\f068";
+@fa-var-minus-circle: "\f056";
+@fa-var-minus-square: "\f146";
+@fa-var-minus-square-o: "\f147";
+@fa-var-mixcloud: "\f289";
+@fa-var-mobile: "\f10b";
+@fa-var-mobile-phone: "\f10b";
+@fa-var-modx: "\f285";
+@fa-var-money: "\f0d6";
+@fa-var-moon-o: "\f186";
+@fa-var-mortar-board: "\f19d";
+@fa-var-motorcycle: "\f21c";
+@fa-var-mouse-pointer: "\f245";
+@fa-var-music: "\f001";
+@fa-var-navicon: "\f0c9";
+@fa-var-neuter: "\f22c";
+@fa-var-newspaper-o: "\f1ea";
+@fa-var-object-group: "\f247";
+@fa-var-object-ungroup: "\f248";
+@fa-var-odnoklassniki: "\f263";
+@fa-var-odnoklassniki-square: "\f264";
+@fa-var-opencart: "\f23d";
+@fa-var-openid: "\f19b";
+@fa-var-opera: "\f26a";
+@fa-var-optin-monster: "\f23c";
+@fa-var-outdent: "\f03b";
+@fa-var-pagelines: "\f18c";
+@fa-var-paint-brush: "\f1fc";
+@fa-var-paper-plane: "\f1d8";
+@fa-var-paper-plane-o: "\f1d9";
+@fa-var-paperclip: "\f0c6";
+@fa-var-paragraph: "\f1dd";
+@fa-var-paste: "\f0ea";
+@fa-var-pause: "\f04c";
+@fa-var-pause-circle: "\f28b";
+@fa-var-pause-circle-o: "\f28c";
+@fa-var-paw: "\f1b0";
+@fa-var-paypal: "\f1ed";
+@fa-var-pencil: "\f040";
+@fa-var-pencil-square: "\f14b";
+@fa-var-pencil-square-o: "\f044";
+@fa-var-percent: "\f295";
+@fa-var-phone: "\f095";
+@fa-var-phone-square: "\f098";
+@fa-var-photo: "\f03e";
+@fa-var-picture-o: "\f03e";
+@fa-var-pie-chart: "\f200";
+@fa-var-pied-piper: "\f2ae";
+@fa-var-pied-piper-alt: "\f1a8";
+@fa-var-pied-piper-pp: "\f1a7";
+@fa-var-pinterest: "\f0d2";
+@fa-var-pinterest-p: "\f231";
+@fa-var-pinterest-square: "\f0d3";
+@fa-var-plane: "\f072";
+@fa-var-play: "\f04b";
+@fa-var-play-circle: "\f144";
+@fa-var-play-circle-o: "\f01d";
+@fa-var-plug: "\f1e6";
+@fa-var-plus: "\f067";
+@fa-var-plus-circle: "\f055";
+@fa-var-plus-square: "\f0fe";
+@fa-var-plus-square-o: "\f196";
+@fa-var-podcast: "\f2ce";
+@fa-var-power-off: "\f011";
+@fa-var-print: "\f02f";
+@fa-var-product-hunt: "\f288";
+@fa-var-puzzle-piece: "\f12e";
+@fa-var-qq: "\f1d6";
+@fa-var-qrcode: "\f029";
+@fa-var-question: "\f128";
+@fa-var-question-circle: "\f059";
+@fa-var-question-circle-o: "\f29c";
+@fa-var-quora: "\f2c4";
+@fa-var-quote-left: "\f10d";
+@fa-var-quote-right: "\f10e";
+@fa-var-ra: "\f1d0";
+@fa-var-random: "\f074";
+@fa-var-ravelry: "\f2d9";
+@fa-var-rebel: "\f1d0";
+@fa-var-recycle: "\f1b8";
+@fa-var-reddit: "\f1a1";
+@fa-var-reddit-alien: "\f281";
+@fa-var-reddit-square: "\f1a2";
+@fa-var-refresh: "\f021";
+@fa-var-registered: "\f25d";
+@fa-var-remove: "\f00d";
+@fa-var-renren: "\f18b";
+@fa-var-reorder: "\f0c9";
+@fa-var-repeat: "\f01e";
+@fa-var-reply: "\f112";
+@fa-var-reply-all: "\f122";
+@fa-var-resistance: "\f1d0";
+@fa-var-retweet: "\f079";
+@fa-var-rmb: "\f157";
+@fa-var-road: "\f018";
+@fa-var-rocket: "\f135";
+@fa-var-rotate-left: "\f0e2";
+@fa-var-rotate-right: "\f01e";
+@fa-var-rouble: "\f158";
+@fa-var-rss: "\f09e";
+@fa-var-rss-square: "\f143";
+@fa-var-rub: "\f158";
+@fa-var-ruble: "\f158";
+@fa-var-rupee: "\f156";
+@fa-var-s15: "\f2cd";
+@fa-var-safari: "\f267";
+@fa-var-save: "\f0c7";
+@fa-var-scissors: "\f0c4";
+@fa-var-scribd: "\f28a";
+@fa-var-search: "\f002";
+@fa-var-search-minus: "\f010";
+@fa-var-search-plus: "\f00e";
+@fa-var-sellsy: "\f213";
+@fa-var-send: "\f1d8";
+@fa-var-send-o: "\f1d9";
+@fa-var-server: "\f233";
+@fa-var-share: "\f064";
+@fa-var-share-alt: "\f1e0";
+@fa-var-share-alt-square: "\f1e1";
+@fa-var-share-square: "\f14d";
+@fa-var-share-square-o: "\f045";
+@fa-var-shekel: "\f20b";
+@fa-var-sheqel: "\f20b";
+@fa-var-shield: "\f132";
+@fa-var-ship: "\f21a";
+@fa-var-shirtsinbulk: "\f214";
+@fa-var-shopping-bag: "\f290";
+@fa-var-shopping-basket: "\f291";
+@fa-var-shopping-cart: "\f07a";
+@fa-var-shower: "\f2cc";
+@fa-var-sign-in: "\f090";
+@fa-var-sign-language: "\f2a7";
+@fa-var-sign-out: "\f08b";
+@fa-var-signal: "\f012";
+@fa-var-signing: "\f2a7";
+@fa-var-simplybuilt: "\f215";
+@fa-var-sitemap: "\f0e8";
+@fa-var-skyatlas: "\f216";
+@fa-var-skype: "\f17e";
+@fa-var-slack: "\f198";
+@fa-var-sliders: "\f1de";
+@fa-var-slideshare: "\f1e7";
+@fa-var-smile-o: "\f118";
+@fa-var-snapchat: "\f2ab";
+@fa-var-snapchat-ghost: "\f2ac";
+@fa-var-snapchat-square: "\f2ad";
+@fa-var-snowflake-o: "\f2dc";
+@fa-var-soccer-ball-o: "\f1e3";
+@fa-var-sort: "\f0dc";
+@fa-var-sort-alpha-asc: "\f15d";
+@fa-var-sort-alpha-desc: "\f15e";
+@fa-var-sort-amount-asc: "\f160";
+@fa-var-sort-amount-desc: "\f161";
+@fa-var-sort-asc: "\f0de";
+@fa-var-sort-desc: "\f0dd";
+@fa-var-sort-down: "\f0dd";
+@fa-var-sort-numeric-asc: "\f162";
+@fa-var-sort-numeric-desc: "\f163";
+@fa-var-sort-up: "\f0de";
+@fa-var-soundcloud: "\f1be";
+@fa-var-space-shuttle: "\f197";
+@fa-var-spinner: "\f110";
+@fa-var-spoon: "\f1b1";
+@fa-var-spotify: "\f1bc";
+@fa-var-square: "\f0c8";
+@fa-var-square-o: "\f096";
+@fa-var-stack-exchange: "\f18d";
+@fa-var-stack-overflow: "\f16c";
+@fa-var-star: "\f005";
+@fa-var-star-half: "\f089";
+@fa-var-star-half-empty: "\f123";
+@fa-var-star-half-full: "\f123";
+@fa-var-star-half-o: "\f123";
+@fa-var-star-o: "\f006";
+@fa-var-steam: "\f1b6";
+@fa-var-steam-square: "\f1b7";
+@fa-var-step-backward: "\f048";
+@fa-var-step-forward: "\f051";
+@fa-var-stethoscope: "\f0f1";
+@fa-var-sticky-note: "\f249";
+@fa-var-sticky-note-o: "\f24a";
+@fa-var-stop: "\f04d";
+@fa-var-stop-circle: "\f28d";
+@fa-var-stop-circle-o: "\f28e";
+@fa-var-street-view: "\f21d";
+@fa-var-strikethrough: "\f0cc";
+@fa-var-stumbleupon: "\f1a4";
+@fa-var-stumbleupon-circle: "\f1a3";
+@fa-var-subscript: "\f12c";
+@fa-var-subway: "\f239";
+@fa-var-suitcase: "\f0f2";
+@fa-var-sun-o: "\f185";
+@fa-var-superpowers: "\f2dd";
+@fa-var-superscript: "\f12b";
+@fa-var-support: "\f1cd";
+@fa-var-table: "\f0ce";
+@fa-var-tablet: "\f10a";
+@fa-var-tachometer: "\f0e4";
+@fa-var-tag: "\f02b";
+@fa-var-tags: "\f02c";
+@fa-var-tasks: "\f0ae";
+@fa-var-taxi: "\f1ba";
+@fa-var-telegram: "\f2c6";
+@fa-var-television: "\f26c";
+@fa-var-tencent-weibo: "\f1d5";
+@fa-var-terminal: "\f120";
+@fa-var-text-height: "\f034";
+@fa-var-text-width: "\f035";
+@fa-var-th: "\f00a";
+@fa-var-th-large: "\f009";
+@fa-var-th-list: "\f00b";
+@fa-var-themeisle: "\f2b2";
+@fa-var-thermometer: "\f2c7";
+@fa-var-thermometer-0: "\f2cb";
+@fa-var-thermometer-1: "\f2ca";
+@fa-var-thermometer-2: "\f2c9";
+@fa-var-thermometer-3: "\f2c8";
+@fa-var-thermometer-4: "\f2c7";
+@fa-var-thermometer-empty: "\f2cb";
+@fa-var-thermometer-full: "\f2c7";
+@fa-var-thermometer-half: "\f2c9";
+@fa-var-thermometer-quarter: "\f2ca";
+@fa-var-thermometer-three-quarters: "\f2c8";
+@fa-var-thumb-tack: "\f08d";
+@fa-var-thumbs-down: "\f165";
+@fa-var-thumbs-o-down: "\f088";
+@fa-var-thumbs-o-up: "\f087";
+@fa-var-thumbs-up: "\f164";
+@fa-var-ticket: "\f145";
+@fa-var-times: "\f00d";
+@fa-var-times-circle: "\f057";
+@fa-var-times-circle-o: "\f05c";
+@fa-var-times-rectangle: "\f2d3";
+@fa-var-times-rectangle-o: "\f2d4";
+@fa-var-tint: "\f043";
+@fa-var-toggle-down: "\f150";
+@fa-var-toggle-left: "\f191";
+@fa-var-toggle-off: "\f204";
+@fa-var-toggle-on: "\f205";
+@fa-var-toggle-right: "\f152";
+@fa-var-toggle-up: "\f151";
+@fa-var-trademark: "\f25c";
+@fa-var-train: "\f238";
+@fa-var-transgender: "\f224";
+@fa-var-transgender-alt: "\f225";
+@fa-var-trash: "\f1f8";
+@fa-var-trash-o: "\f014";
+@fa-var-tree: "\f1bb";
+@fa-var-trello: "\f181";
+@fa-var-tripadvisor: "\f262";
+@fa-var-trophy: "\f091";
+@fa-var-truck: "\f0d1";
+@fa-var-try: "\f195";
+@fa-var-tty: "\f1e4";
+@fa-var-tumblr: "\f173";
+@fa-var-tumblr-square: "\f174";
+@fa-var-turkish-lira: "\f195";
+@fa-var-tv: "\f26c";
+@fa-var-twitch: "\f1e8";
+@fa-var-twitter: "\f099";
+@fa-var-twitter-square: "\f081";
+@fa-var-umbrella: "\f0e9";
+@fa-var-underline: "\f0cd";
+@fa-var-undo: "\f0e2";
+@fa-var-universal-access: "\f29a";
+@fa-var-university: "\f19c";
+@fa-var-unlink: "\f127";
+@fa-var-unlock: "\f09c";
+@fa-var-unlock-alt: "\f13e";
+@fa-var-unsorted: "\f0dc";
+@fa-var-upload: "\f093";
+@fa-var-usb: "\f287";
+@fa-var-usd: "\f155";
+@fa-var-user: "\f007";
+@fa-var-user-circle: "\f2bd";
+@fa-var-user-circle-o: "\f2be";
+@fa-var-user-md: "\f0f0";
+@fa-var-user-o: "\f2c0";
+@fa-var-user-plus: "\f234";
+@fa-var-user-secret: "\f21b";
+@fa-var-user-times: "\f235";
+@fa-var-users: "\f0c0";
+@fa-var-vcard: "\f2bb";
+@fa-var-vcard-o: "\f2bc";
+@fa-var-venus: "\f221";
+@fa-var-venus-double: "\f226";
+@fa-var-venus-mars: "\f228";
+@fa-var-viacoin: "\f237";
+@fa-var-viadeo: "\f2a9";
+@fa-var-viadeo-square: "\f2aa";
+@fa-var-video-camera: "\f03d";
+@fa-var-vimeo: "\f27d";
+@fa-var-vimeo-square: "\f194";
+@fa-var-vine: "\f1ca";
+@fa-var-vk: "\f189";
+@fa-var-volume-control-phone: "\f2a0";
+@fa-var-volume-down: "\f027";
+@fa-var-volume-off: "\f026";
+@fa-var-volume-up: "\f028";
+@fa-var-warning: "\f071";
+@fa-var-wechat: "\f1d7";
+@fa-var-weibo: "\f18a";
+@fa-var-weixin: "\f1d7";
+@fa-var-whatsapp: "\f232";
+@fa-var-wheelchair: "\f193";
+@fa-var-wheelchair-alt: "\f29b";
+@fa-var-wifi: "\f1eb";
+@fa-var-wikipedia-w: "\f266";
+@fa-var-window-close: "\f2d3";
+@fa-var-window-close-o: "\f2d4";
+@fa-var-window-maximize: "\f2d0";
+@fa-var-window-minimize: "\f2d1";
+@fa-var-window-restore: "\f2d2";
+@fa-var-windows: "\f17a";
+@fa-var-won: "\f159";
+@fa-var-wordpress: "\f19a";
+@fa-var-wpbeginner: "\f297";
+@fa-var-wpexplorer: "\f2de";
+@fa-var-wpforms: "\f298";
+@fa-var-wrench: "\f0ad";
+@fa-var-xing: "\f168";
+@fa-var-xing-square: "\f169";
+@fa-var-y-combinator: "\f23b";
+@fa-var-y-combinator-square: "\f1d4";
+@fa-var-yahoo: "\f19e";
+@fa-var-yc: "\f23b";
+@fa-var-yc-square: "\f1d4";
+@fa-var-yelp: "\f1e9";
+@fa-var-yen: "\f157";
+@fa-var-yoast: "\f2b1";
+@fa-var-youtube: "\f167";
+@fa-var-youtube-play: "\f16a";
+@fa-var-youtube-square: "\f166";
+
diff --git a/_site/site/public/font-awesome-4.7.0/scss/font-awesome.scss b/_site/site/public/font-awesome-4.7.0/scss/font-awesome.scss
new file mode 100755
index 00000000..f1c83aaa
--- /dev/null
+++ b/_site/site/public/font-awesome-4.7.0/scss/font-awesome.scss
@@ -0,0 +1,18 @@
+/*!
+ *  Font Awesome 4.7.0 by @davegandy - http://fontawesome.io - @fontawesome
+ *  License - http://fontawesome.io/license (Font: SIL OFL 1.1, CSS: MIT License)
+ */
+
+@import "variables";
+@import "mixins";
+@import "path";
+@import "core";
+@import "larger";
+@import "fixed-width";
+@import "list";
+@import "bordered-pulled";
+@import "animated";
+@import "rotated-flipped";
+@import "stacked";
+@import "icons";
+@import "screen-reader";
diff --git a/_site/site/tags.html b/_site/site/tags.html
new file mode 100644
index 00000000..6442c184
--- /dev/null
+++ b/_site/site/tags.html
@@ -0,0 +1,5006 @@
+
+<!-- site_tags: 2015,ICCV 2015,AI,CV,Dataset,ICCV,NLP,VQA,CVPR,Embedding,CVPR 2017,2017,2016,CVPR 2016,Neural Module Network,2014,EMNLP 2014,Dependency Parsing,EMNLP,EMNLP 2016,Attention,Natural Language Inference,SOTA,ACL 2015,ACL,Word Vectors,Multi Modal,Multi Model,Speech,POS,QA,Sentiment Analysis,Out of Distribution,Softmax,KDD 2017,KDD,Machine Comprehension,RL,Out of Vocabulary Words,NIPS 2015,Seq2Seq,NIPS,Relational Network,Representation Learning,NMT,EMNLP 2017,Information Retrieval,ACL 2017,Activation Function,Self Gated,Graph Representation,Graph,ICLR 2015,ICLR,2002,Science 2002,Motif,Network,Science,Science 2016,Clustering,Transfer Learning,ICML 2016,ICML,KDD 2015,Network Embedding,NIPS 2014,Conversational Agent,Multi Task,Abstract Summarization,Pointer Network,Summarization,ACL 2016,NLG,Workshop,2018,Dynamical System,Latent Variable,Relational Inference,GNN,VAE,Graph Neural Network,SAT,Catastrophic Forgetting,ICLR 2014,Activation,Accelerated Training,Learning Rate,WACV 2017,LR,WACV,Pruning Network,Hypothesis,Initialization,Unsupervised,Message Passing,Neural Message Passing,Chemistry,MPNN,Count Based VQA,ICLR 2018,ICLR 2016,Knowledge Transfer,Lifelong Learning,ICML 2018,Knowledge Distillation,KD,Continual Learning,Incremental Learning,Weight Adaptation,Causal Learning,Causality,Recurrent Neural Network,Kronecker,KRU,RNN,NIPS 2017,Model-Based,Model-Free,Pooling,Loss Function,Semantic Loss,Symbolic Knowledge,Loss,AAAI 2018,Emergent Language,Multi-Agent,Natural Language Processing,AAAI,NIPS Workskop,Virtual Embodiment,Theory,Hyperbolic Embedding,Poincare Ball Model,Tree,Curriculum Learning,Grounded Language Learning,Interactive Teaching,Reinforcement Learning,Environment,Memory Augmented Neural Network,Meta Learning,One shot learning,MANN,Memory,ICML 2017,Hierarchial RNN,Learning Optimizer',Optimizer,Hyperboloid Model,Off policy RL,Sample Efficient,Kernel,Pretraining,2019,ICLR 2019,CL,Hierarchical RL,Mujoco,Modular Meta Learning,Modular ML,Modular Network,Module,Information Theory,Entropy,Linear Algebra,Linear Model,Matrix Factorization,Tucker Decomposition,Factorization,Matrix,AAMAS 2019,Hierarchical Reinforcement Learning,AAMAS,HRL,Empirical Advice,NeurIPS 2018,NeurIPS Workshop 2018,Neural Computation,Neural Computation 2002,Data Augmentation,Sequential models,Compositionality,Deep Reinforcement Learning,Relational Learning,RRL,Meta Reinforcement Learning,Structured Exploration,Exploration,NeurIPS,ICML 2019,Inverse Reinforcement Learning,IRL,Physics,Relation Learning,Set,Evaluating Generalization,DRL,Evaluation,Generalization,Neurips 2018,Neurips,Abductive Reasoning,NLI,Reasoning,Key Value,Physical Reasoning,Benchmark,MAML,Distributed Reinforcement Learning,Neurips 2019,Object-Oriented Learning,Planning,EMNLP 2019,Procedural Text,Transformer,2006,Markov Decision Process,State Abstraction,MDP,LL,Distributed Computing,Distributed SGD,Synchronous SGD,ImageNet,ICLR 2020,Multi Domain,Neural Machine Translation,Scale,Adversarial Robustness,Energy-Based Models,Generative Models,Hybrid Models,Outlier Detection,Out of Distribution Detection,Adversarial,Calibration,EBM,Robustness,NeurIPS 2019,Replay Buffer,Finetuning,ICLR 2018',ERM,2010,ECCV 2010,ECCV,2020,Contrastive Learning,Self Supervised,Contrastive,Online Learning,Empirical,Credit Assignment,Decentralized Reinforcement Learning,ICML 2020,Economics,Stochastic Gradient Descent,UAI 2018,SGD,SWA,UAI,Batch Normalisation,ICML 2020',BatchNorm,BN,Normalization,Gradient Manipulation,Gradient Normalization,Conditional Computation,ICLR 2017,Mixture of Experts,Gating,Long-tailed Dataset,Classifier,1999,Distributed Systems,Operating Systems,CAP,OS,Geometry,Invariance,NeurIPS 2020,2012,Build System,Software Engineering,Technical Debt,Engineering,IEEE,Software,Systems,Big Data,Data,Database,DBMS,SSO,USENIX,Siamese,SSL,Latency,ACID,BASE,Apache,Design Pattern,Container,Explainability,Interpretability,Model Parallelism,HyperNetwork,2021,Text-to-Text Transformer,Zero-Shot,Zero Shot Generalization,Adapter,ICLR 2021,Representation Analysis,2013,Click-Through Rate,Data Mining,KDD 2013,Machine Learning,Ads,CTR,ML,KDD 2014,ACM,Recommender Systems,Ranking,Recommender,Generalizatio,2023,Large Language Model,PreTrained Langauge Model,GPT,LLM,OPT -->
+
+<!-- tag_words: 199920022006201020122013201420152016201720182019202020212023AAAIAAAI 2018AAMASAAMAS 2019ACIDACLACL 2015ACL 2016ACL 2017ACMAIAbductive ReasoningAbstract SummarizationAccelerated TrainingActivationActivation FunctionAdapterAdsAdversarialAdversarial RobustnessApacheAttentionBASEBNBatch NormalisationBatchNormBenchmarkBig DataBuild SystemCAPCLCTRCVCVPRCVPR 2016CVPR 2017CalibrationCatastrophic ForgettingCausal LearningCausalityChemistryClassifierClick-Through RateClusteringCompositionalityConditional ComputationContainerContinual LearningContrastiveContrastive LearningConversational AgentCount Based VQACredit AssignmentCurriculum LearningDBMSDRLDataData AugmentationData MiningDatabaseDatasetDecentralized Reinforcement LearningDeep Reinforcement LearningDependency ParsingDesign PatternDistributed ComputingDistributed Reinforcement LearningDistributed SGDDistributed SystemsDynamical SystemEBMECCVECCV 2010EMNLPEMNLP 2014EMNLP 2016EMNLP 2017EMNLP 2019ERMEconomicsEmbeddingEmergent LanguageEmpiricalEmpirical AdviceEnergy-Based ModelsEngineeringEntropyEnvironmentEvaluating GeneralizationEvaluationExplainabilityExplorationFactorizationFinetuningGNNGPTGatingGeneralizatioGeneralizationGenerative ModelsGeometryGradient ManipulationGradient NormalizationGraphGraph Neural NetworkGraph RepresentationGrounded Language LearningHRLHierarchial RNNHierarchical RLHierarchical Reinforcement LearningHybrid ModelsHyperNetworkHyperbolic EmbeddingHyperboloid ModelHypothesisICCVICCV 2015ICLRICLR 2014ICLR 2015ICLR 2016ICLR 2017ICLR 2018ICLR 2018'ICLR 2019ICLR 2020ICLR 2021ICMLICML 2016ICML 2017ICML 2018ICML 2019ICML 2020ICML 2020'IEEEIRLImageNetIncremental LearningInformation RetrievalInformation TheoryInitializationInteractive TeachingInterpretabilityInvarianceInverse Reinforcement LearningKDKDDKDD 2013KDD 2014KDD 2015KDD 2017KRUKernelKey ValueKnowledge DistillationKnowledge TransferKroneckerLLLLMLRLarge Language ModelLatencyLatent VariableLearning Optimizer'Learning RateLifelong LearningLinear AlgebraLinear ModelLong-tailed DatasetLossLoss FunctionMAMLMANNMDPMLMPNNMachine ComprehensionMachine LearningMarkov Decision ProcessMatrixMatrix FactorizationMemoryMemory Augmented Neural NetworkMessage PassingMeta LearningMeta Reinforcement LearningMixture of ExpertsModel ParallelismModel-BasedModel-FreeModular MLModular Meta LearningModular NetworkModuleMotifMujocoMulti DomainMulti ModalMulti ModelMulti TaskMulti-AgentNIPSNIPS 2014NIPS 2015NIPS 2017NIPS WorkskopNLGNLINLPNMTNatural Language InferenceNatural Language ProcessingNetworkNetwork EmbeddingNeurIPSNeurIPS 2018NeurIPS 2019NeurIPS 2020NeurIPS Workshop 2018Neural ComputationNeural Computation 2002Neural Machine TranslationNeural Message PassingNeural Module NetworkNeuripsNeurips 2018Neurips 2019NormalizationOPTOSObject-Oriented LearningOff policy RLOne shot learningOnline LearningOperating SystemsOptimizerOut of DistributionOut of Distribution DetectionOut of Vocabulary WordsOutlier DetectionPOSPhysical ReasoningPhysicsPlanningPoincare Ball ModelPointer NetworkPoolingPreTrained Langauge ModelPretrainingProcedural TextPruning NetworkQARLRNNRRLRankingReasoningRecommenderRecommender SystemsRecurrent Neural NetworkReinforcement LearningRelation LearningRelational InferenceRelational LearningRelational NetworkReplay BufferRepresentation AnalysisRepresentation LearningRobustnessSATSGDSOTASSLSSOSWASample EfficientScaleScienceScience 2002Science 2016Self GatedSelf SupervisedSemantic LossSentiment AnalysisSeq2SeqSequential modelsSetSiameseSoftmaxSoftwareSoftware EngineeringSpeechState AbstractionStochastic Gradient DescentStructured ExplorationSummarizationSymbolic KnowledgeSynchronous SGDSystemsTechnical DebtText-to-Text TransformerTheoryTransfer LearningTransformerTreeTucker DecompositionUAIUAI 2018USENIXUnsupervisedVAEVQAVirtual EmbodimentWACVWACV 2017Weight AdaptationWord VectorsWorkshopZero Shot GeneralizationZero-Shot -->
+
+<div id="tags">
+  <ul class="tag-box inline">
+  
+    <li><a href="#1999">1999 <span>1</span></a></li>
+  
+    <li><a href="#2002">2002 <span>2</span></a></li>
+  
+    <li><a href="#2006">2006 <span>2</span></a></li>
+  
+    <li><a href="#2010">2010 <span>2</span></a></li>
+  
+    <li><a href="#2012">2012 <span>3</span></a></li>
+  
+    <li><a href="#2013">2013 <span>2</span></a></li>
+  
+    <li><a href="#2014">2014 <span>5</span></a></li>
+  
+    <li><a href="#2015">2015 <span>7</span></a></li>
+  
+    <li><a href="#2016">2016 <span>12</span></a></li>
+  
+    <li><a href="#2017">2017 <span>29</span></a></li>
+  
+    <li><a href="#2018">2018 <span>32</span></a></li>
+  
+    <li><a href="#2019">2019 <span>31</span></a></li>
+  
+    <li><a href="#2020">2020 <span>15</span></a></li>
+  
+    <li><a href="#2021">2021 <span>1</span></a></li>
+  
+    <li><a href="#2023">2023 <span>1</span></a></li>
+  
+    <li><a href="#AAAI">AAAI <span>1</span></a></li>
+  
+    <li><a href="#AAAI+2018">AAAI 2018 <span>1</span></a></li>
+  
+    <li><a href="#AAMAS">AAMAS <span>1</span></a></li>
+  
+    <li><a href="#AAMAS+2019">AAMAS 2019 <span>1</span></a></li>
+  
+    <li><a href="#ACID">ACID <span>1</span></a></li>
+  
+    <li><a href="#ACL">ACL <span>4</span></a></li>
+  
+    <li><a href="#ACL+2015">ACL 2015 <span>1</span></a></li>
+  
+    <li><a href="#ACL+2016">ACL 2016 <span>1</span></a></li>
+  
+    <li><a href="#ACL+2017">ACL 2017 <span>2</span></a></li>
+  
+    <li><a href="#ACM">ACM <span>2</span></a></li>
+  
+    <li><a href="#AI">AI <span>127</span></a></li>
+  
+    <li><a href="#Abductive+Reasoning">Abductive Reasoning <span>1</span></a></li>
+  
+    <li><a href="#Abstract+Summarization">Abstract Summarization <span>1</span></a></li>
+  
+    <li><a href="#Accelerated+Training">Accelerated Training <span>1</span></a></li>
+  
+    <li><a href="#Activation">Activation <span>1</span></a></li>
+  
+    <li><a href="#Activation+Function">Activation Function <span>1</span></a></li>
+  
+    <li><a href="#Adapter">Adapter <span>1</span></a></li>
+  
+    <li><a href="#Ads">Ads <span>2</span></a></li>
+  
+    <li><a href="#Adversarial">Adversarial <span>1</span></a></li>
+  
+    <li><a href="#Adversarial+Robustness">Adversarial Robustness <span>2</span></a></li>
+  
+    <li><a href="#Apache">Apache <span>1</span></a></li>
+  
+    <li><a href="#Attention">Attention <span>7</span></a></li>
+  
+    <li><a href="#BASE">BASE <span>1</span></a></li>
+  
+    <li><a href="#BN">BN <span>1</span></a></li>
+  
+    <li><a href="#Batch+Normalisation">Batch Normalisation <span>1</span></a></li>
+  
+    <li><a href="#BatchNorm">BatchNorm <span>1</span></a></li>
+  
+    <li><a href="#Benchmark">Benchmark <span>1</span></a></li>
+  
+    <li><a href="#Big+Data">Big Data <span>4</span></a></li>
+  
+    <li><a href="#Build+System">Build System <span>1</span></a></li>
+  
+    <li><a href="#CAP">CAP <span>4</span></a></li>
+  
+    <li><a href="#CL">CL <span>9</span></a></li>
+  
+    <li><a href="#CTR">CTR <span>2</span></a></li>
+  
+    <li><a href="#CV">CV <span>16</span></a></li>
+  
+    <li><a href="#CVPR">CVPR <span>2</span></a></li>
+  
+    <li><a href="#CVPR+2016">CVPR 2016 <span>1</span></a></li>
+  
+    <li><a href="#CVPR+2017">CVPR 2017 <span>1</span></a></li>
+  
+    <li><a href="#Calibration">Calibration <span>1</span></a></li>
+  
+    <li><a href="#Catastrophic+Forgetting">Catastrophic Forgetting <span>8</span></a></li>
+  
+    <li><a href="#Causal+Learning">Causal Learning <span>2</span></a></li>
+  
+    <li><a href="#Causality">Causality <span>2</span></a></li>
+  
+    <li><a href="#Chemistry">Chemistry <span>1</span></a></li>
+  
+    <li><a href="#Classifier">Classifier <span>1</span></a></li>
+  
+    <li><a href="#Click-Through+Rate">Click-Through Rate <span>2</span></a></li>
+  
+    <li><a href="#Clustering">Clustering <span>2</span></a></li>
+  
+    <li><a href="#Compositionality">Compositionality <span>4</span></a></li>
+  
+    <li><a href="#Conditional+Computation">Conditional Computation <span>1</span></a></li>
+  
+    <li><a href="#Container">Container <span>1</span></a></li>
+  
+    <li><a href="#Continual+Learning">Continual Learning <span>10</span></a></li>
+  
+    <li><a href="#Contrastive">Contrastive <span>2</span></a></li>
+  
+    <li><a href="#Contrastive+Learning">Contrastive Learning <span>2</span></a></li>
+  
+    <li><a href="#Conversational+Agent">Conversational Agent <span>1</span></a></li>
+  
+    <li><a href="#Count+Based+VQA">Count Based VQA <span>1</span></a></li>
+  
+    <li><a href="#Credit+Assignment">Credit Assignment <span>1</span></a></li>
+  
+    <li><a href="#Curriculum+Learning">Curriculum Learning <span>2</span></a></li>
+  
+    <li><a href="#DBMS">DBMS <span>4</span></a></li>
+  
+    <li><a href="#DRL">DRL <span>12</span></a></li>
+  
+    <li><a href="#Data">Data <span>3</span></a></li>
+  
+    <li><a href="#Data+Augmentation">Data Augmentation <span>2</span></a></li>
+  
+    <li><a href="#Data+Mining">Data Mining <span>2</span></a></li>
+  
+    <li><a href="#Database">Database <span>4</span></a></li>
+  
+    <li><a href="#Dataset">Dataset <span>8</span></a></li>
+  
+    <li><a href="#Decentralized+Reinforcement+Learning">Decentralized Reinforcement Learning <span>1</span></a></li>
+  
+    <li><a href="#Deep+Reinforcement+Learning">Deep Reinforcement Learning <span>13</span></a></li>
+  
+    <li><a href="#Dependency+Parsing">Dependency Parsing <span>1</span></a></li>
+  
+    <li><a href="#Design+Pattern">Design Pattern <span>1</span></a></li>
+  
+    <li><a href="#Distributed+Computing">Distributed Computing <span>3</span></a></li>
+  
+    <li><a href="#Distributed+Reinforcement+Learning">Distributed Reinforcement Learning <span>1</span></a></li>
+  
+    <li><a href="#Distributed+SGD">Distributed SGD <span>1</span></a></li>
+  
+    <li><a href="#Distributed+Systems">Distributed Systems <span>7</span></a></li>
+  
+    <li><a href="#Dynamical+System">Dynamical System <span>1</span></a></li>
+  
+    <li><a href="#EBM">EBM <span>2</span></a></li>
+  
+    <li><a href="#ECCV">ECCV <span>1</span></a></li>
+  
+    <li><a href="#ECCV+2010">ECCV 2010 <span>1</span></a></li>
+  
+    <li><a href="#EMNLP">EMNLP <span>6</span></a></li>
+  
+    <li><a href="#EMNLP+2014">EMNLP 2014 <span>1</span></a></li>
+  
+    <li><a href="#EMNLP+2016">EMNLP 2016 <span>2</span></a></li>
+  
+    <li><a href="#EMNLP+2017">EMNLP 2017 <span>2</span></a></li>
+  
+    <li><a href="#EMNLP+2019">EMNLP 2019 <span>1</span></a></li>
+  
+    <li><a href="#ERM">ERM <span>1</span></a></li>
+  
+    <li><a href="#Economics">Economics <span>1</span></a></li>
+  
+    <li><a href="#Embedding">Embedding <span>12</span></a></li>
+  
+    <li><a href="#Emergent+Language">Emergent Language <span>1</span></a></li>
+  
+    <li><a href="#Empirical">Empirical <span>4</span></a></li>
+  
+    <li><a href="#Empirical+Advice">Empirical Advice <span>6</span></a></li>
+  
+    <li><a href="#Energy-Based+Models">Energy-Based Models <span>2</span></a></li>
+  
+    <li><a href="#Engineering">Engineering <span>11</span></a></li>
+  
+    <li><a href="#Entropy">Entropy <span>1</span></a></li>
+  
+    <li><a href="#Environment">Environment <span>2</span></a></li>
+  
+    <li><a href="#Evaluating+Generalization">Evaluating Generalization <span>3</span></a></li>
+  
+    <li><a href="#Evaluation">Evaluation <span>3</span></a></li>
+  
+    <li><a href="#Explainability">Explainability <span>1</span></a></li>
+  
+    <li><a href="#Exploration">Exploration <span>1</span></a></li>
+  
+    <li><a href="#Factorization">Factorization <span>1</span></a></li>
+  
+    <li><a href="#Finetuning">Finetuning <span>1</span></a></li>
+  
+    <li><a href="#GNN">GNN <span>6</span></a></li>
+  
+    <li><a href="#GPT">GPT <span>1</span></a></li>
+  
+    <li><a href="#Gating">Gating <span>1</span></a></li>
+  
+    <li><a href="#Generalizatio">Generalizatio <span>1</span></a></li>
+  
+    <li><a href="#Generalization">Generalization <span>6</span></a></li>
+  
+    <li><a href="#Generative+Models">Generative Models <span>2</span></a></li>
+  
+    <li><a href="#Geometry">Geometry <span>1</span></a></li>
+  
+    <li><a href="#Gradient+Manipulation">Gradient Manipulation <span>2</span></a></li>
+  
+    <li><a href="#Gradient+Normalization">Gradient Normalization <span>1</span></a></li>
+  
+    <li><a href="#Graph">Graph <span>17</span></a></li>
+  
+    <li><a href="#Graph+Neural+Network">Graph Neural Network <span>5</span></a></li>
+  
+    <li><a href="#Graph+Representation">Graph Representation <span>11</span></a></li>
+  
+    <li><a href="#Grounded+Language+Learning">Grounded Language Learning <span>1</span></a></li>
+  
+    <li><a href="#HRL">HRL <span>2</span></a></li>
+  
+    <li><a href="#Hierarchial+RNN">Hierarchial RNN <span>1</span></a></li>
+  
+    <li><a href="#Hierarchical+RL">Hierarchical RL <span>1</span></a></li>
+  
+    <li><a href="#Hierarchical+Reinforcement+Learning">Hierarchical Reinforcement Learning <span>2</span></a></li>
+  
+    <li><a href="#Hybrid+Models">Hybrid Models <span>1</span></a></li>
+  
+    <li><a href="#HyperNetwork">HyperNetwork <span>3</span></a></li>
+  
+    <li><a href="#Hyperbolic+Embedding">Hyperbolic Embedding <span>2</span></a></li>
+  
+    <li><a href="#Hyperboloid+Model">Hyperboloid Model <span>1</span></a></li>
+  
+    <li><a href="#Hypothesis">Hypothesis <span>1</span></a></li>
+  
+    <li><a href="#ICCV">ICCV <span>1</span></a></li>
+  
+    <li><a href="#ICCV+2015">ICCV 2015 <span>1</span></a></li>
+  
+    <li><a href="#ICLR">ICLR <span>25</span></a></li>
+  
+    <li><a href="#ICLR+2014">ICLR 2014 <span>1</span></a></li>
+  
+    <li><a href="#ICLR+2015">ICLR 2015 <span>1</span></a></li>
+  
+    <li><a href="#ICLR+2016">ICLR 2016 <span>1</span></a></li>
+  
+    <li><a href="#ICLR+2017">ICLR 2017 <span>2</span></a></li>
+  
+    <li><a href="#ICLR+2018">ICLR 2018 <span>4</span></a></li>
+  
+    <li><a href="#ICLR+2018%27">ICLR 2018' <span>1</span></a></li>
+  
+    <li><a href="#ICLR+2019">ICLR 2019 <span>8</span></a></li>
+  
+    <li><a href="#ICLR+2020">ICLR 2020 <span>6</span></a></li>
+  
+    <li><a href="#ICLR+2021">ICLR 2021 <span>2</span></a></li>
+  
+    <li><a href="#ICML">ICML <span>13</span></a></li>
+  
+    <li><a href="#ICML+2016">ICML 2016 <span>1</span></a></li>
+  
+    <li><a href="#ICML+2017">ICML 2017 <span>1</span></a></li>
+  
+    <li><a href="#ICML+2018">ICML 2018 <span>6</span></a></li>
+  
+    <li><a href="#ICML+2019">ICML 2019 <span>3</span></a></li>
+  
+    <li><a href="#ICML+2020">ICML 2020 <span>2</span></a></li>
+  
+    <li><a href="#ICML+2020%27">ICML 2020' <span>1</span></a></li>
+  
+    <li><a href="#IEEE">IEEE <span>3</span></a></li>
+  
+    <li><a href="#IRL">IRL <span>2</span></a></li>
+  
+    <li><a href="#ImageNet">ImageNet <span>4</span></a></li>
+  
+    <li><a href="#Incremental+Learning">Incremental Learning <span>2</span></a></li>
+  
+    <li><a href="#Information+Retrieval">Information Retrieval <span>2</span></a></li>
+  
+    <li><a href="#Information+Theory">Information Theory <span>1</span></a></li>
+  
+    <li><a href="#Initialization">Initialization <span>1</span></a></li>
+  
+    <li><a href="#Interactive+Teaching">Interactive Teaching <span>1</span></a></li>
+  
+    <li><a href="#Interpretability">Interpretability <span>1</span></a></li>
+  
+    <li><a href="#Invariance">Invariance <span>1</span></a></li>
+  
+    <li><a href="#Inverse+Reinforcement+Learning">Inverse Reinforcement Learning <span>2</span></a></li>
+  
+    <li><a href="#KD">KD <span>1</span></a></li>
+  
+    <li><a href="#KDD">KDD <span>4</span></a></li>
+  
+    <li><a href="#KDD+2013">KDD 2013 <span>1</span></a></li>
+  
+    <li><a href="#KDD+2014">KDD 2014 <span>1</span></a></li>
+  
+    <li><a href="#KDD+2015">KDD 2015 <span>1</span></a></li>
+  
+    <li><a href="#KDD+2017">KDD 2017 <span>1</span></a></li>
+  
+    <li><a href="#KRU">KRU <span>1</span></a></li>
+  
+    <li><a href="#Kernel">Kernel <span>1</span></a></li>
+  
+    <li><a href="#Key+Value">Key Value <span>2</span></a></li>
+  
+    <li><a href="#Knowledge+Distillation">Knowledge Distillation <span>1</span></a></li>
+  
+    <li><a href="#Knowledge+Transfer">Knowledge Transfer <span>2</span></a></li>
+  
+    <li><a href="#Kronecker">Kronecker <span>1</span></a></li>
+  
+    <li><a href="#LL">LL <span>6</span></a></li>
+  
+    <li><a href="#LLM">LLM <span>1</span></a></li>
+  
+    <li><a href="#LR">LR <span>1</span></a></li>
+  
+    <li><a href="#Large+Language+Model">Large Language Model <span>1</span></a></li>
+  
+    <li><a href="#Latency">Latency <span>4</span></a></li>
+  
+    <li><a href="#Latent+Variable">Latent Variable <span>2</span></a></li>
+  
+    <li><a href="#Learning+Optimizer%27">Learning Optimizer' <span>1</span></a></li>
+  
+    <li><a href="#Learning+Rate">Learning Rate <span>1</span></a></li>
+  
+    <li><a href="#Lifelong+Learning">Lifelong Learning <span>10</span></a></li>
+  
+    <li><a href="#Linear+Algebra">Linear Algebra <span>1</span></a></li>
+  
+    <li><a href="#Linear+Model">Linear Model <span>1</span></a></li>
+  
+    <li><a href="#Long-tailed+Dataset">Long-tailed Dataset <span>1</span></a></li>
+  
+    <li><a href="#Loss">Loss <span>2</span></a></li>
+  
+    <li><a href="#Loss+Function">Loss Function <span>2</span></a></li>
+  
+    <li><a href="#MAML">MAML <span>3</span></a></li>
+  
+    <li><a href="#MANN">MANN <span>1</span></a></li>
+  
+    <li><a href="#MDP">MDP <span>2</span></a></li>
+  
+    <li><a href="#ML">ML <span>3</span></a></li>
+  
+    <li><a href="#MPNN">MPNN <span>1</span></a></li>
+  
+    <li><a href="#Machine+Comprehension">Machine Comprehension <span>4</span></a></li>
+  
+    <li><a href="#Machine+Learning">Machine Learning <span>3</span></a></li>
+  
+    <li><a href="#Markov+Decision+Process">Markov Decision Process <span>2</span></a></li>
+  
+    <li><a href="#Matrix">Matrix <span>1</span></a></li>
+  
+    <li><a href="#Matrix+Factorization">Matrix Factorization <span>1</span></a></li>
+  
+    <li><a href="#Memory">Memory <span>3</span></a></li>
+  
+    <li><a href="#Memory+Augmented+Neural+Network">Memory Augmented Neural Network <span>1</span></a></li>
+  
+    <li><a href="#Message+Passing">Message Passing <span>1</span></a></li>
+  
+    <li><a href="#Meta+Learning">Meta Learning <span>8</span></a></li>
+  
+    <li><a href="#Meta+Reinforcement+Learning">Meta Reinforcement Learning <span>1</span></a></li>
+  
+    <li><a href="#Mixture+of+Experts">Mixture of Experts <span>1</span></a></li>
+  
+    <li><a href="#Model+Parallelism">Model Parallelism <span>1</span></a></li>
+  
+    <li><a href="#Model-Based">Model-Based <span>5</span></a></li>
+  
+    <li><a href="#Model-Free">Model-Free <span>3</span></a></li>
+  
+    <li><a href="#Modular+ML">Modular ML <span>1</span></a></li>
+  
+    <li><a href="#Modular+Meta+Learning">Modular Meta Learning <span>1</span></a></li>
+  
+    <li><a href="#Modular+Network">Modular Network <span>1</span></a></li>
+  
+    <li><a href="#Module">Module <span>1</span></a></li>
+  
+    <li><a href="#Motif">Motif <span>2</span></a></li>
+  
+    <li><a href="#Mujoco">Mujoco <span>1</span></a></li>
+  
+    <li><a href="#Multi+Domain">Multi Domain <span>1</span></a></li>
+  
+    <li><a href="#Multi+Modal">Multi Modal <span>2</span></a></li>
+  
+    <li><a href="#Multi+Model">Multi Model <span>1</span></a></li>
+  
+    <li><a href="#Multi+Task">Multi Task <span>5</span></a></li>
+  
+    <li><a href="#Multi-Agent">Multi-Agent <span>1</span></a></li>
+  
+    <li><a href="#NIPS">NIPS <span>7</span></a></li>
+  
+    <li><a href="#NIPS+2014">NIPS 2014 <span>2</span></a></li>
+  
+    <li><a href="#NIPS+2015">NIPS 2015 <span>2</span></a></li>
+  
+    <li><a href="#NIPS+2017">NIPS 2017 <span>3</span></a></li>
+  
+    <li><a href="#NIPS+Workskop">NIPS Workskop <span>1</span></a></li>
+  
+    <li><a href="#NLG">NLG <span>1</span></a></li>
+  
+    <li><a href="#NLI">NLI <span>2</span></a></li>
+  
+    <li><a href="#NLP">NLP <span>41</span></a></li>
+  
+    <li><a href="#NMT">NMT <span>2</span></a></li>
+  
+    <li><a href="#Natural+Language+Inference">Natural Language Inference <span>3</span></a></li>
+  
+    <li><a href="#Natural+Language+Processing">Natural Language Processing <span>12</span></a></li>
+  
+    <li><a href="#Network">Network <span>3</span></a></li>
+  
+    <li><a href="#Network+Embedding">Network Embedding <span>1</span></a></li>
+  
+    <li><a href="#NeurIPS">NeurIPS <span>6</span></a></li>
+  
+    <li><a href="#NeurIPS+2018">NeurIPS 2018 <span>3</span></a></li>
+  
+    <li><a href="#NeurIPS+2019">NeurIPS 2019 <span>2</span></a></li>
+  
+    <li><a href="#NeurIPS+2020">NeurIPS 2020 <span>2</span></a></li>
+  
+    <li><a href="#NeurIPS+Workshop+2018">NeurIPS Workshop 2018 <span>1</span></a></li>
+  
+    <li><a href="#Neural+Computation">Neural Computation <span>1</span></a></li>
+  
+    <li><a href="#Neural+Computation+2002">Neural Computation 2002 <span>1</span></a></li>
+  
+    <li><a href="#Neural+Machine+Translation">Neural Machine Translation <span>1</span></a></li>
+  
+    <li><a href="#Neural+Message+Passing">Neural Message Passing <span>1</span></a></li>
+  
+    <li><a href="#Neural+Module+Network">Neural Module Network <span>1</span></a></li>
+  
+    <li><a href="#Neurips">Neurips <span>3</span></a></li>
+  
+    <li><a href="#Neurips+2018">Neurips 2018 <span>1</span></a></li>
+  
+    <li><a href="#Neurips+2019">Neurips 2019 <span>2</span></a></li>
+  
+    <li><a href="#Normalization">Normalization <span>1</span></a></li>
+  
+    <li><a href="#OPT">OPT <span>1</span></a></li>
+  
+    <li><a href="#OS">OS <span>1</span></a></li>
+  
+    <li><a href="#Object-Oriented+Learning">Object-Oriented Learning <span>2</span></a></li>
+  
+    <li><a href="#Off+policy+RL">Off policy RL <span>3</span></a></li>
+  
+    <li><a href="#One+shot+learning">One shot learning <span>1</span></a></li>
+  
+    <li><a href="#Online+Learning">Online Learning <span>1</span></a></li>
+  
+    <li><a href="#Operating+Systems">Operating Systems <span>1</span></a></li>
+  
+    <li><a href="#Optimizer">Optimizer <span>1</span></a></li>
+  
+    <li><a href="#Out+of+Distribution">Out of Distribution <span>2</span></a></li>
+  
+    <li><a href="#Out+of+Distribution+Detection">Out of Distribution Detection <span>1</span></a></li>
+  
+    <li><a href="#Out+of+Vocabulary+Words">Out of Vocabulary Words <span>1</span></a></li>
+  
+    <li><a href="#Outlier+Detection">Outlier Detection <span>1</span></a></li>
+  
+    <li><a href="#POS">POS <span>1</span></a></li>
+  
+    <li><a href="#Physical+Reasoning">Physical Reasoning <span>1</span></a></li>
+  
+    <li><a href="#Physics">Physics <span>2</span></a></li>
+  
+    <li><a href="#Planning">Planning <span>2</span></a></li>
+  
+    <li><a href="#Poincare+Ball+Model">Poincare Ball Model <span>2</span></a></li>
+  
+    <li><a href="#Pointer+Network">Pointer Network <span>1</span></a></li>
+  
+    <li><a href="#Pooling">Pooling <span>1</span></a></li>
+  
+    <li><a href="#PreTrained+Langauge+Model">PreTrained Langauge Model <span>1</span></a></li>
+  
+    <li><a href="#Pretraining">Pretraining <span>2</span></a></li>
+  
+    <li><a href="#Procedural+Text">Procedural Text <span>1</span></a></li>
+  
+    <li><a href="#Pruning+Network">Pruning Network <span>1</span></a></li>
+  
+    <li><a href="#QA">QA <span>7</span></a></li>
+  
+    <li><a href="#RL">RL <span>29</span></a></li>
+  
+    <li><a href="#RNN">RNN <span>4</span></a></li>
+  
+    <li><a href="#RRL">RRL <span>1</span></a></li>
+  
+    <li><a href="#Ranking">Ranking <span>1</span></a></li>
+  
+    <li><a href="#Reasoning">Reasoning <span>3</span></a></li>
+  
+    <li><a href="#Recommender">Recommender <span>1</span></a></li>
+  
+    <li><a href="#Recommender+Systems">Recommender Systems <span>1</span></a></li>
+  
+    <li><a href="#Recurrent+Neural+Network">Recurrent Neural Network <span>2</span></a></li>
+  
+    <li><a href="#Reinforcement+Learning">Reinforcement Learning <span>23</span></a></li>
+  
+    <li><a href="#Relation+Learning">Relation Learning <span>3</span></a></li>
+  
+    <li><a href="#Relational+Inference">Relational Inference <span>1</span></a></li>
+  
+    <li><a href="#Relational+Learning">Relational Learning <span>5</span></a></li>
+  
+    <li><a href="#Relational+Network">Relational Network <span>1</span></a></li>
+  
+    <li><a href="#Replay+Buffer">Replay Buffer <span>5</span></a></li>
+  
+    <li><a href="#Representation+Analysis">Representation Analysis <span>1</span></a></li>
+  
+    <li><a href="#Representation+Learning">Representation Learning <span>4</span></a></li>
+  
+    <li><a href="#Robustness">Robustness <span>2</span></a></li>
+  
+    <li><a href="#SAT">SAT <span>1</span></a></li>
+  
+    <li><a href="#SGD">SGD <span>1</span></a></li>
+  
+    <li><a href="#SOTA">SOTA <span>9</span></a></li>
+  
+    <li><a href="#SSL">SSL <span>1</span></a></li>
+  
+    <li><a href="#SSO">SSO <span>1</span></a></li>
+  
+    <li><a href="#SWA">SWA <span>1</span></a></li>
+  
+    <li><a href="#Sample+Efficient">Sample Efficient <span>2</span></a></li>
+  
+    <li><a href="#Scale">Scale <span>12</span></a></li>
+  
+    <li><a href="#Science">Science <span>2</span></a></li>
+  
+    <li><a href="#Science+2002">Science 2002 <span>1</span></a></li>
+  
+    <li><a href="#Science+2016">Science 2016 <span>1</span></a></li>
+  
+    <li><a href="#Self+Gated">Self Gated <span>1</span></a></li>
+  
+    <li><a href="#Self+Supervised">Self Supervised <span>2</span></a></li>
+  
+    <li><a href="#Semantic+Loss">Semantic Loss <span>1</span></a></li>
+  
+    <li><a href="#Sentiment+Analysis">Sentiment Analysis <span>1</span></a></li>
+  
+    <li><a href="#Seq2Seq">Seq2Seq <span>1</span></a></li>
+  
+    <li><a href="#Sequential+models">Sequential models <span>1</span></a></li>
+  
+    <li><a href="#Set">Set <span>1</span></a></li>
+  
+    <li><a href="#Siamese">Siamese <span>1</span></a></li>
+  
+    <li><a href="#Softmax">Softmax <span>2</span></a></li>
+  
+    <li><a href="#Software">Software <span>4</span></a></li>
+  
+    <li><a href="#Software+Engineering">Software Engineering <span>5</span></a></li>
+  
+    <li><a href="#Speech">Speech <span>1</span></a></li>
+  
+    <li><a href="#State+Abstraction">State Abstraction <span>1</span></a></li>
+  
+    <li><a href="#Stochastic+Gradient+Descent">Stochastic Gradient Descent <span>1</span></a></li>
+  
+    <li><a href="#Structured+Exploration">Structured Exploration <span>1</span></a></li>
+  
+    <li><a href="#Summarization">Summarization <span>1</span></a></li>
+  
+    <li><a href="#Symbolic+Knowledge">Symbolic Knowledge <span>1</span></a></li>
+  
+    <li><a href="#Synchronous+SGD">Synchronous SGD <span>1</span></a></li>
+  
+    <li><a href="#Systems">Systems <span>11</span></a></li>
+  
+    <li><a href="#Technical+Debt">Technical Debt <span>1</span></a></li>
+  
+    <li><a href="#Text-to-Text+Transformer">Text-to-Text Transformer <span>1</span></a></li>
+  
+    <li><a href="#Theory">Theory <span>2</span></a></li>
+  
+    <li><a href="#Transfer+Learning">Transfer Learning <span>7</span></a></li>
+  
+    <li><a href="#Transformer">Transformer <span>3</span></a></li>
+  
+    <li><a href="#Tree">Tree <span>1</span></a></li>
+  
+    <li><a href="#Tucker+Decomposition">Tucker Decomposition <span>1</span></a></li>
+  
+    <li><a href="#UAI">UAI <span>1</span></a></li>
+  
+    <li><a href="#UAI+2018">UAI 2018 <span>1</span></a></li>
+  
+    <li><a href="#USENIX">USENIX <span>2</span></a></li>
+  
+    <li><a href="#Unsupervised">Unsupervised <span>6</span></a></li>
+  
+    <li><a href="#VAE">VAE <span>1</span></a></li>
+  
+    <li><a href="#VQA">VQA <span>6</span></a></li>
+  
+    <li><a href="#Virtual+Embodiment">Virtual Embodiment <span>1</span></a></li>
+  
+    <li><a href="#WACV">WACV <span>1</span></a></li>
+  
+    <li><a href="#WACV+2017">WACV 2017 <span>1</span></a></li>
+  
+    <li><a href="#Weight+Adaptation">Weight Adaptation <span>1</span></a></li>
+  
+    <li><a href="#Word+Vectors">Word Vectors <span>3</span></a></li>
+  
+    <li><a href="#Workshop">Workshop <span>2</span></a></li>
+  
+    <li><a href="#Zero+Shot+Generalization">Zero Shot Generalization <span>1</span></a></li>
+  
+    <li><a href="#Zero-Shot">Zero-Shot <span>1</span></a></li>
+  
+  </ul>
+
+  
+    
+  <h2 id="1999">1999</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-21T00:00:00-04:00" itemprop="datePublished">September 21, 2020</time></span> &raquo; <a href="/site/2020/09/21/Harvest,-Yield,-and-Scalable-Tolerant-Systems.html">Harvest, Yield, and Scalable Tolerant Systems</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="2002">2002</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-05-14T00:00:00-04:00" itemprop="datePublished">May 14, 2019</time></span> &raquo; <a href="/site/2019/05/14/Multiple-Model-Based-Reinforcement-Learning.html">Multiple Model-Based Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-12T00:00:00-05:00" itemprop="datePublished">November 12, 2017</time></span> &raquo; <a href="/site/2017/11/12/Network-Motifs-Simple-Building-Blocks-of-Complex-Networks.html">Network Motifs - Simple Building Blocks of Complex Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="2006">2006</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-16T00:00:00-05:00" itemprop="datePublished">November 16, 2020</time></span> &raquo; <a href="/site/2020/11/16/Data-Management-for-Internet-Scale-Single-Sign-On.html">Data Management for Internet-Scale Single-Sign-On</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-26T00:00:00-05:00" itemprop="datePublished">December 26, 2019</time></span> &raquo; <a href="/site/2019/12/26/Towards-a-Unified-Theory-of-State-Abstraction-for-MDPs.html">Towards a Unified Theory of State Abstraction for MDPs</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="2010">2010</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-14T00:00:00-05:00" itemprop="datePublished">December 14, 2020</time></span> &raquo; <a href="/site/2020/12/14/Cassandra-a-decentralized-structured-storage-system.html">Cassandra - a decentralized structured storage system</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-03-05T00:00:00-05:00" itemprop="datePublished">March 05, 2020</time></span> &raquo; <a href="/site/2020/03/05/What-Does-Classifying-More-Than-10,000-Image-Categories-Tell-Us.html">What Does Classifying More Than 10,000 Image Categories Tell Us?</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="2012">2012</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-07T00:00:00-05:00" itemprop="datePublished">December 07, 2020</time></span> &raquo; <a href="/site/2020/12/07/CAP-twelve-years-later-How-the-rules-have-changed.html">CAP twelve years later - How the rules have changed</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-30T00:00:00-05:00" itemprop="datePublished">November 30, 2020</time></span> &raquo; <a href="/site/2020/11/30/Consistency-Tradeoffs-in-Modern-Distributed-Database-System-Design.html">Consistency Tradeoffs in Modern Distributed Database System Design</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-09T00:00:00-05:00" itemprop="datePublished">November 09, 2020</time></span> &raquo; <a href="/site/2020/11/09/Searching-for-Build-Debt-Experiences-Managing-Technical-Debt-at-Google.html">Searching for Build Debt - Experiences Managing Technical Debt at Google</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="2013">2013</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-15T00:00:00-04:00" itemprop="datePublished">March 15, 2021</time></span> &raquo; <a href="/site/2021/03/15/The-Tail-at-Scale.html">The Tail at Scale</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-01T00:00:00-05:00" itemprop="datePublished">March 01, 2021</time></span> &raquo; <a href="/site/2021/03/01/Ad-Click-Prediction-a-View-from-the-Trenches.html">Ad Click Prediction - a View from the Trenches</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="2014">2014</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-08T00:00:00-05:00" itemprop="datePublished">March 08, 2021</time></span> &raquo; <a href="/site/2021/03/08/Practical-Lessons-from-Predicting-Clicks-on-Ads-at-Facebook.html">Practical Lessons from Predicting Clicks on Ads at Facebook</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-05T00:00:00-05:00" itemprop="datePublished">March 05, 2018</time></span> &raquo; <a href="/site/2018/03/05/An-Empirical-Investigation-of-Catastrophic-Forgetting-in-Gradient-Based-Neural-Networks.html">An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-06T00:00:00-05:00" itemprop="datePublished">January 06, 2018</time></span> &raquo; <a href="/site/2018/01/06/How-transferable-are-features-in-deep-neural-networks.html">How transferable are features in deep neural networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-12-31T00:00:00-05:00" itemprop="datePublished">December 31, 2017</time></span> &raquo; <a href="/site/2017/12/31/Distilling-the-Knowledge-in-a-Neural-Network.html">Distilling the Knowledge in a Neural Network</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-06-03T00:00:00-04:00" itemprop="datePublished">June 03, 2017</time></span> &raquo; <a href="/site/2017/06/03/A-Fast-and-Accurate-Dependency-Parser-using-Neural-Networks.html">A Fast and Accurate Dependency Parser using Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="2015">2015</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-14T00:00:00-05:00" itemprop="datePublished">January 14, 2018</time></span> &raquo; <a href="/site/2018/01/14/Exploring-Models-and-Data-for-Image-Question-Answering.html">Exploring Models and Data for Image Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-12-24T00:00:00-05:00" itemprop="datePublished">December 24, 2017</time></span> &raquo; <a href="/site/2017/12/24/PTE-Predictive-Text-Embedding-through-Large-scale-Heterogeneous-Text-Networks.html">PTE - Predictive Text Embedding through Large-scale Heterogeneous Text Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-05T00:00:00-04:00" itemprop="datePublished">November 05, 2017</time></span> &raquo; <a href="/site/2017/11/05/Word-Representations-via-Gaussian-Embedding.html">Word Representations via Gaussian Embedding</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-08-27T00:00:00-04:00" itemprop="datePublished">August 27, 2017</time></span> &raquo; <a href="/site/2017/08/27/Pointer-Networks.html">Pointer Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-06-26T00:00:00-04:00" itemprop="datePublished">June 26, 2017</time></span> &raquo; <a href="/site/2017/06/26/Two-Too-Simple-Adaptations-of-Word2Vec-for-Syntax-Problems.html">Two/Too Simple Adaptations of Word2Vec for Syntax Problems</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-04-28T00:00:00-04:00" itemprop="datePublished">April 28, 2017</time></span> &raquo; <a href="/site/2017/04/28/Simple-Baseline-for-Visual-Question-Answering.html">Simple Baseline for Visual Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-04-27T00:00:00-04:00" itemprop="datePublished">April 27, 2017</time></span> &raquo; <a href="/site/2017/04/27/VQA-Visual-Question-Answering.html">VQA-Visual Question Answering</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="2016">2016</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-22T00:00:00-04:00" itemprop="datePublished">March 22, 2021</time></span> &raquo; <a href="/site/2021/03/22/Deep-Neural-Networks-for-YouTube-Recommendations.html">Deep Neural Networks for YouTube Recommendations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-25T00:00:00-05:00" itemprop="datePublished">January 25, 2021</time></span> &raquo; <a href="/site/2021/01/25/HyperNetworks.html">HyperNetworks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-21T00:00:00-05:00" itemprop="datePublished">December 21, 2020</time></span> &raquo; <a href="/site/2020/12/21/Design-patterns-for-container-based-distributed-systems.html">Design patterns for container-based distributed systems</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-25T00:00:00-04:00" itemprop="datePublished">October 25, 2018</time></span> &raquo; <a href="/site/2018/10/25/One-shot-Learning-with-Memory-Augmented-Neural-Networks.html">One-shot Learning with Memory-Augmented Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-05-21T00:00:00-04:00" itemprop="datePublished">May 21, 2018</time></span> &raquo; <a href="/site/2018/05/21/Net2Net-Accelerating-Learning-via-Knowledge-Transfer.html">Net2Net-Accelerating Learning via Knowledge Transfer</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-11T00:00:00-05:00" itemprop="datePublished">March 11, 2018</time></span> &raquo; <a href="/site/2018/03/11/Improving-Information-Extraction-by-Acquiring-External-Evidence-with-Reinforcement-Learning.html">Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-11T00:00:00-05:00" itemprop="datePublished">February 11, 2018</time></span> &raquo; <a href="/site/2018/02/11/Stylistic-Transfer-in-Natural-Language-Generation-Systems-Using-Recurrent-Neural-Networks.html">Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-12-11T00:00:00-05:00" itemprop="datePublished">December 11, 2017</time></span> &raquo; <a href="/site/2017/12/11/Revisiting-Semi-Supervised-Learning-with-Graph-Embeddings.html">Revisiting Semi-Supervised Learning with Graph Embeddings</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-19T00:00:00-05:00" itemprop="datePublished">November 19, 2017</time></span> &raquo; <a href="/site/2017/11/19/Higher-order-organization-of-complex-networks.html">Higher-order organization of complex networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-09T00:00:00-04:00" itemprop="datePublished">July 09, 2017</time></span> &raquo; <a href="/site/2017/07/09/Ask-Me-Anything-Dynamic-Memory-Networks-for-Natural-Language-Processing.html">Ask Me Anything -  Dynamic Memory Networks for Natural Language Processing</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-06-17T00:00:00-04:00" itemprop="datePublished">June 17, 2017</time></span> &raquo; <a href="/site/2017/06/17/A-Decomposable-Attention-Model-for-Natural-Language-Inference.html">A Decomposable Attention Model for Natural Language Inference</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-05-23T00:00:00-04:00" itemprop="datePublished">May 23, 2017</time></span> &raquo; <a href="/site/2017/05/23/Neural-Module-Networks.html">Neural Module Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="2017">2017</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-14T00:00:00-04:00" itemprop="datePublished">August 14, 2020</time></span> &raquo; <a href="/site/2020/08/14/Outrageously-Large-Neural-Networks-The-Sparsely-Gated-Mixture-of-Experts-Layer.html">Outrageously Large Neural Networks--The Sparsely-Gated Mixture-of-Experts Layer</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-30T00:00:00-04:00" itemprop="datePublished">July 30, 2020</time></span> &raquo; <a href="/site/2020/07/30/GradNorm-Gradient-Normalization-for-Adaptive-Loss-Balancing-in-Deep-Multitask-Networks.html">GradNorm--Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-27T00:00:00-05:00" itemprop="datePublished">February 27, 2020</time></span> &raquo; <a href="/site/2020/02/27/mixup-Beyond-Empirical-Risk-Minimization.html">mixup - Beyond Empirical Risk Minimization</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-09T00:00:00-05:00" itemprop="datePublished">January 09, 2020</time></span> &raquo; <a href="/site/2020/01/09/Accurate-Large-Minibatch-SGD-Training-ImageNet-in-1-Hour.html">Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-02-05T00:00:00-05:00" itemprop="datePublished">February 05, 2019</time></span> &raquo; <a href="/site/2019/02/05/Linguistic-Knowledge-as-Memory-for-Recurrent-Neural-Networks.html">Linguistic Knowledge as Memory for Recurrent Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-12-18T00:00:00-05:00" itemprop="datePublished">December 18, 2018</time></span> &raquo; <a href="/site/2018/12/18/Hindsight-Experience-Replay.html">Hindsight Experience Replay</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-11-01T00:00:00-04:00" itemprop="datePublished">November 01, 2018</time></span> &raquo; <a href="/site/2018/11/01/Learned-Optimizers-that-Scale-and-Generalize.html">Learned Optimizers that Scale and Generalize</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-11T00:00:00-04:00" itemprop="datePublished">October 11, 2018</time></span> &raquo; <a href="/site/2018/10/11/Poincare-Embeddings-for-Learning-Hierarchical-Representations.html">Poincaré Embeddings for Learning Hierarchical Representations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-09-27T00:00:00-04:00" itemprop="datePublished">September 27, 2018</time></span> &raquo; <a href="/site/2018/09/27/HoME-a-Household-Multimodal-Environment.html">HoME - a Household Multimodal Environment</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-08-08T00:00:00-04:00" itemprop="datePublished">August 08, 2018</time></span> &raquo; <a href="/site/2018/08/08/Imagination-Augmented-Agents-for-Deep-Reinforcement-Learning.html">Imagination-Augmented Agents for Deep Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-04-08T00:00:00-04:00" itemprop="datePublished">April 08, 2018</time></span> &raquo; <a href="/site/2018/04/08/Neural-Message-Passing-for-Quantum-Chemistry.html">Neural Message Passing for Quantum Chemistry</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-04-02T00:00:00-04:00" itemprop="datePublished">April 02, 2018</time></span> &raquo; <a href="/site/2018/04/02/Unsupervised-Learning-By-Predicting-Noise.html">Unsupervised Learning by Predicting Noise</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-18T00:00:00-04:00" itemprop="datePublished">March 18, 2018</time></span> &raquo; <a href="/site/2018/03/18/Cyclical-Learning-Rates-for-Training-Neural-Networks.html">Cyclical Learning Rates for Training Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-05T00:00:00-05:00" itemprop="datePublished">February 05, 2018</time></span> &raquo; <a href="/site/2018/02/05/Get-To-The-Point-Summarization-with-Pointer-Generator-Networks.html">Get To The Point - Summarization with Pointer-Generator Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-29T00:00:00-05:00" itemprop="datePublished">January 29, 2018</time></span> &raquo; <a href="/site/2018/01/29/StarSpace-Embed-All-The-Things.html">StarSpace - Embed All The Things!</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-22T00:00:00-05:00" itemprop="datePublished">January 22, 2018</time></span> &raquo; <a href="/site/2018/01/22/Emotional-Chatting-Machine-Emotional-Conversation-Generation-with-Internal-and-External-Memory.html">Emotional Chatting Machine - Emotional Conversation Generation with Internal and External Memory</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-28T00:00:00-05:00" itemprop="datePublished">November 28, 2017</time></span> &raquo; <a href="/site/2017/11/28/Two-Stage-Synthesis-Networks-for-Transfer-Learning-in-Machine-Comprehension.html">Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-28T00:00:00-04:00" itemprop="datePublished">October 28, 2017</time></span> &raquo; <a href="/site/2017/10/28/HARP-Hierarchical-Representation-Learning-for-Networks.html">HARP - Hierarchical Representation Learning for Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-22T00:00:00-04:00" itemprop="datePublished">October 22, 2017</time></span> &raquo; <a href="/site/2017/10/22/Swish-A-self-gated-activation-function.html">Swish - a Self-Gated Activation Function</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-15T00:00:00-04:00" itemprop="datePublished">October 15, 2017</time></span> &raquo; <a href="/site/2017/10/15/Reading-Wikipedia-to-Answer-Open-Domain-Questions.html">Reading Wikipedia to Answer Open-Domain Questions</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-01T00:00:00-04:00" itemprop="datePublished">October 01, 2017</time></span> &raquo; <a href="/site/2017/10/01/Task-Oriented-Query-Reformulation-with-Reinforcement-Learning.html">Task-Oriented Query Reformulation with Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-09-22T00:00:00-04:00" itemprop="datePublished">September 22, 2017</time></span> &raquo; <a href="/site/2017/09/22/Refining-Source-Representations-with-Relation-Networks-for-Neural-Machine-Translation.html">Refining Source Representations with Relation Networks for Neural Machine Translation</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-08-21T00:00:00-04:00" itemprop="datePublished">August 21, 2017</time></span> &raquo; <a href="/site/2017/08/21/Learning-to-Compute-Word-Embeddings-On-the-Fly.html">Learning to Compute Word Embeddings On the Fly</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-08-07T00:00:00-04:00" itemprop="datePublished">August 07, 2017</time></span> &raquo; <a href="/site/2017/08/07/R-NET-Machine-Reading-Comprehension-with-Self-matching-Networks.html">R-NET - Machine Reading Comprehension with Self-matching Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-24T00:00:00-04:00" itemprop="datePublished">July 24, 2017</time></span> &raquo; <a href="/site/2017/07/24/ReasoNet-Learning-to-Stop-Reading-in-Machine-Comprehension.html">ReasoNet - Learning to Stop Reading in Machine Comprehension</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-17T00:00:00-04:00" itemprop="datePublished">July 17, 2017</time></span> &raquo; <a href="/site/2017/07/17/Principled-Detection-of-Out-of-Distribution-Examples-in-Neural-Networks.html">Principled Detection of Out-of-Distribution Examples in Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-01T00:00:00-04:00" itemprop="datePublished">July 01, 2017</time></span> &raquo; <a href="/site/2017/07/01/One-Model-To-Learn-Them-All.html">One Model To Learn Them All</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-05-14T00:00:00-04:00" itemprop="datePublished">May 14, 2017</time></span> &raquo; <a href="/site/2017/05/14/Making-the-V-in-VQA-Matter-Elevating-the-Role-of-Image-Understanding-in-Visual-Question-Answering.html">Making the V in VQA Matter - Elevating the Role of Image Understanding in Visual Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-05-07T00:00:00-04:00" itemprop="datePublished">May 07, 2017</time></span> &raquo; <a href="/site/2017/05/07/Conditional-Similarity-Networks.html">Conditional Similarity Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="2018">2018</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-29T00:00:00-04:00" itemprop="datePublished">March 29, 2021</time></span> &raquo; <a href="/site/2021/03/29/Synthesized-Policies-for-Transfer-and-Adaptation-across-Tasks-and-Environments.html">Synthesized Policies for Transfer and Adaptation across Tasks and Environments</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-11T00:00:00-05:00" itemprop="datePublished">January 11, 2021</time></span> &raquo; <a href="/site/2021/01/11/GPipe-Easy-Scaling-with-Micro-Batch-Pipeline-Parallelism.html">GPipe - Easy Scaling with Micro-Batch Pipeline Parallelism</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-31T00:00:00-04:00" itemprop="datePublished">August 31, 2020</time></span> &raquo; <a href="/site/2020/08/31/Deep-Reinforcement-Learning-and-the-Deadly-Triad.html">Deep Reinforcement Learning and the Deadly Triad</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-16T00:00:00-04:00" itemprop="datePublished">July 16, 2020</time></span> &raquo; <a href="/site/2020/07/16/Averaging-Weights-leads-to-Wider-Optima-and-Better-Generalization.html">Averaging Weights leads to Wider Optima and Better Generalization</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-03-12T00:00:00-04:00" itemprop="datePublished">March 12, 2020</time></span> &raquo; <a href="/site/2020/03/12/Competitive-Training-of-Mixtures-of-Independent-Deep-Generative-Models.html">Competitive Training of Mixtures of Independent Deep Generative Models</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-09-05T00:00:00-04:00" itemprop="datePublished">September 05, 2019</time></span> &raquo; <a href="/site/2019/09/05/How-to-train-your-MAML.html">How to train your MAML</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-08T00:00:00-04:00" itemprop="datePublished">August 08, 2019</time></span> &raquo; <a href="/site/2019/08/08/Deep-Reinforcement-Learning-in-a-Handful-of-Trials-using-Probabilistic-Dynamics-Models.html">Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-01T00:00:00-04:00" itemprop="datePublished">August 01, 2019</time></span> &raquo; <a href="/site/2019/08/01/Assessing-Generalization-in-Deep-Reinforcement-Learning.html">Assessing Generalization in Deep Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-07-25T00:00:00-04:00" itemprop="datePublished">July 25, 2019</time></span> &raquo; <a href="/site/2019/07/25/Quantifying-Generalization-in-Reinforcement-Learning.html">Quantifying Generalization in Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-07-18T00:00:00-04:00" itemprop="datePublished">July 18, 2019</time></span> &raquo; <a href="/site/2019/07/18/Set-Transformer-A-Framework-for-Attention-based-Permutation-Invariant-Neural-Networks.html">Set Transformer - A Framework for Attention-based Permutation-Invariant Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-27T00:00:00-04:00" itemprop="datePublished">June 27, 2019</time></span> &raquo; <a href="/site/2019/06/27/Measuring-Abstract-Reasoning-in-Neural-Networks.html">Measuring abstract reasoning in neural networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-08T00:00:00-04:00" itemprop="datePublished">June 08, 2019</time></span> &raquo; <a href="/site/2019/06/08/Meta-Reinforcement-Learning-of-Structured-Exploration-Strategies.html">Meta-Reinforcement Learning of Structured Exploration Strategies</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-01T00:00:00-04:00" itemprop="datePublished">June 01, 2019</time></span> &raquo; <a href="/site/2019/06/01/Relational-Reinforcement-Learning.html">Relational Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-04-09T00:00:00-04:00" itemprop="datePublished">April 09, 2019</time></span> &raquo; <a href="/site/2019/04/09/Towards-a-natural-benchmark-for-continual-learning.html">Towards a natural benchmark for continual learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-04-02T00:00:00-04:00" itemprop="datePublished">April 02, 2019</time></span> &raquo; <a href="/site/2019/04/02/Meta-Learning-Update-Rules-for-Unsupervised-Representation-Learning.html">Meta-Learning Update Rules for Unsupervised Representation Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-22T00:00:00-05:00" itemprop="datePublished">January 22, 2019</time></span> &raquo; <a href="/site/2019/01/22/Modular-meta-learning.html">Modular meta-learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-02T00:00:00-05:00" itemprop="datePublished">January 02, 2019</time></span> &raquo; <a href="/site/2019/01/02/Pre-training-Graph-Neural-Networks-with-Kernels.html">Pre-training Graph Neural Networks with Kernels</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-12-25T00:00:00-05:00" itemprop="datePublished">December 25, 2018</time></span> &raquo; <a href="/site/2018/12/25/Smooth-Loss-Functions-for-Deep-Top-k-Classification.html">Smooth Loss Functions for Deep Top-k Classification</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-12-11T00:00:00-05:00" itemprop="datePublished">December 11, 2018</time></span> &raquo; <a href="/site/2018/12/11/Representation-Tradeoffs-for-Hyperbolic-Embeddings.html">Representation Tradeoffs for Hyperbolic Embeddings</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-18T00:00:00-04:00" itemprop="datePublished">October 18, 2018</time></span> &raquo; <a href="/site/2018/10/18/BabyAI-First-Steps-Towards-Grounded-Language-Learning-With-a-Human-In-the-Loop.html">BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-04T00:00:00-04:00" itemprop="datePublished">October 04, 2018</time></span> &raquo; <a href="/site/2018/10/04/When-Recurrent-Models-Don-t-Need-To-Be-Recurrent.html">When Recurrent Models Don’t Need To Be Recurrent</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-09-12T00:00:00-04:00" itemprop="datePublished">September 12, 2018</time></span> &raquo; <a href="/site/2018/09/12/Emergence-of-Grounded-Compositional-Language-in-Multi-Agent-Populations.html">Emergence of Grounded Compositional Language in Multi-Agent Populations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-08-21T00:00:00-04:00" itemprop="datePublished">August 21, 2018</time></span> &raquo; <a href="/site/2018/08/21/A-Semantic-Loss-Function-for-Deep-Learning-with-Symbolic-Knowledge.html">A Semantic Loss Function for Deep Learning with Symbolic Knowledge</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-08-16T00:00:00-04:00" itemprop="datePublished">August 16, 2018</time></span> &raquo; <a href="/site/2018/08/16/Hierarchical-Graph-Representation-Learning-with-Differentiable-Pooling.html">Hierarchical Graph Representation Learning with Differentiable Pooling</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-07-19T00:00:00-04:00" itemprop="datePublished">July 19, 2018</time></span> &raquo; <a href="/site/2018/07/19/Kronecker-Recurrent-Units.html">Kronecker Recurrent Units</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-07-11T00:00:00-04:00" itemprop="datePublished">July 11, 2018</time></span> &raquo; <a href="/site/2018/07/11/Learning-Independent-Causal-Mechanisms.html">Learning Independent Causal Mechanisms</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-07-04T00:00:00-04:00" itemprop="datePublished">July 04, 2018</time></span> &raquo; <a href="/site/2018/07/04/Memory-Based-Parameter-Adaption.html">Memory-based Parameter Adaptation</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-06-09T00:00:00-04:00" itemprop="datePublished">June 09, 2018</time></span> &raquo; <a href="/site/2018/06/09/Born-Again-Neural-Networks.html">Born Again Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-05-06T00:00:00-04:00" itemprop="datePublished">May 06, 2018</time></span> &raquo; <a href="/site/2018/05/06/Learning-to-Count-Objects-in-Natural-Images-for-Visual-Question-Answering.html">Learning to Count Objects in Natural Images for Visual Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-25T00:00:00-04:00" itemprop="datePublished">March 25, 2018</time></span> &raquo; <a href="/site/2018/03/25/The-Lottery-Ticket-Hypothesis-Training-Pruned-Neural-Networks.html">The Lottery Ticket Hypothesis - Training Pruned Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-24T00:00:00-05:00" itemprop="datePublished">February 24, 2018</time></span> &raquo; <a href="/site/2018/02/24/Learning-a-SAT-Solver-from-Single-Bit-Supervision.html">Learning an SAT Solver from Single-Bit Supervision</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-17T00:00:00-05:00" itemprop="datePublished">February 17, 2018</time></span> &raquo; <a href="/site/2018/02/17/Neural-Relational-Inference-for-Interacting-Systems.html">Neural Relational Inference for Interacting Systems</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="2019">2019</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-08T00:00:00-05:00" itemprop="datePublished">February 08, 2021</time></span> &raquo; <a href="/site/2021/02/08/Continual-learning-with-hypernetworks.html">Continual learning with hypernetworks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-14T00:00:00-04:00" itemprop="datePublished">September 14, 2020</time></span> &raquo; <a href="/site/2020/09/14/MONet-Unsupervised-Scene-Decomposition-and-Representation.html">MONet - Unsupervised Scene Decomposition and Representation</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-06T00:00:00-04:00" itemprop="datePublished">August 06, 2020</time></span> &raquo; <a href="/site/2020/08/06/Gradient-Surgery-for-Multi-Task-Learning.html">Gradient Surgery for Multi-Task Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-02T00:00:00-04:00" itemprop="datePublished">July 02, 2020</time></span> &raquo; <a href="/site/2020/07/02/When-to-use-parametric-models-in-reinforcement-learning.html">When to use parametric models in reinforcement learning?</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-06-25T00:00:00-04:00" itemprop="datePublished">June 25, 2020</time></span> &raquo; <a href="/site/2020/06/25/Network-Randomization-A-Simple-Technique-for-Generalization-in-Deep-Reinforcement-Learning.html">Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-06-18T00:00:00-04:00" itemprop="datePublished">June 18, 2020</time></span> &raquo; <a href="/site/2020/06/18/On-the-Difficulty-of-Warm-Starting-Neural-Network-Training.html">On the Difficulty of Warm-Starting Neural Network Training</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-20T00:00:00-05:00" itemprop="datePublished">February 20, 2020</time></span> &raquo; <a href="/site/2020/02/20/ELECTRA-Pre-training-Text-Encoders-as-Discriminators-Rather-Than-Generators.html">ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-13T00:00:00-05:00" itemprop="datePublished">February 13, 2020</time></span> &raquo; <a href="/site/2020/02/13/Gradient-based-sample-selection-for-online-continual-learning.html">Gradient based sample selection for online continual learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-06T00:00:00-05:00" itemprop="datePublished">February 06, 2020</time></span> &raquo; <a href="/site/2020/02/06/Your-Classifier-is-Secretly-an-Energy-Based-Model,-and-You-Should-Treat-it-Like-One.html">Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-30T00:00:00-05:00" itemprop="datePublished">January 30, 2020</time></span> &raquo; <a href="/site/2020/01/30/Massively-Multilingual-Neural-Machine-Translation-in-the-Wild-Findings-and-Challenges.html">Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-23T00:00:00-05:00" itemprop="datePublished">January 23, 2020</time></span> &raquo; <a href="/site/2020/01/23/Observational-Overfitting-in-Reinforcement-Learning.html">Observational Overfitting in Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-16T00:00:00-05:00" itemprop="datePublished">January 16, 2020</time></span> &raquo; <a href="/site/2020/01/16/Rapid-Learning-or-Feature-Reuse-Towards-Understanding-the-Effectiveness-of-MAML.html">Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-02T00:00:00-05:00" itemprop="datePublished">January 02, 2020</time></span> &raquo; <a href="/site/2020/01/02/Superposition-of-many-models-into-one.html">Superposition of many models into one</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-19T00:00:00-05:00" itemprop="datePublished">December 19, 2019</time></span> &raquo; <a href="/site/2019/12/19/ALBERT-A-Lite-BERT-for-Self-supervised-Learning-of-Language-Representations.html">ALBERT - A Lite BERT for Self-supervised Learning of Language Representations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-12T00:00:00-05:00" itemprop="datePublished">December 12, 2019</time></span> &raquo; <a href="/site/2019/12/12/Everything-Happens-for-a-Reason-Discovering-the-Purpose-of-Actions-in-Procedural-Text.html">Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-05T00:00:00-05:00" itemprop="datePublished">December 05, 2019</time></span> &raquo; <a href="/site/2019/12/05/Mastering-Atari,-Go,-Chess-and-Shogi-by-Planning-with-a-Learned-Model.html">Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-11-28T00:00:00-05:00" itemprop="datePublished">November 28, 2019</time></span> &raquo; <a href="/site/2019/11/28/Contrastive-Learning-of-Structured-World-Models.html">Contrastive Learning of Structured World Models</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-09-12T00:00:00-04:00" itemprop="datePublished">September 12, 2019</time></span> &raquo; <a href="/site/2019/09/12/Gossip-based-Actor-Learner-Architectures-for-Deep-RL.html">Gossip based Actor-Learner Architectures for Deep RL</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-29T00:00:00-04:00" itemprop="datePublished">August 29, 2019</time></span> &raquo; <a href="/site/2019/08/29/PHYRE-A-New-Benchmark-for-Physical-Reasoning.html">PHYRE - A New Benchmark for Physical Reasoning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-22T00:00:00-04:00" itemprop="datePublished">August 22, 2019</time></span> &raquo; <a href="/site/2019/08/22/Large-Memory-Layers-with-Product-Keys.html">Large Memory Layers with Product Keys</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-15T00:00:00-04:00" itemprop="datePublished">August 15, 2019</time></span> &raquo; <a href="/site/2019/08/15/Abductive-Commonsense-Reasoning.html">Abductive Commonsense Reasoning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-20T00:00:00-04:00" itemprop="datePublished">June 20, 2019</time></span> &raquo; <a href="/site/2019/06/20/Hamiltonian-Neural-Networks.html">Hamiltonian Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-13T00:00:00-04:00" itemprop="datePublished">June 13, 2019</time></span> &raquo; <a href="/site/2019/06/13/Extrapolating-Beyond-Suboptimal-Demonstrations-via-Inverse-Reinforcement-Learning-from-Observations.html">Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-05-21T00:00:00-04:00" itemprop="datePublished">May 21, 2019</time></span> &raquo; <a href="/site/2019/05/21/Good-Enough-Compositional-Data-Augmentation.html">Good-Enough Compositional Data Augmentation</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-26T00:00:00-04:00" itemprop="datePublished">March 26, 2019</time></span> &raquo; <a href="/site/2019/03/26/GNN-Explainer-A-Tool-for-Post-hoc-Explanation-of-Graph-Neural-Networks.html">GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-16T00:00:00-04:00" itemprop="datePublished">March 16, 2019</time></span> &raquo; <a href="/site/2019/03/16/To-Tune-or-Not-to-Tune-Adapting-Pretrained-Representations-to-Diverse-Tasks.html">To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-12T00:00:00-04:00" itemprop="datePublished">March 12, 2019</time></span> &raquo; <a href="/site/2019/03/12/Model-Primitive-Hierarchical-Lifelong-Reinforcement-Learning.html">Model Primitive Hierarchical Lifelong Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-02-19T00:00:00-05:00" itemprop="datePublished">February 19, 2019</time></span> &raquo; <a href="/site/2019/02/19/TuckER-Tensor-Factorization-for-Knowledge-Graph-Completion.html">TuckER - Tensor Factorization for Knowledge Graph Completion</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-29T00:00:00-05:00" itemprop="datePublished">January 29, 2019</time></span> &raquo; <a href="/site/2019/01/29/Diversity-is-All-You-Need-Learning-Skills-without-a-Reward-Function.html">Diversity is All You Need - Learning Skills without a Reward Function</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-15T00:00:00-05:00" itemprop="datePublished">January 15, 2019</time></span> &raquo; <a href="/site/2019/01/15/Hierarchical-RL-Using-an-Ensemble-of-Proprioceptive-Periodic-Policies.html">Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-08T00:00:00-05:00" itemprop="datePublished">January 08, 2019</time></span> &raquo; <a href="/site/2019/01/08/Efficient-Lifelong-Learning-with-A-GEM.html">Efficient Lifelong Learning with A-GEM</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="2020">2020</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-22T00:00:00-05:00" itemprop="datePublished">February 22, 2021</time></span> &raquo; <a href="/site/2021/02/22/Anatomy-of-Catastrophic-Forgetting-Hidden-Representations-and-Task-Semantics.html">Anatomy of Catastrophic Forgetting - Hidden Representations and Task Semantics</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-15T00:00:00-05:00" itemprop="datePublished">February 15, 2021</time></span> &raquo; <a href="/site/2021/02/15/When-Do-Curricula-Work.html">When Do Curricula Work?</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-18T00:00:00-05:00" itemprop="datePublished">January 18, 2021</time></span> &raquo; <a href="/site/2021/01/18/Energy-based-Models-for-Continual-Learning.html">Energy-based Models for Continual Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-04T00:00:00-05:00" itemprop="datePublished">January 04, 2021</time></span> &raquo; <a href="/site/2021/01/04/Compositional-Explanations-of-Neurons.html">Compositional Explanations of Neurons</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-23T00:00:00-05:00" itemprop="datePublished">November 23, 2020</time></span> &raquo; <a href="/site/2020/11/23/Exploring-Simple-Siamese-Representation-Learning.html">Exploring Simple Siamese Representation Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-02T00:00:00-05:00" itemprop="datePublished">November 02, 2020</time></span> &raquo; <a href="/site/2020/11/02/One-Solution-is-Not-All-You-Need-Few-Shot-Extrapolation-via-Structured-MaxEnt-RL.html">One Solution is Not All You Need - Few-Shot Extrapolation via Structured MaxEnt RL</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-10-19T00:00:00-04:00" itemprop="datePublished">October 19, 2020</time></span> &raquo; <a href="/site/2020/10/19/Learning-Explanations-That-Are-Hard-To-Vary.html">Learning Explanations That Are Hard To Vary</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-10-12T00:00:00-04:00" itemprop="datePublished">October 12, 2020</time></span> &raquo; <a href="/site/2020/10/12/Remembering-for-the-Right-Reasons-Explanations-Reduce-Catastrophic-Forgetting.html">Remembering for the Right Reasons - Explanations Reduce Catastrophic Forgetting</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-28T00:00:00-04:00" itemprop="datePublished">September 28, 2020</time></span> &raquo; <a href="/site/2020/09/28/A-Foliated-View-of-Transfer-Learning.html">A Foliated View of Transfer Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-07T00:00:00-04:00" itemprop="datePublished">September 07, 2020</time></span> &raquo; <a href="/site/2020/09/07/Revisiting-Fundamentals-of-Experience-Replay.html">Revisiting Fundamentals of Experience Replay</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-24T00:00:00-04:00" itemprop="datePublished">August 24, 2020</time></span> &raquo; <a href="/site/2020/08/24/Alpha-Net-Adaptation-with-Composition-in-Classifier-Space.html">Alpha Net--Adaptation with Composition in Classifier Space</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-23T00:00:00-04:00" itemprop="datePublished">July 23, 2020</time></span> &raquo; <a href="/site/2020/07/23/TASKNORM-Rethinking-Batch-Normalization-for-Meta-Learning.html">TaskNorm--Rethinking Batch Normalization for Meta-Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-09T00:00:00-04:00" itemprop="datePublished">July 09, 2020</time></span> &raquo; <a href="/site/2020/07/09/Decentralized-Reinforcement-Learning-Global-Decision-Making-via-Local-Economic-Transactions.html">Decentralized Reinforcement Learning -- Global Decision-Making via Local Economic Transactions</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-04-30T00:00:00-04:00" itemprop="datePublished">April 30, 2020</time></span> &raquo; <a href="/site/2020/04/30/Supervised-Contrastive-Learning.html">Supervised Contrastive Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-04-09T00:00:00-04:00" itemprop="datePublished">April 09, 2020</time></span> &raquo; <a href="/site/2020/04/09/CURL-Contrastive-Unsupervised-Representations-for-Reinforcement-Learning.html">CURL - Contrastive Unsupervised Representations for Reinforcement Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="2021">2021</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-01T00:00:00-05:00" itemprop="datePublished">February 01, 2021</time></span> &raquo; <a href="/site/2021/02/01/Zero-shot-Learning-by-Generating-Task-specific-Adapters.html">Zero-shot Learning by Generating Task-specific Adapters</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="2023">2023</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2023-02-10T00:00:00-05:00" itemprop="datePublished">February 10, 2023</time></span> &raquo; <a href="/site/2023/02/10/Toolformer-Language-Models-Can-Teach-Themselves-to-Use-Tools.html">Toolformer - Language Models Can Teach Themselves to Use Tools</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="AAAI">AAAI</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-09-12T00:00:00-04:00" itemprop="datePublished">September 12, 2018</time></span> &raquo; <a href="/site/2018/09/12/Emergence-of-Grounded-Compositional-Language-in-Multi-Agent-Populations.html">Emergence of Grounded Compositional Language in Multi-Agent Populations</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="AAAI+2018">AAAI 2018</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-09-12T00:00:00-04:00" itemprop="datePublished">September 12, 2018</time></span> &raquo; <a href="/site/2018/09/12/Emergence-of-Grounded-Compositional-Language-in-Multi-Agent-Populations.html">Emergence of Grounded Compositional Language in Multi-Agent Populations</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="AAMAS">AAMAS</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-12T00:00:00-04:00" itemprop="datePublished">March 12, 2019</time></span> &raquo; <a href="/site/2019/03/12/Model-Primitive-Hierarchical-Lifelong-Reinforcement-Learning.html">Model Primitive Hierarchical Lifelong Reinforcement Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="AAMAS+2019">AAMAS 2019</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-12T00:00:00-04:00" itemprop="datePublished">March 12, 2019</time></span> &raquo; <a href="/site/2019/03/12/Model-Primitive-Hierarchical-Lifelong-Reinforcement-Learning.html">Model Primitive Hierarchical Lifelong Reinforcement Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ACID">ACID</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-07T00:00:00-05:00" itemprop="datePublished">December 07, 2020</time></span> &raquo; <a href="/site/2020/12/07/CAP-twelve-years-later-How-the-rules-have-changed.html">CAP twelve years later - How the rules have changed</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ACL">ACL</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-11T00:00:00-05:00" itemprop="datePublished">February 11, 2018</time></span> &raquo; <a href="/site/2018/02/11/Stylistic-Transfer-in-Natural-Language-Generation-Systems-Using-Recurrent-Neural-Networks.html">Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-05T00:00:00-05:00" itemprop="datePublished">February 05, 2018</time></span> &raquo; <a href="/site/2018/02/05/Get-To-The-Point-Summarization-with-Pointer-Generator-Networks.html">Get To The Point - Summarization with Pointer-Generator Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-15T00:00:00-04:00" itemprop="datePublished">October 15, 2017</time></span> &raquo; <a href="/site/2017/10/15/Reading-Wikipedia-to-Answer-Open-Domain-Questions.html">Reading Wikipedia to Answer Open-Domain Questions</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-06-26T00:00:00-04:00" itemprop="datePublished">June 26, 2017</time></span> &raquo; <a href="/site/2017/06/26/Two-Too-Simple-Adaptations-of-Word2Vec-for-Syntax-Problems.html">Two/Too Simple Adaptations of Word2Vec for Syntax Problems</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ACL+2015">ACL 2015</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-06-26T00:00:00-04:00" itemprop="datePublished">June 26, 2017</time></span> &raquo; <a href="/site/2017/06/26/Two-Too-Simple-Adaptations-of-Word2Vec-for-Syntax-Problems.html">Two/Too Simple Adaptations of Word2Vec for Syntax Problems</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ACL+2016">ACL 2016</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-11T00:00:00-05:00" itemprop="datePublished">February 11, 2018</time></span> &raquo; <a href="/site/2018/02/11/Stylistic-Transfer-in-Natural-Language-Generation-Systems-Using-Recurrent-Neural-Networks.html">Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ACL+2017">ACL 2017</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-05T00:00:00-05:00" itemprop="datePublished">February 05, 2018</time></span> &raquo; <a href="/site/2018/02/05/Get-To-The-Point-Summarization-with-Pointer-Generator-Networks.html">Get To The Point - Summarization with Pointer-Generator Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-15T00:00:00-04:00" itemprop="datePublished">October 15, 2017</time></span> &raquo; <a href="/site/2017/10/15/Reading-Wikipedia-to-Answer-Open-Domain-Questions.html">Reading Wikipedia to Answer Open-Domain Questions</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ACM">ACM</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-22T00:00:00-04:00" itemprop="datePublished">March 22, 2021</time></span> &raquo; <a href="/site/2021/03/22/Deep-Neural-Networks-for-YouTube-Recommendations.html">Deep Neural Networks for YouTube Recommendations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-15T00:00:00-04:00" itemprop="datePublished">March 15, 2021</time></span> &raquo; <a href="/site/2021/03/15/The-Tail-at-Scale.html">The Tail at Scale</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="AI">AI</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2023-02-10T00:00:00-05:00" itemprop="datePublished">February 10, 2023</time></span> &raquo; <a href="/site/2023/02/10/Toolformer-Language-Models-Can-Teach-Themselves-to-Use-Tools.html">Toolformer - Language Models Can Teach Themselves to Use Tools</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-29T00:00:00-04:00" itemprop="datePublished">March 29, 2021</time></span> &raquo; <a href="/site/2021/03/29/Synthesized-Policies-for-Transfer-and-Adaptation-across-Tasks-and-Environments.html">Synthesized Policies for Transfer and Adaptation across Tasks and Environments</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-22T00:00:00-05:00" itemprop="datePublished">February 22, 2021</time></span> &raquo; <a href="/site/2021/02/22/Anatomy-of-Catastrophic-Forgetting-Hidden-Representations-and-Task-Semantics.html">Anatomy of Catastrophic Forgetting - Hidden Representations and Task Semantics</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-15T00:00:00-05:00" itemprop="datePublished">February 15, 2021</time></span> &raquo; <a href="/site/2021/02/15/When-Do-Curricula-Work.html">When Do Curricula Work?</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-08T00:00:00-05:00" itemprop="datePublished">February 08, 2021</time></span> &raquo; <a href="/site/2021/02/08/Continual-learning-with-hypernetworks.html">Continual learning with hypernetworks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-01T00:00:00-05:00" itemprop="datePublished">February 01, 2021</time></span> &raquo; <a href="/site/2021/02/01/Zero-shot-Learning-by-Generating-Task-specific-Adapters.html">Zero-shot Learning by Generating Task-specific Adapters</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-25T00:00:00-05:00" itemprop="datePublished">January 25, 2021</time></span> &raquo; <a href="/site/2021/01/25/HyperNetworks.html">HyperNetworks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-18T00:00:00-05:00" itemprop="datePublished">January 18, 2021</time></span> &raquo; <a href="/site/2021/01/18/Energy-based-Models-for-Continual-Learning.html">Energy-based Models for Continual Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-11T00:00:00-05:00" itemprop="datePublished">January 11, 2021</time></span> &raquo; <a href="/site/2021/01/11/GPipe-Easy-Scaling-with-Micro-Batch-Pipeline-Parallelism.html">GPipe - Easy Scaling with Micro-Batch Pipeline Parallelism</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-04T00:00:00-05:00" itemprop="datePublished">January 04, 2021</time></span> &raquo; <a href="/site/2021/01/04/Compositional-Explanations-of-Neurons.html">Compositional Explanations of Neurons</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-23T00:00:00-05:00" itemprop="datePublished">November 23, 2020</time></span> &raquo; <a href="/site/2020/11/23/Exploring-Simple-Siamese-Representation-Learning.html">Exploring Simple Siamese Representation Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-02T00:00:00-05:00" itemprop="datePublished">November 02, 2020</time></span> &raquo; <a href="/site/2020/11/02/One-Solution-is-Not-All-You-Need-Few-Shot-Extrapolation-via-Structured-MaxEnt-RL.html">One Solution is Not All You Need - Few-Shot Extrapolation via Structured MaxEnt RL</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-10-19T00:00:00-04:00" itemprop="datePublished">October 19, 2020</time></span> &raquo; <a href="/site/2020/10/19/Learning-Explanations-That-Are-Hard-To-Vary.html">Learning Explanations That Are Hard To Vary</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-10-12T00:00:00-04:00" itemprop="datePublished">October 12, 2020</time></span> &raquo; <a href="/site/2020/10/12/Remembering-for-the-Right-Reasons-Explanations-Reduce-Catastrophic-Forgetting.html">Remembering for the Right Reasons - Explanations Reduce Catastrophic Forgetting</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-28T00:00:00-04:00" itemprop="datePublished">September 28, 2020</time></span> &raquo; <a href="/site/2020/09/28/A-Foliated-View-of-Transfer-Learning.html">A Foliated View of Transfer Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-14T00:00:00-04:00" itemprop="datePublished">September 14, 2020</time></span> &raquo; <a href="/site/2020/09/14/MONet-Unsupervised-Scene-Decomposition-and-Representation.html">MONet - Unsupervised Scene Decomposition and Representation</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-07T00:00:00-04:00" itemprop="datePublished">September 07, 2020</time></span> &raquo; <a href="/site/2020/09/07/Revisiting-Fundamentals-of-Experience-Replay.html">Revisiting Fundamentals of Experience Replay</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-31T00:00:00-04:00" itemprop="datePublished">August 31, 2020</time></span> &raquo; <a href="/site/2020/08/31/Deep-Reinforcement-Learning-and-the-Deadly-Triad.html">Deep Reinforcement Learning and the Deadly Triad</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-24T00:00:00-04:00" itemprop="datePublished">August 24, 2020</time></span> &raquo; <a href="/site/2020/08/24/Alpha-Net-Adaptation-with-Composition-in-Classifier-Space.html">Alpha Net--Adaptation with Composition in Classifier Space</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-14T00:00:00-04:00" itemprop="datePublished">August 14, 2020</time></span> &raquo; <a href="/site/2020/08/14/Outrageously-Large-Neural-Networks-The-Sparsely-Gated-Mixture-of-Experts-Layer.html">Outrageously Large Neural Networks--The Sparsely-Gated Mixture-of-Experts Layer</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-06T00:00:00-04:00" itemprop="datePublished">August 06, 2020</time></span> &raquo; <a href="/site/2020/08/06/Gradient-Surgery-for-Multi-Task-Learning.html">Gradient Surgery for Multi-Task Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-30T00:00:00-04:00" itemprop="datePublished">July 30, 2020</time></span> &raquo; <a href="/site/2020/07/30/GradNorm-Gradient-Normalization-for-Adaptive-Loss-Balancing-in-Deep-Multitask-Networks.html">GradNorm--Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-23T00:00:00-04:00" itemprop="datePublished">July 23, 2020</time></span> &raquo; <a href="/site/2020/07/23/TASKNORM-Rethinking-Batch-Normalization-for-Meta-Learning.html">TaskNorm--Rethinking Batch Normalization for Meta-Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-16T00:00:00-04:00" itemprop="datePublished">July 16, 2020</time></span> &raquo; <a href="/site/2020/07/16/Averaging-Weights-leads-to-Wider-Optima-and-Better-Generalization.html">Averaging Weights leads to Wider Optima and Better Generalization</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-09T00:00:00-04:00" itemprop="datePublished">July 09, 2020</time></span> &raquo; <a href="/site/2020/07/09/Decentralized-Reinforcement-Learning-Global-Decision-Making-via-Local-Economic-Transactions.html">Decentralized Reinforcement Learning -- Global Decision-Making via Local Economic Transactions</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-02T00:00:00-04:00" itemprop="datePublished">July 02, 2020</time></span> &raquo; <a href="/site/2020/07/02/When-to-use-parametric-models-in-reinforcement-learning.html">When to use parametric models in reinforcement learning?</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-06-25T00:00:00-04:00" itemprop="datePublished">June 25, 2020</time></span> &raquo; <a href="/site/2020/06/25/Network-Randomization-A-Simple-Technique-for-Generalization-in-Deep-Reinforcement-Learning.html">Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-06-18T00:00:00-04:00" itemprop="datePublished">June 18, 2020</time></span> &raquo; <a href="/site/2020/06/18/On-the-Difficulty-of-Warm-Starting-Neural-Network-Training.html">On the Difficulty of Warm-Starting Neural Network Training</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-04-30T00:00:00-04:00" itemprop="datePublished">April 30, 2020</time></span> &raquo; <a href="/site/2020/04/30/Supervised-Contrastive-Learning.html">Supervised Contrastive Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-04-09T00:00:00-04:00" itemprop="datePublished">April 09, 2020</time></span> &raquo; <a href="/site/2020/04/09/CURL-Contrastive-Unsupervised-Representations-for-Reinforcement-Learning.html">CURL - Contrastive Unsupervised Representations for Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-03-12T00:00:00-04:00" itemprop="datePublished">March 12, 2020</time></span> &raquo; <a href="/site/2020/03/12/Competitive-Training-of-Mixtures-of-Independent-Deep-Generative-Models.html">Competitive Training of Mixtures of Independent Deep Generative Models</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-03-05T00:00:00-05:00" itemprop="datePublished">March 05, 2020</time></span> &raquo; <a href="/site/2020/03/05/What-Does-Classifying-More-Than-10,000-Image-Categories-Tell-Us.html">What Does Classifying More Than 10,000 Image Categories Tell Us?</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-27T00:00:00-05:00" itemprop="datePublished">February 27, 2020</time></span> &raquo; <a href="/site/2020/02/27/mixup-Beyond-Empirical-Risk-Minimization.html">mixup - Beyond Empirical Risk Minimization</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-20T00:00:00-05:00" itemprop="datePublished">February 20, 2020</time></span> &raquo; <a href="/site/2020/02/20/ELECTRA-Pre-training-Text-Encoders-as-Discriminators-Rather-Than-Generators.html">ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-13T00:00:00-05:00" itemprop="datePublished">February 13, 2020</time></span> &raquo; <a href="/site/2020/02/13/Gradient-based-sample-selection-for-online-continual-learning.html">Gradient based sample selection for online continual learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-06T00:00:00-05:00" itemprop="datePublished">February 06, 2020</time></span> &raquo; <a href="/site/2020/02/06/Your-Classifier-is-Secretly-an-Energy-Based-Model,-and-You-Should-Treat-it-Like-One.html">Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-30T00:00:00-05:00" itemprop="datePublished">January 30, 2020</time></span> &raquo; <a href="/site/2020/01/30/Massively-Multilingual-Neural-Machine-Translation-in-the-Wild-Findings-and-Challenges.html">Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-23T00:00:00-05:00" itemprop="datePublished">January 23, 2020</time></span> &raquo; <a href="/site/2020/01/23/Observational-Overfitting-in-Reinforcement-Learning.html">Observational Overfitting in Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-16T00:00:00-05:00" itemprop="datePublished">January 16, 2020</time></span> &raquo; <a href="/site/2020/01/16/Rapid-Learning-or-Feature-Reuse-Towards-Understanding-the-Effectiveness-of-MAML.html">Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-09T00:00:00-05:00" itemprop="datePublished">January 09, 2020</time></span> &raquo; <a href="/site/2020/01/09/Accurate-Large-Minibatch-SGD-Training-ImageNet-in-1-Hour.html">Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-02T00:00:00-05:00" itemprop="datePublished">January 02, 2020</time></span> &raquo; <a href="/site/2020/01/02/Superposition-of-many-models-into-one.html">Superposition of many models into one</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-26T00:00:00-05:00" itemprop="datePublished">December 26, 2019</time></span> &raquo; <a href="/site/2019/12/26/Towards-a-Unified-Theory-of-State-Abstraction-for-MDPs.html">Towards a Unified Theory of State Abstraction for MDPs</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-19T00:00:00-05:00" itemprop="datePublished">December 19, 2019</time></span> &raquo; <a href="/site/2019/12/19/ALBERT-A-Lite-BERT-for-Self-supervised-Learning-of-Language-Representations.html">ALBERT - A Lite BERT for Self-supervised Learning of Language Representations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-12T00:00:00-05:00" itemprop="datePublished">December 12, 2019</time></span> &raquo; <a href="/site/2019/12/12/Everything-Happens-for-a-Reason-Discovering-the-Purpose-of-Actions-in-Procedural-Text.html">Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-05T00:00:00-05:00" itemprop="datePublished">December 05, 2019</time></span> &raquo; <a href="/site/2019/12/05/Mastering-Atari,-Go,-Chess-and-Shogi-by-Planning-with-a-Learned-Model.html">Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-11-28T00:00:00-05:00" itemprop="datePublished">November 28, 2019</time></span> &raquo; <a href="/site/2019/11/28/Contrastive-Learning-of-Structured-World-Models.html">Contrastive Learning of Structured World Models</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-09-12T00:00:00-04:00" itemprop="datePublished">September 12, 2019</time></span> &raquo; <a href="/site/2019/09/12/Gossip-based-Actor-Learner-Architectures-for-Deep-RL.html">Gossip based Actor-Learner Architectures for Deep RL</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-09-05T00:00:00-04:00" itemprop="datePublished">September 05, 2019</time></span> &raquo; <a href="/site/2019/09/05/How-to-train-your-MAML.html">How to train your MAML</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-29T00:00:00-04:00" itemprop="datePublished">August 29, 2019</time></span> &raquo; <a href="/site/2019/08/29/PHYRE-A-New-Benchmark-for-Physical-Reasoning.html">PHYRE - A New Benchmark for Physical Reasoning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-22T00:00:00-04:00" itemprop="datePublished">August 22, 2019</time></span> &raquo; <a href="/site/2019/08/22/Large-Memory-Layers-with-Product-Keys.html">Large Memory Layers with Product Keys</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-15T00:00:00-04:00" itemprop="datePublished">August 15, 2019</time></span> &raquo; <a href="/site/2019/08/15/Abductive-Commonsense-Reasoning.html">Abductive Commonsense Reasoning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-08T00:00:00-04:00" itemprop="datePublished">August 08, 2019</time></span> &raquo; <a href="/site/2019/08/08/Deep-Reinforcement-Learning-in-a-Handful-of-Trials-using-Probabilistic-Dynamics-Models.html">Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-01T00:00:00-04:00" itemprop="datePublished">August 01, 2019</time></span> &raquo; <a href="/site/2019/08/01/Assessing-Generalization-in-Deep-Reinforcement-Learning.html">Assessing Generalization in Deep Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-07-25T00:00:00-04:00" itemprop="datePublished">July 25, 2019</time></span> &raquo; <a href="/site/2019/07/25/Quantifying-Generalization-in-Reinforcement-Learning.html">Quantifying Generalization in Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-07-18T00:00:00-04:00" itemprop="datePublished">July 18, 2019</time></span> &raquo; <a href="/site/2019/07/18/Set-Transformer-A-Framework-for-Attention-based-Permutation-Invariant-Neural-Networks.html">Set Transformer - A Framework for Attention-based Permutation-Invariant Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-27T00:00:00-04:00" itemprop="datePublished">June 27, 2019</time></span> &raquo; <a href="/site/2019/06/27/Measuring-Abstract-Reasoning-in-Neural-Networks.html">Measuring abstract reasoning in neural networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-20T00:00:00-04:00" itemprop="datePublished">June 20, 2019</time></span> &raquo; <a href="/site/2019/06/20/Hamiltonian-Neural-Networks.html">Hamiltonian Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-13T00:00:00-04:00" itemprop="datePublished">June 13, 2019</time></span> &raquo; <a href="/site/2019/06/13/Extrapolating-Beyond-Suboptimal-Demonstrations-via-Inverse-Reinforcement-Learning-from-Observations.html">Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-08T00:00:00-04:00" itemprop="datePublished">June 08, 2019</time></span> &raquo; <a href="/site/2019/06/08/Meta-Reinforcement-Learning-of-Structured-Exploration-Strategies.html">Meta-Reinforcement Learning of Structured Exploration Strategies</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-01T00:00:00-04:00" itemprop="datePublished">June 01, 2019</time></span> &raquo; <a href="/site/2019/06/01/Relational-Reinforcement-Learning.html">Relational Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-05-21T00:00:00-04:00" itemprop="datePublished">May 21, 2019</time></span> &raquo; <a href="/site/2019/05/21/Good-Enough-Compositional-Data-Augmentation.html">Good-Enough Compositional Data Augmentation</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-05-14T00:00:00-04:00" itemprop="datePublished">May 14, 2019</time></span> &raquo; <a href="/site/2019/05/14/Multiple-Model-Based-Reinforcement-Learning.html">Multiple Model-Based Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-04-09T00:00:00-04:00" itemprop="datePublished">April 09, 2019</time></span> &raquo; <a href="/site/2019/04/09/Towards-a-natural-benchmark-for-continual-learning.html">Towards a natural benchmark for continual learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-04-02T00:00:00-04:00" itemprop="datePublished">April 02, 2019</time></span> &raquo; <a href="/site/2019/04/02/Meta-Learning-Update-Rules-for-Unsupervised-Representation-Learning.html">Meta-Learning Update Rules for Unsupervised Representation Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-26T00:00:00-04:00" itemprop="datePublished">March 26, 2019</time></span> &raquo; <a href="/site/2019/03/26/GNN-Explainer-A-Tool-for-Post-hoc-Explanation-of-Graph-Neural-Networks.html">GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-16T00:00:00-04:00" itemprop="datePublished">March 16, 2019</time></span> &raquo; <a href="/site/2019/03/16/To-Tune-or-Not-to-Tune-Adapting-Pretrained-Representations-to-Diverse-Tasks.html">To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-12T00:00:00-04:00" itemprop="datePublished">March 12, 2019</time></span> &raquo; <a href="/site/2019/03/12/Model-Primitive-Hierarchical-Lifelong-Reinforcement-Learning.html">Model Primitive Hierarchical Lifelong Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-02-19T00:00:00-05:00" itemprop="datePublished">February 19, 2019</time></span> &raquo; <a href="/site/2019/02/19/TuckER-Tensor-Factorization-for-Knowledge-Graph-Completion.html">TuckER - Tensor Factorization for Knowledge Graph Completion</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-02-05T00:00:00-05:00" itemprop="datePublished">February 05, 2019</time></span> &raquo; <a href="/site/2019/02/05/Linguistic-Knowledge-as-Memory-for-Recurrent-Neural-Networks.html">Linguistic Knowledge as Memory for Recurrent Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-15T00:00:00-05:00" itemprop="datePublished">January 15, 2019</time></span> &raquo; <a href="/site/2019/01/15/Hierarchical-RL-Using-an-Ensemble-of-Proprioceptive-Periodic-Policies.html">Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-08T00:00:00-05:00" itemprop="datePublished">January 08, 2019</time></span> &raquo; <a href="/site/2019/01/08/Efficient-Lifelong-Learning-with-A-GEM.html">Efficient Lifelong Learning with A-GEM</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-02T00:00:00-05:00" itemprop="datePublished">January 02, 2019</time></span> &raquo; <a href="/site/2019/01/02/Pre-training-Graph-Neural-Networks-with-Kernels.html">Pre-training Graph Neural Networks with Kernels</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-12-25T00:00:00-05:00" itemprop="datePublished">December 25, 2018</time></span> &raquo; <a href="/site/2018/12/25/Smooth-Loss-Functions-for-Deep-Top-k-Classification.html">Smooth Loss Functions for Deep Top-k Classification</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-12-18T00:00:00-05:00" itemprop="datePublished">December 18, 2018</time></span> &raquo; <a href="/site/2018/12/18/Hindsight-Experience-Replay.html">Hindsight Experience Replay</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-12-11T00:00:00-05:00" itemprop="datePublished">December 11, 2018</time></span> &raquo; <a href="/site/2018/12/11/Representation-Tradeoffs-for-Hyperbolic-Embeddings.html">Representation Tradeoffs for Hyperbolic Embeddings</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-11-01T00:00:00-04:00" itemprop="datePublished">November 01, 2018</time></span> &raquo; <a href="/site/2018/11/01/Learned-Optimizers-that-Scale-and-Generalize.html">Learned Optimizers that Scale and Generalize</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-25T00:00:00-04:00" itemprop="datePublished">October 25, 2018</time></span> &raquo; <a href="/site/2018/10/25/One-shot-Learning-with-Memory-Augmented-Neural-Networks.html">One-shot Learning with Memory-Augmented Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-18T00:00:00-04:00" itemprop="datePublished">October 18, 2018</time></span> &raquo; <a href="/site/2018/10/18/BabyAI-First-Steps-Towards-Grounded-Language-Learning-With-a-Human-In-the-Loop.html">BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-11T00:00:00-04:00" itemprop="datePublished">October 11, 2018</time></span> &raquo; <a href="/site/2018/10/11/Poincare-Embeddings-for-Learning-Hierarchical-Representations.html">Poincaré Embeddings for Learning Hierarchical Representations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-04T00:00:00-04:00" itemprop="datePublished">October 04, 2018</time></span> &raquo; <a href="/site/2018/10/04/When-Recurrent-Models-Don-t-Need-To-Be-Recurrent.html">When Recurrent Models Don’t Need To Be Recurrent</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-09-27T00:00:00-04:00" itemprop="datePublished">September 27, 2018</time></span> &raquo; <a href="/site/2018/09/27/HoME-a-Household-Multimodal-Environment.html">HoME - a Household Multimodal Environment</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-09-12T00:00:00-04:00" itemprop="datePublished">September 12, 2018</time></span> &raquo; <a href="/site/2018/09/12/Emergence-of-Grounded-Compositional-Language-in-Multi-Agent-Populations.html">Emergence of Grounded Compositional Language in Multi-Agent Populations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-08-21T00:00:00-04:00" itemprop="datePublished">August 21, 2018</time></span> &raquo; <a href="/site/2018/08/21/A-Semantic-Loss-Function-for-Deep-Learning-with-Symbolic-Knowledge.html">A Semantic Loss Function for Deep Learning with Symbolic Knowledge</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-08-16T00:00:00-04:00" itemprop="datePublished">August 16, 2018</time></span> &raquo; <a href="/site/2018/08/16/Hierarchical-Graph-Representation-Learning-with-Differentiable-Pooling.html">Hierarchical Graph Representation Learning with Differentiable Pooling</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-08-08T00:00:00-04:00" itemprop="datePublished">August 08, 2018</time></span> &raquo; <a href="/site/2018/08/08/Imagination-Augmented-Agents-for-Deep-Reinforcement-Learning.html">Imagination-Augmented Agents for Deep Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-07-19T00:00:00-04:00" itemprop="datePublished">July 19, 2018</time></span> &raquo; <a href="/site/2018/07/19/Kronecker-Recurrent-Units.html">Kronecker Recurrent Units</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-07-11T00:00:00-04:00" itemprop="datePublished">July 11, 2018</time></span> &raquo; <a href="/site/2018/07/11/Learning-Independent-Causal-Mechanisms.html">Learning Independent Causal Mechanisms</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-07-04T00:00:00-04:00" itemprop="datePublished">July 04, 2018</time></span> &raquo; <a href="/site/2018/07/04/Memory-Based-Parameter-Adaption.html">Memory-based Parameter Adaptation</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-06-09T00:00:00-04:00" itemprop="datePublished">June 09, 2018</time></span> &raquo; <a href="/site/2018/06/09/Born-Again-Neural-Networks.html">Born Again Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-05-21T00:00:00-04:00" itemprop="datePublished">May 21, 2018</time></span> &raquo; <a href="/site/2018/05/21/Net2Net-Accelerating-Learning-via-Knowledge-Transfer.html">Net2Net-Accelerating Learning via Knowledge Transfer</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-05-06T00:00:00-04:00" itemprop="datePublished">May 06, 2018</time></span> &raquo; <a href="/site/2018/05/06/Learning-to-Count-Objects-in-Natural-Images-for-Visual-Question-Answering.html">Learning to Count Objects in Natural Images for Visual Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-04-08T00:00:00-04:00" itemprop="datePublished">April 08, 2018</time></span> &raquo; <a href="/site/2018/04/08/Neural-Message-Passing-for-Quantum-Chemistry.html">Neural Message Passing for Quantum Chemistry</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-04-02T00:00:00-04:00" itemprop="datePublished">April 02, 2018</time></span> &raquo; <a href="/site/2018/04/02/Unsupervised-Learning-By-Predicting-Noise.html">Unsupervised Learning by Predicting Noise</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-25T00:00:00-04:00" itemprop="datePublished">March 25, 2018</time></span> &raquo; <a href="/site/2018/03/25/The-Lottery-Ticket-Hypothesis-Training-Pruned-Neural-Networks.html">The Lottery Ticket Hypothesis - Training Pruned Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-18T00:00:00-04:00" itemprop="datePublished">March 18, 2018</time></span> &raquo; <a href="/site/2018/03/18/Cyclical-Learning-Rates-for-Training-Neural-Networks.html">Cyclical Learning Rates for Training Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-11T00:00:00-05:00" itemprop="datePublished">March 11, 2018</time></span> &raquo; <a href="/site/2018/03/11/Improving-Information-Extraction-by-Acquiring-External-Evidence-with-Reinforcement-Learning.html">Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-05T00:00:00-05:00" itemprop="datePublished">March 05, 2018</time></span> &raquo; <a href="/site/2018/03/05/An-Empirical-Investigation-of-Catastrophic-Forgetting-in-Gradient-Based-Neural-Networks.html">An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-24T00:00:00-05:00" itemprop="datePublished">February 24, 2018</time></span> &raquo; <a href="/site/2018/02/24/Learning-a-SAT-Solver-from-Single-Bit-Supervision.html">Learning an SAT Solver from Single-Bit Supervision</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-17T00:00:00-05:00" itemprop="datePublished">February 17, 2018</time></span> &raquo; <a href="/site/2018/02/17/Neural-Relational-Inference-for-Interacting-Systems.html">Neural Relational Inference for Interacting Systems</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-11T00:00:00-05:00" itemprop="datePublished">February 11, 2018</time></span> &raquo; <a href="/site/2018/02/11/Stylistic-Transfer-in-Natural-Language-Generation-Systems-Using-Recurrent-Neural-Networks.html">Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-05T00:00:00-05:00" itemprop="datePublished">February 05, 2018</time></span> &raquo; <a href="/site/2018/02/05/Get-To-The-Point-Summarization-with-Pointer-Generator-Networks.html">Get To The Point - Summarization with Pointer-Generator Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-22T00:00:00-05:00" itemprop="datePublished">January 22, 2018</time></span> &raquo; <a href="/site/2018/01/22/Emotional-Chatting-Machine-Emotional-Conversation-Generation-with-Internal-and-External-Memory.html">Emotional Chatting Machine - Emotional Conversation Generation with Internal and External Memory</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-14T00:00:00-05:00" itemprop="datePublished">January 14, 2018</time></span> &raquo; <a href="/site/2018/01/14/Exploring-Models-and-Data-for-Image-Question-Answering.html">Exploring Models and Data for Image Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-06T00:00:00-05:00" itemprop="datePublished">January 06, 2018</time></span> &raquo; <a href="/site/2018/01/06/How-transferable-are-features-in-deep-neural-networks.html">How transferable are features in deep neural networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-12-31T00:00:00-05:00" itemprop="datePublished">December 31, 2017</time></span> &raquo; <a href="/site/2017/12/31/Distilling-the-Knowledge-in-a-Neural-Network.html">Distilling the Knowledge in a Neural Network</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-12-11T00:00:00-05:00" itemprop="datePublished">December 11, 2017</time></span> &raquo; <a href="/site/2017/12/11/Revisiting-Semi-Supervised-Learning-with-Graph-Embeddings.html">Revisiting Semi-Supervised Learning with Graph Embeddings</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-28T00:00:00-05:00" itemprop="datePublished">November 28, 2017</time></span> &raquo; <a href="/site/2017/11/28/Two-Stage-Synthesis-Networks-for-Transfer-Learning-in-Machine-Comprehension.html">Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-05T00:00:00-04:00" itemprop="datePublished">November 05, 2017</time></span> &raquo; <a href="/site/2017/11/05/Word-Representations-via-Gaussian-Embedding.html">Word Representations via Gaussian Embedding</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-28T00:00:00-04:00" itemprop="datePublished">October 28, 2017</time></span> &raquo; <a href="/site/2017/10/28/HARP-Hierarchical-Representation-Learning-for-Networks.html">HARP - Hierarchical Representation Learning for Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-22T00:00:00-04:00" itemprop="datePublished">October 22, 2017</time></span> &raquo; <a href="/site/2017/10/22/Swish-A-self-gated-activation-function.html">Swish - a Self-Gated Activation Function</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-15T00:00:00-04:00" itemprop="datePublished">October 15, 2017</time></span> &raquo; <a href="/site/2017/10/15/Reading-Wikipedia-to-Answer-Open-Domain-Questions.html">Reading Wikipedia to Answer Open-Domain Questions</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-01T00:00:00-04:00" itemprop="datePublished">October 01, 2017</time></span> &raquo; <a href="/site/2017/10/01/Task-Oriented-Query-Reformulation-with-Reinforcement-Learning.html">Task-Oriented Query Reformulation with Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-09-22T00:00:00-04:00" itemprop="datePublished">September 22, 2017</time></span> &raquo; <a href="/site/2017/09/22/Refining-Source-Representations-with-Relation-Networks-for-Neural-Machine-Translation.html">Refining Source Representations with Relation Networks for Neural Machine Translation</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-08-27T00:00:00-04:00" itemprop="datePublished">August 27, 2017</time></span> &raquo; <a href="/site/2017/08/27/Pointer-Networks.html">Pointer Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-08-21T00:00:00-04:00" itemprop="datePublished">August 21, 2017</time></span> &raquo; <a href="/site/2017/08/21/Learning-to-Compute-Word-Embeddings-On-the-Fly.html">Learning to Compute Word Embeddings On the Fly</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-08-07T00:00:00-04:00" itemprop="datePublished">August 07, 2017</time></span> &raquo; <a href="/site/2017/08/07/R-NET-Machine-Reading-Comprehension-with-Self-matching-Networks.html">R-NET - Machine Reading Comprehension with Self-matching Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-24T00:00:00-04:00" itemprop="datePublished">July 24, 2017</time></span> &raquo; <a href="/site/2017/07/24/ReasoNet-Learning-to-Stop-Reading-in-Machine-Comprehension.html">ReasoNet - Learning to Stop Reading in Machine Comprehension</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-17T00:00:00-04:00" itemprop="datePublished">July 17, 2017</time></span> &raquo; <a href="/site/2017/07/17/Principled-Detection-of-Out-of-Distribution-Examples-in-Neural-Networks.html">Principled Detection of Out-of-Distribution Examples in Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-09T00:00:00-04:00" itemprop="datePublished">July 09, 2017</time></span> &raquo; <a href="/site/2017/07/09/Ask-Me-Anything-Dynamic-Memory-Networks-for-Natural-Language-Processing.html">Ask Me Anything -  Dynamic Memory Networks for Natural Language Processing</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-01T00:00:00-04:00" itemprop="datePublished">July 01, 2017</time></span> &raquo; <a href="/site/2017/07/01/One-Model-To-Learn-Them-All.html">One Model To Learn Them All</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-06-26T00:00:00-04:00" itemprop="datePublished">June 26, 2017</time></span> &raquo; <a href="/site/2017/06/26/Two-Too-Simple-Adaptations-of-Word2Vec-for-Syntax-Problems.html">Two/Too Simple Adaptations of Word2Vec for Syntax Problems</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-06-17T00:00:00-04:00" itemprop="datePublished">June 17, 2017</time></span> &raquo; <a href="/site/2017/06/17/A-Decomposable-Attention-Model-for-Natural-Language-Inference.html">A Decomposable Attention Model for Natural Language Inference</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-05-23T00:00:00-04:00" itemprop="datePublished">May 23, 2017</time></span> &raquo; <a href="/site/2017/05/23/Neural-Module-Networks.html">Neural Module Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-05-14T00:00:00-04:00" itemprop="datePublished">May 14, 2017</time></span> &raquo; <a href="/site/2017/05/14/Making-the-V-in-VQA-Matter-Elevating-the-Role-of-Image-Understanding-in-Visual-Question-Answering.html">Making the V in VQA Matter - Elevating the Role of Image Understanding in Visual Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-05-07T00:00:00-04:00" itemprop="datePublished">May 07, 2017</time></span> &raquo; <a href="/site/2017/05/07/Conditional-Similarity-Networks.html">Conditional Similarity Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-04-28T00:00:00-04:00" itemprop="datePublished">April 28, 2017</time></span> &raquo; <a href="/site/2017/04/28/Simple-Baseline-for-Visual-Question-Answering.html">Simple Baseline for Visual Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-04-27T00:00:00-04:00" itemprop="datePublished">April 27, 2017</time></span> &raquo; <a href="/site/2017/04/27/VQA-Visual-Question-Answering.html">VQA-Visual Question Answering</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Abductive+Reasoning">Abductive Reasoning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-15T00:00:00-04:00" itemprop="datePublished">August 15, 2019</time></span> &raquo; <a href="/site/2019/08/15/Abductive-Commonsense-Reasoning.html">Abductive Commonsense Reasoning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Abstract+Summarization">Abstract Summarization</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-05T00:00:00-05:00" itemprop="datePublished">February 05, 2018</time></span> &raquo; <a href="/site/2018/02/05/Get-To-The-Point-Summarization-with-Pointer-Generator-Networks.html">Get To The Point - Summarization with Pointer-Generator Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Accelerated+Training">Accelerated Training</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-18T00:00:00-04:00" itemprop="datePublished">March 18, 2018</time></span> &raquo; <a href="/site/2018/03/18/Cyclical-Learning-Rates-for-Training-Neural-Networks.html">Cyclical Learning Rates for Training Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Activation">Activation</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-05T00:00:00-05:00" itemprop="datePublished">March 05, 2018</time></span> &raquo; <a href="/site/2018/03/05/An-Empirical-Investigation-of-Catastrophic-Forgetting-in-Gradient-Based-Neural-Networks.html">An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Activation+Function">Activation Function</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-22T00:00:00-04:00" itemprop="datePublished">October 22, 2017</time></span> &raquo; <a href="/site/2017/10/22/Swish-A-self-gated-activation-function.html">Swish - a Self-Gated Activation Function</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Adapter">Adapter</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-01T00:00:00-05:00" itemprop="datePublished">February 01, 2021</time></span> &raquo; <a href="/site/2021/02/01/Zero-shot-Learning-by-Generating-Task-specific-Adapters.html">Zero-shot Learning by Generating Task-specific Adapters</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Ads">Ads</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-08T00:00:00-05:00" itemprop="datePublished">March 08, 2021</time></span> &raquo; <a href="/site/2021/03/08/Practical-Lessons-from-Predicting-Clicks-on-Ads-at-Facebook.html">Practical Lessons from Predicting Clicks on Ads at Facebook</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-01T00:00:00-05:00" itemprop="datePublished">March 01, 2021</time></span> &raquo; <a href="/site/2021/03/01/Ad-Click-Prediction-a-View-from-the-Trenches.html">Ad Click Prediction - a View from the Trenches</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Adversarial">Adversarial</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-06T00:00:00-05:00" itemprop="datePublished">February 06, 2020</time></span> &raquo; <a href="/site/2020/02/06/Your-Classifier-is-Secretly-an-Energy-Based-Model,-and-You-Should-Treat-it-Like-One.html">Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Adversarial+Robustness">Adversarial Robustness</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-27T00:00:00-05:00" itemprop="datePublished">February 27, 2020</time></span> &raquo; <a href="/site/2020/02/27/mixup-Beyond-Empirical-Risk-Minimization.html">mixup - Beyond Empirical Risk Minimization</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-06T00:00:00-05:00" itemprop="datePublished">February 06, 2020</time></span> &raquo; <a href="/site/2020/02/06/Your-Classifier-is-Secretly-an-Energy-Based-Model,-and-You-Should-Treat-it-Like-One.html">Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Apache">Apache</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-14T00:00:00-05:00" itemprop="datePublished">December 14, 2020</time></span> &raquo; <a href="/site/2020/12/14/Cassandra-a-decentralized-structured-storage-system.html">Cassandra - a decentralized structured storage system</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Attention">Attention</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-14T00:00:00-04:00" itemprop="datePublished">September 14, 2020</time></span> &raquo; <a href="/site/2020/09/14/MONet-Unsupervised-Scene-Decomposition-and-Representation.html">MONet - Unsupervised Scene Decomposition and Representation</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-20T00:00:00-05:00" itemprop="datePublished">February 20, 2020</time></span> &raquo; <a href="/site/2020/02/20/ELECTRA-Pre-training-Text-Encoders-as-Discriminators-Rather-Than-Generators.html">ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-19T00:00:00-05:00" itemprop="datePublished">December 19, 2019</time></span> &raquo; <a href="/site/2019/12/19/ALBERT-A-Lite-BERT-for-Self-supervised-Learning-of-Language-Representations.html">ALBERT - A Lite BERT for Self-supervised Learning of Language Representations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-22T00:00:00-04:00" itemprop="datePublished">August 22, 2019</time></span> &raquo; <a href="/site/2019/08/22/Large-Memory-Layers-with-Product-Keys.html">Large Memory Layers with Product Keys</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-09T00:00:00-04:00" itemprop="datePublished">July 09, 2017</time></span> &raquo; <a href="/site/2017/07/09/Ask-Me-Anything-Dynamic-Memory-Networks-for-Natural-Language-Processing.html">Ask Me Anything -  Dynamic Memory Networks for Natural Language Processing</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-01T00:00:00-04:00" itemprop="datePublished">July 01, 2017</time></span> &raquo; <a href="/site/2017/07/01/One-Model-To-Learn-Them-All.html">One Model To Learn Them All</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-06-17T00:00:00-04:00" itemprop="datePublished">June 17, 2017</time></span> &raquo; <a href="/site/2017/06/17/A-Decomposable-Attention-Model-for-Natural-Language-Inference.html">A Decomposable Attention Model for Natural Language Inference</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="BASE">BASE</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-07T00:00:00-05:00" itemprop="datePublished">December 07, 2020</time></span> &raquo; <a href="/site/2020/12/07/CAP-twelve-years-later-How-the-rules-have-changed.html">CAP twelve years later - How the rules have changed</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="BN">BN</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-23T00:00:00-04:00" itemprop="datePublished">July 23, 2020</time></span> &raquo; <a href="/site/2020/07/23/TASKNORM-Rethinking-Batch-Normalization-for-Meta-Learning.html">TaskNorm--Rethinking Batch Normalization for Meta-Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Batch+Normalisation">Batch Normalisation</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-23T00:00:00-04:00" itemprop="datePublished">July 23, 2020</time></span> &raquo; <a href="/site/2020/07/23/TASKNORM-Rethinking-Batch-Normalization-for-Meta-Learning.html">TaskNorm--Rethinking Batch Normalization for Meta-Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="BatchNorm">BatchNorm</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-23T00:00:00-04:00" itemprop="datePublished">July 23, 2020</time></span> &raquo; <a href="/site/2020/07/23/TASKNORM-Rethinking-Batch-Normalization-for-Meta-Learning.html">TaskNorm--Rethinking Batch Normalization for Meta-Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Benchmark">Benchmark</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-29T00:00:00-04:00" itemprop="datePublished">August 29, 2019</time></span> &raquo; <a href="/site/2019/08/29/PHYRE-A-New-Benchmark-for-Physical-Reasoning.html">PHYRE - A New Benchmark for Physical Reasoning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Big+Data">Big Data</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-14T00:00:00-05:00" itemprop="datePublished">December 14, 2020</time></span> &raquo; <a href="/site/2020/12/14/Cassandra-a-decentralized-structured-storage-system.html">Cassandra - a decentralized structured storage system</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-07T00:00:00-05:00" itemprop="datePublished">December 07, 2020</time></span> &raquo; <a href="/site/2020/12/07/CAP-twelve-years-later-How-the-rules-have-changed.html">CAP twelve years later - How the rules have changed</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-30T00:00:00-05:00" itemprop="datePublished">November 30, 2020</time></span> &raquo; <a href="/site/2020/11/30/Consistency-Tradeoffs-in-Modern-Distributed-Database-System-Design.html">Consistency Tradeoffs in Modern Distributed Database System Design</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-16T00:00:00-05:00" itemprop="datePublished">November 16, 2020</time></span> &raquo; <a href="/site/2020/11/16/Data-Management-for-Internet-Scale-Single-Sign-On.html">Data Management for Internet-Scale Single-Sign-On</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Build+System">Build System</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-09T00:00:00-05:00" itemprop="datePublished">November 09, 2020</time></span> &raquo; <a href="/site/2020/11/09/Searching-for-Build-Debt-Experiences-Managing-Technical-Debt-at-Google.html">Searching for Build Debt - Experiences Managing Technical Debt at Google</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="CAP">CAP</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-14T00:00:00-05:00" itemprop="datePublished">December 14, 2020</time></span> &raquo; <a href="/site/2020/12/14/Cassandra-a-decentralized-structured-storage-system.html">Cassandra - a decentralized structured storage system</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-07T00:00:00-05:00" itemprop="datePublished">December 07, 2020</time></span> &raquo; <a href="/site/2020/12/07/CAP-twelve-years-later-How-the-rules-have-changed.html">CAP twelve years later - How the rules have changed</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-30T00:00:00-05:00" itemprop="datePublished">November 30, 2020</time></span> &raquo; <a href="/site/2020/11/30/Consistency-Tradeoffs-in-Modern-Distributed-Database-System-Design.html">Consistency Tradeoffs in Modern Distributed Database System Design</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-21T00:00:00-04:00" itemprop="datePublished">September 21, 2020</time></span> &raquo; <a href="/site/2020/09/21/Harvest,-Yield,-and-Scalable-Tolerant-Systems.html">Harvest, Yield, and Scalable Tolerant Systems</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="CL">CL</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-22T00:00:00-05:00" itemprop="datePublished">February 22, 2021</time></span> &raquo; <a href="/site/2021/02/22/Anatomy-of-Catastrophic-Forgetting-Hidden-Representations-and-Task-Semantics.html">Anatomy of Catastrophic Forgetting - Hidden Representations and Task Semantics</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-08T00:00:00-05:00" itemprop="datePublished">February 08, 2021</time></span> &raquo; <a href="/site/2021/02/08/Continual-learning-with-hypernetworks.html">Continual learning with hypernetworks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-18T00:00:00-05:00" itemprop="datePublished">January 18, 2021</time></span> &raquo; <a href="/site/2021/01/18/Energy-based-Models-for-Continual-Learning.html">Energy-based Models for Continual Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-10-12T00:00:00-04:00" itemprop="datePublished">October 12, 2020</time></span> &raquo; <a href="/site/2020/10/12/Remembering-for-the-Right-Reasons-Explanations-Reduce-Catastrophic-Forgetting.html">Remembering for the Right Reasons - Explanations Reduce Catastrophic Forgetting</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-13T00:00:00-05:00" itemprop="datePublished">February 13, 2020</time></span> &raquo; <a href="/site/2020/02/13/Gradient-based-sample-selection-for-online-continual-learning.html">Gradient based sample selection for online continual learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-02T00:00:00-05:00" itemprop="datePublished">January 02, 2020</time></span> &raquo; <a href="/site/2020/01/02/Superposition-of-many-models-into-one.html">Superposition of many models into one</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-04-09T00:00:00-04:00" itemprop="datePublished">April 09, 2019</time></span> &raquo; <a href="/site/2019/04/09/Towards-a-natural-benchmark-for-continual-learning.html">Towards a natural benchmark for continual learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-12T00:00:00-04:00" itemprop="datePublished">March 12, 2019</time></span> &raquo; <a href="/site/2019/03/12/Model-Primitive-Hierarchical-Lifelong-Reinforcement-Learning.html">Model Primitive Hierarchical Lifelong Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-08T00:00:00-05:00" itemprop="datePublished">January 08, 2019</time></span> &raquo; <a href="/site/2019/01/08/Efficient-Lifelong-Learning-with-A-GEM.html">Efficient Lifelong Learning with A-GEM</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="CTR">CTR</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-08T00:00:00-05:00" itemprop="datePublished">March 08, 2021</time></span> &raquo; <a href="/site/2021/03/08/Practical-Lessons-from-Predicting-Clicks-on-Ads-at-Facebook.html">Practical Lessons from Predicting Clicks on Ads at Facebook</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-01T00:00:00-05:00" itemprop="datePublished">March 01, 2021</time></span> &raquo; <a href="/site/2021/03/01/Ad-Click-Prediction-a-View-from-the-Trenches.html">Ad Click Prediction - a View from the Trenches</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="CV">CV</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-23T00:00:00-05:00" itemprop="datePublished">November 23, 2020</time></span> &raquo; <a href="/site/2020/11/23/Exploring-Simple-Siamese-Representation-Learning.html">Exploring Simple Siamese Representation Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-14T00:00:00-04:00" itemprop="datePublished">September 14, 2020</time></span> &raquo; <a href="/site/2020/09/14/MONet-Unsupervised-Scene-Decomposition-and-Representation.html">MONet - Unsupervised Scene Decomposition and Representation</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-03-05T00:00:00-05:00" itemprop="datePublished">March 05, 2020</time></span> &raquo; <a href="/site/2020/03/05/What-Does-Classifying-More-Than-10,000-Image-Categories-Tell-Us.html">What Does Classifying More Than 10,000 Image Categories Tell Us?</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-08T00:00:00-05:00" itemprop="datePublished">January 08, 2019</time></span> &raquo; <a href="/site/2019/01/08/Efficient-Lifelong-Learning-with-A-GEM.html">Efficient Lifelong Learning with A-GEM</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-05-21T00:00:00-04:00" itemprop="datePublished">May 21, 2018</time></span> &raquo; <a href="/site/2018/05/21/Net2Net-Accelerating-Learning-via-Knowledge-Transfer.html">Net2Net-Accelerating Learning via Knowledge Transfer</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-05-06T00:00:00-04:00" itemprop="datePublished">May 06, 2018</time></span> &raquo; <a href="/site/2018/05/06/Learning-to-Count-Objects-in-Natural-Images-for-Visual-Question-Answering.html">Learning to Count Objects in Natural Images for Visual Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-04-02T00:00:00-04:00" itemprop="datePublished">April 02, 2018</time></span> &raquo; <a href="/site/2018/04/02/Unsupervised-Learning-By-Predicting-Noise.html">Unsupervised Learning by Predicting Noise</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-14T00:00:00-05:00" itemprop="datePublished">January 14, 2018</time></span> &raquo; <a href="/site/2018/01/14/Exploring-Models-and-Data-for-Image-Question-Answering.html">Exploring Models and Data for Image Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-06T00:00:00-05:00" itemprop="datePublished">January 06, 2018</time></span> &raquo; <a href="/site/2018/01/06/How-transferable-are-features-in-deep-neural-networks.html">How transferable are features in deep neural networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-17T00:00:00-04:00" itemprop="datePublished">July 17, 2017</time></span> &raquo; <a href="/site/2017/07/17/Principled-Detection-of-Out-of-Distribution-Examples-in-Neural-Networks.html">Principled Detection of Out-of-Distribution Examples in Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-01T00:00:00-04:00" itemprop="datePublished">July 01, 2017</time></span> &raquo; <a href="/site/2017/07/01/One-Model-To-Learn-Them-All.html">One Model To Learn Them All</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-05-23T00:00:00-04:00" itemprop="datePublished">May 23, 2017</time></span> &raquo; <a href="/site/2017/05/23/Neural-Module-Networks.html">Neural Module Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-05-14T00:00:00-04:00" itemprop="datePublished">May 14, 2017</time></span> &raquo; <a href="/site/2017/05/14/Making-the-V-in-VQA-Matter-Elevating-the-Role-of-Image-Understanding-in-Visual-Question-Answering.html">Making the V in VQA Matter - Elevating the Role of Image Understanding in Visual Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-05-07T00:00:00-04:00" itemprop="datePublished">May 07, 2017</time></span> &raquo; <a href="/site/2017/05/07/Conditional-Similarity-Networks.html">Conditional Similarity Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-04-28T00:00:00-04:00" itemprop="datePublished">April 28, 2017</time></span> &raquo; <a href="/site/2017/04/28/Simple-Baseline-for-Visual-Question-Answering.html">Simple Baseline for Visual Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-04-27T00:00:00-04:00" itemprop="datePublished">April 27, 2017</time></span> &raquo; <a href="/site/2017/04/27/VQA-Visual-Question-Answering.html">VQA-Visual Question Answering</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="CVPR">CVPR</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-05-23T00:00:00-04:00" itemprop="datePublished">May 23, 2017</time></span> &raquo; <a href="/site/2017/05/23/Neural-Module-Networks.html">Neural Module Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-05-07T00:00:00-04:00" itemprop="datePublished">May 07, 2017</time></span> &raquo; <a href="/site/2017/05/07/Conditional-Similarity-Networks.html">Conditional Similarity Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="CVPR+2016">CVPR 2016</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-05-23T00:00:00-04:00" itemprop="datePublished">May 23, 2017</time></span> &raquo; <a href="/site/2017/05/23/Neural-Module-Networks.html">Neural Module Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="CVPR+2017">CVPR 2017</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-05-07T00:00:00-04:00" itemprop="datePublished">May 07, 2017</time></span> &raquo; <a href="/site/2017/05/07/Conditional-Similarity-Networks.html">Conditional Similarity Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Calibration">Calibration</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-06T00:00:00-05:00" itemprop="datePublished">February 06, 2020</time></span> &raquo; <a href="/site/2020/02/06/Your-Classifier-is-Secretly-an-Energy-Based-Model,-and-You-Should-Treat-it-Like-One.html">Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Catastrophic+Forgetting">Catastrophic Forgetting</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-22T00:00:00-05:00" itemprop="datePublished">February 22, 2021</time></span> &raquo; <a href="/site/2021/02/22/Anatomy-of-Catastrophic-Forgetting-Hidden-Representations-and-Task-Semantics.html">Anatomy of Catastrophic Forgetting - Hidden Representations and Task Semantics</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-18T00:00:00-05:00" itemprop="datePublished">January 18, 2021</time></span> &raquo; <a href="/site/2021/01/18/Energy-based-Models-for-Continual-Learning.html">Energy-based Models for Continual Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-10-12T00:00:00-04:00" itemprop="datePublished">October 12, 2020</time></span> &raquo; <a href="/site/2020/10/12/Remembering-for-the-Right-Reasons-Explanations-Reduce-Catastrophic-Forgetting.html">Remembering for the Right Reasons - Explanations Reduce Catastrophic Forgetting</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-13T00:00:00-05:00" itemprop="datePublished">February 13, 2020</time></span> &raquo; <a href="/site/2020/02/13/Gradient-based-sample-selection-for-online-continual-learning.html">Gradient based sample selection for online continual learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-04-09T00:00:00-04:00" itemprop="datePublished">April 09, 2019</time></span> &raquo; <a href="/site/2019/04/09/Towards-a-natural-benchmark-for-continual-learning.html">Towards a natural benchmark for continual learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-12T00:00:00-04:00" itemprop="datePublished">March 12, 2019</time></span> &raquo; <a href="/site/2019/03/12/Model-Primitive-Hierarchical-Lifelong-Reinforcement-Learning.html">Model Primitive Hierarchical Lifelong Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-08T00:00:00-05:00" itemprop="datePublished">January 08, 2019</time></span> &raquo; <a href="/site/2019/01/08/Efficient-Lifelong-Learning-with-A-GEM.html">Efficient Lifelong Learning with A-GEM</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-05T00:00:00-05:00" itemprop="datePublished">March 05, 2018</time></span> &raquo; <a href="/site/2018/03/05/An-Empirical-Investigation-of-Catastrophic-Forgetting-in-Gradient-Based-Neural-Networks.html">An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Causal+Learning">Causal Learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-03-12T00:00:00-04:00" itemprop="datePublished">March 12, 2020</time></span> &raquo; <a href="/site/2020/03/12/Competitive-Training-of-Mixtures-of-Independent-Deep-Generative-Models.html">Competitive Training of Mixtures of Independent Deep Generative Models</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-07-11T00:00:00-04:00" itemprop="datePublished">July 11, 2018</time></span> &raquo; <a href="/site/2018/07/11/Learning-Independent-Causal-Mechanisms.html">Learning Independent Causal Mechanisms</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Causality">Causality</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-03-12T00:00:00-04:00" itemprop="datePublished">March 12, 2020</time></span> &raquo; <a href="/site/2020/03/12/Competitive-Training-of-Mixtures-of-Independent-Deep-Generative-Models.html">Competitive Training of Mixtures of Independent Deep Generative Models</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-07-11T00:00:00-04:00" itemprop="datePublished">July 11, 2018</time></span> &raquo; <a href="/site/2018/07/11/Learning-Independent-Causal-Mechanisms.html">Learning Independent Causal Mechanisms</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Chemistry">Chemistry</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-04-08T00:00:00-04:00" itemprop="datePublished">April 08, 2018</time></span> &raquo; <a href="/site/2018/04/08/Neural-Message-Passing-for-Quantum-Chemistry.html">Neural Message Passing for Quantum Chemistry</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Classifier">Classifier</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-24T00:00:00-04:00" itemprop="datePublished">August 24, 2020</time></span> &raquo; <a href="/site/2020/08/24/Alpha-Net-Adaptation-with-Composition-in-Classifier-Space.html">Alpha Net--Adaptation with Composition in Classifier Space</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Click-Through+Rate">Click-Through Rate</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-08T00:00:00-05:00" itemprop="datePublished">March 08, 2021</time></span> &raquo; <a href="/site/2021/03/08/Practical-Lessons-from-Predicting-Clicks-on-Ads-at-Facebook.html">Practical Lessons from Predicting Clicks on Ads at Facebook</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-01T00:00:00-05:00" itemprop="datePublished">March 01, 2021</time></span> &raquo; <a href="/site/2021/03/01/Ad-Click-Prediction-a-View-from-the-Trenches.html">Ad Click Prediction - a View from the Trenches</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Clustering">Clustering</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-03-12T00:00:00-04:00" itemprop="datePublished">March 12, 2020</time></span> &raquo; <a href="/site/2020/03/12/Competitive-Training-of-Mixtures-of-Independent-Deep-Generative-Models.html">Competitive Training of Mixtures of Independent Deep Generative Models</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-19T00:00:00-05:00" itemprop="datePublished">November 19, 2017</time></span> &raquo; <a href="/site/2017/11/19/Higher-order-organization-of-complex-networks.html">Higher-order organization of complex networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Compositionality">Compositionality</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-29T00:00:00-04:00" itemprop="datePublished">March 29, 2021</time></span> &raquo; <a href="/site/2021/03/29/Synthesized-Policies-for-Transfer-and-Adaptation-across-Tasks-and-Environments.html">Synthesized Policies for Transfer and Adaptation across Tasks and Environments</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-04T00:00:00-05:00" itemprop="datePublished">January 04, 2021</time></span> &raquo; <a href="/site/2021/01/04/Compositional-Explanations-of-Neurons.html">Compositional Explanations of Neurons</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-24T00:00:00-04:00" itemprop="datePublished">August 24, 2020</time></span> &raquo; <a href="/site/2020/08/24/Alpha-Net-Adaptation-with-Composition-in-Classifier-Space.html">Alpha Net--Adaptation with Composition in Classifier Space</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-05-21T00:00:00-04:00" itemprop="datePublished">May 21, 2019</time></span> &raquo; <a href="/site/2019/05/21/Good-Enough-Compositional-Data-Augmentation.html">Good-Enough Compositional Data Augmentation</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Conditional+Computation">Conditional Computation</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-14T00:00:00-04:00" itemprop="datePublished">August 14, 2020</time></span> &raquo; <a href="/site/2020/08/14/Outrageously-Large-Neural-Networks-The-Sparsely-Gated-Mixture-of-Experts-Layer.html">Outrageously Large Neural Networks--The Sparsely-Gated Mixture-of-Experts Layer</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Container">Container</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-21T00:00:00-05:00" itemprop="datePublished">December 21, 2020</time></span> &raquo; <a href="/site/2020/12/21/Design-patterns-for-container-based-distributed-systems.html">Design patterns for container-based distributed systems</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Continual+Learning">Continual Learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-22T00:00:00-05:00" itemprop="datePublished">February 22, 2021</time></span> &raquo; <a href="/site/2021/02/22/Anatomy-of-Catastrophic-Forgetting-Hidden-Representations-and-Task-Semantics.html">Anatomy of Catastrophic Forgetting - Hidden Representations and Task Semantics</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-08T00:00:00-05:00" itemprop="datePublished">February 08, 2021</time></span> &raquo; <a href="/site/2021/02/08/Continual-learning-with-hypernetworks.html">Continual learning with hypernetworks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-18T00:00:00-05:00" itemprop="datePublished">January 18, 2021</time></span> &raquo; <a href="/site/2021/01/18/Energy-based-Models-for-Continual-Learning.html">Energy-based Models for Continual Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-10-12T00:00:00-04:00" itemprop="datePublished">October 12, 2020</time></span> &raquo; <a href="/site/2020/10/12/Remembering-for-the-Right-Reasons-Explanations-Reduce-Catastrophic-Forgetting.html">Remembering for the Right Reasons - Explanations Reduce Catastrophic Forgetting</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-13T00:00:00-05:00" itemprop="datePublished">February 13, 2020</time></span> &raquo; <a href="/site/2020/02/13/Gradient-based-sample-selection-for-online-continual-learning.html">Gradient based sample selection for online continual learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-02T00:00:00-05:00" itemprop="datePublished">January 02, 2020</time></span> &raquo; <a href="/site/2020/01/02/Superposition-of-many-models-into-one.html">Superposition of many models into one</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-04-09T00:00:00-04:00" itemprop="datePublished">April 09, 2019</time></span> &raquo; <a href="/site/2019/04/09/Towards-a-natural-benchmark-for-continual-learning.html">Towards a natural benchmark for continual learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-12T00:00:00-04:00" itemprop="datePublished">March 12, 2019</time></span> &raquo; <a href="/site/2019/03/12/Model-Primitive-Hierarchical-Lifelong-Reinforcement-Learning.html">Model Primitive Hierarchical Lifelong Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-08T00:00:00-05:00" itemprop="datePublished">January 08, 2019</time></span> &raquo; <a href="/site/2019/01/08/Efficient-Lifelong-Learning-with-A-GEM.html">Efficient Lifelong Learning with A-GEM</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-07-04T00:00:00-04:00" itemprop="datePublished">July 04, 2018</time></span> &raquo; <a href="/site/2018/07/04/Memory-Based-Parameter-Adaption.html">Memory-based Parameter Adaptation</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Contrastive">Contrastive</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-04-30T00:00:00-04:00" itemprop="datePublished">April 30, 2020</time></span> &raquo; <a href="/site/2020/04/30/Supervised-Contrastive-Learning.html">Supervised Contrastive Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-04-09T00:00:00-04:00" itemprop="datePublished">April 09, 2020</time></span> &raquo; <a href="/site/2020/04/09/CURL-Contrastive-Unsupervised-Representations-for-Reinforcement-Learning.html">CURL - Contrastive Unsupervised Representations for Reinforcement Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Contrastive+Learning">Contrastive Learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-04-30T00:00:00-04:00" itemprop="datePublished">April 30, 2020</time></span> &raquo; <a href="/site/2020/04/30/Supervised-Contrastive-Learning.html">Supervised Contrastive Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-04-09T00:00:00-04:00" itemprop="datePublished">April 09, 2020</time></span> &raquo; <a href="/site/2020/04/09/CURL-Contrastive-Unsupervised-Representations-for-Reinforcement-Learning.html">CURL - Contrastive Unsupervised Representations for Reinforcement Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Conversational+Agent">Conversational Agent</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-22T00:00:00-05:00" itemprop="datePublished">January 22, 2018</time></span> &raquo; <a href="/site/2018/01/22/Emotional-Chatting-Machine-Emotional-Conversation-Generation-with-Internal-and-External-Memory.html">Emotional Chatting Machine - Emotional Conversation Generation with Internal and External Memory</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Count+Based+VQA">Count Based VQA</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-05-06T00:00:00-04:00" itemprop="datePublished">May 06, 2018</time></span> &raquo; <a href="/site/2018/05/06/Learning-to-Count-Objects-in-Natural-Images-for-Visual-Question-Answering.html">Learning to Count Objects in Natural Images for Visual Question Answering</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Credit+Assignment">Credit Assignment</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-09T00:00:00-04:00" itemprop="datePublished">July 09, 2020</time></span> &raquo; <a href="/site/2020/07/09/Decentralized-Reinforcement-Learning-Global-Decision-Making-via-Local-Economic-Transactions.html">Decentralized Reinforcement Learning -- Global Decision-Making via Local Economic Transactions</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Curriculum+Learning">Curriculum Learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-15T00:00:00-05:00" itemprop="datePublished">February 15, 2021</time></span> &raquo; <a href="/site/2021/02/15/When-Do-Curricula-Work.html">When Do Curricula Work?</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-18T00:00:00-04:00" itemprop="datePublished">October 18, 2018</time></span> &raquo; <a href="/site/2018/10/18/BabyAI-First-Steps-Towards-Grounded-Language-Learning-With-a-Human-In-the-Loop.html">BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="DBMS">DBMS</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-14T00:00:00-05:00" itemprop="datePublished">December 14, 2020</time></span> &raquo; <a href="/site/2020/12/14/Cassandra-a-decentralized-structured-storage-system.html">Cassandra - a decentralized structured storage system</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-07T00:00:00-05:00" itemprop="datePublished">December 07, 2020</time></span> &raquo; <a href="/site/2020/12/07/CAP-twelve-years-later-How-the-rules-have-changed.html">CAP twelve years later - How the rules have changed</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-30T00:00:00-05:00" itemprop="datePublished">November 30, 2020</time></span> &raquo; <a href="/site/2020/11/30/Consistency-Tradeoffs-in-Modern-Distributed-Database-System-Design.html">Consistency Tradeoffs in Modern Distributed Database System Design</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-16T00:00:00-05:00" itemprop="datePublished">November 16, 2020</time></span> &raquo; <a href="/site/2020/11/16/Data-Management-for-Internet-Scale-Single-Sign-On.html">Data Management for Internet-Scale Single-Sign-On</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="DRL">DRL</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-02T00:00:00-05:00" itemprop="datePublished">November 02, 2020</time></span> &raquo; <a href="/site/2020/11/02/One-Solution-is-Not-All-You-Need-Few-Shot-Extrapolation-via-Structured-MaxEnt-RL.html">One Solution is Not All You Need - Few-Shot Extrapolation via Structured MaxEnt RL</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-07T00:00:00-04:00" itemprop="datePublished">September 07, 2020</time></span> &raquo; <a href="/site/2020/09/07/Revisiting-Fundamentals-of-Experience-Replay.html">Revisiting Fundamentals of Experience Replay</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-31T00:00:00-04:00" itemprop="datePublished">August 31, 2020</time></span> &raquo; <a href="/site/2020/08/31/Deep-Reinforcement-Learning-and-the-Deadly-Triad.html">Deep Reinforcement Learning and the Deadly Triad</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-02T00:00:00-04:00" itemprop="datePublished">July 02, 2020</time></span> &raquo; <a href="/site/2020/07/02/When-to-use-parametric-models-in-reinforcement-learning.html">When to use parametric models in reinforcement learning?</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-06-25T00:00:00-04:00" itemprop="datePublished">June 25, 2020</time></span> &raquo; <a href="/site/2020/06/25/Network-Randomization-A-Simple-Technique-for-Generalization-in-Deep-Reinforcement-Learning.html">Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-04-09T00:00:00-04:00" itemprop="datePublished">April 09, 2020</time></span> &raquo; <a href="/site/2020/04/09/CURL-Contrastive-Unsupervised-Representations-for-Reinforcement-Learning.html">CURL - Contrastive Unsupervised Representations for Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-23T00:00:00-05:00" itemprop="datePublished">January 23, 2020</time></span> &raquo; <a href="/site/2020/01/23/Observational-Overfitting-in-Reinforcement-Learning.html">Observational Overfitting in Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-05T00:00:00-05:00" itemprop="datePublished">December 05, 2019</time></span> &raquo; <a href="/site/2019/12/05/Mastering-Atari,-Go,-Chess-and-Shogi-by-Planning-with-a-Learned-Model.html">Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-09-12T00:00:00-04:00" itemprop="datePublished">September 12, 2019</time></span> &raquo; <a href="/site/2019/09/12/Gossip-based-Actor-Learner-Architectures-for-Deep-RL.html">Gossip based Actor-Learner Architectures for Deep RL</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-08T00:00:00-04:00" itemprop="datePublished">August 08, 2019</time></span> &raquo; <a href="/site/2019/08/08/Deep-Reinforcement-Learning-in-a-Handful-of-Trials-using-Probabilistic-Dynamics-Models.html">Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-01T00:00:00-04:00" itemprop="datePublished">August 01, 2019</time></span> &raquo; <a href="/site/2019/08/01/Assessing-Generalization-in-Deep-Reinforcement-Learning.html">Assessing Generalization in Deep Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-07-25T00:00:00-04:00" itemprop="datePublished">July 25, 2019</time></span> &raquo; <a href="/site/2019/07/25/Quantifying-Generalization-in-Reinforcement-Learning.html">Quantifying Generalization in Reinforcement Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Data">Data</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-14T00:00:00-05:00" itemprop="datePublished">December 14, 2020</time></span> &raquo; <a href="/site/2020/12/14/Cassandra-a-decentralized-structured-storage-system.html">Cassandra - a decentralized structured storage system</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-30T00:00:00-05:00" itemprop="datePublished">November 30, 2020</time></span> &raquo; <a href="/site/2020/11/30/Consistency-Tradeoffs-in-Modern-Distributed-Database-System-Design.html">Consistency Tradeoffs in Modern Distributed Database System Design</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-16T00:00:00-05:00" itemprop="datePublished">November 16, 2020</time></span> &raquo; <a href="/site/2020/11/16/Data-Management-for-Internet-Scale-Single-Sign-On.html">Data Management for Internet-Scale Single-Sign-On</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Data+Augmentation">Data Augmentation</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-27T00:00:00-05:00" itemprop="datePublished">February 27, 2020</time></span> &raquo; <a href="/site/2020/02/27/mixup-Beyond-Empirical-Risk-Minimization.html">mixup - Beyond Empirical Risk Minimization</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-05-21T00:00:00-04:00" itemprop="datePublished">May 21, 2019</time></span> &raquo; <a href="/site/2019/05/21/Good-Enough-Compositional-Data-Augmentation.html">Good-Enough Compositional Data Augmentation</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Data+Mining">Data Mining</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-08T00:00:00-05:00" itemprop="datePublished">March 08, 2021</time></span> &raquo; <a href="/site/2021/03/08/Practical-Lessons-from-Predicting-Clicks-on-Ads-at-Facebook.html">Practical Lessons from Predicting Clicks on Ads at Facebook</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-01T00:00:00-05:00" itemprop="datePublished">March 01, 2021</time></span> &raquo; <a href="/site/2021/03/01/Ad-Click-Prediction-a-View-from-the-Trenches.html">Ad Click Prediction - a View from the Trenches</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Database">Database</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-14T00:00:00-05:00" itemprop="datePublished">December 14, 2020</time></span> &raquo; <a href="/site/2020/12/14/Cassandra-a-decentralized-structured-storage-system.html">Cassandra - a decentralized structured storage system</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-07T00:00:00-05:00" itemprop="datePublished">December 07, 2020</time></span> &raquo; <a href="/site/2020/12/07/CAP-twelve-years-later-How-the-rules-have-changed.html">CAP twelve years later - How the rules have changed</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-30T00:00:00-05:00" itemprop="datePublished">November 30, 2020</time></span> &raquo; <a href="/site/2020/11/30/Consistency-Tradeoffs-in-Modern-Distributed-Database-System-Design.html">Consistency Tradeoffs in Modern Distributed Database System Design</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-16T00:00:00-05:00" itemprop="datePublished">November 16, 2020</time></span> &raquo; <a href="/site/2020/11/16/Data-Management-for-Internet-Scale-Single-Sign-On.html">Data Management for Internet-Scale Single-Sign-On</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Dataset">Dataset</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-12T00:00:00-05:00" itemprop="datePublished">December 12, 2019</time></span> &raquo; <a href="/site/2019/12/12/Everything-Happens-for-a-Reason-Discovering-the-Purpose-of-Actions-in-Procedural-Text.html">Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-29T00:00:00-04:00" itemprop="datePublished">August 29, 2019</time></span> &raquo; <a href="/site/2019/08/29/PHYRE-A-New-Benchmark-for-Physical-Reasoning.html">PHYRE - A New Benchmark for Physical Reasoning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-15T00:00:00-04:00" itemprop="datePublished">August 15, 2019</time></span> &raquo; <a href="/site/2019/08/15/Abductive-Commonsense-Reasoning.html">Abductive Commonsense Reasoning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-14T00:00:00-05:00" itemprop="datePublished">January 14, 2018</time></span> &raquo; <a href="/site/2018/01/14/Exploring-Models-and-Data-for-Image-Question-Answering.html">Exploring Models and Data for Image Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-15T00:00:00-04:00" itemprop="datePublished">October 15, 2017</time></span> &raquo; <a href="/site/2017/10/15/Reading-Wikipedia-to-Answer-Open-Domain-Questions.html">Reading Wikipedia to Answer Open-Domain Questions</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-05-23T00:00:00-04:00" itemprop="datePublished">May 23, 2017</time></span> &raquo; <a href="/site/2017/05/23/Neural-Module-Networks.html">Neural Module Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-05-14T00:00:00-04:00" itemprop="datePublished">May 14, 2017</time></span> &raquo; <a href="/site/2017/05/14/Making-the-V-in-VQA-Matter-Elevating-the-Role-of-Image-Understanding-in-Visual-Question-Answering.html">Making the V in VQA Matter - Elevating the Role of Image Understanding in Visual Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-04-27T00:00:00-04:00" itemprop="datePublished">April 27, 2017</time></span> &raquo; <a href="/site/2017/04/27/VQA-Visual-Question-Answering.html">VQA-Visual Question Answering</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Decentralized+Reinforcement+Learning">Decentralized Reinforcement Learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-09T00:00:00-04:00" itemprop="datePublished">July 09, 2020</time></span> &raquo; <a href="/site/2020/07/09/Decentralized-Reinforcement-Learning-Global-Decision-Making-via-Local-Economic-Transactions.html">Decentralized Reinforcement Learning -- Global Decision-Making via Local Economic Transactions</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Deep+Reinforcement+Learning">Deep Reinforcement Learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-02T00:00:00-05:00" itemprop="datePublished">November 02, 2020</time></span> &raquo; <a href="/site/2020/11/02/One-Solution-is-Not-All-You-Need-Few-Shot-Extrapolation-via-Structured-MaxEnt-RL.html">One Solution is Not All You Need - Few-Shot Extrapolation via Structured MaxEnt RL</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-07T00:00:00-04:00" itemprop="datePublished">September 07, 2020</time></span> &raquo; <a href="/site/2020/09/07/Revisiting-Fundamentals-of-Experience-Replay.html">Revisiting Fundamentals of Experience Replay</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-31T00:00:00-04:00" itemprop="datePublished">August 31, 2020</time></span> &raquo; <a href="/site/2020/08/31/Deep-Reinforcement-Learning-and-the-Deadly-Triad.html">Deep Reinforcement Learning and the Deadly Triad</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-02T00:00:00-04:00" itemprop="datePublished">July 02, 2020</time></span> &raquo; <a href="/site/2020/07/02/When-to-use-parametric-models-in-reinforcement-learning.html">When to use parametric models in reinforcement learning?</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-06-25T00:00:00-04:00" itemprop="datePublished">June 25, 2020</time></span> &raquo; <a href="/site/2020/06/25/Network-Randomization-A-Simple-Technique-for-Generalization-in-Deep-Reinforcement-Learning.html">Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-04-09T00:00:00-04:00" itemprop="datePublished">April 09, 2020</time></span> &raquo; <a href="/site/2020/04/09/CURL-Contrastive-Unsupervised-Representations-for-Reinforcement-Learning.html">CURL - Contrastive Unsupervised Representations for Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-23T00:00:00-05:00" itemprop="datePublished">January 23, 2020</time></span> &raquo; <a href="/site/2020/01/23/Observational-Overfitting-in-Reinforcement-Learning.html">Observational Overfitting in Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-05T00:00:00-05:00" itemprop="datePublished">December 05, 2019</time></span> &raquo; <a href="/site/2019/12/05/Mastering-Atari,-Go,-Chess-and-Shogi-by-Planning-with-a-Learned-Model.html">Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-09-12T00:00:00-04:00" itemprop="datePublished">September 12, 2019</time></span> &raquo; <a href="/site/2019/09/12/Gossip-based-Actor-Learner-Architectures-for-Deep-RL.html">Gossip based Actor-Learner Architectures for Deep RL</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-08T00:00:00-04:00" itemprop="datePublished">August 08, 2019</time></span> &raquo; <a href="/site/2019/08/08/Deep-Reinforcement-Learning-in-a-Handful-of-Trials-using-Probabilistic-Dynamics-Models.html">Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-01T00:00:00-04:00" itemprop="datePublished">August 01, 2019</time></span> &raquo; <a href="/site/2019/08/01/Assessing-Generalization-in-Deep-Reinforcement-Learning.html">Assessing Generalization in Deep Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-07-25T00:00:00-04:00" itemprop="datePublished">July 25, 2019</time></span> &raquo; <a href="/site/2019/07/25/Quantifying-Generalization-in-Reinforcement-Learning.html">Quantifying Generalization in Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-01T00:00:00-04:00" itemprop="datePublished">June 01, 2019</time></span> &raquo; <a href="/site/2019/06/01/Relational-Reinforcement-Learning.html">Relational Reinforcement Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Dependency+Parsing">Dependency Parsing</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-06-03T00:00:00-04:00" itemprop="datePublished">June 03, 2017</time></span> &raquo; <a href="/site/2017/06/03/A-Fast-and-Accurate-Dependency-Parser-using-Neural-Networks.html">A Fast and Accurate Dependency Parser using Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Design+Pattern">Design Pattern</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-21T00:00:00-05:00" itemprop="datePublished">December 21, 2020</time></span> &raquo; <a href="/site/2020/12/21/Design-patterns-for-container-based-distributed-systems.html">Design patterns for container-based distributed systems</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Distributed+Computing">Distributed Computing</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-11T00:00:00-05:00" itemprop="datePublished">January 11, 2021</time></span> &raquo; <a href="/site/2021/01/11/GPipe-Easy-Scaling-with-Micro-Batch-Pipeline-Parallelism.html">GPipe - Easy Scaling with Micro-Batch Pipeline Parallelism</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-14T00:00:00-04:00" itemprop="datePublished">August 14, 2020</time></span> &raquo; <a href="/site/2020/08/14/Outrageously-Large-Neural-Networks-The-Sparsely-Gated-Mixture-of-Experts-Layer.html">Outrageously Large Neural Networks--The Sparsely-Gated Mixture-of-Experts Layer</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-09T00:00:00-05:00" itemprop="datePublished">January 09, 2020</time></span> &raquo; <a href="/site/2020/01/09/Accurate-Large-Minibatch-SGD-Training-ImageNet-in-1-Hour.html">Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Distributed+Reinforcement+Learning">Distributed Reinforcement Learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-09-12T00:00:00-04:00" itemprop="datePublished">September 12, 2019</time></span> &raquo; <a href="/site/2019/09/12/Gossip-based-Actor-Learner-Architectures-for-Deep-RL.html">Gossip based Actor-Learner Architectures for Deep RL</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Distributed+SGD">Distributed SGD</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-09T00:00:00-05:00" itemprop="datePublished">January 09, 2020</time></span> &raquo; <a href="/site/2020/01/09/Accurate-Large-Minibatch-SGD-Training-ImageNet-in-1-Hour.html">Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Distributed+Systems">Distributed Systems</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-15T00:00:00-04:00" itemprop="datePublished">March 15, 2021</time></span> &raquo; <a href="/site/2021/03/15/The-Tail-at-Scale.html">The Tail at Scale</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-21T00:00:00-05:00" itemprop="datePublished">December 21, 2020</time></span> &raquo; <a href="/site/2020/12/21/Design-patterns-for-container-based-distributed-systems.html">Design patterns for container-based distributed systems</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-14T00:00:00-05:00" itemprop="datePublished">December 14, 2020</time></span> &raquo; <a href="/site/2020/12/14/Cassandra-a-decentralized-structured-storage-system.html">Cassandra - a decentralized structured storage system</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-07T00:00:00-05:00" itemprop="datePublished">December 07, 2020</time></span> &raquo; <a href="/site/2020/12/07/CAP-twelve-years-later-How-the-rules-have-changed.html">CAP twelve years later - How the rules have changed</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-30T00:00:00-05:00" itemprop="datePublished">November 30, 2020</time></span> &raquo; <a href="/site/2020/11/30/Consistency-Tradeoffs-in-Modern-Distributed-Database-System-Design.html">Consistency Tradeoffs in Modern Distributed Database System Design</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-16T00:00:00-05:00" itemprop="datePublished">November 16, 2020</time></span> &raquo; <a href="/site/2020/11/16/Data-Management-for-Internet-Scale-Single-Sign-On.html">Data Management for Internet-Scale Single-Sign-On</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-21T00:00:00-04:00" itemprop="datePublished">September 21, 2020</time></span> &raquo; <a href="/site/2020/09/21/Harvest,-Yield,-and-Scalable-Tolerant-Systems.html">Harvest, Yield, and Scalable Tolerant Systems</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Dynamical+System">Dynamical System</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-17T00:00:00-05:00" itemprop="datePublished">February 17, 2018</time></span> &raquo; <a href="/site/2018/02/17/Neural-Relational-Inference-for-Interacting-Systems.html">Neural Relational Inference for Interacting Systems</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="EBM">EBM</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-18T00:00:00-05:00" itemprop="datePublished">January 18, 2021</time></span> &raquo; <a href="/site/2021/01/18/Energy-based-Models-for-Continual-Learning.html">Energy-based Models for Continual Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-06T00:00:00-05:00" itemprop="datePublished">February 06, 2020</time></span> &raquo; <a href="/site/2020/02/06/Your-Classifier-is-Secretly-an-Energy-Based-Model,-and-You-Should-Treat-it-Like-One.html">Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ECCV">ECCV</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-03-05T00:00:00-05:00" itemprop="datePublished">March 05, 2020</time></span> &raquo; <a href="/site/2020/03/05/What-Does-Classifying-More-Than-10,000-Image-Categories-Tell-Us.html">What Does Classifying More Than 10,000 Image Categories Tell Us?</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ECCV+2010">ECCV 2010</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-03-05T00:00:00-05:00" itemprop="datePublished">March 05, 2020</time></span> &raquo; <a href="/site/2020/03/05/What-Does-Classifying-More-Than-10,000-Image-Categories-Tell-Us.html">What Does Classifying More Than 10,000 Image Categories Tell Us?</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="EMNLP">EMNLP</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-12T00:00:00-05:00" itemprop="datePublished">December 12, 2019</time></span> &raquo; <a href="/site/2019/12/12/Everything-Happens-for-a-Reason-Discovering-the-Purpose-of-Actions-in-Procedural-Text.html">Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-11T00:00:00-05:00" itemprop="datePublished">March 11, 2018</time></span> &raquo; <a href="/site/2018/03/11/Improving-Information-Extraction-by-Acquiring-External-Evidence-with-Reinforcement-Learning.html">Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-28T00:00:00-05:00" itemprop="datePublished">November 28, 2017</time></span> &raquo; <a href="/site/2017/11/28/Two-Stage-Synthesis-Networks-for-Transfer-Learning-in-Machine-Comprehension.html">Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-01T00:00:00-04:00" itemprop="datePublished">October 01, 2017</time></span> &raquo; <a href="/site/2017/10/01/Task-Oriented-Query-Reformulation-with-Reinforcement-Learning.html">Task-Oriented Query Reformulation with Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-06-17T00:00:00-04:00" itemprop="datePublished">June 17, 2017</time></span> &raquo; <a href="/site/2017/06/17/A-Decomposable-Attention-Model-for-Natural-Language-Inference.html">A Decomposable Attention Model for Natural Language Inference</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-06-03T00:00:00-04:00" itemprop="datePublished">June 03, 2017</time></span> &raquo; <a href="/site/2017/06/03/A-Fast-and-Accurate-Dependency-Parser-using-Neural-Networks.html">A Fast and Accurate Dependency Parser using Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="EMNLP+2014">EMNLP 2014</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-06-03T00:00:00-04:00" itemprop="datePublished">June 03, 2017</time></span> &raquo; <a href="/site/2017/06/03/A-Fast-and-Accurate-Dependency-Parser-using-Neural-Networks.html">A Fast and Accurate Dependency Parser using Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="EMNLP+2016">EMNLP 2016</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-11T00:00:00-05:00" itemprop="datePublished">March 11, 2018</time></span> &raquo; <a href="/site/2018/03/11/Improving-Information-Extraction-by-Acquiring-External-Evidence-with-Reinforcement-Learning.html">Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-06-17T00:00:00-04:00" itemprop="datePublished">June 17, 2017</time></span> &raquo; <a href="/site/2017/06/17/A-Decomposable-Attention-Model-for-Natural-Language-Inference.html">A Decomposable Attention Model for Natural Language Inference</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="EMNLP+2017">EMNLP 2017</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-28T00:00:00-05:00" itemprop="datePublished">November 28, 2017</time></span> &raquo; <a href="/site/2017/11/28/Two-Stage-Synthesis-Networks-for-Transfer-Learning-in-Machine-Comprehension.html">Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-01T00:00:00-04:00" itemprop="datePublished">October 01, 2017</time></span> &raquo; <a href="/site/2017/10/01/Task-Oriented-Query-Reformulation-with-Reinforcement-Learning.html">Task-Oriented Query Reformulation with Reinforcement Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="EMNLP+2019">EMNLP 2019</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-12T00:00:00-05:00" itemprop="datePublished">December 12, 2019</time></span> &raquo; <a href="/site/2019/12/12/Everything-Happens-for-a-Reason-Discovering-the-Purpose-of-Actions-in-Procedural-Text.html">Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ERM">ERM</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-27T00:00:00-05:00" itemprop="datePublished">February 27, 2020</time></span> &raquo; <a href="/site/2020/02/27/mixup-Beyond-Empirical-Risk-Minimization.html">mixup - Beyond Empirical Risk Minimization</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Economics">Economics</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-09T00:00:00-04:00" itemprop="datePublished">July 09, 2020</time></span> &raquo; <a href="/site/2020/07/09/Decentralized-Reinforcement-Learning-Global-Decision-Making-via-Local-Economic-Transactions.html">Decentralized Reinforcement Learning -- Global Decision-Making via Local Economic Transactions</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Embedding">Embedding</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-26T00:00:00-04:00" itemprop="datePublished">March 26, 2019</time></span> &raquo; <a href="/site/2019/03/26/GNN-Explainer-A-Tool-for-Post-hoc-Explanation-of-Graph-Neural-Networks.html">GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-02-19T00:00:00-05:00" itemprop="datePublished">February 19, 2019</time></span> &raquo; <a href="/site/2019/02/19/TuckER-Tensor-Factorization-for-Knowledge-Graph-Completion.html">TuckER - Tensor Factorization for Knowledge Graph Completion</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-12-11T00:00:00-05:00" itemprop="datePublished">December 11, 2018</time></span> &raquo; <a href="/site/2018/12/11/Representation-Tradeoffs-for-Hyperbolic-Embeddings.html">Representation Tradeoffs for Hyperbolic Embeddings</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-11T00:00:00-04:00" itemprop="datePublished">October 11, 2018</time></span> &raquo; <a href="/site/2018/10/11/Poincare-Embeddings-for-Learning-Hierarchical-Representations.html">Poincaré Embeddings for Learning Hierarchical Representations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-04-02T00:00:00-04:00" itemprop="datePublished">April 02, 2018</time></span> &raquo; <a href="/site/2018/04/02/Unsupervised-Learning-By-Predicting-Noise.html">Unsupervised Learning by Predicting Noise</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-29T00:00:00-05:00" itemprop="datePublished">January 29, 2018</time></span> &raquo; <a href="/site/2018/01/29/StarSpace-Embed-All-The-Things.html">StarSpace - Embed All The Things!</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-12-24T00:00:00-05:00" itemprop="datePublished">December 24, 2017</time></span> &raquo; <a href="/site/2017/12/24/PTE-Predictive-Text-Embedding-through-Large-scale-Heterogeneous-Text-Networks.html">PTE - Predictive Text Embedding through Large-scale Heterogeneous Text Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-12-11T00:00:00-05:00" itemprop="datePublished">December 11, 2017</time></span> &raquo; <a href="/site/2017/12/11/Revisiting-Semi-Supervised-Learning-with-Graph-Embeddings.html">Revisiting Semi-Supervised Learning with Graph Embeddings</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-28T00:00:00-04:00" itemprop="datePublished">October 28, 2017</time></span> &raquo; <a href="/site/2017/10/28/HARP-Hierarchical-Representation-Learning-for-Networks.html">HARP - Hierarchical Representation Learning for Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-08-21T00:00:00-04:00" itemprop="datePublished">August 21, 2017</time></span> &raquo; <a href="/site/2017/08/21/Learning-to-Compute-Word-Embeddings-On-the-Fly.html">Learning to Compute Word Embeddings On the Fly</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-06-26T00:00:00-04:00" itemprop="datePublished">June 26, 2017</time></span> &raquo; <a href="/site/2017/06/26/Two-Too-Simple-Adaptations-of-Word2Vec-for-Syntax-Problems.html">Two/Too Simple Adaptations of Word2Vec for Syntax Problems</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-05-07T00:00:00-04:00" itemprop="datePublished">May 07, 2017</time></span> &raquo; <a href="/site/2017/05/07/Conditional-Similarity-Networks.html">Conditional Similarity Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Emergent+Language">Emergent Language</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-09-12T00:00:00-04:00" itemprop="datePublished">September 12, 2018</time></span> &raquo; <a href="/site/2018/09/12/Emergence-of-Grounded-Compositional-Language-in-Multi-Agent-Populations.html">Emergence of Grounded Compositional Language in Multi-Agent Populations</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Empirical">Empirical</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-15T00:00:00-05:00" itemprop="datePublished">February 15, 2021</time></span> &raquo; <a href="/site/2021/02/15/When-Do-Curricula-Work.html">When Do Curricula Work?</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-07T00:00:00-04:00" itemprop="datePublished">September 07, 2020</time></span> &raquo; <a href="/site/2020/09/07/Revisiting-Fundamentals-of-Experience-Replay.html">Revisiting Fundamentals of Experience Replay</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-31T00:00:00-04:00" itemprop="datePublished">August 31, 2020</time></span> &raquo; <a href="/site/2020/08/31/Deep-Reinforcement-Learning-and-the-Deadly-Triad.html">Deep Reinforcement Learning and the Deadly Triad</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-06-18T00:00:00-04:00" itemprop="datePublished">June 18, 2020</time></span> &raquo; <a href="/site/2020/06/18/On-the-Difficulty-of-Warm-Starting-Neural-Network-Training.html">On the Difficulty of Warm-Starting Neural Network Training</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Empirical+Advice">Empirical Advice</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-08T00:00:00-05:00" itemprop="datePublished">March 08, 2021</time></span> &raquo; <a href="/site/2021/03/08/Practical-Lessons-from-Predicting-Clicks-on-Ads-at-Facebook.html">Practical Lessons from Predicting Clicks on Ads at Facebook</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-01T00:00:00-05:00" itemprop="datePublished">March 01, 2021</time></span> &raquo; <a href="/site/2021/03/01/Ad-Click-Prediction-a-View-from-the-Trenches.html">Ad Click Prediction - a View from the Trenches</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-31T00:00:00-04:00" itemprop="datePublished">August 31, 2020</time></span> &raquo; <a href="/site/2020/08/31/Deep-Reinforcement-Learning-and-the-Deadly-Triad.html">Deep Reinforcement Learning and the Deadly Triad</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-09T00:00:00-05:00" itemprop="datePublished">January 09, 2020</time></span> &raquo; <a href="/site/2020/01/09/Accurate-Large-Minibatch-SGD-Training-ImageNet-in-1-Hour.html">Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-09-05T00:00:00-04:00" itemprop="datePublished">September 05, 2019</time></span> &raquo; <a href="/site/2019/09/05/How-to-train-your-MAML.html">How to train your MAML</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-16T00:00:00-04:00" itemprop="datePublished">March 16, 2019</time></span> &raquo; <a href="/site/2019/03/16/To-Tune-or-Not-to-Tune-Adapting-Pretrained-Representations-to-Diverse-Tasks.html">To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Energy-Based+Models">Energy-Based Models</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-18T00:00:00-05:00" itemprop="datePublished">January 18, 2021</time></span> &raquo; <a href="/site/2021/01/18/Energy-based-Models-for-Continual-Learning.html">Energy-based Models for Continual Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-06T00:00:00-05:00" itemprop="datePublished">February 06, 2020</time></span> &raquo; <a href="/site/2020/02/06/Your-Classifier-is-Secretly-an-Energy-Based-Model,-and-You-Should-Treat-it-Like-One.html">Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Engineering">Engineering</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-22T00:00:00-04:00" itemprop="datePublished">March 22, 2021</time></span> &raquo; <a href="/site/2021/03/22/Deep-Neural-Networks-for-YouTube-Recommendations.html">Deep Neural Networks for YouTube Recommendations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-15T00:00:00-04:00" itemprop="datePublished">March 15, 2021</time></span> &raquo; <a href="/site/2021/03/15/The-Tail-at-Scale.html">The Tail at Scale</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-08T00:00:00-05:00" itemprop="datePublished">March 08, 2021</time></span> &raquo; <a href="/site/2021/03/08/Practical-Lessons-from-Predicting-Clicks-on-Ads-at-Facebook.html">Practical Lessons from Predicting Clicks on Ads at Facebook</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-01T00:00:00-05:00" itemprop="datePublished">March 01, 2021</time></span> &raquo; <a href="/site/2021/03/01/Ad-Click-Prediction-a-View-from-the-Trenches.html">Ad Click Prediction - a View from the Trenches</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-11T00:00:00-05:00" itemprop="datePublished">January 11, 2021</time></span> &raquo; <a href="/site/2021/01/11/GPipe-Easy-Scaling-with-Micro-Batch-Pipeline-Parallelism.html">GPipe - Easy Scaling with Micro-Batch Pipeline Parallelism</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-21T00:00:00-05:00" itemprop="datePublished">December 21, 2020</time></span> &raquo; <a href="/site/2020/12/21/Design-patterns-for-container-based-distributed-systems.html">Design patterns for container-based distributed systems</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-14T00:00:00-05:00" itemprop="datePublished">December 14, 2020</time></span> &raquo; <a href="/site/2020/12/14/Cassandra-a-decentralized-structured-storage-system.html">Cassandra - a decentralized structured storage system</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-07T00:00:00-05:00" itemprop="datePublished">December 07, 2020</time></span> &raquo; <a href="/site/2020/12/07/CAP-twelve-years-later-How-the-rules-have-changed.html">CAP twelve years later - How the rules have changed</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-30T00:00:00-05:00" itemprop="datePublished">November 30, 2020</time></span> &raquo; <a href="/site/2020/11/30/Consistency-Tradeoffs-in-Modern-Distributed-Database-System-Design.html">Consistency Tradeoffs in Modern Distributed Database System Design</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-16T00:00:00-05:00" itemprop="datePublished">November 16, 2020</time></span> &raquo; <a href="/site/2020/11/16/Data-Management-for-Internet-Scale-Single-Sign-On.html">Data Management for Internet-Scale Single-Sign-On</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-09T00:00:00-05:00" itemprop="datePublished">November 09, 2020</time></span> &raquo; <a href="/site/2020/11/09/Searching-for-Build-Debt-Experiences-Managing-Technical-Debt-at-Google.html">Searching for Build Debt - Experiences Managing Technical Debt at Google</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Entropy">Entropy</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-29T00:00:00-05:00" itemprop="datePublished">January 29, 2019</time></span> &raquo; <a href="/site/2019/01/29/Diversity-is-All-You-Need-Learning-Skills-without-a-Reward-Function.html">Diversity is All You Need - Learning Skills without a Reward Function</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Environment">Environment</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-07-25T00:00:00-04:00" itemprop="datePublished">July 25, 2019</time></span> &raquo; <a href="/site/2019/07/25/Quantifying-Generalization-in-Reinforcement-Learning.html">Quantifying Generalization in Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-18T00:00:00-04:00" itemprop="datePublished">October 18, 2018</time></span> &raquo; <a href="/site/2018/10/18/BabyAI-First-Steps-Towards-Grounded-Language-Learning-With-a-Human-In-the-Loop.html">BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Evaluating+Generalization">Evaluating Generalization</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-23T00:00:00-05:00" itemprop="datePublished">January 23, 2020</time></span> &raquo; <a href="/site/2020/01/23/Observational-Overfitting-in-Reinforcement-Learning.html">Observational Overfitting in Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-01T00:00:00-04:00" itemprop="datePublished">August 01, 2019</time></span> &raquo; <a href="/site/2019/08/01/Assessing-Generalization-in-Deep-Reinforcement-Learning.html">Assessing Generalization in Deep Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-07-25T00:00:00-04:00" itemprop="datePublished">July 25, 2019</time></span> &raquo; <a href="/site/2019/07/25/Quantifying-Generalization-in-Reinforcement-Learning.html">Quantifying Generalization in Reinforcement Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Evaluation">Evaluation</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-23T00:00:00-05:00" itemprop="datePublished">January 23, 2020</time></span> &raquo; <a href="/site/2020/01/23/Observational-Overfitting-in-Reinforcement-Learning.html">Observational Overfitting in Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-01T00:00:00-04:00" itemprop="datePublished">August 01, 2019</time></span> &raquo; <a href="/site/2019/08/01/Assessing-Generalization-in-Deep-Reinforcement-Learning.html">Assessing Generalization in Deep Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-07-25T00:00:00-04:00" itemprop="datePublished">July 25, 2019</time></span> &raquo; <a href="/site/2019/07/25/Quantifying-Generalization-in-Reinforcement-Learning.html">Quantifying Generalization in Reinforcement Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Explainability">Explainability</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-04T00:00:00-05:00" itemprop="datePublished">January 04, 2021</time></span> &raquo; <a href="/site/2021/01/04/Compositional-Explanations-of-Neurons.html">Compositional Explanations of Neurons</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Exploration">Exploration</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-08T00:00:00-04:00" itemprop="datePublished">June 08, 2019</time></span> &raquo; <a href="/site/2019/06/08/Meta-Reinforcement-Learning-of-Structured-Exploration-Strategies.html">Meta-Reinforcement Learning of Structured Exploration Strategies</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Factorization">Factorization</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-02-19T00:00:00-05:00" itemprop="datePublished">February 19, 2019</time></span> &raquo; <a href="/site/2019/02/19/TuckER-Tensor-Factorization-for-Knowledge-Graph-Completion.html">TuckER - Tensor Factorization for Knowledge Graph Completion</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Finetuning">Finetuning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-20T00:00:00-05:00" itemprop="datePublished">February 20, 2020</time></span> &raquo; <a href="/site/2020/02/20/ELECTRA-Pre-training-Text-Encoders-as-Discriminators-Rather-Than-Generators.html">ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="GNN">GNN</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-11-28T00:00:00-05:00" itemprop="datePublished">November 28, 2019</time></span> &raquo; <a href="/site/2019/11/28/Contrastive-Learning-of-Structured-World-Models.html">Contrastive Learning of Structured World Models</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-26T00:00:00-04:00" itemprop="datePublished">March 26, 2019</time></span> &raquo; <a href="/site/2019/03/26/GNN-Explainer-A-Tool-for-Post-hoc-Explanation-of-Graph-Neural-Networks.html">GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-02T00:00:00-05:00" itemprop="datePublished">January 02, 2019</time></span> &raquo; <a href="/site/2019/01/02/Pre-training-Graph-Neural-Networks-with-Kernels.html">Pre-training Graph Neural Networks with Kernels</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-08-16T00:00:00-04:00" itemprop="datePublished">August 16, 2018</time></span> &raquo; <a href="/site/2018/08/16/Hierarchical-Graph-Representation-Learning-with-Differentiable-Pooling.html">Hierarchical Graph Representation Learning with Differentiable Pooling</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-24T00:00:00-05:00" itemprop="datePublished">February 24, 2018</time></span> &raquo; <a href="/site/2018/02/24/Learning-a-SAT-Solver-from-Single-Bit-Supervision.html">Learning an SAT Solver from Single-Bit Supervision</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-17T00:00:00-05:00" itemprop="datePublished">February 17, 2018</time></span> &raquo; <a href="/site/2018/02/17/Neural-Relational-Inference-for-Interacting-Systems.html">Neural Relational Inference for Interacting Systems</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="GPT">GPT</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2023-02-10T00:00:00-05:00" itemprop="datePublished">February 10, 2023</time></span> &raquo; <a href="/site/2023/02/10/Toolformer-Language-Models-Can-Teach-Themselves-to-Use-Tools.html">Toolformer - Language Models Can Teach Themselves to Use Tools</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Gating">Gating</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-14T00:00:00-04:00" itemprop="datePublished">August 14, 2020</time></span> &raquo; <a href="/site/2020/08/14/Outrageously-Large-Neural-Networks-The-Sparsely-Gated-Mixture-of-Experts-Layer.html">Outrageously Large Neural Networks--The Sparsely-Gated Mixture-of-Experts Layer</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Generalizatio">Generalizatio</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-29T00:00:00-04:00" itemprop="datePublished">March 29, 2021</time></span> &raquo; <a href="/site/2021/03/29/Synthesized-Policies-for-Transfer-and-Adaptation-across-Tasks-and-Environments.html">Synthesized Policies for Transfer and Adaptation across Tasks and Environments</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Generalization">Generalization</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-02T00:00:00-05:00" itemprop="datePublished">November 02, 2020</time></span> &raquo; <a href="/site/2020/11/02/One-Solution-is-Not-All-You-Need-Few-Shot-Extrapolation-via-Structured-MaxEnt-RL.html">One Solution is Not All You Need - Few-Shot Extrapolation via Structured MaxEnt RL</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-16T00:00:00-04:00" itemprop="datePublished">July 16, 2020</time></span> &raquo; <a href="/site/2020/07/16/Averaging-Weights-leads-to-Wider-Optima-and-Better-Generalization.html">Averaging Weights leads to Wider Optima and Better Generalization</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-06-25T00:00:00-04:00" itemprop="datePublished">June 25, 2020</time></span> &raquo; <a href="/site/2020/06/25/Network-Randomization-A-Simple-Technique-for-Generalization-in-Deep-Reinforcement-Learning.html">Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-23T00:00:00-05:00" itemprop="datePublished">January 23, 2020</time></span> &raquo; <a href="/site/2020/01/23/Observational-Overfitting-in-Reinforcement-Learning.html">Observational Overfitting in Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-01T00:00:00-04:00" itemprop="datePublished">August 01, 2019</time></span> &raquo; <a href="/site/2019/08/01/Assessing-Generalization-in-Deep-Reinforcement-Learning.html">Assessing Generalization in Deep Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-07-25T00:00:00-04:00" itemprop="datePublished">July 25, 2019</time></span> &raquo; <a href="/site/2019/07/25/Quantifying-Generalization-in-Reinforcement-Learning.html">Quantifying Generalization in Reinforcement Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Generative+Models">Generative Models</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-03-12T00:00:00-04:00" itemprop="datePublished">March 12, 2020</time></span> &raquo; <a href="/site/2020/03/12/Competitive-Training-of-Mixtures-of-Independent-Deep-Generative-Models.html">Competitive Training of Mixtures of Independent Deep Generative Models</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-06T00:00:00-05:00" itemprop="datePublished">February 06, 2020</time></span> &raquo; <a href="/site/2020/02/06/Your-Classifier-is-Secretly-an-Energy-Based-Model,-and-You-Should-Treat-it-Like-One.html">Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Geometry">Geometry</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-28T00:00:00-04:00" itemprop="datePublished">September 28, 2020</time></span> &raquo; <a href="/site/2020/09/28/A-Foliated-View-of-Transfer-Learning.html">A Foliated View of Transfer Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Gradient+Manipulation">Gradient Manipulation</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-06T00:00:00-04:00" itemprop="datePublished">August 06, 2020</time></span> &raquo; <a href="/site/2020/08/06/Gradient-Surgery-for-Multi-Task-Learning.html">Gradient Surgery for Multi-Task Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-30T00:00:00-04:00" itemprop="datePublished">July 30, 2020</time></span> &raquo; <a href="/site/2020/07/30/GradNorm-Gradient-Normalization-for-Adaptive-Loss-Balancing-in-Deep-Multitask-Networks.html">GradNorm--Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Gradient+Normalization">Gradient Normalization</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-30T00:00:00-04:00" itemprop="datePublished">July 30, 2020</time></span> &raquo; <a href="/site/2020/07/30/GradNorm-Gradient-Normalization-for-Adaptive-Loss-Balancing-in-Deep-Multitask-Networks.html">GradNorm--Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Graph">Graph</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-12T00:00:00-05:00" itemprop="datePublished">December 12, 2019</time></span> &raquo; <a href="/site/2019/12/12/Everything-Happens-for-a-Reason-Discovering-the-Purpose-of-Actions-in-Procedural-Text.html">Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-11-28T00:00:00-05:00" itemprop="datePublished">November 28, 2019</time></span> &raquo; <a href="/site/2019/11/28/Contrastive-Learning-of-Structured-World-Models.html">Contrastive Learning of Structured World Models</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-26T00:00:00-04:00" itemprop="datePublished">March 26, 2019</time></span> &raquo; <a href="/site/2019/03/26/GNN-Explainer-A-Tool-for-Post-hoc-Explanation-of-Graph-Neural-Networks.html">GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-02-19T00:00:00-05:00" itemprop="datePublished">February 19, 2019</time></span> &raquo; <a href="/site/2019/02/19/TuckER-Tensor-Factorization-for-Knowledge-Graph-Completion.html">TuckER - Tensor Factorization for Knowledge Graph Completion</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-02T00:00:00-05:00" itemprop="datePublished">January 02, 2019</time></span> &raquo; <a href="/site/2019/01/02/Pre-training-Graph-Neural-Networks-with-Kernels.html">Pre-training Graph Neural Networks with Kernels</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-12-11T00:00:00-05:00" itemprop="datePublished">December 11, 2018</time></span> &raquo; <a href="/site/2018/12/11/Representation-Tradeoffs-for-Hyperbolic-Embeddings.html">Representation Tradeoffs for Hyperbolic Embeddings</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-11T00:00:00-04:00" itemprop="datePublished">October 11, 2018</time></span> &raquo; <a href="/site/2018/10/11/Poincare-Embeddings-for-Learning-Hierarchical-Representations.html">Poincaré Embeddings for Learning Hierarchical Representations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-08-16T00:00:00-04:00" itemprop="datePublished">August 16, 2018</time></span> &raquo; <a href="/site/2018/08/16/Hierarchical-Graph-Representation-Learning-with-Differentiable-Pooling.html">Hierarchical Graph Representation Learning with Differentiable Pooling</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-04-08T00:00:00-04:00" itemprop="datePublished">April 08, 2018</time></span> &raquo; <a href="/site/2018/04/08/Neural-Message-Passing-for-Quantum-Chemistry.html">Neural Message Passing for Quantum Chemistry</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-24T00:00:00-05:00" itemprop="datePublished">February 24, 2018</time></span> &raquo; <a href="/site/2018/02/24/Learning-a-SAT-Solver-from-Single-Bit-Supervision.html">Learning an SAT Solver from Single-Bit Supervision</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-17T00:00:00-05:00" itemprop="datePublished">February 17, 2018</time></span> &raquo; <a href="/site/2018/02/17/Neural-Relational-Inference-for-Interacting-Systems.html">Neural Relational Inference for Interacting Systems</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-29T00:00:00-05:00" itemprop="datePublished">January 29, 2018</time></span> &raquo; <a href="/site/2018/01/29/StarSpace-Embed-All-The-Things.html">StarSpace - Embed All The Things!</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-12-24T00:00:00-05:00" itemprop="datePublished">December 24, 2017</time></span> &raquo; <a href="/site/2017/12/24/PTE-Predictive-Text-Embedding-through-Large-scale-Heterogeneous-Text-Networks.html">PTE - Predictive Text Embedding through Large-scale Heterogeneous Text Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-12-11T00:00:00-05:00" itemprop="datePublished">December 11, 2017</time></span> &raquo; <a href="/site/2017/12/11/Revisiting-Semi-Supervised-Learning-with-Graph-Embeddings.html">Revisiting Semi-Supervised Learning with Graph Embeddings</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-19T00:00:00-05:00" itemprop="datePublished">November 19, 2017</time></span> &raquo; <a href="/site/2017/11/19/Higher-order-organization-of-complex-networks.html">Higher-order organization of complex networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-12T00:00:00-05:00" itemprop="datePublished">November 12, 2017</time></span> &raquo; <a href="/site/2017/11/12/Network-Motifs-Simple-Building-Blocks-of-Complex-Networks.html">Network Motifs - Simple Building Blocks of Complex Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-28T00:00:00-04:00" itemprop="datePublished">October 28, 2017</time></span> &raquo; <a href="/site/2017/10/28/HARP-Hierarchical-Representation-Learning-for-Networks.html">HARP - Hierarchical Representation Learning for Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Graph+Neural+Network">Graph Neural Network</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-11-28T00:00:00-05:00" itemprop="datePublished">November 28, 2019</time></span> &raquo; <a href="/site/2019/11/28/Contrastive-Learning-of-Structured-World-Models.html">Contrastive Learning of Structured World Models</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-26T00:00:00-04:00" itemprop="datePublished">March 26, 2019</time></span> &raquo; <a href="/site/2019/03/26/GNN-Explainer-A-Tool-for-Post-hoc-Explanation-of-Graph-Neural-Networks.html">GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-02T00:00:00-05:00" itemprop="datePublished">January 02, 2019</time></span> &raquo; <a href="/site/2019/01/02/Pre-training-Graph-Neural-Networks-with-Kernels.html">Pre-training Graph Neural Networks with Kernels</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-08-16T00:00:00-04:00" itemprop="datePublished">August 16, 2018</time></span> &raquo; <a href="/site/2018/08/16/Hierarchical-Graph-Representation-Learning-with-Differentiable-Pooling.html">Hierarchical Graph Representation Learning with Differentiable Pooling</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-24T00:00:00-05:00" itemprop="datePublished">February 24, 2018</time></span> &raquo; <a href="/site/2018/02/24/Learning-a-SAT-Solver-from-Single-Bit-Supervision.html">Learning an SAT Solver from Single-Bit Supervision</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Graph+Representation">Graph Representation</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-26T00:00:00-04:00" itemprop="datePublished">March 26, 2019</time></span> &raquo; <a href="/site/2019/03/26/GNN-Explainer-A-Tool-for-Post-hoc-Explanation-of-Graph-Neural-Networks.html">GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-02-19T00:00:00-05:00" itemprop="datePublished">February 19, 2019</time></span> &raquo; <a href="/site/2019/02/19/TuckER-Tensor-Factorization-for-Knowledge-Graph-Completion.html">TuckER - Tensor Factorization for Knowledge Graph Completion</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-02T00:00:00-05:00" itemprop="datePublished">January 02, 2019</time></span> &raquo; <a href="/site/2019/01/02/Pre-training-Graph-Neural-Networks-with-Kernels.html">Pre-training Graph Neural Networks with Kernels</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-12-11T00:00:00-05:00" itemprop="datePublished">December 11, 2018</time></span> &raquo; <a href="/site/2018/12/11/Representation-Tradeoffs-for-Hyperbolic-Embeddings.html">Representation Tradeoffs for Hyperbolic Embeddings</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-11T00:00:00-04:00" itemprop="datePublished">October 11, 2018</time></span> &raquo; <a href="/site/2018/10/11/Poincare-Embeddings-for-Learning-Hierarchical-Representations.html">Poincaré Embeddings for Learning Hierarchical Representations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-08-16T00:00:00-04:00" itemprop="datePublished">August 16, 2018</time></span> &raquo; <a href="/site/2018/08/16/Hierarchical-Graph-Representation-Learning-with-Differentiable-Pooling.html">Hierarchical Graph Representation Learning with Differentiable Pooling</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-04-08T00:00:00-04:00" itemprop="datePublished">April 08, 2018</time></span> &raquo; <a href="/site/2018/04/08/Neural-Message-Passing-for-Quantum-Chemistry.html">Neural Message Passing for Quantum Chemistry</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-17T00:00:00-05:00" itemprop="datePublished">February 17, 2018</time></span> &raquo; <a href="/site/2018/02/17/Neural-Relational-Inference-for-Interacting-Systems.html">Neural Relational Inference for Interacting Systems</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-29T00:00:00-05:00" itemprop="datePublished">January 29, 2018</time></span> &raquo; <a href="/site/2018/01/29/StarSpace-Embed-All-The-Things.html">StarSpace - Embed All The Things!</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-12-11T00:00:00-05:00" itemprop="datePublished">December 11, 2017</time></span> &raquo; <a href="/site/2017/12/11/Revisiting-Semi-Supervised-Learning-with-Graph-Embeddings.html">Revisiting Semi-Supervised Learning with Graph Embeddings</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-28T00:00:00-04:00" itemprop="datePublished">October 28, 2017</time></span> &raquo; <a href="/site/2017/10/28/HARP-Hierarchical-Representation-Learning-for-Networks.html">HARP - Hierarchical Representation Learning for Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Grounded+Language+Learning">Grounded Language Learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-18T00:00:00-04:00" itemprop="datePublished">October 18, 2018</time></span> &raquo; <a href="/site/2018/10/18/BabyAI-First-Steps-Towards-Grounded-Language-Learning-With-a-Human-In-the-Loop.html">BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="HRL">HRL</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-09T00:00:00-04:00" itemprop="datePublished">July 09, 2020</time></span> &raquo; <a href="/site/2020/07/09/Decentralized-Reinforcement-Learning-Global-Decision-Making-via-Local-Economic-Transactions.html">Decentralized Reinforcement Learning -- Global Decision-Making via Local Economic Transactions</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-12T00:00:00-04:00" itemprop="datePublished">March 12, 2019</time></span> &raquo; <a href="/site/2019/03/12/Model-Primitive-Hierarchical-Lifelong-Reinforcement-Learning.html">Model Primitive Hierarchical Lifelong Reinforcement Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Hierarchial+RNN">Hierarchial RNN</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-11-01T00:00:00-04:00" itemprop="datePublished">November 01, 2018</time></span> &raquo; <a href="/site/2018/11/01/Learned-Optimizers-that-Scale-and-Generalize.html">Learned Optimizers that Scale and Generalize</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Hierarchical+RL">Hierarchical RL</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-15T00:00:00-05:00" itemprop="datePublished">January 15, 2019</time></span> &raquo; <a href="/site/2019/01/15/Hierarchical-RL-Using-an-Ensemble-of-Proprioceptive-Periodic-Policies.html">Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Hierarchical+Reinforcement+Learning">Hierarchical Reinforcement Learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-09T00:00:00-04:00" itemprop="datePublished">July 09, 2020</time></span> &raquo; <a href="/site/2020/07/09/Decentralized-Reinforcement-Learning-Global-Decision-Making-via-Local-Economic-Transactions.html">Decentralized Reinforcement Learning -- Global Decision-Making via Local Economic Transactions</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-12T00:00:00-04:00" itemprop="datePublished">March 12, 2019</time></span> &raquo; <a href="/site/2019/03/12/Model-Primitive-Hierarchical-Lifelong-Reinforcement-Learning.html">Model Primitive Hierarchical Lifelong Reinforcement Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Hybrid+Models">Hybrid Models</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-06T00:00:00-05:00" itemprop="datePublished">February 06, 2020</time></span> &raquo; <a href="/site/2020/02/06/Your-Classifier-is-Secretly-an-Energy-Based-Model,-and-You-Should-Treat-it-Like-One.html">Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="HyperNetwork">HyperNetwork</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-08T00:00:00-05:00" itemprop="datePublished">February 08, 2021</time></span> &raquo; <a href="/site/2021/02/08/Continual-learning-with-hypernetworks.html">Continual learning with hypernetworks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-01T00:00:00-05:00" itemprop="datePublished">February 01, 2021</time></span> &raquo; <a href="/site/2021/02/01/Zero-shot-Learning-by-Generating-Task-specific-Adapters.html">Zero-shot Learning by Generating Task-specific Adapters</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-25T00:00:00-05:00" itemprop="datePublished">January 25, 2021</time></span> &raquo; <a href="/site/2021/01/25/HyperNetworks.html">HyperNetworks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Hyperbolic+Embedding">Hyperbolic Embedding</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-12-11T00:00:00-05:00" itemprop="datePublished">December 11, 2018</time></span> &raquo; <a href="/site/2018/12/11/Representation-Tradeoffs-for-Hyperbolic-Embeddings.html">Representation Tradeoffs for Hyperbolic Embeddings</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-11T00:00:00-04:00" itemprop="datePublished">October 11, 2018</time></span> &raquo; <a href="/site/2018/10/11/Poincare-Embeddings-for-Learning-Hierarchical-Representations.html">Poincaré Embeddings for Learning Hierarchical Representations</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Hyperboloid+Model">Hyperboloid Model</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-12-11T00:00:00-05:00" itemprop="datePublished">December 11, 2018</time></span> &raquo; <a href="/site/2018/12/11/Representation-Tradeoffs-for-Hyperbolic-Embeddings.html">Representation Tradeoffs for Hyperbolic Embeddings</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Hypothesis">Hypothesis</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-25T00:00:00-04:00" itemprop="datePublished">March 25, 2018</time></span> &raquo; <a href="/site/2018/03/25/The-Lottery-Ticket-Hypothesis-Training-Pruned-Neural-Networks.html">The Lottery Ticket Hypothesis - Training Pruned Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ICCV">ICCV</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-04-27T00:00:00-04:00" itemprop="datePublished">April 27, 2017</time></span> &raquo; <a href="/site/2017/04/27/VQA-Visual-Question-Answering.html">VQA-Visual Question Answering</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ICCV+2015">ICCV 2015</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-04-27T00:00:00-04:00" itemprop="datePublished">April 27, 2017</time></span> &raquo; <a href="/site/2017/04/27/VQA-Visual-Question-Answering.html">VQA-Visual Question Answering</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ICLR">ICLR</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-22T00:00:00-05:00" itemprop="datePublished">February 22, 2021</time></span> &raquo; <a href="/site/2021/02/22/Anatomy-of-Catastrophic-Forgetting-Hidden-Representations-and-Task-Semantics.html">Anatomy of Catastrophic Forgetting - Hidden Representations and Task Semantics</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-15T00:00:00-05:00" itemprop="datePublished">February 15, 2021</time></span> &raquo; <a href="/site/2021/02/15/When-Do-Curricula-Work.html">When Do Curricula Work?</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-08T00:00:00-05:00" itemprop="datePublished">February 08, 2021</time></span> &raquo; <a href="/site/2021/02/08/Continual-learning-with-hypernetworks.html">Continual learning with hypernetworks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-25T00:00:00-05:00" itemprop="datePublished">January 25, 2021</time></span> &raquo; <a href="/site/2021/01/25/HyperNetworks.html">HyperNetworks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-14T00:00:00-04:00" itemprop="datePublished">August 14, 2020</time></span> &raquo; <a href="/site/2020/08/14/Outrageously-Large-Neural-Networks-The-Sparsely-Gated-Mixture-of-Experts-Layer.html">Outrageously Large Neural Networks--The Sparsely-Gated Mixture-of-Experts Layer</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-06-25T00:00:00-04:00" itemprop="datePublished">June 25, 2020</time></span> &raquo; <a href="/site/2020/06/25/Network-Randomization-A-Simple-Technique-for-Generalization-in-Deep-Reinforcement-Learning.html">Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-27T00:00:00-05:00" itemprop="datePublished">February 27, 2020</time></span> &raquo; <a href="/site/2020/02/27/mixup-Beyond-Empirical-Risk-Minimization.html">mixup - Beyond Empirical Risk Minimization</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-20T00:00:00-05:00" itemprop="datePublished">February 20, 2020</time></span> &raquo; <a href="/site/2020/02/20/ELECTRA-Pre-training-Text-Encoders-as-Discriminators-Rather-Than-Generators.html">ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-06T00:00:00-05:00" itemprop="datePublished">February 06, 2020</time></span> &raquo; <a href="/site/2020/02/06/Your-Classifier-is-Secretly-an-Energy-Based-Model,-and-You-Should-Treat-it-Like-One.html">Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-23T00:00:00-05:00" itemprop="datePublished">January 23, 2020</time></span> &raquo; <a href="/site/2020/01/23/Observational-Overfitting-in-Reinforcement-Learning.html">Observational Overfitting in Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-16T00:00:00-05:00" itemprop="datePublished">January 16, 2020</time></span> &raquo; <a href="/site/2020/01/16/Rapid-Learning-or-Feature-Reuse-Towards-Understanding-the-Effectiveness-of-MAML.html">Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-19T00:00:00-05:00" itemprop="datePublished">December 19, 2019</time></span> &raquo; <a href="/site/2019/12/19/ALBERT-A-Lite-BERT-for-Self-supervised-Learning-of-Language-Representations.html">ALBERT - A Lite BERT for Self-supervised Learning of Language Representations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-09-05T00:00:00-04:00" itemprop="datePublished">September 05, 2019</time></span> &raquo; <a href="/site/2019/09/05/How-to-train-your-MAML.html">How to train your MAML</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-27T00:00:00-04:00" itemprop="datePublished">June 27, 2019</time></span> &raquo; <a href="/site/2019/06/27/Measuring-Abstract-Reasoning-in-Neural-Networks.html">Measuring abstract reasoning in neural networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-01T00:00:00-04:00" itemprop="datePublished">June 01, 2019</time></span> &raquo; <a href="/site/2019/06/01/Relational-Reinforcement-Learning.html">Relational Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-04-02T00:00:00-04:00" itemprop="datePublished">April 02, 2019</time></span> &raquo; <a href="/site/2019/04/02/Meta-Learning-Update-Rules-for-Unsupervised-Representation-Learning.html">Meta-Learning Update Rules for Unsupervised Representation Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-29T00:00:00-05:00" itemprop="datePublished">January 29, 2019</time></span> &raquo; <a href="/site/2019/01/29/Diversity-is-All-You-Need-Learning-Skills-without-a-Reward-Function.html">Diversity is All You Need - Learning Skills without a Reward Function</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-15T00:00:00-05:00" itemprop="datePublished">January 15, 2019</time></span> &raquo; <a href="/site/2019/01/15/Hierarchical-RL-Using-an-Ensemble-of-Proprioceptive-Periodic-Policies.html">Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-08T00:00:00-05:00" itemprop="datePublished">January 08, 2019</time></span> &raquo; <a href="/site/2019/01/08/Efficient-Lifelong-Learning-with-A-GEM.html">Efficient Lifelong Learning with A-GEM</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-12-25T00:00:00-05:00" itemprop="datePublished">December 25, 2018</time></span> &raquo; <a href="/site/2018/12/25/Smooth-Loss-Functions-for-Deep-Top-k-Classification.html">Smooth Loss Functions for Deep Top-k Classification</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-18T00:00:00-04:00" itemprop="datePublished">October 18, 2018</time></span> &raquo; <a href="/site/2018/10/18/BabyAI-First-Steps-Towards-Grounded-Language-Learning-With-a-Human-In-the-Loop.html">BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-07-04T00:00:00-04:00" itemprop="datePublished">July 04, 2018</time></span> &raquo; <a href="/site/2018/07/04/Memory-Based-Parameter-Adaption.html">Memory-based Parameter Adaptation</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-05-06T00:00:00-04:00" itemprop="datePublished">May 06, 2018</time></span> &raquo; <a href="/site/2018/05/06/Learning-to-Count-Objects-in-Natural-Images-for-Visual-Question-Answering.html">Learning to Count Objects in Natural Images for Visual Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-05T00:00:00-05:00" itemprop="datePublished">March 05, 2018</time></span> &raquo; <a href="/site/2018/03/05/An-Empirical-Investigation-of-Catastrophic-Forgetting-in-Gradient-Based-Neural-Networks.html">An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-05T00:00:00-04:00" itemprop="datePublished">November 05, 2017</time></span> &raquo; <a href="/site/2017/11/05/Word-Representations-via-Gaussian-Embedding.html">Word Representations via Gaussian Embedding</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ICLR+2014">ICLR 2014</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-05T00:00:00-05:00" itemprop="datePublished">March 05, 2018</time></span> &raquo; <a href="/site/2018/03/05/An-Empirical-Investigation-of-Catastrophic-Forgetting-in-Gradient-Based-Neural-Networks.html">An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ICLR+2015">ICLR 2015</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-05T00:00:00-04:00" itemprop="datePublished">November 05, 2017</time></span> &raquo; <a href="/site/2017/11/05/Word-Representations-via-Gaussian-Embedding.html">Word Representations via Gaussian Embedding</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ICLR+2016">ICLR 2016</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-05-21T00:00:00-04:00" itemprop="datePublished">May 21, 2018</time></span> &raquo; <a href="/site/2018/05/21/Net2Net-Accelerating-Learning-via-Knowledge-Transfer.html">Net2Net-Accelerating Learning via Knowledge Transfer</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ICLR+2017">ICLR 2017</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-25T00:00:00-05:00" itemprop="datePublished">January 25, 2021</time></span> &raquo; <a href="/site/2021/01/25/HyperNetworks.html">HyperNetworks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-14T00:00:00-04:00" itemprop="datePublished">August 14, 2020</time></span> &raquo; <a href="/site/2020/08/14/Outrageously-Large-Neural-Networks-The-Sparsely-Gated-Mixture-of-Experts-Layer.html">Outrageously Large Neural Networks--The Sparsely-Gated Mixture-of-Experts Layer</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ICLR+2018">ICLR 2018</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-12-25T00:00:00-05:00" itemprop="datePublished">December 25, 2018</time></span> &raquo; <a href="/site/2018/12/25/Smooth-Loss-Functions-for-Deep-Top-k-Classification.html">Smooth Loss Functions for Deep Top-k Classification</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-18T00:00:00-04:00" itemprop="datePublished">October 18, 2018</time></span> &raquo; <a href="/site/2018/10/18/BabyAI-First-Steps-Towards-Grounded-Language-Learning-With-a-Human-In-the-Loop.html">BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-07-04T00:00:00-04:00" itemprop="datePublished">July 04, 2018</time></span> &raquo; <a href="/site/2018/07/04/Memory-Based-Parameter-Adaption.html">Memory-based Parameter Adaptation</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-05-06T00:00:00-04:00" itemprop="datePublished">May 06, 2018</time></span> &raquo; <a href="/site/2018/05/06/Learning-to-Count-Objects-in-Natural-Images-for-Visual-Question-Answering.html">Learning to Count Objects in Natural Images for Visual Question Answering</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ICLR+2018%27">ICLR 2018'</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-27T00:00:00-05:00" itemprop="datePublished">February 27, 2020</time></span> &raquo; <a href="/site/2020/02/27/mixup-Beyond-Empirical-Risk-Minimization.html">mixup - Beyond Empirical Risk Minimization</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ICLR+2019">ICLR 2019</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-19T00:00:00-05:00" itemprop="datePublished">December 19, 2019</time></span> &raquo; <a href="/site/2019/12/19/ALBERT-A-Lite-BERT-for-Self-supervised-Learning-of-Language-Representations.html">ALBERT - A Lite BERT for Self-supervised Learning of Language Representations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-09-05T00:00:00-04:00" itemprop="datePublished">September 05, 2019</time></span> &raquo; <a href="/site/2019/09/05/How-to-train-your-MAML.html">How to train your MAML</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-27T00:00:00-04:00" itemprop="datePublished">June 27, 2019</time></span> &raquo; <a href="/site/2019/06/27/Measuring-Abstract-Reasoning-in-Neural-Networks.html">Measuring abstract reasoning in neural networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-01T00:00:00-04:00" itemprop="datePublished">June 01, 2019</time></span> &raquo; <a href="/site/2019/06/01/Relational-Reinforcement-Learning.html">Relational Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-04-02T00:00:00-04:00" itemprop="datePublished">April 02, 2019</time></span> &raquo; <a href="/site/2019/04/02/Meta-Learning-Update-Rules-for-Unsupervised-Representation-Learning.html">Meta-Learning Update Rules for Unsupervised Representation Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-29T00:00:00-05:00" itemprop="datePublished">January 29, 2019</time></span> &raquo; <a href="/site/2019/01/29/Diversity-is-All-You-Need-Learning-Skills-without-a-Reward-Function.html">Diversity is All You Need - Learning Skills without a Reward Function</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-15T00:00:00-05:00" itemprop="datePublished">January 15, 2019</time></span> &raquo; <a href="/site/2019/01/15/Hierarchical-RL-Using-an-Ensemble-of-Proprioceptive-Periodic-Policies.html">Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-08T00:00:00-05:00" itemprop="datePublished">January 08, 2019</time></span> &raquo; <a href="/site/2019/01/08/Efficient-Lifelong-Learning-with-A-GEM.html">Efficient Lifelong Learning with A-GEM</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ICLR+2020">ICLR 2020</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-08T00:00:00-05:00" itemprop="datePublished">February 08, 2021</time></span> &raquo; <a href="/site/2021/02/08/Continual-learning-with-hypernetworks.html">Continual learning with hypernetworks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-06-25T00:00:00-04:00" itemprop="datePublished">June 25, 2020</time></span> &raquo; <a href="/site/2020/06/25/Network-Randomization-A-Simple-Technique-for-Generalization-in-Deep-Reinforcement-Learning.html">Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-20T00:00:00-05:00" itemprop="datePublished">February 20, 2020</time></span> &raquo; <a href="/site/2020/02/20/ELECTRA-Pre-training-Text-Encoders-as-Discriminators-Rather-Than-Generators.html">ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-06T00:00:00-05:00" itemprop="datePublished">February 06, 2020</time></span> &raquo; <a href="/site/2020/02/06/Your-Classifier-is-Secretly-an-Energy-Based-Model,-and-You-Should-Treat-it-Like-One.html">Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-23T00:00:00-05:00" itemprop="datePublished">January 23, 2020</time></span> &raquo; <a href="/site/2020/01/23/Observational-Overfitting-in-Reinforcement-Learning.html">Observational Overfitting in Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-16T00:00:00-05:00" itemprop="datePublished">January 16, 2020</time></span> &raquo; <a href="/site/2020/01/16/Rapid-Learning-or-Feature-Reuse-Towards-Understanding-the-Effectiveness-of-MAML.html">Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ICLR+2021">ICLR 2021</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-22T00:00:00-05:00" itemprop="datePublished">February 22, 2021</time></span> &raquo; <a href="/site/2021/02/22/Anatomy-of-Catastrophic-Forgetting-Hidden-Representations-and-Task-Semantics.html">Anatomy of Catastrophic Forgetting - Hidden Representations and Task Semantics</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-15T00:00:00-05:00" itemprop="datePublished">February 15, 2021</time></span> &raquo; <a href="/site/2021/02/15/When-Do-Curricula-Work.html">When Do Curricula Work?</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ICML">ICML</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-07T00:00:00-04:00" itemprop="datePublished">September 07, 2020</time></span> &raquo; <a href="/site/2020/09/07/Revisiting-Fundamentals-of-Experience-Replay.html">Revisiting Fundamentals of Experience Replay</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-30T00:00:00-04:00" itemprop="datePublished">July 30, 2020</time></span> &raquo; <a href="/site/2020/07/30/GradNorm-Gradient-Normalization-for-Adaptive-Loss-Balancing-in-Deep-Multitask-Networks.html">GradNorm--Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-23T00:00:00-04:00" itemprop="datePublished">July 23, 2020</time></span> &raquo; <a href="/site/2020/07/23/TASKNORM-Rethinking-Batch-Normalization-for-Meta-Learning.html">TaskNorm--Rethinking Batch Normalization for Meta-Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-09T00:00:00-04:00" itemprop="datePublished">July 09, 2020</time></span> &raquo; <a href="/site/2020/07/09/Decentralized-Reinforcement-Learning-Global-Decision-Making-via-Local-Economic-Transactions.html">Decentralized Reinforcement Learning -- Global Decision-Making via Local Economic Transactions</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-07-25T00:00:00-04:00" itemprop="datePublished">July 25, 2019</time></span> &raquo; <a href="/site/2019/07/25/Quantifying-Generalization-in-Reinforcement-Learning.html">Quantifying Generalization in Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-07-18T00:00:00-04:00" itemprop="datePublished">July 18, 2019</time></span> &raquo; <a href="/site/2019/07/18/Set-Transformer-A-Framework-for-Attention-based-Permutation-Invariant-Neural-Networks.html">Set Transformer - A Framework for Attention-based Permutation-Invariant Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-13T00:00:00-04:00" itemprop="datePublished">June 13, 2019</time></span> &raquo; <a href="/site/2019/06/13/Extrapolating-Beyond-Suboptimal-Demonstrations-via-Inverse-Reinforcement-Learning-from-Observations.html">Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-11-01T00:00:00-04:00" itemprop="datePublished">November 01, 2018</time></span> &raquo; <a href="/site/2018/11/01/Learned-Optimizers-that-Scale-and-Generalize.html">Learned Optimizers that Scale and Generalize</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-08-21T00:00:00-04:00" itemprop="datePublished">August 21, 2018</time></span> &raquo; <a href="/site/2018/08/21/A-Semantic-Loss-Function-for-Deep-Learning-with-Symbolic-Knowledge.html">A Semantic Loss Function for Deep Learning with Symbolic Knowledge</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-07-19T00:00:00-04:00" itemprop="datePublished">July 19, 2018</time></span> &raquo; <a href="/site/2018/07/19/Kronecker-Recurrent-Units.html">Kronecker Recurrent Units</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-07-11T00:00:00-04:00" itemprop="datePublished">July 11, 2018</time></span> &raquo; <a href="/site/2018/07/11/Learning-Independent-Causal-Mechanisms.html">Learning Independent Causal Mechanisms</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-06-09T00:00:00-04:00" itemprop="datePublished">June 09, 2018</time></span> &raquo; <a href="/site/2018/06/09/Born-Again-Neural-Networks.html">Born Again Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-12-11T00:00:00-05:00" itemprop="datePublished">December 11, 2017</time></span> &raquo; <a href="/site/2017/12/11/Revisiting-Semi-Supervised-Learning-with-Graph-Embeddings.html">Revisiting Semi-Supervised Learning with Graph Embeddings</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ICML+2016">ICML 2016</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-12-11T00:00:00-05:00" itemprop="datePublished">December 11, 2017</time></span> &raquo; <a href="/site/2017/12/11/Revisiting-Semi-Supervised-Learning-with-Graph-Embeddings.html">Revisiting Semi-Supervised Learning with Graph Embeddings</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ICML+2017">ICML 2017</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-11-01T00:00:00-04:00" itemprop="datePublished">November 01, 2018</time></span> &raquo; <a href="/site/2018/11/01/Learned-Optimizers-that-Scale-and-Generalize.html">Learned Optimizers that Scale and Generalize</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ICML+2018">ICML 2018</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-30T00:00:00-04:00" itemprop="datePublished">July 30, 2020</time></span> &raquo; <a href="/site/2020/07/30/GradNorm-Gradient-Normalization-for-Adaptive-Loss-Balancing-in-Deep-Multitask-Networks.html">GradNorm--Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-12-11T00:00:00-05:00" itemprop="datePublished">December 11, 2018</time></span> &raquo; <a href="/site/2018/12/11/Representation-Tradeoffs-for-Hyperbolic-Embeddings.html">Representation Tradeoffs for Hyperbolic Embeddings</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-08-21T00:00:00-04:00" itemprop="datePublished">August 21, 2018</time></span> &raquo; <a href="/site/2018/08/21/A-Semantic-Loss-Function-for-Deep-Learning-with-Symbolic-Knowledge.html">A Semantic Loss Function for Deep Learning with Symbolic Knowledge</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-07-19T00:00:00-04:00" itemprop="datePublished">July 19, 2018</time></span> &raquo; <a href="/site/2018/07/19/Kronecker-Recurrent-Units.html">Kronecker Recurrent Units</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-07-11T00:00:00-04:00" itemprop="datePublished">July 11, 2018</time></span> &raquo; <a href="/site/2018/07/11/Learning-Independent-Causal-Mechanisms.html">Learning Independent Causal Mechanisms</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-06-09T00:00:00-04:00" itemprop="datePublished">June 09, 2018</time></span> &raquo; <a href="/site/2018/06/09/Born-Again-Neural-Networks.html">Born Again Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ICML+2019">ICML 2019</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-07-25T00:00:00-04:00" itemprop="datePublished">July 25, 2019</time></span> &raquo; <a href="/site/2019/07/25/Quantifying-Generalization-in-Reinforcement-Learning.html">Quantifying Generalization in Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-07-18T00:00:00-04:00" itemprop="datePublished">July 18, 2019</time></span> &raquo; <a href="/site/2019/07/18/Set-Transformer-A-Framework-for-Attention-based-Permutation-Invariant-Neural-Networks.html">Set Transformer - A Framework for Attention-based Permutation-Invariant Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-13T00:00:00-04:00" itemprop="datePublished">June 13, 2019</time></span> &raquo; <a href="/site/2019/06/13/Extrapolating-Beyond-Suboptimal-Demonstrations-via-Inverse-Reinforcement-Learning-from-Observations.html">Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ICML+2020">ICML 2020</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-07T00:00:00-04:00" itemprop="datePublished">September 07, 2020</time></span> &raquo; <a href="/site/2020/09/07/Revisiting-Fundamentals-of-Experience-Replay.html">Revisiting Fundamentals of Experience Replay</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-09T00:00:00-04:00" itemprop="datePublished">July 09, 2020</time></span> &raquo; <a href="/site/2020/07/09/Decentralized-Reinforcement-Learning-Global-Decision-Making-via-Local-Economic-Transactions.html">Decentralized Reinforcement Learning -- Global Decision-Making via Local Economic Transactions</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ICML+2020%27">ICML 2020'</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-23T00:00:00-04:00" itemprop="datePublished">July 23, 2020</time></span> &raquo; <a href="/site/2020/07/23/TASKNORM-Rethinking-Batch-Normalization-for-Meta-Learning.html">TaskNorm--Rethinking Batch Normalization for Meta-Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="IEEE">IEEE</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-07T00:00:00-05:00" itemprop="datePublished">December 07, 2020</time></span> &raquo; <a href="/site/2020/12/07/CAP-twelve-years-later-How-the-rules-have-changed.html">CAP twelve years later - How the rules have changed</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-30T00:00:00-05:00" itemprop="datePublished">November 30, 2020</time></span> &raquo; <a href="/site/2020/11/30/Consistency-Tradeoffs-in-Modern-Distributed-Database-System-Design.html">Consistency Tradeoffs in Modern Distributed Database System Design</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-09T00:00:00-05:00" itemprop="datePublished">November 09, 2020</time></span> &raquo; <a href="/site/2020/11/09/Searching-for-Build-Debt-Experiences-Managing-Technical-Debt-at-Google.html">Searching for Build Debt - Experiences Managing Technical Debt at Google</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="IRL">IRL</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-29T00:00:00-04:00" itemprop="datePublished">March 29, 2021</time></span> &raquo; <a href="/site/2021/03/29/Synthesized-Policies-for-Transfer-and-Adaptation-across-Tasks-and-Environments.html">Synthesized Policies for Transfer and Adaptation across Tasks and Environments</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-13T00:00:00-04:00" itemprop="datePublished">June 13, 2019</time></span> &raquo; <a href="/site/2019/06/13/Extrapolating-Beyond-Suboptimal-Demonstrations-via-Inverse-Reinforcement-Learning-from-Observations.html">Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ImageNet">ImageNet</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-23T00:00:00-05:00" itemprop="datePublished">November 23, 2020</time></span> &raquo; <a href="/site/2020/11/23/Exploring-Simple-Siamese-Representation-Learning.html">Exploring Simple Siamese Representation Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-04-30T00:00:00-04:00" itemprop="datePublished">April 30, 2020</time></span> &raquo; <a href="/site/2020/04/30/Supervised-Contrastive-Learning.html">Supervised Contrastive Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-03-05T00:00:00-05:00" itemprop="datePublished">March 05, 2020</time></span> &raquo; <a href="/site/2020/03/05/What-Does-Classifying-More-Than-10,000-Image-Categories-Tell-Us.html">What Does Classifying More Than 10,000 Image Categories Tell Us?</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-09T00:00:00-05:00" itemprop="datePublished">January 09, 2020</time></span> &raquo; <a href="/site/2020/01/09/Accurate-Large-Minibatch-SGD-Training-ImageNet-in-1-Hour.html">Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Incremental+Learning">Incremental Learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-06-18T00:00:00-04:00" itemprop="datePublished">June 18, 2020</time></span> &raquo; <a href="/site/2020/06/18/On-the-Difficulty-of-Warm-Starting-Neural-Network-Training.html">On the Difficulty of Warm-Starting Neural Network Training</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-07-04T00:00:00-04:00" itemprop="datePublished">July 04, 2018</time></span> &raquo; <a href="/site/2018/07/04/Memory-Based-Parameter-Adaption.html">Memory-based Parameter Adaptation</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Information+Retrieval">Information Retrieval</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-11T00:00:00-05:00" itemprop="datePublished">March 11, 2018</time></span> &raquo; <a href="/site/2018/03/11/Improving-Information-Extraction-by-Acquiring-External-Evidence-with-Reinforcement-Learning.html">Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-01T00:00:00-04:00" itemprop="datePublished">October 01, 2017</time></span> &raquo; <a href="/site/2017/10/01/Task-Oriented-Query-Reformulation-with-Reinforcement-Learning.html">Task-Oriented Query Reformulation with Reinforcement Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Information+Theory">Information Theory</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-29T00:00:00-05:00" itemprop="datePublished">January 29, 2019</time></span> &raquo; <a href="/site/2019/01/29/Diversity-is-All-You-Need-Learning-Skills-without-a-Reward-Function.html">Diversity is All You Need - Learning Skills without a Reward Function</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Initialization">Initialization</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-25T00:00:00-04:00" itemprop="datePublished">March 25, 2018</time></span> &raquo; <a href="/site/2018/03/25/The-Lottery-Ticket-Hypothesis-Training-Pruned-Neural-Networks.html">The Lottery Ticket Hypothesis - Training Pruned Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Interactive+Teaching">Interactive Teaching</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-18T00:00:00-04:00" itemprop="datePublished">October 18, 2018</time></span> &raquo; <a href="/site/2018/10/18/BabyAI-First-Steps-Towards-Grounded-Language-Learning-With-a-Human-In-the-Loop.html">BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Interpretability">Interpretability</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-04T00:00:00-05:00" itemprop="datePublished">January 04, 2021</time></span> &raquo; <a href="/site/2021/01/04/Compositional-Explanations-of-Neurons.html">Compositional Explanations of Neurons</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Invariance">Invariance</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-10-19T00:00:00-04:00" itemprop="datePublished">October 19, 2020</time></span> &raquo; <a href="/site/2020/10/19/Learning-Explanations-That-Are-Hard-To-Vary.html">Learning Explanations That Are Hard To Vary</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Inverse+Reinforcement+Learning">Inverse Reinforcement Learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-29T00:00:00-04:00" itemprop="datePublished">March 29, 2021</time></span> &raquo; <a href="/site/2021/03/29/Synthesized-Policies-for-Transfer-and-Adaptation-across-Tasks-and-Environments.html">Synthesized Policies for Transfer and Adaptation across Tasks and Environments</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-13T00:00:00-04:00" itemprop="datePublished">June 13, 2019</time></span> &raquo; <a href="/site/2019/06/13/Extrapolating-Beyond-Suboptimal-Demonstrations-via-Inverse-Reinforcement-Learning-from-Observations.html">Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="KD">KD</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-06-09T00:00:00-04:00" itemprop="datePublished">June 09, 2018</time></span> &raquo; <a href="/site/2018/06/09/Born-Again-Neural-Networks.html">Born Again Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="KDD">KDD</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-08T00:00:00-05:00" itemprop="datePublished">March 08, 2021</time></span> &raquo; <a href="/site/2021/03/08/Practical-Lessons-from-Predicting-Clicks-on-Ads-at-Facebook.html">Practical Lessons from Predicting Clicks on Ads at Facebook</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-01T00:00:00-05:00" itemprop="datePublished">March 01, 2021</time></span> &raquo; <a href="/site/2021/03/01/Ad-Click-Prediction-a-View-from-the-Trenches.html">Ad Click Prediction - a View from the Trenches</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-12-24T00:00:00-05:00" itemprop="datePublished">December 24, 2017</time></span> &raquo; <a href="/site/2017/12/24/PTE-Predictive-Text-Embedding-through-Large-scale-Heterogeneous-Text-Networks.html">PTE - Predictive Text Embedding through Large-scale Heterogeneous Text Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-24T00:00:00-04:00" itemprop="datePublished">July 24, 2017</time></span> &raquo; <a href="/site/2017/07/24/ReasoNet-Learning-to-Stop-Reading-in-Machine-Comprehension.html">ReasoNet - Learning to Stop Reading in Machine Comprehension</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="KDD+2013">KDD 2013</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-01T00:00:00-05:00" itemprop="datePublished">March 01, 2021</time></span> &raquo; <a href="/site/2021/03/01/Ad-Click-Prediction-a-View-from-the-Trenches.html">Ad Click Prediction - a View from the Trenches</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="KDD+2014">KDD 2014</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-08T00:00:00-05:00" itemprop="datePublished">March 08, 2021</time></span> &raquo; <a href="/site/2021/03/08/Practical-Lessons-from-Predicting-Clicks-on-Ads-at-Facebook.html">Practical Lessons from Predicting Clicks on Ads at Facebook</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="KDD+2015">KDD 2015</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-12-24T00:00:00-05:00" itemprop="datePublished">December 24, 2017</time></span> &raquo; <a href="/site/2017/12/24/PTE-Predictive-Text-Embedding-through-Large-scale-Heterogeneous-Text-Networks.html">PTE - Predictive Text Embedding through Large-scale Heterogeneous Text Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="KDD+2017">KDD 2017</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-24T00:00:00-04:00" itemprop="datePublished">July 24, 2017</time></span> &raquo; <a href="/site/2017/07/24/ReasoNet-Learning-to-Stop-Reading-in-Machine-Comprehension.html">ReasoNet - Learning to Stop Reading in Machine Comprehension</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="KRU">KRU</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-07-19T00:00:00-04:00" itemprop="datePublished">July 19, 2018</time></span> &raquo; <a href="/site/2018/07/19/Kronecker-Recurrent-Units.html">Kronecker Recurrent Units</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Kernel">Kernel</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-02T00:00:00-05:00" itemprop="datePublished">January 02, 2019</time></span> &raquo; <a href="/site/2019/01/02/Pre-training-Graph-Neural-Networks-with-Kernels.html">Pre-training Graph Neural Networks with Kernels</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Key+Value">Key Value</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-16T00:00:00-05:00" itemprop="datePublished">November 16, 2020</time></span> &raquo; <a href="/site/2020/11/16/Data-Management-for-Internet-Scale-Single-Sign-On.html">Data Management for Internet-Scale Single-Sign-On</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-22T00:00:00-04:00" itemprop="datePublished">August 22, 2019</time></span> &raquo; <a href="/site/2019/08/22/Large-Memory-Layers-with-Product-Keys.html">Large Memory Layers with Product Keys</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Knowledge+Distillation">Knowledge Distillation</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-06-09T00:00:00-04:00" itemprop="datePublished">June 09, 2018</time></span> &raquo; <a href="/site/2018/06/09/Born-Again-Neural-Networks.html">Born Again Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Knowledge+Transfer">Knowledge Transfer</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-06-09T00:00:00-04:00" itemprop="datePublished">June 09, 2018</time></span> &raquo; <a href="/site/2018/06/09/Born-Again-Neural-Networks.html">Born Again Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-05-21T00:00:00-04:00" itemprop="datePublished">May 21, 2018</time></span> &raquo; <a href="/site/2018/05/21/Net2Net-Accelerating-Learning-via-Knowledge-Transfer.html">Net2Net-Accelerating Learning via Knowledge Transfer</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Kronecker">Kronecker</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-07-19T00:00:00-04:00" itemprop="datePublished">July 19, 2018</time></span> &raquo; <a href="/site/2018/07/19/Kronecker-Recurrent-Units.html">Kronecker Recurrent Units</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="LL">LL</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-22T00:00:00-05:00" itemprop="datePublished">February 22, 2021</time></span> &raquo; <a href="/site/2021/02/22/Anatomy-of-Catastrophic-Forgetting-Hidden-Representations-and-Task-Semantics.html">Anatomy of Catastrophic Forgetting - Hidden Representations and Task Semantics</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-08T00:00:00-05:00" itemprop="datePublished">February 08, 2021</time></span> &raquo; <a href="/site/2021/02/08/Continual-learning-with-hypernetworks.html">Continual learning with hypernetworks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-18T00:00:00-05:00" itemprop="datePublished">January 18, 2021</time></span> &raquo; <a href="/site/2021/01/18/Energy-based-Models-for-Continual-Learning.html">Energy-based Models for Continual Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-10-12T00:00:00-04:00" itemprop="datePublished">October 12, 2020</time></span> &raquo; <a href="/site/2020/10/12/Remembering-for-the-Right-Reasons-Explanations-Reduce-Catastrophic-Forgetting.html">Remembering for the Right Reasons - Explanations Reduce Catastrophic Forgetting</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-13T00:00:00-05:00" itemprop="datePublished">February 13, 2020</time></span> &raquo; <a href="/site/2020/02/13/Gradient-based-sample-selection-for-online-continual-learning.html">Gradient based sample selection for online continual learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-02T00:00:00-05:00" itemprop="datePublished">January 02, 2020</time></span> &raquo; <a href="/site/2020/01/02/Superposition-of-many-models-into-one.html">Superposition of many models into one</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="LLM">LLM</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2023-02-10T00:00:00-05:00" itemprop="datePublished">February 10, 2023</time></span> &raquo; <a href="/site/2023/02/10/Toolformer-Language-Models-Can-Teach-Themselves-to-Use-Tools.html">Toolformer - Language Models Can Teach Themselves to Use Tools</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="LR">LR</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-18T00:00:00-04:00" itemprop="datePublished">March 18, 2018</time></span> &raquo; <a href="/site/2018/03/18/Cyclical-Learning-Rates-for-Training-Neural-Networks.html">Cyclical Learning Rates for Training Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Large+Language+Model">Large Language Model</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2023-02-10T00:00:00-05:00" itemprop="datePublished">February 10, 2023</time></span> &raquo; <a href="/site/2023/02/10/Toolformer-Language-Models-Can-Teach-Themselves-to-Use-Tools.html">Toolformer - Language Models Can Teach Themselves to Use Tools</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Latency">Latency</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-22T00:00:00-04:00" itemprop="datePublished">March 22, 2021</time></span> &raquo; <a href="/site/2021/03/22/Deep-Neural-Networks-for-YouTube-Recommendations.html">Deep Neural Networks for YouTube Recommendations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-15T00:00:00-04:00" itemprop="datePublished">March 15, 2021</time></span> &raquo; <a href="/site/2021/03/15/The-Tail-at-Scale.html">The Tail at Scale</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-07T00:00:00-05:00" itemprop="datePublished">December 07, 2020</time></span> &raquo; <a href="/site/2020/12/07/CAP-twelve-years-later-How-the-rules-have-changed.html">CAP twelve years later - How the rules have changed</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-30T00:00:00-05:00" itemprop="datePublished">November 30, 2020</time></span> &raquo; <a href="/site/2020/11/30/Consistency-Tradeoffs-in-Modern-Distributed-Database-System-Design.html">Consistency Tradeoffs in Modern Distributed Database System Design</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Latent+Variable">Latent Variable</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-02T00:00:00-05:00" itemprop="datePublished">November 02, 2020</time></span> &raquo; <a href="/site/2020/11/02/One-Solution-is-Not-All-You-Need-Few-Shot-Extrapolation-via-Structured-MaxEnt-RL.html">One Solution is Not All You Need - Few-Shot Extrapolation via Structured MaxEnt RL</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-17T00:00:00-05:00" itemprop="datePublished">February 17, 2018</time></span> &raquo; <a href="/site/2018/02/17/Neural-Relational-Inference-for-Interacting-Systems.html">Neural Relational Inference for Interacting Systems</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Learning+Optimizer%27">Learning Optimizer'</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-11-01T00:00:00-04:00" itemprop="datePublished">November 01, 2018</time></span> &raquo; <a href="/site/2018/11/01/Learned-Optimizers-that-Scale-and-Generalize.html">Learned Optimizers that Scale and Generalize</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Learning+Rate">Learning Rate</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-18T00:00:00-04:00" itemprop="datePublished">March 18, 2018</time></span> &raquo; <a href="/site/2018/03/18/Cyclical-Learning-Rates-for-Training-Neural-Networks.html">Cyclical Learning Rates for Training Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Lifelong+Learning">Lifelong Learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-22T00:00:00-05:00" itemprop="datePublished">February 22, 2021</time></span> &raquo; <a href="/site/2021/02/22/Anatomy-of-Catastrophic-Forgetting-Hidden-Representations-and-Task-Semantics.html">Anatomy of Catastrophic Forgetting - Hidden Representations and Task Semantics</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-08T00:00:00-05:00" itemprop="datePublished">February 08, 2021</time></span> &raquo; <a href="/site/2021/02/08/Continual-learning-with-hypernetworks.html">Continual learning with hypernetworks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-18T00:00:00-05:00" itemprop="datePublished">January 18, 2021</time></span> &raquo; <a href="/site/2021/01/18/Energy-based-Models-for-Continual-Learning.html">Energy-based Models for Continual Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-10-12T00:00:00-04:00" itemprop="datePublished">October 12, 2020</time></span> &raquo; <a href="/site/2020/10/12/Remembering-for-the-Right-Reasons-Explanations-Reduce-Catastrophic-Forgetting.html">Remembering for the Right Reasons - Explanations Reduce Catastrophic Forgetting</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-13T00:00:00-05:00" itemprop="datePublished">February 13, 2020</time></span> &raquo; <a href="/site/2020/02/13/Gradient-based-sample-selection-for-online-continual-learning.html">Gradient based sample selection for online continual learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-02T00:00:00-05:00" itemprop="datePublished">January 02, 2020</time></span> &raquo; <a href="/site/2020/01/02/Superposition-of-many-models-into-one.html">Superposition of many models into one</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-04-09T00:00:00-04:00" itemprop="datePublished">April 09, 2019</time></span> &raquo; <a href="/site/2019/04/09/Towards-a-natural-benchmark-for-continual-learning.html">Towards a natural benchmark for continual learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-12T00:00:00-04:00" itemprop="datePublished">March 12, 2019</time></span> &raquo; <a href="/site/2019/03/12/Model-Primitive-Hierarchical-Lifelong-Reinforcement-Learning.html">Model Primitive Hierarchical Lifelong Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-08T00:00:00-05:00" itemprop="datePublished">January 08, 2019</time></span> &raquo; <a href="/site/2019/01/08/Efficient-Lifelong-Learning-with-A-GEM.html">Efficient Lifelong Learning with A-GEM</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-05-21T00:00:00-04:00" itemprop="datePublished">May 21, 2018</time></span> &raquo; <a href="/site/2018/05/21/Net2Net-Accelerating-Learning-via-Knowledge-Transfer.html">Net2Net-Accelerating Learning via Knowledge Transfer</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Linear+Algebra">Linear Algebra</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-02-19T00:00:00-05:00" itemprop="datePublished">February 19, 2019</time></span> &raquo; <a href="/site/2019/02/19/TuckER-Tensor-Factorization-for-Knowledge-Graph-Completion.html">TuckER - Tensor Factorization for Knowledge Graph Completion</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Linear+Model">Linear Model</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-02-19T00:00:00-05:00" itemprop="datePublished">February 19, 2019</time></span> &raquo; <a href="/site/2019/02/19/TuckER-Tensor-Factorization-for-Knowledge-Graph-Completion.html">TuckER - Tensor Factorization for Knowledge Graph Completion</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Long-tailed+Dataset">Long-tailed Dataset</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-24T00:00:00-04:00" itemprop="datePublished">August 24, 2020</time></span> &raquo; <a href="/site/2020/08/24/Alpha-Net-Adaptation-with-Composition-in-Classifier-Space.html">Alpha Net--Adaptation with Composition in Classifier Space</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Loss">Loss</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-12-25T00:00:00-05:00" itemprop="datePublished">December 25, 2018</time></span> &raquo; <a href="/site/2018/12/25/Smooth-Loss-Functions-for-Deep-Top-k-Classification.html">Smooth Loss Functions for Deep Top-k Classification</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-08-21T00:00:00-04:00" itemprop="datePublished">August 21, 2018</time></span> &raquo; <a href="/site/2018/08/21/A-Semantic-Loss-Function-for-Deep-Learning-with-Symbolic-Knowledge.html">A Semantic Loss Function for Deep Learning with Symbolic Knowledge</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Loss+Function">Loss Function</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-12-25T00:00:00-05:00" itemprop="datePublished">December 25, 2018</time></span> &raquo; <a href="/site/2018/12/25/Smooth-Loss-Functions-for-Deep-Top-k-Classification.html">Smooth Loss Functions for Deep Top-k Classification</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-08-21T00:00:00-04:00" itemprop="datePublished">August 21, 2018</time></span> &raquo; <a href="/site/2018/08/21/A-Semantic-Loss-Function-for-Deep-Learning-with-Symbolic-Knowledge.html">A Semantic Loss Function for Deep Learning with Symbolic Knowledge</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="MAML">MAML</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-23T00:00:00-04:00" itemprop="datePublished">July 23, 2020</time></span> &raquo; <a href="/site/2020/07/23/TASKNORM-Rethinking-Batch-Normalization-for-Meta-Learning.html">TaskNorm--Rethinking Batch Normalization for Meta-Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-16T00:00:00-05:00" itemprop="datePublished">January 16, 2020</time></span> &raquo; <a href="/site/2020/01/16/Rapid-Learning-or-Feature-Reuse-Towards-Understanding-the-Effectiveness-of-MAML.html">Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-09-05T00:00:00-04:00" itemprop="datePublished">September 05, 2019</time></span> &raquo; <a href="/site/2019/09/05/How-to-train-your-MAML.html">How to train your MAML</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="MANN">MANN</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-25T00:00:00-04:00" itemprop="datePublished">October 25, 2018</time></span> &raquo; <a href="/site/2018/10/25/One-shot-Learning-with-Memory-Augmented-Neural-Networks.html">One-shot Learning with Memory-Augmented Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="MDP">MDP</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-23T00:00:00-05:00" itemprop="datePublished">January 23, 2020</time></span> &raquo; <a href="/site/2020/01/23/Observational-Overfitting-in-Reinforcement-Learning.html">Observational Overfitting in Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-26T00:00:00-05:00" itemprop="datePublished">December 26, 2019</time></span> &raquo; <a href="/site/2019/12/26/Towards-a-Unified-Theory-of-State-Abstraction-for-MDPs.html">Towards a Unified Theory of State Abstraction for MDPs</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="ML">ML</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-22T00:00:00-04:00" itemprop="datePublished">March 22, 2021</time></span> &raquo; <a href="/site/2021/03/22/Deep-Neural-Networks-for-YouTube-Recommendations.html">Deep Neural Networks for YouTube Recommendations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-08T00:00:00-05:00" itemprop="datePublished">March 08, 2021</time></span> &raquo; <a href="/site/2021/03/08/Practical-Lessons-from-Predicting-Clicks-on-Ads-at-Facebook.html">Practical Lessons from Predicting Clicks on Ads at Facebook</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-01T00:00:00-05:00" itemprop="datePublished">March 01, 2021</time></span> &raquo; <a href="/site/2021/03/01/Ad-Click-Prediction-a-View-from-the-Trenches.html">Ad Click Prediction - a View from the Trenches</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="MPNN">MPNN</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-04-08T00:00:00-04:00" itemprop="datePublished">April 08, 2018</time></span> &raquo; <a href="/site/2018/04/08/Neural-Message-Passing-for-Quantum-Chemistry.html">Neural Message Passing for Quantum Chemistry</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Machine+Comprehension">Machine Comprehension</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-28T00:00:00-05:00" itemprop="datePublished">November 28, 2017</time></span> &raquo; <a href="/site/2017/11/28/Two-Stage-Synthesis-Networks-for-Transfer-Learning-in-Machine-Comprehension.html">Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-15T00:00:00-04:00" itemprop="datePublished">October 15, 2017</time></span> &raquo; <a href="/site/2017/10/15/Reading-Wikipedia-to-Answer-Open-Domain-Questions.html">Reading Wikipedia to Answer Open-Domain Questions</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-08-07T00:00:00-04:00" itemprop="datePublished">August 07, 2017</time></span> &raquo; <a href="/site/2017/08/07/R-NET-Machine-Reading-Comprehension-with-Self-matching-Networks.html">R-NET - Machine Reading Comprehension with Self-matching Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-24T00:00:00-04:00" itemprop="datePublished">July 24, 2017</time></span> &raquo; <a href="/site/2017/07/24/ReasoNet-Learning-to-Stop-Reading-in-Machine-Comprehension.html">ReasoNet - Learning to Stop Reading in Machine Comprehension</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Machine+Learning">Machine Learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-22T00:00:00-04:00" itemprop="datePublished">March 22, 2021</time></span> &raquo; <a href="/site/2021/03/22/Deep-Neural-Networks-for-YouTube-Recommendations.html">Deep Neural Networks for YouTube Recommendations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-08T00:00:00-05:00" itemprop="datePublished">March 08, 2021</time></span> &raquo; <a href="/site/2021/03/08/Practical-Lessons-from-Predicting-Clicks-on-Ads-at-Facebook.html">Practical Lessons from Predicting Clicks on Ads at Facebook</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-01T00:00:00-05:00" itemprop="datePublished">March 01, 2021</time></span> &raquo; <a href="/site/2021/03/01/Ad-Click-Prediction-a-View-from-the-Trenches.html">Ad Click Prediction - a View from the Trenches</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Markov+Decision+Process">Markov Decision Process</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-23T00:00:00-05:00" itemprop="datePublished">January 23, 2020</time></span> &raquo; <a href="/site/2020/01/23/Observational-Overfitting-in-Reinforcement-Learning.html">Observational Overfitting in Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-26T00:00:00-05:00" itemprop="datePublished">December 26, 2019</time></span> &raquo; <a href="/site/2019/12/26/Towards-a-Unified-Theory-of-State-Abstraction-for-MDPs.html">Towards a Unified Theory of State Abstraction for MDPs</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Matrix">Matrix</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-02-19T00:00:00-05:00" itemprop="datePublished">February 19, 2019</time></span> &raquo; <a href="/site/2019/02/19/TuckER-Tensor-Factorization-for-Knowledge-Graph-Completion.html">TuckER - Tensor Factorization for Knowledge Graph Completion</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Matrix+Factorization">Matrix Factorization</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-02-19T00:00:00-05:00" itemprop="datePublished">February 19, 2019</time></span> &raquo; <a href="/site/2019/02/19/TuckER-Tensor-Factorization-for-Knowledge-Graph-Completion.html">TuckER - Tensor Factorization for Knowledge Graph Completion</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Memory">Memory</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-22T00:00:00-04:00" itemprop="datePublished">August 22, 2019</time></span> &raquo; <a href="/site/2019/08/22/Large-Memory-Layers-with-Product-Keys.html">Large Memory Layers with Product Keys</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-02-05T00:00:00-05:00" itemprop="datePublished">February 05, 2019</time></span> &raquo; <a href="/site/2019/02/05/Linguistic-Knowledge-as-Memory-for-Recurrent-Neural-Networks.html">Linguistic Knowledge as Memory for Recurrent Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-25T00:00:00-04:00" itemprop="datePublished">October 25, 2018</time></span> &raquo; <a href="/site/2018/10/25/One-shot-Learning-with-Memory-Augmented-Neural-Networks.html">One-shot Learning with Memory-Augmented Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Memory+Augmented+Neural+Network">Memory Augmented Neural Network</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-25T00:00:00-04:00" itemprop="datePublished">October 25, 2018</time></span> &raquo; <a href="/site/2018/10/25/One-shot-Learning-with-Memory-Augmented-Neural-Networks.html">One-shot Learning with Memory-Augmented Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Message+Passing">Message Passing</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-04-08T00:00:00-04:00" itemprop="datePublished">April 08, 2018</time></span> &raquo; <a href="/site/2018/04/08/Neural-Message-Passing-for-Quantum-Chemistry.html">Neural Message Passing for Quantum Chemistry</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Meta+Learning">Meta Learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-23T00:00:00-04:00" itemprop="datePublished">July 23, 2020</time></span> &raquo; <a href="/site/2020/07/23/TASKNORM-Rethinking-Batch-Normalization-for-Meta-Learning.html">TaskNorm--Rethinking Batch Normalization for Meta-Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-16T00:00:00-05:00" itemprop="datePublished">January 16, 2020</time></span> &raquo; <a href="/site/2020/01/16/Rapid-Learning-or-Feature-Reuse-Towards-Understanding-the-Effectiveness-of-MAML.html">Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-09-05T00:00:00-04:00" itemprop="datePublished">September 05, 2019</time></span> &raquo; <a href="/site/2019/09/05/How-to-train-your-MAML.html">How to train your MAML</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-08T00:00:00-04:00" itemprop="datePublished">June 08, 2019</time></span> &raquo; <a href="/site/2019/06/08/Meta-Reinforcement-Learning-of-Structured-Exploration-Strategies.html">Meta-Reinforcement Learning of Structured Exploration Strategies</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-04-02T00:00:00-04:00" itemprop="datePublished">April 02, 2019</time></span> &raquo; <a href="/site/2019/04/02/Meta-Learning-Update-Rules-for-Unsupervised-Representation-Learning.html">Meta-Learning Update Rules for Unsupervised Representation Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-22T00:00:00-05:00" itemprop="datePublished">January 22, 2019</time></span> &raquo; <a href="/site/2019/01/22/Modular-meta-learning.html">Modular meta-learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-11-01T00:00:00-04:00" itemprop="datePublished">November 01, 2018</time></span> &raquo; <a href="/site/2018/11/01/Learned-Optimizers-that-Scale-and-Generalize.html">Learned Optimizers that Scale and Generalize</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-25T00:00:00-04:00" itemprop="datePublished">October 25, 2018</time></span> &raquo; <a href="/site/2018/10/25/One-shot-Learning-with-Memory-Augmented-Neural-Networks.html">One-shot Learning with Memory-Augmented Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Meta+Reinforcement+Learning">Meta Reinforcement Learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-08T00:00:00-04:00" itemprop="datePublished">June 08, 2019</time></span> &raquo; <a href="/site/2019/06/08/Meta-Reinforcement-Learning-of-Structured-Exploration-Strategies.html">Meta-Reinforcement Learning of Structured Exploration Strategies</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Mixture+of+Experts">Mixture of Experts</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-14T00:00:00-04:00" itemprop="datePublished">August 14, 2020</time></span> &raquo; <a href="/site/2020/08/14/Outrageously-Large-Neural-Networks-The-Sparsely-Gated-Mixture-of-Experts-Layer.html">Outrageously Large Neural Networks--The Sparsely-Gated Mixture-of-Experts Layer</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Model+Parallelism">Model Parallelism</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-11T00:00:00-05:00" itemprop="datePublished">January 11, 2021</time></span> &raquo; <a href="/site/2021/01/11/GPipe-Easy-Scaling-with-Micro-Batch-Pipeline-Parallelism.html">GPipe - Easy Scaling with Micro-Batch Pipeline Parallelism</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Model-Based">Model-Based</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-02T00:00:00-04:00" itemprop="datePublished">July 02, 2020</time></span> &raquo; <a href="/site/2020/07/02/When-to-use-parametric-models-in-reinforcement-learning.html">When to use parametric models in reinforcement learning?</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-05T00:00:00-05:00" itemprop="datePublished">December 05, 2019</time></span> &raquo; <a href="/site/2019/12/05/Mastering-Atari,-Go,-Chess-and-Shogi-by-Planning-with-a-Learned-Model.html">Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-08T00:00:00-04:00" itemprop="datePublished">August 08, 2019</time></span> &raquo; <a href="/site/2019/08/08/Deep-Reinforcement-Learning-in-a-Handful-of-Trials-using-Probabilistic-Dynamics-Models.html">Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-05-14T00:00:00-04:00" itemprop="datePublished">May 14, 2019</time></span> &raquo; <a href="/site/2019/05/14/Multiple-Model-Based-Reinforcement-Learning.html">Multiple Model-Based Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-08-08T00:00:00-04:00" itemprop="datePublished">August 08, 2018</time></span> &raquo; <a href="/site/2018/08/08/Imagination-Augmented-Agents-for-Deep-Reinforcement-Learning.html">Imagination-Augmented Agents for Deep Reinforcement Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Model-Free">Model-Free</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-02T00:00:00-04:00" itemprop="datePublished">July 02, 2020</time></span> &raquo; <a href="/site/2020/07/02/When-to-use-parametric-models-in-reinforcement-learning.html">When to use parametric models in reinforcement learning?</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-05T00:00:00-05:00" itemprop="datePublished">December 05, 2019</time></span> &raquo; <a href="/site/2019/12/05/Mastering-Atari,-Go,-Chess-and-Shogi-by-Planning-with-a-Learned-Model.html">Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-08-08T00:00:00-04:00" itemprop="datePublished">August 08, 2018</time></span> &raquo; <a href="/site/2018/08/08/Imagination-Augmented-Agents-for-Deep-Reinforcement-Learning.html">Imagination-Augmented Agents for Deep Reinforcement Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Modular+ML">Modular ML</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-22T00:00:00-05:00" itemprop="datePublished">January 22, 2019</time></span> &raquo; <a href="/site/2019/01/22/Modular-meta-learning.html">Modular meta-learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Modular+Meta+Learning">Modular Meta Learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-22T00:00:00-05:00" itemprop="datePublished">January 22, 2019</time></span> &raquo; <a href="/site/2019/01/22/Modular-meta-learning.html">Modular meta-learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Modular+Network">Modular Network</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-22T00:00:00-05:00" itemprop="datePublished">January 22, 2019</time></span> &raquo; <a href="/site/2019/01/22/Modular-meta-learning.html">Modular meta-learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Module">Module</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-22T00:00:00-05:00" itemprop="datePublished">January 22, 2019</time></span> &raquo; <a href="/site/2019/01/22/Modular-meta-learning.html">Modular meta-learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Motif">Motif</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-19T00:00:00-05:00" itemprop="datePublished">November 19, 2017</time></span> &raquo; <a href="/site/2017/11/19/Higher-order-organization-of-complex-networks.html">Higher-order organization of complex networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-12T00:00:00-05:00" itemprop="datePublished">November 12, 2017</time></span> &raquo; <a href="/site/2017/11/12/Network-Motifs-Simple-Building-Blocks-of-Complex-Networks.html">Network Motifs - Simple Building Blocks of Complex Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Mujoco">Mujoco</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-15T00:00:00-05:00" itemprop="datePublished">January 15, 2019</time></span> &raquo; <a href="/site/2019/01/15/Hierarchical-RL-Using-an-Ensemble-of-Proprioceptive-Periodic-Policies.html">Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Multi+Domain">Multi Domain</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-30T00:00:00-05:00" itemprop="datePublished">January 30, 2020</time></span> &raquo; <a href="/site/2020/01/30/Massively-Multilingual-Neural-Machine-Translation-in-the-Wild-Findings-and-Challenges.html">Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Multi+Modal">Multi Modal</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-09-27T00:00:00-04:00" itemprop="datePublished">September 27, 2018</time></span> &raquo; <a href="/site/2018/09/27/HoME-a-Household-Multimodal-Environment.html">HoME - a Household Multimodal Environment</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-01T00:00:00-04:00" itemprop="datePublished">July 01, 2017</time></span> &raquo; <a href="/site/2017/07/01/One-Model-To-Learn-Them-All.html">One Model To Learn Them All</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Multi+Model">Multi Model</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-01T00:00:00-04:00" itemprop="datePublished">July 01, 2017</time></span> &raquo; <a href="/site/2017/07/01/One-Model-To-Learn-Them-All.html">One Model To Learn Them All</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Multi+Task">Multi Task</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-06T00:00:00-04:00" itemprop="datePublished">August 06, 2020</time></span> &raquo; <a href="/site/2020/08/06/Gradient-Surgery-for-Multi-Task-Learning.html">Gradient Surgery for Multi-Task Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-30T00:00:00-04:00" itemprop="datePublished">July 30, 2020</time></span> &raquo; <a href="/site/2020/07/30/GradNorm-Gradient-Normalization-for-Adaptive-Loss-Balancing-in-Deep-Multitask-Networks.html">GradNorm--Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-30T00:00:00-05:00" itemprop="datePublished">January 30, 2020</time></span> &raquo; <a href="/site/2020/01/30/Massively-Multilingual-Neural-Machine-Translation-in-the-Wild-Findings-and-Challenges.html">Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-16T00:00:00-04:00" itemprop="datePublished">March 16, 2019</time></span> &raquo; <a href="/site/2019/03/16/To-Tune-or-Not-to-Tune-Adapting-Pretrained-Representations-to-Diverse-Tasks.html">To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-29T00:00:00-05:00" itemprop="datePublished">January 29, 2018</time></span> &raquo; <a href="/site/2018/01/29/StarSpace-Embed-All-The-Things.html">StarSpace - Embed All The Things!</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Multi-Agent">Multi-Agent</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-09-12T00:00:00-04:00" itemprop="datePublished">September 12, 2018</time></span> &raquo; <a href="/site/2018/09/12/Emergence-of-Grounded-Compositional-Language-in-Multi-Agent-Populations.html">Emergence of Grounded Compositional Language in Multi-Agent Populations</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="NIPS">NIPS</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-12-18T00:00:00-05:00" itemprop="datePublished">December 18, 2018</time></span> &raquo; <a href="/site/2018/12/18/Hindsight-Experience-Replay.html">Hindsight Experience Replay</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-09-27T00:00:00-04:00" itemprop="datePublished">September 27, 2018</time></span> &raquo; <a href="/site/2018/09/27/HoME-a-Household-Multimodal-Environment.html">HoME - a Household Multimodal Environment</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-08-08T00:00:00-04:00" itemprop="datePublished">August 08, 2018</time></span> &raquo; <a href="/site/2018/08/08/Imagination-Augmented-Agents-for-Deep-Reinforcement-Learning.html">Imagination-Augmented Agents for Deep Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-14T00:00:00-05:00" itemprop="datePublished">January 14, 2018</time></span> &raquo; <a href="/site/2018/01/14/Exploring-Models-and-Data-for-Image-Question-Answering.html">Exploring Models and Data for Image Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-06T00:00:00-05:00" itemprop="datePublished">January 06, 2018</time></span> &raquo; <a href="/site/2018/01/06/How-transferable-are-features-in-deep-neural-networks.html">How transferable are features in deep neural networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-12-31T00:00:00-05:00" itemprop="datePublished">December 31, 2017</time></span> &raquo; <a href="/site/2017/12/31/Distilling-the-Knowledge-in-a-Neural-Network.html">Distilling the Knowledge in a Neural Network</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-08-27T00:00:00-04:00" itemprop="datePublished">August 27, 2017</time></span> &raquo; <a href="/site/2017/08/27/Pointer-Networks.html">Pointer Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="NIPS+2014">NIPS 2014</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-06T00:00:00-05:00" itemprop="datePublished">January 06, 2018</time></span> &raquo; <a href="/site/2018/01/06/How-transferable-are-features-in-deep-neural-networks.html">How transferable are features in deep neural networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-12-31T00:00:00-05:00" itemprop="datePublished">December 31, 2017</time></span> &raquo; <a href="/site/2017/12/31/Distilling-the-Knowledge-in-a-Neural-Network.html">Distilling the Knowledge in a Neural Network</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="NIPS+2015">NIPS 2015</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-14T00:00:00-05:00" itemprop="datePublished">January 14, 2018</time></span> &raquo; <a href="/site/2018/01/14/Exploring-Models-and-Data-for-Image-Question-Answering.html">Exploring Models and Data for Image Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-08-27T00:00:00-04:00" itemprop="datePublished">August 27, 2017</time></span> &raquo; <a href="/site/2017/08/27/Pointer-Networks.html">Pointer Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="NIPS+2017">NIPS 2017</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-12-18T00:00:00-05:00" itemprop="datePublished">December 18, 2018</time></span> &raquo; <a href="/site/2018/12/18/Hindsight-Experience-Replay.html">Hindsight Experience Replay</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-09-27T00:00:00-04:00" itemprop="datePublished">September 27, 2018</time></span> &raquo; <a href="/site/2018/09/27/HoME-a-Household-Multimodal-Environment.html">HoME - a Household Multimodal Environment</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-08-08T00:00:00-04:00" itemprop="datePublished">August 08, 2018</time></span> &raquo; <a href="/site/2018/08/08/Imagination-Augmented-Agents-for-Deep-Reinforcement-Learning.html">Imagination-Augmented Agents for Deep Reinforcement Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="NIPS+Workskop">NIPS Workskop</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-09-27T00:00:00-04:00" itemprop="datePublished">September 27, 2018</time></span> &raquo; <a href="/site/2018/09/27/HoME-a-Household-Multimodal-Environment.html">HoME - a Household Multimodal Environment</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="NLG">NLG</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-11T00:00:00-05:00" itemprop="datePublished">February 11, 2018</time></span> &raquo; <a href="/site/2018/02/11/Stylistic-Transfer-in-Natural-Language-Generation-Systems-Using-Recurrent-Neural-Networks.html">Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="NLI">NLI</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-04T00:00:00-05:00" itemprop="datePublished">January 04, 2021</time></span> &raquo; <a href="/site/2021/01/04/Compositional-Explanations-of-Neurons.html">Compositional Explanations of Neurons</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-15T00:00:00-04:00" itemprop="datePublished">August 15, 2019</time></span> &raquo; <a href="/site/2019/08/15/Abductive-Commonsense-Reasoning.html">Abductive Commonsense Reasoning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="NLP">NLP</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-01T00:00:00-05:00" itemprop="datePublished">February 01, 2021</time></span> &raquo; <a href="/site/2021/02/01/Zero-shot-Learning-by-Generating-Task-specific-Adapters.html">Zero-shot Learning by Generating Task-specific Adapters</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-20T00:00:00-05:00" itemprop="datePublished">February 20, 2020</time></span> &raquo; <a href="/site/2020/02/20/ELECTRA-Pre-training-Text-Encoders-as-Discriminators-Rather-Than-Generators.html">ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-30T00:00:00-05:00" itemprop="datePublished">January 30, 2020</time></span> &raquo; <a href="/site/2020/01/30/Massively-Multilingual-Neural-Machine-Translation-in-the-Wild-Findings-and-Challenges.html">Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-19T00:00:00-05:00" itemprop="datePublished">December 19, 2019</time></span> &raquo; <a href="/site/2019/12/19/ALBERT-A-Lite-BERT-for-Self-supervised-Learning-of-Language-Representations.html">ALBERT - A Lite BERT for Self-supervised Learning of Language Representations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-12T00:00:00-05:00" itemprop="datePublished">December 12, 2019</time></span> &raquo; <a href="/site/2019/12/12/Everything-Happens-for-a-Reason-Discovering-the-Purpose-of-Actions-in-Procedural-Text.html">Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-22T00:00:00-04:00" itemprop="datePublished">August 22, 2019</time></span> &raquo; <a href="/site/2019/08/22/Large-Memory-Layers-with-Product-Keys.html">Large Memory Layers with Product Keys</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-15T00:00:00-04:00" itemprop="datePublished">August 15, 2019</time></span> &raquo; <a href="/site/2019/08/15/Abductive-Commonsense-Reasoning.html">Abductive Commonsense Reasoning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-05-21T00:00:00-04:00" itemprop="datePublished">May 21, 2019</time></span> &raquo; <a href="/site/2019/05/21/Good-Enough-Compositional-Data-Augmentation.html">Good-Enough Compositional Data Augmentation</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-16T00:00:00-04:00" itemprop="datePublished">March 16, 2019</time></span> &raquo; <a href="/site/2019/03/16/To-Tune-or-Not-to-Tune-Adapting-Pretrained-Representations-to-Diverse-Tasks.html">To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-02-05T00:00:00-05:00" itemprop="datePublished">February 05, 2019</time></span> &raquo; <a href="/site/2019/02/05/Linguistic-Knowledge-as-Memory-for-Recurrent-Neural-Networks.html">Linguistic Knowledge as Memory for Recurrent Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-18T00:00:00-04:00" itemprop="datePublished">October 18, 2018</time></span> &raquo; <a href="/site/2018/10/18/BabyAI-First-Steps-Towards-Grounded-Language-Learning-With-a-Human-In-the-Loop.html">BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-11T00:00:00-04:00" itemprop="datePublished">October 11, 2018</time></span> &raquo; <a href="/site/2018/10/11/Poincare-Embeddings-for-Learning-Hierarchical-Representations.html">Poincaré Embeddings for Learning Hierarchical Representations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-04T00:00:00-04:00" itemprop="datePublished">October 04, 2018</time></span> &raquo; <a href="/site/2018/10/04/When-Recurrent-Models-Don-t-Need-To-Be-Recurrent.html">When Recurrent Models Don’t Need To Be Recurrent</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-09-12T00:00:00-04:00" itemprop="datePublished">September 12, 2018</time></span> &raquo; <a href="/site/2018/09/12/Emergence-of-Grounded-Compositional-Language-in-Multi-Agent-Populations.html">Emergence of Grounded Compositional Language in Multi-Agent Populations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-07-19T00:00:00-04:00" itemprop="datePublished">July 19, 2018</time></span> &raquo; <a href="/site/2018/07/19/Kronecker-Recurrent-Units.html">Kronecker Recurrent Units</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-05-06T00:00:00-04:00" itemprop="datePublished">May 06, 2018</time></span> &raquo; <a href="/site/2018/05/06/Learning-to-Count-Objects-in-Natural-Images-for-Visual-Question-Answering.html">Learning to Count Objects in Natural Images for Visual Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-11T00:00:00-05:00" itemprop="datePublished">March 11, 2018</time></span> &raquo; <a href="/site/2018/03/11/Improving-Information-Extraction-by-Acquiring-External-Evidence-with-Reinforcement-Learning.html">Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-11T00:00:00-05:00" itemprop="datePublished">February 11, 2018</time></span> &raquo; <a href="/site/2018/02/11/Stylistic-Transfer-in-Natural-Language-Generation-Systems-Using-Recurrent-Neural-Networks.html">Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-05T00:00:00-05:00" itemprop="datePublished">February 05, 2018</time></span> &raquo; <a href="/site/2018/02/05/Get-To-The-Point-Summarization-with-Pointer-Generator-Networks.html">Get To The Point - Summarization with Pointer-Generator Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-29T00:00:00-05:00" itemprop="datePublished">January 29, 2018</time></span> &raquo; <a href="/site/2018/01/29/StarSpace-Embed-All-The-Things.html">StarSpace - Embed All The Things!</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-22T00:00:00-05:00" itemprop="datePublished">January 22, 2018</time></span> &raquo; <a href="/site/2018/01/22/Emotional-Chatting-Machine-Emotional-Conversation-Generation-with-Internal-and-External-Memory.html">Emotional Chatting Machine - Emotional Conversation Generation with Internal and External Memory</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-14T00:00:00-05:00" itemprop="datePublished">January 14, 2018</time></span> &raquo; <a href="/site/2018/01/14/Exploring-Models-and-Data-for-Image-Question-Answering.html">Exploring Models and Data for Image Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-12-24T00:00:00-05:00" itemprop="datePublished">December 24, 2017</time></span> &raquo; <a href="/site/2017/12/24/PTE-Predictive-Text-Embedding-through-Large-scale-Heterogeneous-Text-Networks.html">PTE - Predictive Text Embedding through Large-scale Heterogeneous Text Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-28T00:00:00-05:00" itemprop="datePublished">November 28, 2017</time></span> &raquo; <a href="/site/2017/11/28/Two-Stage-Synthesis-Networks-for-Transfer-Learning-in-Machine-Comprehension.html">Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-05T00:00:00-04:00" itemprop="datePublished">November 05, 2017</time></span> &raquo; <a href="/site/2017/11/05/Word-Representations-via-Gaussian-Embedding.html">Word Representations via Gaussian Embedding</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-15T00:00:00-04:00" itemprop="datePublished">October 15, 2017</time></span> &raquo; <a href="/site/2017/10/15/Reading-Wikipedia-to-Answer-Open-Domain-Questions.html">Reading Wikipedia to Answer Open-Domain Questions</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-01T00:00:00-04:00" itemprop="datePublished">October 01, 2017</time></span> &raquo; <a href="/site/2017/10/01/Task-Oriented-Query-Reformulation-with-Reinforcement-Learning.html">Task-Oriented Query Reformulation with Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-09-22T00:00:00-04:00" itemprop="datePublished">September 22, 2017</time></span> &raquo; <a href="/site/2017/09/22/Refining-Source-Representations-with-Relation-Networks-for-Neural-Machine-Translation.html">Refining Source Representations with Relation Networks for Neural Machine Translation</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-08-27T00:00:00-04:00" itemprop="datePublished">August 27, 2017</time></span> &raquo; <a href="/site/2017/08/27/Pointer-Networks.html">Pointer Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-08-21T00:00:00-04:00" itemprop="datePublished">August 21, 2017</time></span> &raquo; <a href="/site/2017/08/21/Learning-to-Compute-Word-Embeddings-On-the-Fly.html">Learning to Compute Word Embeddings On the Fly</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-08-07T00:00:00-04:00" itemprop="datePublished">August 07, 2017</time></span> &raquo; <a href="/site/2017/08/07/R-NET-Machine-Reading-Comprehension-with-Self-matching-Networks.html">R-NET - Machine Reading Comprehension with Self-matching Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-24T00:00:00-04:00" itemprop="datePublished">July 24, 2017</time></span> &raquo; <a href="/site/2017/07/24/ReasoNet-Learning-to-Stop-Reading-in-Machine-Comprehension.html">ReasoNet - Learning to Stop Reading in Machine Comprehension</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-09T00:00:00-04:00" itemprop="datePublished">July 09, 2017</time></span> &raquo; <a href="/site/2017/07/09/Ask-Me-Anything-Dynamic-Memory-Networks-for-Natural-Language-Processing.html">Ask Me Anything -  Dynamic Memory Networks for Natural Language Processing</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-01T00:00:00-04:00" itemprop="datePublished">July 01, 2017</time></span> &raquo; <a href="/site/2017/07/01/One-Model-To-Learn-Them-All.html">One Model To Learn Them All</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-06-26T00:00:00-04:00" itemprop="datePublished">June 26, 2017</time></span> &raquo; <a href="/site/2017/06/26/Two-Too-Simple-Adaptations-of-Word2Vec-for-Syntax-Problems.html">Two/Too Simple Adaptations of Word2Vec for Syntax Problems</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-06-17T00:00:00-04:00" itemprop="datePublished">June 17, 2017</time></span> &raquo; <a href="/site/2017/06/17/A-Decomposable-Attention-Model-for-Natural-Language-Inference.html">A Decomposable Attention Model for Natural Language Inference</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-06-03T00:00:00-04:00" itemprop="datePublished">June 03, 2017</time></span> &raquo; <a href="/site/2017/06/03/A-Fast-and-Accurate-Dependency-Parser-using-Neural-Networks.html">A Fast and Accurate Dependency Parser using Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-05-23T00:00:00-04:00" itemprop="datePublished">May 23, 2017</time></span> &raquo; <a href="/site/2017/05/23/Neural-Module-Networks.html">Neural Module Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-05-14T00:00:00-04:00" itemprop="datePublished">May 14, 2017</time></span> &raquo; <a href="/site/2017/05/14/Making-the-V-in-VQA-Matter-Elevating-the-Role-of-Image-Understanding-in-Visual-Question-Answering.html">Making the V in VQA Matter - Elevating the Role of Image Understanding in Visual Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-04-28T00:00:00-04:00" itemprop="datePublished">April 28, 2017</time></span> &raquo; <a href="/site/2017/04/28/Simple-Baseline-for-Visual-Question-Answering.html">Simple Baseline for Visual Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-04-27T00:00:00-04:00" itemprop="datePublished">April 27, 2017</time></span> &raquo; <a href="/site/2017/04/27/VQA-Visual-Question-Answering.html">VQA-Visual Question Answering</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="NMT">NMT</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-30T00:00:00-05:00" itemprop="datePublished">January 30, 2020</time></span> &raquo; <a href="/site/2020/01/30/Massively-Multilingual-Neural-Machine-Translation-in-the-Wild-Findings-and-Challenges.html">Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-09-22T00:00:00-04:00" itemprop="datePublished">September 22, 2017</time></span> &raquo; <a href="/site/2017/09/22/Refining-Source-Representations-with-Relation-Networks-for-Neural-Machine-Translation.html">Refining Source Representations with Relation Networks for Neural Machine Translation</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Natural+Language+Inference">Natural Language Inference</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-04T00:00:00-05:00" itemprop="datePublished">January 04, 2021</time></span> &raquo; <a href="/site/2021/01/04/Compositional-Explanations-of-Neurons.html">Compositional Explanations of Neurons</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-15T00:00:00-04:00" itemprop="datePublished">August 15, 2019</time></span> &raquo; <a href="/site/2019/08/15/Abductive-Commonsense-Reasoning.html">Abductive Commonsense Reasoning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-06-17T00:00:00-04:00" itemprop="datePublished">June 17, 2017</time></span> &raquo; <a href="/site/2017/06/17/A-Decomposable-Attention-Model-for-Natural-Language-Inference.html">A Decomposable Attention Model for Natural Language Inference</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Natural+Language+Processing">Natural Language Processing</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-01T00:00:00-05:00" itemprop="datePublished">February 01, 2021</time></span> &raquo; <a href="/site/2021/02/01/Zero-shot-Learning-by-Generating-Task-specific-Adapters.html">Zero-shot Learning by Generating Task-specific Adapters</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-20T00:00:00-05:00" itemprop="datePublished">February 20, 2020</time></span> &raquo; <a href="/site/2020/02/20/ELECTRA-Pre-training-Text-Encoders-as-Discriminators-Rather-Than-Generators.html">ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-30T00:00:00-05:00" itemprop="datePublished">January 30, 2020</time></span> &raquo; <a href="/site/2020/01/30/Massively-Multilingual-Neural-Machine-Translation-in-the-Wild-Findings-and-Challenges.html">Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-19T00:00:00-05:00" itemprop="datePublished">December 19, 2019</time></span> &raquo; <a href="/site/2019/12/19/ALBERT-A-Lite-BERT-for-Self-supervised-Learning-of-Language-Representations.html">ALBERT - A Lite BERT for Self-supervised Learning of Language Representations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-22T00:00:00-04:00" itemprop="datePublished">August 22, 2019</time></span> &raquo; <a href="/site/2019/08/22/Large-Memory-Layers-with-Product-Keys.html">Large Memory Layers with Product Keys</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-15T00:00:00-04:00" itemprop="datePublished">August 15, 2019</time></span> &raquo; <a href="/site/2019/08/15/Abductive-Commonsense-Reasoning.html">Abductive Commonsense Reasoning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-16T00:00:00-04:00" itemprop="datePublished">March 16, 2019</time></span> &raquo; <a href="/site/2019/03/16/To-Tune-or-Not-to-Tune-Adapting-Pretrained-Representations-to-Diverse-Tasks.html">To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-02-05T00:00:00-05:00" itemprop="datePublished">February 05, 2019</time></span> &raquo; <a href="/site/2019/02/05/Linguistic-Knowledge-as-Memory-for-Recurrent-Neural-Networks.html">Linguistic Knowledge as Memory for Recurrent Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-18T00:00:00-04:00" itemprop="datePublished">October 18, 2018</time></span> &raquo; <a href="/site/2018/10/18/BabyAI-First-Steps-Towards-Grounded-Language-Learning-With-a-Human-In-the-Loop.html">BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-11T00:00:00-04:00" itemprop="datePublished">October 11, 2018</time></span> &raquo; <a href="/site/2018/10/11/Poincare-Embeddings-for-Learning-Hierarchical-Representations.html">Poincaré Embeddings for Learning Hierarchical Representations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-04T00:00:00-04:00" itemprop="datePublished">October 04, 2018</time></span> &raquo; <a href="/site/2018/10/04/When-Recurrent-Models-Don-t-Need-To-Be-Recurrent.html">When Recurrent Models Don’t Need To Be Recurrent</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-09-12T00:00:00-04:00" itemprop="datePublished">September 12, 2018</time></span> &raquo; <a href="/site/2018/09/12/Emergence-of-Grounded-Compositional-Language-in-Multi-Agent-Populations.html">Emergence of Grounded Compositional Language in Multi-Agent Populations</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Network">Network</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-12-24T00:00:00-05:00" itemprop="datePublished">December 24, 2017</time></span> &raquo; <a href="/site/2017/12/24/PTE-Predictive-Text-Embedding-through-Large-scale-Heterogeneous-Text-Networks.html">PTE - Predictive Text Embedding through Large-scale Heterogeneous Text Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-19T00:00:00-05:00" itemprop="datePublished">November 19, 2017</time></span> &raquo; <a href="/site/2017/11/19/Higher-order-organization-of-complex-networks.html">Higher-order organization of complex networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-12T00:00:00-05:00" itemprop="datePublished">November 12, 2017</time></span> &raquo; <a href="/site/2017/11/12/Network-Motifs-Simple-Building-Blocks-of-Complex-Networks.html">Network Motifs - Simple Building Blocks of Complex Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Network+Embedding">Network Embedding</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-12-24T00:00:00-05:00" itemprop="datePublished">December 24, 2017</time></span> &raquo; <a href="/site/2017/12/24/PTE-Predictive-Text-Embedding-through-Large-scale-Heterogeneous-Text-Networks.html">PTE - Predictive Text Embedding through Large-scale Heterogeneous Text Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="NeurIPS">NeurIPS</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-29T00:00:00-04:00" itemprop="datePublished">March 29, 2021</time></span> &raquo; <a href="/site/2021/03/29/Synthesized-Policies-for-Transfer-and-Adaptation-across-Tasks-and-Environments.html">Synthesized Policies for Transfer and Adaptation across Tasks and Environments</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-11T00:00:00-05:00" itemprop="datePublished">January 11, 2021</time></span> &raquo; <a href="/site/2021/01/11/GPipe-Easy-Scaling-with-Micro-Batch-Pipeline-Parallelism.html">GPipe - Easy Scaling with Micro-Batch Pipeline Parallelism</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-04T00:00:00-05:00" itemprop="datePublished">January 04, 2021</time></span> &raquo; <a href="/site/2021/01/04/Compositional-Explanations-of-Neurons.html">Compositional Explanations of Neurons</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-02T00:00:00-05:00" itemprop="datePublished">November 02, 2020</time></span> &raquo; <a href="/site/2020/11/02/One-Solution-is-Not-All-You-Need-Few-Shot-Extrapolation-via-Structured-MaxEnt-RL.html">One Solution is Not All You Need - Few-Shot Extrapolation via Structured MaxEnt RL</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-13T00:00:00-05:00" itemprop="datePublished">February 13, 2020</time></span> &raquo; <a href="/site/2020/02/13/Gradient-based-sample-selection-for-online-continual-learning.html">Gradient based sample selection for online continual learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-08T00:00:00-04:00" itemprop="datePublished">June 08, 2019</time></span> &raquo; <a href="/site/2019/06/08/Meta-Reinforcement-Learning-of-Structured-Exploration-Strategies.html">Meta-Reinforcement Learning of Structured Exploration Strategies</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="NeurIPS+2018">NeurIPS 2018</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-29T00:00:00-04:00" itemprop="datePublished">March 29, 2021</time></span> &raquo; <a href="/site/2021/03/29/Synthesized-Policies-for-Transfer-and-Adaptation-across-Tasks-and-Environments.html">Synthesized Policies for Transfer and Adaptation across Tasks and Environments</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-08T00:00:00-04:00" itemprop="datePublished">June 08, 2019</time></span> &raquo; <a href="/site/2019/06/08/Meta-Reinforcement-Learning-of-Structured-Exploration-Strategies.html">Meta-Reinforcement Learning of Structured Exploration Strategies</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-04-09T00:00:00-04:00" itemprop="datePublished">April 09, 2019</time></span> &raquo; <a href="/site/2019/04/09/Towards-a-natural-benchmark-for-continual-learning.html">Towards a natural benchmark for continual learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="NeurIPS+2019">NeurIPS 2019</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-11T00:00:00-05:00" itemprop="datePublished">January 11, 2021</time></span> &raquo; <a href="/site/2021/01/11/GPipe-Easy-Scaling-with-Micro-Batch-Pipeline-Parallelism.html">GPipe - Easy Scaling with Micro-Batch Pipeline Parallelism</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-13T00:00:00-05:00" itemprop="datePublished">February 13, 2020</time></span> &raquo; <a href="/site/2020/02/13/Gradient-based-sample-selection-for-online-continual-learning.html">Gradient based sample selection for online continual learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="NeurIPS+2020">NeurIPS 2020</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-04T00:00:00-05:00" itemprop="datePublished">January 04, 2021</time></span> &raquo; <a href="/site/2021/01/04/Compositional-Explanations-of-Neurons.html">Compositional Explanations of Neurons</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-02T00:00:00-05:00" itemprop="datePublished">November 02, 2020</time></span> &raquo; <a href="/site/2020/11/02/One-Solution-is-Not-All-You-Need-Few-Shot-Extrapolation-via-Structured-MaxEnt-RL.html">One Solution is Not All You Need - Few-Shot Extrapolation via Structured MaxEnt RL</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="NeurIPS+Workshop+2018">NeurIPS Workshop 2018</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-04-09T00:00:00-04:00" itemprop="datePublished">April 09, 2019</time></span> &raquo; <a href="/site/2019/04/09/Towards-a-natural-benchmark-for-continual-learning.html">Towards a natural benchmark for continual learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Neural+Computation">Neural Computation</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-05-14T00:00:00-04:00" itemprop="datePublished">May 14, 2019</time></span> &raquo; <a href="/site/2019/05/14/Multiple-Model-Based-Reinforcement-Learning.html">Multiple Model-Based Reinforcement Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Neural+Computation+2002">Neural Computation 2002</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-05-14T00:00:00-04:00" itemprop="datePublished">May 14, 2019</time></span> &raquo; <a href="/site/2019/05/14/Multiple-Model-Based-Reinforcement-Learning.html">Multiple Model-Based Reinforcement Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Neural+Machine+Translation">Neural Machine Translation</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-30T00:00:00-05:00" itemprop="datePublished">January 30, 2020</time></span> &raquo; <a href="/site/2020/01/30/Massively-Multilingual-Neural-Machine-Translation-in-the-Wild-Findings-and-Challenges.html">Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Neural+Message+Passing">Neural Message Passing</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-04-08T00:00:00-04:00" itemprop="datePublished">April 08, 2018</time></span> &raquo; <a href="/site/2018/04/08/Neural-Message-Passing-for-Quantum-Chemistry.html">Neural Message Passing for Quantum Chemistry</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Neural+Module+Network">Neural Module Network</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-05-23T00:00:00-04:00" itemprop="datePublished">May 23, 2017</time></span> &raquo; <a href="/site/2017/05/23/Neural-Module-Networks.html">Neural Module Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Neurips">Neurips</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-02T00:00:00-04:00" itemprop="datePublished">July 02, 2020</time></span> &raquo; <a href="/site/2020/07/02/When-to-use-parametric-models-in-reinforcement-learning.html">When to use parametric models in reinforcement learning?</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-09-12T00:00:00-04:00" itemprop="datePublished">September 12, 2019</time></span> &raquo; <a href="/site/2019/09/12/Gossip-based-Actor-Learner-Architectures-for-Deep-RL.html">Gossip based Actor-Learner Architectures for Deep RL</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-08T00:00:00-04:00" itemprop="datePublished">August 08, 2019</time></span> &raquo; <a href="/site/2019/08/08/Deep-Reinforcement-Learning-in-a-Handful-of-Trials-using-Probabilistic-Dynamics-Models.html">Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Neurips+2018">Neurips 2018</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-08T00:00:00-04:00" itemprop="datePublished">August 08, 2019</time></span> &raquo; <a href="/site/2019/08/08/Deep-Reinforcement-Learning-in-a-Handful-of-Trials-using-Probabilistic-Dynamics-Models.html">Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Neurips+2019">Neurips 2019</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-02T00:00:00-04:00" itemprop="datePublished">July 02, 2020</time></span> &raquo; <a href="/site/2020/07/02/When-to-use-parametric-models-in-reinforcement-learning.html">When to use parametric models in reinforcement learning?</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-09-12T00:00:00-04:00" itemprop="datePublished">September 12, 2019</time></span> &raquo; <a href="/site/2019/09/12/Gossip-based-Actor-Learner-Architectures-for-Deep-RL.html">Gossip based Actor-Learner Architectures for Deep RL</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Normalization">Normalization</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-23T00:00:00-04:00" itemprop="datePublished">July 23, 2020</time></span> &raquo; <a href="/site/2020/07/23/TASKNORM-Rethinking-Batch-Normalization-for-Meta-Learning.html">TaskNorm--Rethinking Batch Normalization for Meta-Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="OPT">OPT</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2023-02-10T00:00:00-05:00" itemprop="datePublished">February 10, 2023</time></span> &raquo; <a href="/site/2023/02/10/Toolformer-Language-Models-Can-Teach-Themselves-to-Use-Tools.html">Toolformer - Language Models Can Teach Themselves to Use Tools</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="OS">OS</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-21T00:00:00-04:00" itemprop="datePublished">September 21, 2020</time></span> &raquo; <a href="/site/2020/09/21/Harvest,-Yield,-and-Scalable-Tolerant-Systems.html">Harvest, Yield, and Scalable Tolerant Systems</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Object-Oriented+Learning">Object-Oriented Learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-14T00:00:00-04:00" itemprop="datePublished">September 14, 2020</time></span> &raquo; <a href="/site/2020/09/14/MONet-Unsupervised-Scene-Decomposition-and-Representation.html">MONet - Unsupervised Scene Decomposition and Representation</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-11-28T00:00:00-05:00" itemprop="datePublished">November 28, 2019</time></span> &raquo; <a href="/site/2019/11/28/Contrastive-Learning-of-Structured-World-Models.html">Contrastive Learning of Structured World Models</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Off+policy+RL">Off policy RL</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-07T00:00:00-04:00" itemprop="datePublished">September 07, 2020</time></span> &raquo; <a href="/site/2020/09/07/Revisiting-Fundamentals-of-Experience-Replay.html">Revisiting Fundamentals of Experience Replay</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-31T00:00:00-04:00" itemprop="datePublished">August 31, 2020</time></span> &raquo; <a href="/site/2020/08/31/Deep-Reinforcement-Learning-and-the-Deadly-Triad.html">Deep Reinforcement Learning and the Deadly Triad</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-12-18T00:00:00-05:00" itemprop="datePublished">December 18, 2018</time></span> &raquo; <a href="/site/2018/12/18/Hindsight-Experience-Replay.html">Hindsight Experience Replay</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="One+shot+learning">One shot learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-25T00:00:00-04:00" itemprop="datePublished">October 25, 2018</time></span> &raquo; <a href="/site/2018/10/25/One-shot-Learning-with-Memory-Augmented-Neural-Networks.html">One-shot Learning with Memory-Augmented Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Online+Learning">Online Learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-06-18T00:00:00-04:00" itemprop="datePublished">June 18, 2020</time></span> &raquo; <a href="/site/2020/06/18/On-the-Difficulty-of-Warm-Starting-Neural-Network-Training.html">On the Difficulty of Warm-Starting Neural Network Training</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Operating+Systems">Operating Systems</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-21T00:00:00-04:00" itemprop="datePublished">September 21, 2020</time></span> &raquo; <a href="/site/2020/09/21/Harvest,-Yield,-and-Scalable-Tolerant-Systems.html">Harvest, Yield, and Scalable Tolerant Systems</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Optimizer">Optimizer</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-11-01T00:00:00-04:00" itemprop="datePublished">November 01, 2018</time></span> &raquo; <a href="/site/2018/11/01/Learned-Optimizers-that-Scale-and-Generalize.html">Learned Optimizers that Scale and Generalize</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Out+of+Distribution">Out of Distribution</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-06T00:00:00-05:00" itemprop="datePublished">February 06, 2020</time></span> &raquo; <a href="/site/2020/02/06/Your-Classifier-is-Secretly-an-Energy-Based-Model,-and-You-Should-Treat-it-Like-One.html">Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-17T00:00:00-04:00" itemprop="datePublished">July 17, 2017</time></span> &raquo; <a href="/site/2017/07/17/Principled-Detection-of-Out-of-Distribution-Examples-in-Neural-Networks.html">Principled Detection of Out-of-Distribution Examples in Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Out+of+Distribution+Detection">Out of Distribution Detection</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-06T00:00:00-05:00" itemprop="datePublished">February 06, 2020</time></span> &raquo; <a href="/site/2020/02/06/Your-Classifier-is-Secretly-an-Energy-Based-Model,-and-You-Should-Treat-it-Like-One.html">Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Out+of+Vocabulary+Words">Out of Vocabulary Words</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-08-21T00:00:00-04:00" itemprop="datePublished">August 21, 2017</time></span> &raquo; <a href="/site/2017/08/21/Learning-to-Compute-Word-Embeddings-On-the-Fly.html">Learning to Compute Word Embeddings On the Fly</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Outlier+Detection">Outlier Detection</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-06T00:00:00-05:00" itemprop="datePublished">February 06, 2020</time></span> &raquo; <a href="/site/2020/02/06/Your-Classifier-is-Secretly-an-Energy-Based-Model,-and-You-Should-Treat-it-Like-One.html">Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="POS">POS</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-09T00:00:00-04:00" itemprop="datePublished">July 09, 2017</time></span> &raquo; <a href="/site/2017/07/09/Ask-Me-Anything-Dynamic-Memory-Networks-for-Natural-Language-Processing.html">Ask Me Anything -  Dynamic Memory Networks for Natural Language Processing</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Physical+Reasoning">Physical Reasoning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-29T00:00:00-04:00" itemprop="datePublished">August 29, 2019</time></span> &raquo; <a href="/site/2019/08/29/PHYRE-A-New-Benchmark-for-Physical-Reasoning.html">PHYRE - A New Benchmark for Physical Reasoning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Physics">Physics</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-29T00:00:00-04:00" itemprop="datePublished">August 29, 2019</time></span> &raquo; <a href="/site/2019/08/29/PHYRE-A-New-Benchmark-for-Physical-Reasoning.html">PHYRE - A New Benchmark for Physical Reasoning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-20T00:00:00-04:00" itemprop="datePublished">June 20, 2019</time></span> &raquo; <a href="/site/2019/06/20/Hamiltonian-Neural-Networks.html">Hamiltonian Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Planning">Planning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-02T00:00:00-04:00" itemprop="datePublished">July 02, 2020</time></span> &raquo; <a href="/site/2020/07/02/When-to-use-parametric-models-in-reinforcement-learning.html">When to use parametric models in reinforcement learning?</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-05T00:00:00-05:00" itemprop="datePublished">December 05, 2019</time></span> &raquo; <a href="/site/2019/12/05/Mastering-Atari,-Go,-Chess-and-Shogi-by-Planning-with-a-Learned-Model.html">Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Poincare+Ball+Model">Poincare Ball Model</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-12-11T00:00:00-05:00" itemprop="datePublished">December 11, 2018</time></span> &raquo; <a href="/site/2018/12/11/Representation-Tradeoffs-for-Hyperbolic-Embeddings.html">Representation Tradeoffs for Hyperbolic Embeddings</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-11T00:00:00-04:00" itemprop="datePublished">October 11, 2018</time></span> &raquo; <a href="/site/2018/10/11/Poincare-Embeddings-for-Learning-Hierarchical-Representations.html">Poincaré Embeddings for Learning Hierarchical Representations</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Pointer+Network">Pointer Network</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-05T00:00:00-05:00" itemprop="datePublished">February 05, 2018</time></span> &raquo; <a href="/site/2018/02/05/Get-To-The-Point-Summarization-with-Pointer-Generator-Networks.html">Get To The Point - Summarization with Pointer-Generator Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Pooling">Pooling</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-08-16T00:00:00-04:00" itemprop="datePublished">August 16, 2018</time></span> &raquo; <a href="/site/2018/08/16/Hierarchical-Graph-Representation-Learning-with-Differentiable-Pooling.html">Hierarchical Graph Representation Learning with Differentiable Pooling</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="PreTrained+Langauge+Model">PreTrained Langauge Model</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2023-02-10T00:00:00-05:00" itemprop="datePublished">February 10, 2023</time></span> &raquo; <a href="/site/2023/02/10/Toolformer-Language-Models-Can-Teach-Themselves-to-Use-Tools.html">Toolformer - Language Models Can Teach Themselves to Use Tools</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Pretraining">Pretraining</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-20T00:00:00-05:00" itemprop="datePublished">February 20, 2020</time></span> &raquo; <a href="/site/2020/02/20/ELECTRA-Pre-training-Text-Encoders-as-Discriminators-Rather-Than-Generators.html">ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-02T00:00:00-05:00" itemprop="datePublished">January 02, 2019</time></span> &raquo; <a href="/site/2019/01/02/Pre-training-Graph-Neural-Networks-with-Kernels.html">Pre-training Graph Neural Networks with Kernels</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Procedural+Text">Procedural Text</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-12T00:00:00-05:00" itemprop="datePublished">December 12, 2019</time></span> &raquo; <a href="/site/2019/12/12/Everything-Happens-for-a-Reason-Discovering-the-Purpose-of-Actions-in-Procedural-Text.html">Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Pruning+Network">Pruning Network</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-25T00:00:00-04:00" itemprop="datePublished">March 25, 2018</time></span> &raquo; <a href="/site/2018/03/25/The-Lottery-Ticket-Hypothesis-Training-Pruned-Neural-Networks.html">The Lottery Ticket Hypothesis - Training Pruned Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="QA">QA</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-16T00:00:00-04:00" itemprop="datePublished">March 16, 2019</time></span> &raquo; <a href="/site/2019/03/16/To-Tune-or-Not-to-Tune-Adapting-Pretrained-Representations-to-Diverse-Tasks.html">To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-02-05T00:00:00-05:00" itemprop="datePublished">February 05, 2019</time></span> &raquo; <a href="/site/2019/02/05/Linguistic-Knowledge-as-Memory-for-Recurrent-Neural-Networks.html">Linguistic Knowledge as Memory for Recurrent Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-28T00:00:00-05:00" itemprop="datePublished">November 28, 2017</time></span> &raquo; <a href="/site/2017/11/28/Two-Stage-Synthesis-Networks-for-Transfer-Learning-in-Machine-Comprehension.html">Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-15T00:00:00-04:00" itemprop="datePublished">October 15, 2017</time></span> &raquo; <a href="/site/2017/10/15/Reading-Wikipedia-to-Answer-Open-Domain-Questions.html">Reading Wikipedia to Answer Open-Domain Questions</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-08-07T00:00:00-04:00" itemprop="datePublished">August 07, 2017</time></span> &raquo; <a href="/site/2017/08/07/R-NET-Machine-Reading-Comprehension-with-Self-matching-Networks.html">R-NET - Machine Reading Comprehension with Self-matching Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-24T00:00:00-04:00" itemprop="datePublished">July 24, 2017</time></span> &raquo; <a href="/site/2017/07/24/ReasoNet-Learning-to-Stop-Reading-in-Machine-Comprehension.html">ReasoNet - Learning to Stop Reading in Machine Comprehension</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-09T00:00:00-04:00" itemprop="datePublished">July 09, 2017</time></span> &raquo; <a href="/site/2017/07/09/Ask-Me-Anything-Dynamic-Memory-Networks-for-Natural-Language-Processing.html">Ask Me Anything -  Dynamic Memory Networks for Natural Language Processing</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="RL">RL</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-29T00:00:00-04:00" itemprop="datePublished">March 29, 2021</time></span> &raquo; <a href="/site/2021/03/29/Synthesized-Policies-for-Transfer-and-Adaptation-across-Tasks-and-Environments.html">Synthesized Policies for Transfer and Adaptation across Tasks and Environments</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-02T00:00:00-05:00" itemprop="datePublished">November 02, 2020</time></span> &raquo; <a href="/site/2020/11/02/One-Solution-is-Not-All-You-Need-Few-Shot-Extrapolation-via-Structured-MaxEnt-RL.html">One Solution is Not All You Need - Few-Shot Extrapolation via Structured MaxEnt RL</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-07T00:00:00-04:00" itemprop="datePublished">September 07, 2020</time></span> &raquo; <a href="/site/2020/09/07/Revisiting-Fundamentals-of-Experience-Replay.html">Revisiting Fundamentals of Experience Replay</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-31T00:00:00-04:00" itemprop="datePublished">August 31, 2020</time></span> &raquo; <a href="/site/2020/08/31/Deep-Reinforcement-Learning-and-the-Deadly-Triad.html">Deep Reinforcement Learning and the Deadly Triad</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-09T00:00:00-04:00" itemprop="datePublished">July 09, 2020</time></span> &raquo; <a href="/site/2020/07/09/Decentralized-Reinforcement-Learning-Global-Decision-Making-via-Local-Economic-Transactions.html">Decentralized Reinforcement Learning -- Global Decision-Making via Local Economic Transactions</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-02T00:00:00-04:00" itemprop="datePublished">July 02, 2020</time></span> &raquo; <a href="/site/2020/07/02/When-to-use-parametric-models-in-reinforcement-learning.html">When to use parametric models in reinforcement learning?</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-06-25T00:00:00-04:00" itemprop="datePublished">June 25, 2020</time></span> &raquo; <a href="/site/2020/06/25/Network-Randomization-A-Simple-Technique-for-Generalization-in-Deep-Reinforcement-Learning.html">Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-04-09T00:00:00-04:00" itemprop="datePublished">April 09, 2020</time></span> &raquo; <a href="/site/2020/04/09/CURL-Contrastive-Unsupervised-Representations-for-Reinforcement-Learning.html">CURL - Contrastive Unsupervised Representations for Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-23T00:00:00-05:00" itemprop="datePublished">January 23, 2020</time></span> &raquo; <a href="/site/2020/01/23/Observational-Overfitting-in-Reinforcement-Learning.html">Observational Overfitting in Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-26T00:00:00-05:00" itemprop="datePublished">December 26, 2019</time></span> &raquo; <a href="/site/2019/12/26/Towards-a-Unified-Theory-of-State-Abstraction-for-MDPs.html">Towards a Unified Theory of State Abstraction for MDPs</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-05T00:00:00-05:00" itemprop="datePublished">December 05, 2019</time></span> &raquo; <a href="/site/2019/12/05/Mastering-Atari,-Go,-Chess-and-Shogi-by-Planning-with-a-Learned-Model.html">Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-09-12T00:00:00-04:00" itemprop="datePublished">September 12, 2019</time></span> &raquo; <a href="/site/2019/09/12/Gossip-based-Actor-Learner-Architectures-for-Deep-RL.html">Gossip based Actor-Learner Architectures for Deep RL</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-08T00:00:00-04:00" itemprop="datePublished">August 08, 2019</time></span> &raquo; <a href="/site/2019/08/08/Deep-Reinforcement-Learning-in-a-Handful-of-Trials-using-Probabilistic-Dynamics-Models.html">Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-01T00:00:00-04:00" itemprop="datePublished">August 01, 2019</time></span> &raquo; <a href="/site/2019/08/01/Assessing-Generalization-in-Deep-Reinforcement-Learning.html">Assessing Generalization in Deep Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-07-25T00:00:00-04:00" itemprop="datePublished">July 25, 2019</time></span> &raquo; <a href="/site/2019/07/25/Quantifying-Generalization-in-Reinforcement-Learning.html">Quantifying Generalization in Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-13T00:00:00-04:00" itemprop="datePublished">June 13, 2019</time></span> &raquo; <a href="/site/2019/06/13/Extrapolating-Beyond-Suboptimal-Demonstrations-via-Inverse-Reinforcement-Learning-from-Observations.html">Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-08T00:00:00-04:00" itemprop="datePublished">June 08, 2019</time></span> &raquo; <a href="/site/2019/06/08/Meta-Reinforcement-Learning-of-Structured-Exploration-Strategies.html">Meta-Reinforcement Learning of Structured Exploration Strategies</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-01T00:00:00-04:00" itemprop="datePublished">June 01, 2019</time></span> &raquo; <a href="/site/2019/06/01/Relational-Reinforcement-Learning.html">Relational Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-05-14T00:00:00-04:00" itemprop="datePublished">May 14, 2019</time></span> &raquo; <a href="/site/2019/05/14/Multiple-Model-Based-Reinforcement-Learning.html">Multiple Model-Based Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-12T00:00:00-04:00" itemprop="datePublished">March 12, 2019</time></span> &raquo; <a href="/site/2019/03/12/Model-Primitive-Hierarchical-Lifelong-Reinforcement-Learning.html">Model Primitive Hierarchical Lifelong Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-29T00:00:00-05:00" itemprop="datePublished">January 29, 2019</time></span> &raquo; <a href="/site/2019/01/29/Diversity-is-All-You-Need-Learning-Skills-without-a-Reward-Function.html">Diversity is All You Need - Learning Skills without a Reward Function</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-15T00:00:00-05:00" itemprop="datePublished">January 15, 2019</time></span> &raquo; <a href="/site/2019/01/15/Hierarchical-RL-Using-an-Ensemble-of-Proprioceptive-Periodic-Policies.html">Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-12-18T00:00:00-05:00" itemprop="datePublished">December 18, 2018</time></span> &raquo; <a href="/site/2018/12/18/Hindsight-Experience-Replay.html">Hindsight Experience Replay</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-18T00:00:00-04:00" itemprop="datePublished">October 18, 2018</time></span> &raquo; <a href="/site/2018/10/18/BabyAI-First-Steps-Towards-Grounded-Language-Learning-With-a-Human-In-the-Loop.html">BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-08-08T00:00:00-04:00" itemprop="datePublished">August 08, 2018</time></span> &raquo; <a href="/site/2018/08/08/Imagination-Augmented-Agents-for-Deep-Reinforcement-Learning.html">Imagination-Augmented Agents for Deep Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-11T00:00:00-05:00" itemprop="datePublished">March 11, 2018</time></span> &raquo; <a href="/site/2018/03/11/Improving-Information-Extraction-by-Acquiring-External-Evidence-with-Reinforcement-Learning.html">Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-01T00:00:00-04:00" itemprop="datePublished">October 01, 2017</time></span> &raquo; <a href="/site/2017/10/01/Task-Oriented-Query-Reformulation-with-Reinforcement-Learning.html">Task-Oriented Query Reformulation with Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-08-07T00:00:00-04:00" itemprop="datePublished">August 07, 2017</time></span> &raquo; <a href="/site/2017/08/07/R-NET-Machine-Reading-Comprehension-with-Self-matching-Networks.html">R-NET - Machine Reading Comprehension with Self-matching Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-24T00:00:00-04:00" itemprop="datePublished">July 24, 2017</time></span> &raquo; <a href="/site/2017/07/24/ReasoNet-Learning-to-Stop-Reading-in-Machine-Comprehension.html">ReasoNet - Learning to Stop Reading in Machine Comprehension</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="RNN">RNN</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-02-05T00:00:00-05:00" itemprop="datePublished">February 05, 2019</time></span> &raquo; <a href="/site/2019/02/05/Linguistic-Knowledge-as-Memory-for-Recurrent-Neural-Networks.html">Linguistic Knowledge as Memory for Recurrent Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-11-01T00:00:00-04:00" itemprop="datePublished">November 01, 2018</time></span> &raquo; <a href="/site/2018/11/01/Learned-Optimizers-that-Scale-and-Generalize.html">Learned Optimizers that Scale and Generalize</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-04T00:00:00-04:00" itemprop="datePublished">October 04, 2018</time></span> &raquo; <a href="/site/2018/10/04/When-Recurrent-Models-Don-t-Need-To-Be-Recurrent.html">When Recurrent Models Don’t Need To Be Recurrent</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-07-19T00:00:00-04:00" itemprop="datePublished">July 19, 2018</time></span> &raquo; <a href="/site/2018/07/19/Kronecker-Recurrent-Units.html">Kronecker Recurrent Units</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="RRL">RRL</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-01T00:00:00-04:00" itemprop="datePublished">June 01, 2019</time></span> &raquo; <a href="/site/2019/06/01/Relational-Reinforcement-Learning.html">Relational Reinforcement Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Ranking">Ranking</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-22T00:00:00-04:00" itemprop="datePublished">March 22, 2021</time></span> &raquo; <a href="/site/2021/03/22/Deep-Neural-Networks-for-YouTube-Recommendations.html">Deep Neural Networks for YouTube Recommendations</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Reasoning">Reasoning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-12T00:00:00-05:00" itemprop="datePublished">December 12, 2019</time></span> &raquo; <a href="/site/2019/12/12/Everything-Happens-for-a-Reason-Discovering-the-Purpose-of-Actions-in-Procedural-Text.html">Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-29T00:00:00-04:00" itemprop="datePublished">August 29, 2019</time></span> &raquo; <a href="/site/2019/08/29/PHYRE-A-New-Benchmark-for-Physical-Reasoning.html">PHYRE - A New Benchmark for Physical Reasoning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-15T00:00:00-04:00" itemprop="datePublished">August 15, 2019</time></span> &raquo; <a href="/site/2019/08/15/Abductive-Commonsense-Reasoning.html">Abductive Commonsense Reasoning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Recommender">Recommender</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-22T00:00:00-04:00" itemprop="datePublished">March 22, 2021</time></span> &raquo; <a href="/site/2021/03/22/Deep-Neural-Networks-for-YouTube-Recommendations.html">Deep Neural Networks for YouTube Recommendations</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Recommender+Systems">Recommender Systems</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-22T00:00:00-04:00" itemprop="datePublished">March 22, 2021</time></span> &raquo; <a href="/site/2021/03/22/Deep-Neural-Networks-for-YouTube-Recommendations.html">Deep Neural Networks for YouTube Recommendations</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Recurrent+Neural+Network">Recurrent Neural Network</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-04T00:00:00-04:00" itemprop="datePublished">October 04, 2018</time></span> &raquo; <a href="/site/2018/10/04/When-Recurrent-Models-Don-t-Need-To-Be-Recurrent.html">When Recurrent Models Don’t Need To Be Recurrent</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-07-19T00:00:00-04:00" itemprop="datePublished">July 19, 2018</time></span> &raquo; <a href="/site/2018/07/19/Kronecker-Recurrent-Units.html">Kronecker Recurrent Units</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Reinforcement+Learning">Reinforcement Learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-29T00:00:00-04:00" itemprop="datePublished">March 29, 2021</time></span> &raquo; <a href="/site/2021/03/29/Synthesized-Policies-for-Transfer-and-Adaptation-across-Tasks-and-Environments.html">Synthesized Policies for Transfer and Adaptation across Tasks and Environments</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-02T00:00:00-05:00" itemprop="datePublished">November 02, 2020</time></span> &raquo; <a href="/site/2020/11/02/One-Solution-is-Not-All-You-Need-Few-Shot-Extrapolation-via-Structured-MaxEnt-RL.html">One Solution is Not All You Need - Few-Shot Extrapolation via Structured MaxEnt RL</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-07T00:00:00-04:00" itemprop="datePublished">September 07, 2020</time></span> &raquo; <a href="/site/2020/09/07/Revisiting-Fundamentals-of-Experience-Replay.html">Revisiting Fundamentals of Experience Replay</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-31T00:00:00-04:00" itemprop="datePublished">August 31, 2020</time></span> &raquo; <a href="/site/2020/08/31/Deep-Reinforcement-Learning-and-the-Deadly-Triad.html">Deep Reinforcement Learning and the Deadly Triad</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-09T00:00:00-04:00" itemprop="datePublished">July 09, 2020</time></span> &raquo; <a href="/site/2020/07/09/Decentralized-Reinforcement-Learning-Global-Decision-Making-via-Local-Economic-Transactions.html">Decentralized Reinforcement Learning -- Global Decision-Making via Local Economic Transactions</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-02T00:00:00-04:00" itemprop="datePublished">July 02, 2020</time></span> &raquo; <a href="/site/2020/07/02/When-to-use-parametric-models-in-reinforcement-learning.html">When to use parametric models in reinforcement learning?</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-06-25T00:00:00-04:00" itemprop="datePublished">June 25, 2020</time></span> &raquo; <a href="/site/2020/06/25/Network-Randomization-A-Simple-Technique-for-Generalization-in-Deep-Reinforcement-Learning.html">Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-04-09T00:00:00-04:00" itemprop="datePublished">April 09, 2020</time></span> &raquo; <a href="/site/2020/04/09/CURL-Contrastive-Unsupervised-Representations-for-Reinforcement-Learning.html">CURL - Contrastive Unsupervised Representations for Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-23T00:00:00-05:00" itemprop="datePublished">January 23, 2020</time></span> &raquo; <a href="/site/2020/01/23/Observational-Overfitting-in-Reinforcement-Learning.html">Observational Overfitting in Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-26T00:00:00-05:00" itemprop="datePublished">December 26, 2019</time></span> &raquo; <a href="/site/2019/12/26/Towards-a-Unified-Theory-of-State-Abstraction-for-MDPs.html">Towards a Unified Theory of State Abstraction for MDPs</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-05T00:00:00-05:00" itemprop="datePublished">December 05, 2019</time></span> &raquo; <a href="/site/2019/12/05/Mastering-Atari,-Go,-Chess-and-Shogi-by-Planning-with-a-Learned-Model.html">Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-09-12T00:00:00-04:00" itemprop="datePublished">September 12, 2019</time></span> &raquo; <a href="/site/2019/09/12/Gossip-based-Actor-Learner-Architectures-for-Deep-RL.html">Gossip based Actor-Learner Architectures for Deep RL</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-08T00:00:00-04:00" itemprop="datePublished">August 08, 2019</time></span> &raquo; <a href="/site/2019/08/08/Deep-Reinforcement-Learning-in-a-Handful-of-Trials-using-Probabilistic-Dynamics-Models.html">Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-08-01T00:00:00-04:00" itemprop="datePublished">August 01, 2019</time></span> &raquo; <a href="/site/2019/08/01/Assessing-Generalization-in-Deep-Reinforcement-Learning.html">Assessing Generalization in Deep Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-07-25T00:00:00-04:00" itemprop="datePublished">July 25, 2019</time></span> &raquo; <a href="/site/2019/07/25/Quantifying-Generalization-in-Reinforcement-Learning.html">Quantifying Generalization in Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-13T00:00:00-04:00" itemprop="datePublished">June 13, 2019</time></span> &raquo; <a href="/site/2019/06/13/Extrapolating-Beyond-Suboptimal-Demonstrations-via-Inverse-Reinforcement-Learning-from-Observations.html">Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-08T00:00:00-04:00" itemprop="datePublished">June 08, 2019</time></span> &raquo; <a href="/site/2019/06/08/Meta-Reinforcement-Learning-of-Structured-Exploration-Strategies.html">Meta-Reinforcement Learning of Structured Exploration Strategies</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-01T00:00:00-04:00" itemprop="datePublished">June 01, 2019</time></span> &raquo; <a href="/site/2019/06/01/Relational-Reinforcement-Learning.html">Relational Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-05-14T00:00:00-04:00" itemprop="datePublished">May 14, 2019</time></span> &raquo; <a href="/site/2019/05/14/Multiple-Model-Based-Reinforcement-Learning.html">Multiple Model-Based Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-12T00:00:00-04:00" itemprop="datePublished">March 12, 2019</time></span> &raquo; <a href="/site/2019/03/12/Model-Primitive-Hierarchical-Lifelong-Reinforcement-Learning.html">Model Primitive Hierarchical Lifelong Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-29T00:00:00-05:00" itemprop="datePublished">January 29, 2019</time></span> &raquo; <a href="/site/2019/01/29/Diversity-is-All-You-Need-Learning-Skills-without-a-Reward-Function.html">Diversity is All You Need - Learning Skills without a Reward Function</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-12-18T00:00:00-05:00" itemprop="datePublished">December 18, 2018</time></span> &raquo; <a href="/site/2018/12/18/Hindsight-Experience-Replay.html">Hindsight Experience Replay</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-18T00:00:00-04:00" itemprop="datePublished">October 18, 2018</time></span> &raquo; <a href="/site/2018/10/18/BabyAI-First-Steps-Towards-Grounded-Language-Learning-With-a-Human-In-the-Loop.html">BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Relation+Learning">Relation Learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-12T00:00:00-05:00" itemprop="datePublished">December 12, 2019</time></span> &raquo; <a href="/site/2019/12/12/Everything-Happens-for-a-Reason-Discovering-the-Purpose-of-Actions-in-Procedural-Text.html">Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-07-18T00:00:00-04:00" itemprop="datePublished">July 18, 2019</time></span> &raquo; <a href="/site/2019/07/18/Set-Transformer-A-Framework-for-Attention-based-Permutation-Invariant-Neural-Networks.html">Set Transformer - A Framework for Attention-based Permutation-Invariant Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-27T00:00:00-04:00" itemprop="datePublished">June 27, 2019</time></span> &raquo; <a href="/site/2019/06/27/Measuring-Abstract-Reasoning-in-Neural-Networks.html">Measuring abstract reasoning in neural networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Relational+Inference">Relational Inference</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-17T00:00:00-05:00" itemprop="datePublished">February 17, 2018</time></span> &raquo; <a href="/site/2018/02/17/Neural-Relational-Inference-for-Interacting-Systems.html">Neural Relational Inference for Interacting Systems</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Relational+Learning">Relational Learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-12T00:00:00-05:00" itemprop="datePublished">December 12, 2019</time></span> &raquo; <a href="/site/2019/12/12/Everything-Happens-for-a-Reason-Discovering-the-Purpose-of-Actions-in-Procedural-Text.html">Everything Happens for a Reason - Discovering the Purpose of Actions in Procedural Text</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-11-28T00:00:00-05:00" itemprop="datePublished">November 28, 2019</time></span> &raquo; <a href="/site/2019/11/28/Contrastive-Learning-of-Structured-World-Models.html">Contrastive Learning of Structured World Models</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-07-18T00:00:00-04:00" itemprop="datePublished">July 18, 2019</time></span> &raquo; <a href="/site/2019/07/18/Set-Transformer-A-Framework-for-Attention-based-Permutation-Invariant-Neural-Networks.html">Set Transformer - A Framework for Attention-based Permutation-Invariant Neural Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-27T00:00:00-04:00" itemprop="datePublished">June 27, 2019</time></span> &raquo; <a href="/site/2019/06/27/Measuring-Abstract-Reasoning-in-Neural-Networks.html">Measuring abstract reasoning in neural networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-01T00:00:00-04:00" itemprop="datePublished">June 01, 2019</time></span> &raquo; <a href="/site/2019/06/01/Relational-Reinforcement-Learning.html">Relational Reinforcement Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Relational+Network">Relational Network</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-09-22T00:00:00-04:00" itemprop="datePublished">September 22, 2017</time></span> &raquo; <a href="/site/2017/09/22/Refining-Source-Representations-with-Relation-Networks-for-Neural-Machine-Translation.html">Refining Source Representations with Relation Networks for Neural Machine Translation</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Replay+Buffer">Replay Buffer</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-22T00:00:00-05:00" itemprop="datePublished">February 22, 2021</time></span> &raquo; <a href="/site/2021/02/22/Anatomy-of-Catastrophic-Forgetting-Hidden-Representations-and-Task-Semantics.html">Anatomy of Catastrophic Forgetting - Hidden Representations and Task Semantics</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-18T00:00:00-05:00" itemprop="datePublished">January 18, 2021</time></span> &raquo; <a href="/site/2021/01/18/Energy-based-Models-for-Continual-Learning.html">Energy-based Models for Continual Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-10-12T00:00:00-04:00" itemprop="datePublished">October 12, 2020</time></span> &raquo; <a href="/site/2020/10/12/Remembering-for-the-Right-Reasons-Explanations-Reduce-Catastrophic-Forgetting.html">Remembering for the Right Reasons - Explanations Reduce Catastrophic Forgetting</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-07T00:00:00-04:00" itemprop="datePublished">September 07, 2020</time></span> &raquo; <a href="/site/2020/09/07/Revisiting-Fundamentals-of-Experience-Replay.html">Revisiting Fundamentals of Experience Replay</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-13T00:00:00-05:00" itemprop="datePublished">February 13, 2020</time></span> &raquo; <a href="/site/2020/02/13/Gradient-based-sample-selection-for-online-continual-learning.html">Gradient based sample selection for online continual learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Representation+Analysis">Representation Analysis</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-22T00:00:00-05:00" itemprop="datePublished">February 22, 2021</time></span> &raquo; <a href="/site/2021/02/22/Anatomy-of-Catastrophic-Forgetting-Hidden-Representations-and-Task-Semantics.html">Anatomy of Catastrophic Forgetting - Hidden Representations and Task Semantics</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Representation+Learning">Representation Learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-19T00:00:00-05:00" itemprop="datePublished">December 19, 2019</time></span> &raquo; <a href="/site/2019/12/19/ALBERT-A-Lite-BERT-for-Self-supervised-Learning-of-Language-Representations.html">ALBERT - A Lite BERT for Self-supervised Learning of Language Representations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-29T00:00:00-05:00" itemprop="datePublished">January 29, 2018</time></span> &raquo; <a href="/site/2018/01/29/StarSpace-Embed-All-The-Things.html">StarSpace - Embed All The Things!</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-05T00:00:00-04:00" itemprop="datePublished">November 05, 2017</time></span> &raquo; <a href="/site/2017/11/05/Word-Representations-via-Gaussian-Embedding.html">Word Representations via Gaussian Embedding</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-09-22T00:00:00-04:00" itemprop="datePublished">September 22, 2017</time></span> &raquo; <a href="/site/2017/09/22/Refining-Source-Representations-with-Relation-Networks-for-Neural-Machine-Translation.html">Refining Source Representations with Relation Networks for Neural Machine Translation</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Robustness">Robustness</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-27T00:00:00-05:00" itemprop="datePublished">February 27, 2020</time></span> &raquo; <a href="/site/2020/02/27/mixup-Beyond-Empirical-Risk-Minimization.html">mixup - Beyond Empirical Risk Minimization</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-06T00:00:00-05:00" itemprop="datePublished">February 06, 2020</time></span> &raquo; <a href="/site/2020/02/06/Your-Classifier-is-Secretly-an-Energy-Based-Model,-and-You-Should-Treat-it-Like-One.html">Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="SAT">SAT</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-24T00:00:00-05:00" itemprop="datePublished">February 24, 2018</time></span> &raquo; <a href="/site/2018/02/24/Learning-a-SAT-Solver-from-Single-Bit-Supervision.html">Learning an SAT Solver from Single-Bit Supervision</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="SGD">SGD</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-16T00:00:00-04:00" itemprop="datePublished">July 16, 2020</time></span> &raquo; <a href="/site/2020/07/16/Averaging-Weights-leads-to-Wider-Optima-and-Better-Generalization.html">Averaging Weights leads to Wider Optima and Better Generalization</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="SOTA">SOTA</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-19T00:00:00-05:00" itemprop="datePublished">December 19, 2019</time></span> &raquo; <a href="/site/2019/12/19/ALBERT-A-Lite-BERT-for-Self-supervised-Learning-of-Language-Representations.html">ALBERT - A Lite BERT for Self-supervised Learning of Language Representations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-05-06T00:00:00-04:00" itemprop="datePublished">May 06, 2018</time></span> &raquo; <a href="/site/2018/05/06/Learning-to-Count-Objects-in-Natural-Images-for-Visual-Question-Answering.html">Learning to Count Objects in Natural Images for Visual Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-05T00:00:00-05:00" itemprop="datePublished">February 05, 2018</time></span> &raquo; <a href="/site/2018/02/05/Get-To-The-Point-Summarization-with-Pointer-Generator-Networks.html">Get To The Point - Summarization with Pointer-Generator Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-28T00:00:00-04:00" itemprop="datePublished">October 28, 2017</time></span> &raquo; <a href="/site/2017/10/28/HARP-Hierarchical-Representation-Learning-for-Networks.html">HARP - Hierarchical Representation Learning for Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-22T00:00:00-04:00" itemprop="datePublished">October 22, 2017</time></span> &raquo; <a href="/site/2017/10/22/Swish-A-self-gated-activation-function.html">Swish - a Self-Gated Activation Function</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-08-07T00:00:00-04:00" itemprop="datePublished">August 07, 2017</time></span> &raquo; <a href="/site/2017/08/07/R-NET-Machine-Reading-Comprehension-with-Self-matching-Networks.html">R-NET - Machine Reading Comprehension with Self-matching Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-24T00:00:00-04:00" itemprop="datePublished">July 24, 2017</time></span> &raquo; <a href="/site/2017/07/24/ReasoNet-Learning-to-Stop-Reading-in-Machine-Comprehension.html">ReasoNet - Learning to Stop Reading in Machine Comprehension</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-09T00:00:00-04:00" itemprop="datePublished">July 09, 2017</time></span> &raquo; <a href="/site/2017/07/09/Ask-Me-Anything-Dynamic-Memory-Networks-for-Natural-Language-Processing.html">Ask Me Anything -  Dynamic Memory Networks for Natural Language Processing</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-06-17T00:00:00-04:00" itemprop="datePublished">June 17, 2017</time></span> &raquo; <a href="/site/2017/06/17/A-Decomposable-Attention-Model-for-Natural-Language-Inference.html">A Decomposable Attention Model for Natural Language Inference</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="SSL">SSL</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-23T00:00:00-05:00" itemprop="datePublished">November 23, 2020</time></span> &raquo; <a href="/site/2020/11/23/Exploring-Simple-Siamese-Representation-Learning.html">Exploring Simple Siamese Representation Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="SSO">SSO</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-16T00:00:00-05:00" itemprop="datePublished">November 16, 2020</time></span> &raquo; <a href="/site/2020/11/16/Data-Management-for-Internet-Scale-Single-Sign-On.html">Data Management for Internet-Scale Single-Sign-On</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="SWA">SWA</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-16T00:00:00-04:00" itemprop="datePublished">July 16, 2020</time></span> &raquo; <a href="/site/2020/07/16/Averaging-Weights-leads-to-Wider-Optima-and-Better-Generalization.html">Averaging Weights leads to Wider Optima and Better Generalization</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Sample+Efficient">Sample Efficient</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-04-09T00:00:00-04:00" itemprop="datePublished">April 09, 2020</time></span> &raquo; <a href="/site/2020/04/09/CURL-Contrastive-Unsupervised-Representations-for-Reinforcement-Learning.html">CURL - Contrastive Unsupervised Representations for Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-12-18T00:00:00-05:00" itemprop="datePublished">December 18, 2018</time></span> &raquo; <a href="/site/2018/12/18/Hindsight-Experience-Replay.html">Hindsight Experience Replay</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Scale">Scale</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-22T00:00:00-04:00" itemprop="datePublished">March 22, 2021</time></span> &raquo; <a href="/site/2021/03/22/Deep-Neural-Networks-for-YouTube-Recommendations.html">Deep Neural Networks for YouTube Recommendations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-15T00:00:00-04:00" itemprop="datePublished">March 15, 2021</time></span> &raquo; <a href="/site/2021/03/15/The-Tail-at-Scale.html">The Tail at Scale</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-08T00:00:00-05:00" itemprop="datePublished">March 08, 2021</time></span> &raquo; <a href="/site/2021/03/08/Practical-Lessons-from-Predicting-Clicks-on-Ads-at-Facebook.html">Practical Lessons from Predicting Clicks on Ads at Facebook</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-01T00:00:00-05:00" itemprop="datePublished">March 01, 2021</time></span> &raquo; <a href="/site/2021/03/01/Ad-Click-Prediction-a-View-from-the-Trenches.html">Ad Click Prediction - a View from the Trenches</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-11T00:00:00-05:00" itemprop="datePublished">January 11, 2021</time></span> &raquo; <a href="/site/2021/01/11/GPipe-Easy-Scaling-with-Micro-Batch-Pipeline-Parallelism.html">GPipe - Easy Scaling with Micro-Batch Pipeline Parallelism</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-21T00:00:00-05:00" itemprop="datePublished">December 21, 2020</time></span> &raquo; <a href="/site/2020/12/21/Design-patterns-for-container-based-distributed-systems.html">Design patterns for container-based distributed systems</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-14T00:00:00-05:00" itemprop="datePublished">December 14, 2020</time></span> &raquo; <a href="/site/2020/12/14/Cassandra-a-decentralized-structured-storage-system.html">Cassandra - a decentralized structured storage system</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-07T00:00:00-05:00" itemprop="datePublished">December 07, 2020</time></span> &raquo; <a href="/site/2020/12/07/CAP-twelve-years-later-How-the-rules-have-changed.html">CAP twelve years later - How the rules have changed</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-30T00:00:00-05:00" itemprop="datePublished">November 30, 2020</time></span> &raquo; <a href="/site/2020/11/30/Consistency-Tradeoffs-in-Modern-Distributed-Database-System-Design.html">Consistency Tradeoffs in Modern Distributed Database System Design</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-16T00:00:00-05:00" itemprop="datePublished">November 16, 2020</time></span> &raquo; <a href="/site/2020/11/16/Data-Management-for-Internet-Scale-Single-Sign-On.html">Data Management for Internet-Scale Single-Sign-On</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-21T00:00:00-04:00" itemprop="datePublished">September 21, 2020</time></span> &raquo; <a href="/site/2020/09/21/Harvest,-Yield,-and-Scalable-Tolerant-Systems.html">Harvest, Yield, and Scalable Tolerant Systems</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-30T00:00:00-05:00" itemprop="datePublished">January 30, 2020</time></span> &raquo; <a href="/site/2020/01/30/Massively-Multilingual-Neural-Machine-Translation-in-the-Wild-Findings-and-Challenges.html">Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Science">Science</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-19T00:00:00-05:00" itemprop="datePublished">November 19, 2017</time></span> &raquo; <a href="/site/2017/11/19/Higher-order-organization-of-complex-networks.html">Higher-order organization of complex networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-12T00:00:00-05:00" itemprop="datePublished">November 12, 2017</time></span> &raquo; <a href="/site/2017/11/12/Network-Motifs-Simple-Building-Blocks-of-Complex-Networks.html">Network Motifs - Simple Building Blocks of Complex Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Science+2002">Science 2002</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-12T00:00:00-05:00" itemprop="datePublished">November 12, 2017</time></span> &raquo; <a href="/site/2017/11/12/Network-Motifs-Simple-Building-Blocks-of-Complex-Networks.html">Network Motifs - Simple Building Blocks of Complex Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Science+2016">Science 2016</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-19T00:00:00-05:00" itemprop="datePublished">November 19, 2017</time></span> &raquo; <a href="/site/2017/11/19/Higher-order-organization-of-complex-networks.html">Higher-order organization of complex networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Self+Gated">Self Gated</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-10-22T00:00:00-04:00" itemprop="datePublished">October 22, 2017</time></span> &raquo; <a href="/site/2017/10/22/Swish-A-self-gated-activation-function.html">Swish - a Self-Gated Activation Function</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Self+Supervised">Self Supervised</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-23T00:00:00-05:00" itemprop="datePublished">November 23, 2020</time></span> &raquo; <a href="/site/2020/11/23/Exploring-Simple-Siamese-Representation-Learning.html">Exploring Simple Siamese Representation Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-04-09T00:00:00-04:00" itemprop="datePublished">April 09, 2020</time></span> &raquo; <a href="/site/2020/04/09/CURL-Contrastive-Unsupervised-Representations-for-Reinforcement-Learning.html">CURL - Contrastive Unsupervised Representations for Reinforcement Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Semantic+Loss">Semantic Loss</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-08-21T00:00:00-04:00" itemprop="datePublished">August 21, 2018</time></span> &raquo; <a href="/site/2018/08/21/A-Semantic-Loss-Function-for-Deep-Learning-with-Symbolic-Knowledge.html">A Semantic Loss Function for Deep Learning with Symbolic Knowledge</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Sentiment+Analysis">Sentiment Analysis</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-09T00:00:00-04:00" itemprop="datePublished">July 09, 2017</time></span> &raquo; <a href="/site/2017/07/09/Ask-Me-Anything-Dynamic-Memory-Networks-for-Natural-Language-Processing.html">Ask Me Anything -  Dynamic Memory Networks for Natural Language Processing</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Seq2Seq">Seq2Seq</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-08-27T00:00:00-04:00" itemprop="datePublished">August 27, 2017</time></span> &raquo; <a href="/site/2017/08/27/Pointer-Networks.html">Pointer Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Sequential+models">Sequential models</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-05-21T00:00:00-04:00" itemprop="datePublished">May 21, 2019</time></span> &raquo; <a href="/site/2019/05/21/Good-Enough-Compositional-Data-Augmentation.html">Good-Enough Compositional Data Augmentation</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Set">Set</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-07-18T00:00:00-04:00" itemprop="datePublished">July 18, 2019</time></span> &raquo; <a href="/site/2019/07/18/Set-Transformer-A-Framework-for-Attention-based-Permutation-Invariant-Neural-Networks.html">Set Transformer - A Framework for Attention-based Permutation-Invariant Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Siamese">Siamese</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-23T00:00:00-05:00" itemprop="datePublished">November 23, 2020</time></span> &raquo; <a href="/site/2020/11/23/Exploring-Simple-Siamese-Representation-Learning.html">Exploring Simple Siamese Representation Learning</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Softmax">Softmax</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-08-27T00:00:00-04:00" itemprop="datePublished">August 27, 2017</time></span> &raquo; <a href="/site/2017/08/27/Pointer-Networks.html">Pointer Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-17T00:00:00-04:00" itemprop="datePublished">July 17, 2017</time></span> &raquo; <a href="/site/2017/07/17/Principled-Detection-of-Out-of-Distribution-Examples-in-Neural-Networks.html">Principled Detection of Out-of-Distribution Examples in Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Software">Software</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-14T00:00:00-05:00" itemprop="datePublished">December 14, 2020</time></span> &raquo; <a href="/site/2020/12/14/Cassandra-a-decentralized-structured-storage-system.html">Cassandra - a decentralized structured storage system</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-30T00:00:00-05:00" itemprop="datePublished">November 30, 2020</time></span> &raquo; <a href="/site/2020/11/30/Consistency-Tradeoffs-in-Modern-Distributed-Database-System-Design.html">Consistency Tradeoffs in Modern Distributed Database System Design</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-16T00:00:00-05:00" itemprop="datePublished">November 16, 2020</time></span> &raquo; <a href="/site/2020/11/16/Data-Management-for-Internet-Scale-Single-Sign-On.html">Data Management for Internet-Scale Single-Sign-On</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-09T00:00:00-05:00" itemprop="datePublished">November 09, 2020</time></span> &raquo; <a href="/site/2020/11/09/Searching-for-Build-Debt-Experiences-Managing-Technical-Debt-at-Google.html">Searching for Build Debt - Experiences Managing Technical Debt at Google</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Software+Engineering">Software Engineering</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-21T00:00:00-05:00" itemprop="datePublished">December 21, 2020</time></span> &raquo; <a href="/site/2020/12/21/Design-patterns-for-container-based-distributed-systems.html">Design patterns for container-based distributed systems</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-14T00:00:00-05:00" itemprop="datePublished">December 14, 2020</time></span> &raquo; <a href="/site/2020/12/14/Cassandra-a-decentralized-structured-storage-system.html">Cassandra - a decentralized structured storage system</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-30T00:00:00-05:00" itemprop="datePublished">November 30, 2020</time></span> &raquo; <a href="/site/2020/11/30/Consistency-Tradeoffs-in-Modern-Distributed-Database-System-Design.html">Consistency Tradeoffs in Modern Distributed Database System Design</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-16T00:00:00-05:00" itemprop="datePublished">November 16, 2020</time></span> &raquo; <a href="/site/2020/11/16/Data-Management-for-Internet-Scale-Single-Sign-On.html">Data Management for Internet-Scale Single-Sign-On</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-09T00:00:00-05:00" itemprop="datePublished">November 09, 2020</time></span> &raquo; <a href="/site/2020/11/09/Searching-for-Build-Debt-Experiences-Managing-Technical-Debt-at-Google.html">Searching for Build Debt - Experiences Managing Technical Debt at Google</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Speech">Speech</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-07-01T00:00:00-04:00" itemprop="datePublished">July 01, 2017</time></span> &raquo; <a href="/site/2017/07/01/One-Model-To-Learn-Them-All.html">One Model To Learn Them All</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="State+Abstraction">State Abstraction</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-26T00:00:00-05:00" itemprop="datePublished">December 26, 2019</time></span> &raquo; <a href="/site/2019/12/26/Towards-a-Unified-Theory-of-State-Abstraction-for-MDPs.html">Towards a Unified Theory of State Abstraction for MDPs</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Stochastic+Gradient+Descent">Stochastic Gradient Descent</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-16T00:00:00-04:00" itemprop="datePublished">July 16, 2020</time></span> &raquo; <a href="/site/2020/07/16/Averaging-Weights-leads-to-Wider-Optima-and-Better-Generalization.html">Averaging Weights leads to Wider Optima and Better Generalization</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Structured+Exploration">Structured Exploration</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-06-08T00:00:00-04:00" itemprop="datePublished">June 08, 2019</time></span> &raquo; <a href="/site/2019/06/08/Meta-Reinforcement-Learning-of-Structured-Exploration-Strategies.html">Meta-Reinforcement Learning of Structured Exploration Strategies</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Summarization">Summarization</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-05T00:00:00-05:00" itemprop="datePublished">February 05, 2018</time></span> &raquo; <a href="/site/2018/02/05/Get-To-The-Point-Summarization-with-Pointer-Generator-Networks.html">Get To The Point - Summarization with Pointer-Generator Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Symbolic+Knowledge">Symbolic Knowledge</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-08-21T00:00:00-04:00" itemprop="datePublished">August 21, 2018</time></span> &raquo; <a href="/site/2018/08/21/A-Semantic-Loss-Function-for-Deep-Learning-with-Symbolic-Knowledge.html">A Semantic Loss Function for Deep Learning with Symbolic Knowledge</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Synchronous+SGD">Synchronous SGD</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-01-09T00:00:00-05:00" itemprop="datePublished">January 09, 2020</time></span> &raquo; <a href="/site/2020/01/09/Accurate-Large-Minibatch-SGD-Training-ImageNet-in-1-Hour.html">Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Systems">Systems</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-22T00:00:00-04:00" itemprop="datePublished">March 22, 2021</time></span> &raquo; <a href="/site/2021/03/22/Deep-Neural-Networks-for-YouTube-Recommendations.html">Deep Neural Networks for YouTube Recommendations</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-15T00:00:00-04:00" itemprop="datePublished">March 15, 2021</time></span> &raquo; <a href="/site/2021/03/15/The-Tail-at-Scale.html">The Tail at Scale</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-08T00:00:00-05:00" itemprop="datePublished">March 08, 2021</time></span> &raquo; <a href="/site/2021/03/08/Practical-Lessons-from-Predicting-Clicks-on-Ads-at-Facebook.html">Practical Lessons from Predicting Clicks on Ads at Facebook</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-01T00:00:00-05:00" itemprop="datePublished">March 01, 2021</time></span> &raquo; <a href="/site/2021/03/01/Ad-Click-Prediction-a-View-from-the-Trenches.html">Ad Click Prediction - a View from the Trenches</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-01-11T00:00:00-05:00" itemprop="datePublished">January 11, 2021</time></span> &raquo; <a href="/site/2021/01/11/GPipe-Easy-Scaling-with-Micro-Batch-Pipeline-Parallelism.html">GPipe - Easy Scaling with Micro-Batch Pipeline Parallelism</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-21T00:00:00-05:00" itemprop="datePublished">December 21, 2020</time></span> &raquo; <a href="/site/2020/12/21/Design-patterns-for-container-based-distributed-systems.html">Design patterns for container-based distributed systems</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-14T00:00:00-05:00" itemprop="datePublished">December 14, 2020</time></span> &raquo; <a href="/site/2020/12/14/Cassandra-a-decentralized-structured-storage-system.html">Cassandra - a decentralized structured storage system</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-07T00:00:00-05:00" itemprop="datePublished">December 07, 2020</time></span> &raquo; <a href="/site/2020/12/07/CAP-twelve-years-later-How-the-rules-have-changed.html">CAP twelve years later - How the rules have changed</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-30T00:00:00-05:00" itemprop="datePublished">November 30, 2020</time></span> &raquo; <a href="/site/2020/11/30/Consistency-Tradeoffs-in-Modern-Distributed-Database-System-Design.html">Consistency Tradeoffs in Modern Distributed Database System Design</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-16T00:00:00-05:00" itemprop="datePublished">November 16, 2020</time></span> &raquo; <a href="/site/2020/11/16/Data-Management-for-Internet-Scale-Single-Sign-On.html">Data Management for Internet-Scale Single-Sign-On</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-09T00:00:00-05:00" itemprop="datePublished">November 09, 2020</time></span> &raquo; <a href="/site/2020/11/09/Searching-for-Build-Debt-Experiences-Managing-Technical-Debt-at-Google.html">Searching for Build Debt - Experiences Managing Technical Debt at Google</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Technical+Debt">Technical Debt</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-09T00:00:00-05:00" itemprop="datePublished">November 09, 2020</time></span> &raquo; <a href="/site/2020/11/09/Searching-for-Build-Debt-Experiences-Managing-Technical-Debt-at-Google.html">Searching for Build Debt - Experiences Managing Technical Debt at Google</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Text-to-Text+Transformer">Text-to-Text Transformer</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-01T00:00:00-05:00" itemprop="datePublished">February 01, 2021</time></span> &raquo; <a href="/site/2021/02/01/Zero-shot-Learning-by-Generating-Task-specific-Adapters.html">Zero-shot Learning by Generating Task-specific Adapters</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Theory">Theory</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-28T00:00:00-04:00" itemprop="datePublished">September 28, 2020</time></span> &raquo; <a href="/site/2020/09/28/A-Foliated-View-of-Transfer-Learning.html">A Foliated View of Transfer Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-04T00:00:00-04:00" itemprop="datePublished">October 04, 2018</time></span> &raquo; <a href="/site/2018/10/04/When-Recurrent-Models-Don-t-Need-To-Be-Recurrent.html">When Recurrent Models Don’t Need To Be Recurrent</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Transfer+Learning">Transfer Learning</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-03-29T00:00:00-04:00" itemprop="datePublished">March 29, 2021</time></span> &raquo; <a href="/site/2021/03/29/Synthesized-Policies-for-Transfer-and-Adaptation-across-Tasks-and-Environments.html">Synthesized Policies for Transfer and Adaptation across Tasks and Environments</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-28T00:00:00-04:00" itemprop="datePublished">September 28, 2020</time></span> &raquo; <a href="/site/2020/09/28/A-Foliated-View-of-Transfer-Learning.html">A Foliated View of Transfer Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-08-24T00:00:00-04:00" itemprop="datePublished">August 24, 2020</time></span> &raquo; <a href="/site/2020/08/24/Alpha-Net-Adaptation-with-Composition-in-Classifier-Space.html">Alpha Net--Adaptation with Composition in Classifier Space</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-06-18T00:00:00-04:00" itemprop="datePublished">June 18, 2020</time></span> &raquo; <a href="/site/2020/06/18/On-the-Difficulty-of-Warm-Starting-Neural-Network-Training.html">On the Difficulty of Warm-Starting Neural Network Training</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-03-16T00:00:00-04:00" itemprop="datePublished">March 16, 2019</time></span> &raquo; <a href="/site/2019/03/16/To-Tune-or-Not-to-Tune-Adapting-Pretrained-Representations-to-Diverse-Tasks.html">To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-06T00:00:00-05:00" itemprop="datePublished">January 06, 2018</time></span> &raquo; <a href="/site/2018/01/06/How-transferable-are-features-in-deep-neural-networks.html">How transferable are features in deep neural networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-28T00:00:00-05:00" itemprop="datePublished">November 28, 2017</time></span> &raquo; <a href="/site/2017/11/28/Two-Stage-Synthesis-Networks-for-Transfer-Learning-in-Machine-Comprehension.html">Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Transformer">Transformer</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-01T00:00:00-05:00" itemprop="datePublished">February 01, 2021</time></span> &raquo; <a href="/site/2021/02/01/Zero-shot-Learning-by-Generating-Task-specific-Adapters.html">Zero-shot Learning by Generating Task-specific Adapters</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-02-20T00:00:00-05:00" itemprop="datePublished">February 20, 2020</time></span> &raquo; <a href="/site/2020/02/20/ELECTRA-Pre-training-Text-Encoders-as-Discriminators-Rather-Than-Generators.html">ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-12-19T00:00:00-05:00" itemprop="datePublished">December 19, 2019</time></span> &raquo; <a href="/site/2019/12/19/ALBERT-A-Lite-BERT-for-Self-supervised-Learning-of-Language-Representations.html">ALBERT - A Lite BERT for Self-supervised Learning of Language Representations</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Tree">Tree</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-10-11T00:00:00-04:00" itemprop="datePublished">October 11, 2018</time></span> &raquo; <a href="/site/2018/10/11/Poincare-Embeddings-for-Learning-Hierarchical-Representations.html">Poincaré Embeddings for Learning Hierarchical Representations</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Tucker+Decomposition">Tucker Decomposition</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-02-19T00:00:00-05:00" itemprop="datePublished">February 19, 2019</time></span> &raquo; <a href="/site/2019/02/19/TuckER-Tensor-Factorization-for-Knowledge-Graph-Completion.html">TuckER - Tensor Factorization for Knowledge Graph Completion</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="UAI">UAI</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-16T00:00:00-04:00" itemprop="datePublished">July 16, 2020</time></span> &raquo; <a href="/site/2020/07/16/Averaging-Weights-leads-to-Wider-Optima-and-Better-Generalization.html">Averaging Weights leads to Wider Optima and Better Generalization</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="UAI+2018">UAI 2018</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-07-16T00:00:00-04:00" itemprop="datePublished">July 16, 2020</time></span> &raquo; <a href="/site/2020/07/16/Averaging-Weights-leads-to-Wider-Optima-and-Better-Generalization.html">Averaging Weights leads to Wider Optima and Better Generalization</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="USENIX">USENIX</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-12-21T00:00:00-05:00" itemprop="datePublished">December 21, 2020</time></span> &raquo; <a href="/site/2020/12/21/Design-patterns-for-container-based-distributed-systems.html">Design patterns for container-based distributed systems</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-16T00:00:00-05:00" itemprop="datePublished">November 16, 2020</time></span> &raquo; <a href="/site/2020/11/16/Data-Management-for-Internet-Scale-Single-Sign-On.html">Data Management for Internet-Scale Single-Sign-On</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Unsupervised">Unsupervised</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-11-23T00:00:00-05:00" itemprop="datePublished">November 23, 2020</time></span> &raquo; <a href="/site/2020/11/23/Exploring-Simple-Siamese-Representation-Learning.html">Exploring Simple Siamese Representation Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-09-14T00:00:00-04:00" itemprop="datePublished">September 14, 2020</time></span> &raquo; <a href="/site/2020/09/14/MONet-Unsupervised-Scene-Decomposition-and-Representation.html">MONet - Unsupervised Scene Decomposition and Representation</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2020-04-09T00:00:00-04:00" itemprop="datePublished">April 09, 2020</time></span> &raquo; <a href="/site/2020/04/09/CURL-Contrastive-Unsupervised-Representations-for-Reinforcement-Learning.html">CURL - Contrastive Unsupervised Representations for Reinforcement Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-04-02T00:00:00-04:00" itemprop="datePublished">April 02, 2019</time></span> &raquo; <a href="/site/2019/04/02/Meta-Learning-Update-Rules-for-Unsupervised-Representation-Learning.html">Meta-Learning Update Rules for Unsupervised Representation Learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-01-29T00:00:00-05:00" itemprop="datePublished">January 29, 2019</time></span> &raquo; <a href="/site/2019/01/29/Diversity-is-All-You-Need-Learning-Skills-without-a-Reward-Function.html">Diversity is All You Need - Learning Skills without a Reward Function</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-04-02T00:00:00-04:00" itemprop="datePublished">April 02, 2018</time></span> &raquo; <a href="/site/2018/04/02/Unsupervised-Learning-By-Predicting-Noise.html">Unsupervised Learning by Predicting Noise</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="VAE">VAE</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-17T00:00:00-05:00" itemprop="datePublished">February 17, 2018</time></span> &raquo; <a href="/site/2018/02/17/Neural-Relational-Inference-for-Interacting-Systems.html">Neural Relational Inference for Interacting Systems</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="VQA">VQA</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-05-06T00:00:00-04:00" itemprop="datePublished">May 06, 2018</time></span> &raquo; <a href="/site/2018/05/06/Learning-to-Count-Objects-in-Natural-Images-for-Visual-Question-Answering.html">Learning to Count Objects in Natural Images for Visual Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-14T00:00:00-05:00" itemprop="datePublished">January 14, 2018</time></span> &raquo; <a href="/site/2018/01/14/Exploring-Models-and-Data-for-Image-Question-Answering.html">Exploring Models and Data for Image Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-05-23T00:00:00-04:00" itemprop="datePublished">May 23, 2017</time></span> &raquo; <a href="/site/2017/05/23/Neural-Module-Networks.html">Neural Module Networks</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-05-14T00:00:00-04:00" itemprop="datePublished">May 14, 2017</time></span> &raquo; <a href="/site/2017/05/14/Making-the-V-in-VQA-Matter-Elevating-the-Role-of-Image-Understanding-in-Visual-Question-Answering.html">Making the V in VQA Matter - Elevating the Role of Image Understanding in Visual Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-04-28T00:00:00-04:00" itemprop="datePublished">April 28, 2017</time></span> &raquo; <a href="/site/2017/04/28/Simple-Baseline-for-Visual-Question-Answering.html">Simple Baseline for Visual Question Answering</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-04-27T00:00:00-04:00" itemprop="datePublished">April 27, 2017</time></span> &raquo; <a href="/site/2017/04/27/VQA-Visual-Question-Answering.html">VQA-Visual Question Answering</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Virtual+Embodiment">Virtual Embodiment</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-09-27T00:00:00-04:00" itemprop="datePublished">September 27, 2018</time></span> &raquo; <a href="/site/2018/09/27/HoME-a-Household-Multimodal-Environment.html">HoME - a Household Multimodal Environment</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="WACV">WACV</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-18T00:00:00-04:00" itemprop="datePublished">March 18, 2018</time></span> &raquo; <a href="/site/2018/03/18/Cyclical-Learning-Rates-for-Training-Neural-Networks.html">Cyclical Learning Rates for Training Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="WACV+2017">WACV 2017</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-03-18T00:00:00-04:00" itemprop="datePublished">March 18, 2018</time></span> &raquo; <a href="/site/2018/03/18/Cyclical-Learning-Rates-for-Training-Neural-Networks.html">Cyclical Learning Rates for Training Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Weight+Adaptation">Weight Adaptation</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-07-04T00:00:00-04:00" itemprop="datePublished">July 04, 2018</time></span> &raquo; <a href="/site/2018/07/04/Memory-Based-Parameter-Adaption.html">Memory-based Parameter Adaptation</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Word+Vectors">Word Vectors</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-01-29T00:00:00-05:00" itemprop="datePublished">January 29, 2018</time></span> &raquo; <a href="/site/2018/01/29/StarSpace-Embed-All-The-Things.html">StarSpace - Embed All The Things!</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-11-05T00:00:00-04:00" itemprop="datePublished">November 05, 2017</time></span> &raquo; <a href="/site/2017/11/05/Word-Representations-via-Gaussian-Embedding.html">Word Representations via Gaussian Embedding</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2017-06-26T00:00:00-04:00" itemprop="datePublished">June 26, 2017</time></span> &raquo; <a href="/site/2017/06/26/Two-Too-Simple-Adaptations-of-Word2Vec-for-Syntax-Problems.html">Two/Too Simple Adaptations of Word2Vec for Syntax Problems</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Workshop">Workshop</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2019-04-09T00:00:00-04:00" itemprop="datePublished">April 09, 2019</time></span> &raquo; <a href="/site/2019/04/09/Towards-a-natural-benchmark-for-continual-learning.html">Towards a natural benchmark for continual learning</a></li>
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2018-02-11T00:00:00-05:00" itemprop="datePublished">February 11, 2018</time></span> &raquo; <a href="/site/2018/02/11/Stylistic-Transfer-in-Natural-Language-Generation-Systems-Using-Recurrent-Neural-Networks.html">Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Zero+Shot+Generalization">Zero Shot Generalization</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-01T00:00:00-05:00" itemprop="datePublished">February 01, 2021</time></span> &raquo; <a href="/site/2021/02/01/Zero-shot-Learning-by-Generating-Task-specific-Adapters.html">Zero-shot Learning by Generating Task-specific Adapters</a></li>
+    
+  </ul>
+  
+    
+  <h2 id="Zero-Shot">Zero-Shot</h2>
+  <ul class="posts">
+    
+    <li itemscope=""><span class="entry-date"><time datetime="2021-02-01T00:00:00-05:00" itemprop="datePublished">February 01, 2021</time></span> &raquo; <a href="/site/2021/02/01/Zero-shot-Learning-by-Generating-Task-specific-Adapters.html">Zero-shot Learning by Generating Task-specific Adapters</a></li>
+    
+  </ul>
+  
+</div>
diff --git a/site/_posts/2023-02-10-Toolformer - Language Models Can Teach Themselves to Use Tools.md b/site/_posts/2023-02-10-Toolformer - Language Models Can Teach Themselves to Use Tools.md
index bcd511dd..570a84e9 100755
--- a/site/_posts/2023-02-10-Toolformer - Language Models Can Teach Themselves to Use Tools.md	
+++ b/site/_posts/2023-02-10-Toolformer - Language Models Can Teach Themselves to Use Tools.md	
@@ -25,7 +25,7 @@ tags:
 
 - Starting with a language model, M, the goal is to enable the language model to use tools by invoking API calls.
 
-- An API call is denoted by the tuple $c = (api-name, api-input)$. It can be linearized as $e(c) = [api-name(api-input)]$ or as $e(c, r) = [api-name(api-input) -> r]$ where $r$ denotes the result of the API.
+- An API call is denoted by the tuple $c =$ (api_name, api_input). It can be linearized as $e(c) =$ [api_name(api_input)$]$ or as $e(c, r) = [$api_name(api_input) $ -> r]$ where $r$ denotes the result of the API.
 
 - The given dataset of plain text, $C$, is converted into a dataset $C*$ augmented with the API calls using a three-step process.
 
diff --git a/site/_site b/site/_site
index 595a66b8..f852505c 160000
--- a/site/_site
+++ b/site/_site
@@ -1 +1 @@
-Subproject commit 595a66b8361d6a240aafa6bb4450f0133b6a7a96
+Subproject commit f852505cc19589ddf6596d882825bea7293282e9