Skip to content

Commit

Permalink
Added emergent language paper
Browse files Browse the repository at this point in the history
  • Loading branch information
shagunsodhani committed Sep 12, 2018
1 parent 67f3ad3 commit 563394f
Show file tree
Hide file tree
Showing 3 changed files with 79 additions and 1 deletion.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ I am trying a new initiative - a-paper-a-week. This repository will hold all tho

## List of papers

* [Emergence of Grounded Compositional Language in Multi-Agent Populations](https://shagunsodhani.in/papers-I-read/Emergence-of-Grounded-Compositional-Language-in-Multi-Agent-Populations)
* [A Semantic Loss Function for Deep Learning with Symbolic Knowledge](https://shagunsodhani.in/papers-I-read/A-Semantic-Loss-Function-for-Deep-Learning-with-Symbolic-Knowledge)
* [Hierarchical Graph Representation Learning with Differentiable Pooling](https://shagunsodhani.in/papers-I-read/Hierarchical-Graph-Representation-Learning-with-Differentiable-Pooling)
* [Imagination-Augmented Agents for Deep Reinforcement Learning](https://shagunsodhani.in/papers-I-read/Imagination-Augmented-Agents-for-Deep-Reinforcement-Learning)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
---
layout: post
title: Emergence of Grounded Compositional Language in Multi-Agent Populations
comments: True
excerpt:
tags: ['2018', 'AAAI 2018', 'Emergent Language', 'Multi-Agent', Natural Language Processing', AAAI, AI, NLP]
---

## Introduction

* The paper provides a multi-agent learning environment and proposes a learning approach that facilitates the emergence of a basic compositional language.

* The language is quite rudimentary and is essentially a sequence of abstract discrete symbols. But it does comprise of a defined vocabulary and syntax.

* [Link to the paper](https://arxiv.org/abs/1703.04908)

## Setup

* Cooperative, partially observable Markov game (multi-agent extension of MDP).

* All agents have identical action and observation spaces, use the same policy and receive a shared reward.

### Grounded Communication Environment

* Physically simulated 2-D environment in continuous space and discrete time with N agents and M landmarks.

* The agents and the landmarks would occupy some location and would have some attributes (colour, shape).

* Within the environment, the agents can *go to* a location, *look* at a location or *do nothing*. Additionally, they can utter communication symbols c (from a shared vocabulary C). Agents themselves learn to assign a meaning to the symbols.

* Each agent has an internal goal (which could require interaction with other agents to complete) which the other agents cannot see.

* Goal for agent *i* consists of an action to perform, a landmark location where to perform the action and another agent who should be performing the action.

* Since the agent is continuously emitting symbols, a memory module is provided and simple additive memory updates are done.

* For interaction, the agents could use verbal utterances, non-verbal signals (gaze) or non-communicative strategies (pushing other agents).

## Approach

* A model of all agent and environment state dynamics is created over time and the return gradient is computed.

* Gumbel-Softmax distribution is used to obtain categorical word emission c.

* A multi-layer perceptron is used to model the policy which returns action, communication symbol and the memory update for each agent.

* Since the number of agents (and hence the number of communication streams etc) can vary across instantiations, an identical model is instantiated per agent and per communication stream.

* The output of individual processing modules are pooled into feature vectors corresponding to communication and physical observations. These pooled features and the goal vectors are fed to the final processing module from which actions and categorical symbols are sampled.

* In practice, using an additional task (each agent predicts the goal for another agent) encouraged more meaningful communication utterances.

### Compositionality and Vocabulary Size

* Along the lines of rich gets richer dynamics, the communication symbol c's are modelled as being generated by a Dirichlet process. The resulting reward across all agents is the log-likelihood of all communication utterances to have been generated by a Dirichlet process.

* Since the agents can only communicate in discrete symbols and do not have a global positioning reference, they need to unambiguously communicate landmark references to other agents.

## Case I - Agents can not see each other

* Non-verbal communication is not possible.

* When trained with just 2 agents, symbols are assigned for each landmark and action.

* As the number of agents is increased, additional symbols are used to refer to agents.

* If the agents of the same colour are asked to perform conflicting tasks, they perform the average of conflicting tasks. If distractor locations are added, the agents learn to ignore them.

## Non-verbal communication

* Agents are allowed to observe other agents' position, gaze etc.

* Now the location can be pointed to using gaze.

* If gaze is disabled, the agent could indicate the goal landmark by moving to it.

* Basically even when the communication is disabled the agents can come up with strategies to complete the task.
2 changes: 1 addition & 1 deletion site/_site
Submodule _site updated from 17793c to 86c855

0 comments on commit 563394f

Please sign in to comment.