-
Notifications
You must be signed in to change notification settings - Fork 78
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
67f3ad3
commit 563394f
Showing
3 changed files
with
79 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
77 changes: 77 additions & 0 deletions
77
...9-12-Emergence of Grounded Compositional Language in Multi-Agent Populations.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
--- | ||
layout: post | ||
title: Emergence of Grounded Compositional Language in Multi-Agent Populations | ||
comments: True | ||
excerpt: | ||
tags: ['2018', 'AAAI 2018', 'Emergent Language', 'Multi-Agent', Natural Language Processing', AAAI, AI, NLP] | ||
--- | ||
|
||
## Introduction | ||
|
||
* The paper provides a multi-agent learning environment and proposes a learning approach that facilitates the emergence of a basic compositional language. | ||
|
||
* The language is quite rudimentary and is essentially a sequence of abstract discrete symbols. But it does comprise of a defined vocabulary and syntax. | ||
|
||
* [Link to the paper](https://arxiv.org/abs/1703.04908) | ||
|
||
## Setup | ||
|
||
* Cooperative, partially observable Markov game (multi-agent extension of MDP). | ||
|
||
* All agents have identical action and observation spaces, use the same policy and receive a shared reward. | ||
|
||
### Grounded Communication Environment | ||
|
||
* Physically simulated 2-D environment in continuous space and discrete time with N agents and M landmarks. | ||
|
||
* The agents and the landmarks would occupy some location and would have some attributes (colour, shape). | ||
|
||
* Within the environment, the agents can *go to* a location, *look* at a location or *do nothing*. Additionally, they can utter communication symbols c (from a shared vocabulary C). Agents themselves learn to assign a meaning to the symbols. | ||
|
||
* Each agent has an internal goal (which could require interaction with other agents to complete) which the other agents cannot see. | ||
|
||
* Goal for agent *i* consists of an action to perform, a landmark location where to perform the action and another agent who should be performing the action. | ||
|
||
* Since the agent is continuously emitting symbols, a memory module is provided and simple additive memory updates are done. | ||
|
||
* For interaction, the agents could use verbal utterances, non-verbal signals (gaze) or non-communicative strategies (pushing other agents). | ||
|
||
## Approach | ||
|
||
* A model of all agent and environment state dynamics is created over time and the return gradient is computed. | ||
|
||
* Gumbel-Softmax distribution is used to obtain categorical word emission c. | ||
|
||
* A multi-layer perceptron is used to model the policy which returns action, communication symbol and the memory update for each agent. | ||
|
||
* Since the number of agents (and hence the number of communication streams etc) can vary across instantiations, an identical model is instantiated per agent and per communication stream. | ||
|
||
* The output of individual processing modules are pooled into feature vectors corresponding to communication and physical observations. These pooled features and the goal vectors are fed to the final processing module from which actions and categorical symbols are sampled. | ||
|
||
* In practice, using an additional task (each agent predicts the goal for another agent) encouraged more meaningful communication utterances. | ||
|
||
### Compositionality and Vocabulary Size | ||
|
||
* Along the lines of rich gets richer dynamics, the communication symbol c's are modelled as being generated by a Dirichlet process. The resulting reward across all agents is the log-likelihood of all communication utterances to have been generated by a Dirichlet process. | ||
|
||
* Since the agents can only communicate in discrete symbols and do not have a global positioning reference, they need to unambiguously communicate landmark references to other agents. | ||
|
||
## Case I - Agents can not see each other | ||
|
||
* Non-verbal communication is not possible. | ||
|
||
* When trained with just 2 agents, symbols are assigned for each landmark and action. | ||
|
||
* As the number of agents is increased, additional symbols are used to refer to agents. | ||
|
||
* If the agents of the same colour are asked to perform conflicting tasks, they perform the average of conflicting tasks. If distractor locations are added, the agents learn to ignore them. | ||
|
||
## Non-verbal communication | ||
|
||
* Agents are allowed to observe other agents' position, gaze etc. | ||
|
||
* Now the location can be pointed to using gaze. | ||
|
||
* If gaze is disabled, the agent could indicate the goal landmark by moving to it. | ||
|
||
* Basically even when the communication is disabled the agents can come up with strategies to complete the task. |
Submodule _site
updated
from 17793c to 86c855