Skip to content

Commit

Permalink
Add PHYRE paper
Browse files Browse the repository at this point in the history
  • Loading branch information
shagunsodhani committed Sep 29, 2019
1 parent ca9ded1 commit 1d2dcde
Show file tree
Hide file tree
Showing 3 changed files with 104 additions and 1 deletion.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ I am trying a new initiative - a-paper-a-week. This repository will hold all tho

## List of papers

* [PHYRE - A New Benchmark for Physical Reasoning](https://shagunsodhani.com/papers-I-read/PHYRE-A-New-Benchmark-for-Physical-Reasoning)
* [Large Memory Layers with Product Keys](https://shagunsodhani.com/papers-I-read/Large-Memory-Layers-with-Product-Keys)
* [Abductive Commonsense Reasoning](https://shagunsodhani.com/papers-I-read/Abductive-Commonsense-Reasoning)
* [Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models](https://shagunsodhani.com/papers-I-read/Deep-Reinforcement-Learning-in-a-Handful-of-Trials-using-Probabilistic-Dynamics-Models)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
---
layout: post
title: PHYRE - A New Benchmark for Physical Reasoning
comments: True
excerpt:
tags: ['2019', 'Physical Reasoning', AI, Benchmark, Dataset, Physics, Reasoning]

---

## Introduction

* The paper proposes the PHYRE (PHYsical REasoning) benchmark - consisting of classic mechanical puzzles in 2D physical environments - as a means to evaluate the physical reasoning ability of machine learning models.

* [Link to the paper](https://arxiv.org/abs/1908.05656)

## Environment

* 2D world that obeys Newtonian mechanics.

* Gravitational force + Friction.

* Non-deformable objects that can be static (ie fixed) or dynamic (ie can move and are affected by collisions etc).

## Task

* The learning agent starts in some initial world state (ie configuration of objects).

* Goal is described in the form of (`subject`, `relation`, `object`) where the agent's task is to satisfy the `relation` between the `subject` and the `object`.

* Currently, only the "touch" `relation` is supported.

## Setup

* The learning agent has to take a single action - placing one or more new dynamic objects in the world.

* A simulator is run on the new configuration (for a fixed amount of time) to check if the goal condition is satisfied.

* At the end of the simulation, a binary reward and intermediate observations (collected as the simulator executes) are provided to the learning agent.

* These observations are 256\*256 grids where each grid cell can take 1 of the 7 values (denoting different types of objects).

* Since only one relation supported currently, the color is sufficient to encode the goal.

## Benchmark Tiers

* Two benchmark tiers are provided where each tier comprises of a combination of:

* a predefined set of all the actions that the agent is allowed to perform.

* set of tasks that can be solved by at least one action from the allowed action set.

* **PHYRE-B** - The agent is allowed to place a single (ball of any radii) at any valid location.

* **PHYRE-2B** - The agent is allowed to place 2 balls at any valid pair of locations.

* Each of the two tiers has 25 task templates where each template comprises of variants of a single task (same goal but different initial conditions).

## Evaluation

* Two evaluation setups are considered:

* **within-template** where the agent is trained on some tasks in a template and evaluated on a set of held-out tasks from the same template.

* **cross-template** where the agent is evaluated on tasks from a different template.

* In the training phase, the model has access to the simulator (but not to the correct solution). So the model could learn an action-prediction model or forward dynamics model or both.

* In the testing phase, the model can query the simulator only a few times. Each query provides it with the binary reward and the intermediate observations.

## Performance Measure

* The emphasis is on solving more tasks (in few queries) during the test phase.

* This requirement is captured using a metric called AUCCESS.

* In general, the tasks in PHYRE-2B are harder than tasks in PHYRE-B.

## Baseline Agents

* Random Agent - Randomly samples actions

* Non-parametric agent (MEM) - generates R actions at random and uses the simulator to check how many tasks can be solved using these R random actions. During testing, try the R actions in the decreasing order of the number of tasks they solve.

* Non-parametric agent with online learning (MEM-O) - Variant of MEM where an online adaptation step is performed during test time (to update the rank of the actions).

* Deep Q Networks with an action encoder, observation encoder and fusion model (combine action and observation representation).

* DQN with online learning (DQN-0): Variant of DQN with online updates (during the test phase).

* Contextual bandits.

* Policy learning approaches like PPO and A2C.

## Observations

* Both Contextual bandits and policy-based approaches show poor training stability.

* The best agent, DQN-O, reaches AUCCESS of 56.2\% on PHYRE-B and 39.26\% on PHYRE-2B. In general, agents with online adaptation perform better.

* The tasks are designed such that 100000 attempts are sufficient to solve 100\% of tasks in PHYRE-B and 95\% of tasks in PHYRE-2B.

* Even though only two tiers are provided right now, the benchmark is readily extensible and new tasks can be added in the future.
2 changes: 1 addition & 1 deletion site/_site
Submodule _site updated from d8120a to efc07d

0 comments on commit 1d2dcde

Please sign in to comment.