This code is for a simple 2D game environment that can be used in developing reinforcement learning models. It is designed to be compact but flexible, enabling the implementation of diverse set of games. Furthermore, it offers precise tuning of the game difficulty, facilitating the construction of curricula to aid training. The code is in Lua+Torch, and it offers rapid prototyping of games and is easy to connect to models that control the agent’s behavior. For more details, see our paper.
Each game is played in a 2D rectangular grid. Each location in the grid can be empty, or may contain one or more items such as:
- Block: an impassible obstacle that does not allow the agent to move to that grid location
- Water: the agent may move to a grid location with water, but incurs an additional cost of for doing so.
- Switch: a switch can be in one of M states, which we refer to as colors. The agent can toggle through the states cyclically by a toggle action when it is at the location of the switch .
- Door: a door has a color, matched to a particular switch. The agent may only move to the door’s grid location if the state of the switch matches the state of the door.
- PushableBlock: This block is impassable, but can be moved with a separate “push” actions. The block moves in the direction of the push, and the agent must be located adjacent to the block opposite the direction of the push.
- Corner: This item simply marks a corner of the board.
- Goal: depending on the game, one or more goals may exist, each named individually.
- Info: these items do not have a grid location, but can specify a or give information necessary for its completion.
The environment is presented to the agent as a list of sentences, each describing an item in the game. For example, an agent might see “Block at [-1,4]. Switch at [+3,0] with blue color. Info: change switch to red.” However, note that we use egocentric spatial coordinates, meaning that the environment updates the locations of each object after an action. The environments are generated randomly with some distribution on the various items. For example, we usually specify a uniform distribution over height and width, and a percentage of wall blocks and water blocks.
Currently, there are 10 different games implemented, but it is possible to add new games. The existing games are:
- Multigoals: the agent is given an ordered list of goals as “Info”, and needs to visit the goals in that order.
- Conditional Goals: the agent must visit a destination goal that is conditional on the state of a switch. The “Info” is of the form “go to goal 4 if the switch is colored red, else go to goal 2”.
- Exclusion: the “Info” in this game specifies a list of goals to avoid. The agent should visit all other unmentioned goals.
- Switches: there are multiple switches on the board, and the agent has to toggle all switches to the same color.
- Light Key: there is a switch and a door in a wall of blocks. The agent should navigate to a goal which may be on the wrong side of a wall of blocks, in which the agent needs move to and toggle the switch to open the door before going to the goal.
- Goto: the agent is given an absolute location on the grid as a target. The game ends when the agent visits this location. Solving this task requires the agent to convert from its own egocentric coordinate representation to absolute coordinates.
- Goto Hidden: the agent is given a list of goals with absolute coordinates, and then is told to go to one of the goals by the goal’s name. The agent is not directly given the goal’s location, it must read this from the list of goal locations.
- Push block: the agent needs to push a Pushable block so that it lays on top of a switch.
- Push block cardinal: the agent needs to push a Pushable block so that it is on a specified edge of the maze, e.g. the left edge. Any location along the edge is acceptable.
- Blocked door: the agent should navigate to a goal which may lie on the opposite side of a wall of blocks, as in the Light Key game. However, a PushableBlock blocks the gap in the wall instead of a door.
Examples of each games are shown in this video. The internal parameters of the games are written to a configuration file, which can be easily modified.
First, either install mazebase with luarocks make *.rockspec
or add the appropriate path:
package.path = package.path .. ';lua/?/init.lua'
To use the game environment as standalone in Torch, first start a local display
server with
$ th -ldisplay.start 8000 0.0.0.0
which will begin the remote desktop to view the MazeBase graphics at http://0.0.0.0:8000
. See the full repo for more details. Next, include the init file with
g_mazebase = require('mazebase')
Then we have to set which config file to use. Here we are using the config file that used in our paper
g_opts = {games_config_path = 'mazebase/config/game_config.lua'}
Next, we call this function to create a dictionary with all necessary words used in the game
g_mazebase.init_vocab()
Finally, initialize the game environment with
g_mazebase.init_game()
Now we can create a new game instance by calling
g = g_mazebase.new_game()
If there are more than one game, it will randomly pick one. Now, the current game state can be retrieved by calling
s = g:to_sentence()
which would return a tensor containing words (encoded by g_vocab
dictionary) describing each item in the game. If you started the display server, you can see the game at 0.0.0.0:8000
on your browser by doing
g_disp = require('display')
g_disp.image(g.map:to_image())
Next, an action can be performed by calling
g:act(action)
where action
is the index of the action. The list of possible actions are in g.agent.action_names
. When there are multiple agents in the game, we can choose the agent to perform the action by doing
g.agent = g.agents[i]
before calling g:act()
. After the action is completed, g:update()
must be called so that the game will update its internal state.
Finally, we can check if the game is finished by calling g:is_active()
. You can run demo_api.lua
to see the game playing with random actions.
Here we demonstrate how a new game can be added. Let us create a very simple game where an agent has to reach the goal. First, we create a file named SingleGoal.lua
. In it, a game class has to be created
local SingleGoal, parent = torch.class('SingleGoal', 'MazeBase')
Next, we have to construct the game items. In this case, we only need a goal item placed at a random location:
function SingleGoal:__init(opts, vocab)
parent.__init(self, opts, vocab)
self:add_default_items()
self.goal = self:place_item_rand({type = 'goal'})
end
Function place_item_rand
puts the item on empty random location. But it is possible specify the location using place_item
function. The argument to this function is a table containing item's properties such as type and name. Here, we only set the type of item to goal, but it is possible to include any number of attributes (e.g. color, name, etc.).
The game rule is to finish when the agent reaches the goal, which can be achieved by changing update
function to
function SingleGoal:update()
parent.update(self)
if not self.finished then
if self.goal.loc.y == self.agent.loc.y and self.goal.loc.x == self.agent.loc.x then
self.finished = true
end
end
end
This will check if the agent's location is the same as the goal, and sets a flag when it is true. Finally, we have to give a proper reward when the goal is reached:
function SingleGoal:get_reward()
if self.finished then
return -self.costs.goal -- this will be set in config file
else
return parent.get_reward(self)
end
end
Now, we include our game file in mazebase/init.lua
by adding the following line
paths.dofile('SingleGoal.lua')
Also, the following lines has to be added inside init_game_opts
function:
games.SingleGoal = SingleGoal
helpers.SingleGoal = OptsHelper
Finally, we need a config file for our new game. Let us create singlegoal.lua
file in mazebase/config
. The main parameters of the game is the grid size:
local mapH = torch.Tensor{5,5,5,10,1}
local mapW = torch.Tensor{5,5,5,10,1}
The first two numbers define lower and upper bounds of the parameter. The actual grid size will be uniformly sampled from this range. The remaining three numbers for curriculum training. In the easiest (hardest) case, the upper bound will be set to 3rd (4th) number. 5th number is the step size for changing the upper bound. In the same way, we define a percentage of grid cells to contain a block or water:
local blockspct = torch.Tensor{0,.05,0,.2,.01}
local waterpct = torch.Tensor{0,.05,0,.2,.01}
There are other generic parameters has be set, but see the actual config file for detail. Now we are ready to use the game!
We also provide a code for training different types of neural models with policy gradient method. Training uses CPUs with multi-threading for speed up. The implemented models are
- multi-layer neural network
- convolutional neural network
- end-to-end memory network.
For example, running the following command will train a 2-layer network on MultiGoals.
th main.lua --hidsz 20 --model mlp --nlayers 2 --epochs 100 --game MultiGoals --nactions 6 --nworker 2
To see all the command line options, run
th main.lua -h
--hidsz the size of the internal state vector [20]
--nonlin non-linearity type: tanh | relu | none [tanh]
--model model type: mlp | conv | memnn [memnn]
--init_std STD of initial weights [0.2]
--max_attributes maximum number of attributes of each item [6]
--nlayers the number of layers in MLP [2]
--convdim the number of feature maps in convolutional layers [20]
--conv_sz spatial scope of the input to convnet and MLP [19]
--memsize size of the memory in MemNN [20]
--nhop the number of hops in MemNN [3]
--nagents the number of agents [1]
--nactions the number of agent actions [11]
--max_steps force to end the game after this many steps [20]
--games_config_path configuration file for games [mazebase/config/game_config.lua]
--game can specify a single game []
--optim optimization method: rmsprop | sgd [rmsprop]
--lrate learning rate [0.001]
--max_grad_norm gradient clip value [0]
--alpha coefficient of baseline term in the cost function [0.03]
--epochs the number of training epochs [100]
--nbatches the number of mini-batches in one epoch [100]
--batch_size size of mini-batch (the number of parallel games) in each thread [32]
--nworker the number of threads used for training [16]
--beta parameter of RMSProp [0.97]
--eps parameter of RMSProp [1e-06]
--save file name to save the model []
--load file name to load the model []
See the paper for more details on training.
After training, you can see the model playing by calling function test()
which will display the game in a browser window. But you have to have display package to see the game play. For example, if you saved a trained model using --save /tmp/model.t7
option, then you can load the model using option --load /tmp/model.t7 --epochs 0
and then run test()
command to see it's playing.
The whole code is written in Lua, and requires Torch7 and nngraph packages. The training uses multi-threading for speed up. Display package is necessary for visualizing the game play, which can be installed by
luarocks install display