Warning
As of Feb 9, 2024, the pyRDDLGym API has been updated to version 2.0, and is no longer backwards compatible with the previous stable version 1.4.4.
While we strongly recommend that you update to 2.0, in case you require the old API, you can install the last stable version with pip:
pip install pyRDDLGym==1.4.4
, or directly from github pip install git+https://github.com/pyrddlgym-project/pyRDDLGym@version_1.4.4_stable
.
A Python toolkit for auto-generation of OpenAI Gym environments from Relational Dynamic Influence Diagram Language (RDDL) description files.
This is currently the official parser, simulator and evaluation system for RDDL in Python, with new features and enhancements to the RDDL language.
- Purpose and Benefits
- Installation
- Example Scripts
- Usage
- Status
- Citing pyRDDLGym
- License
- Contributors
- Describe your environment in RDDL (web-based intro), (full tutorial), (language spec) and use it with your existing workflow for OpenAI gym environments
- Compact, easily modifiable representation language for discrete time control in dynamic stochastic environments
- e.g., a few lines of RDDL for CartPole vs. 200 lines in direct Python for Gym
- Object-oriented relational (template) specification allows easy scaling of model instances from 1 object to 1000's of objects without changing the domain model
- Customizable visualization and recording tools facilitate domain debugging and plan interpretation
- e.g., a student course project visualizing Jax plans in a sailing domain
- Runs out-of-the-box in Python or within Colab (RDDL Playground)
- Compiler tools to extract Dynamic Bayesian Networks (DBNs) and Extended Algebraic Decision Diagrams (XADDs) for symbolic analysis of causal dependencies and transition distributions
- Ready to use with out-of-the-box planners:
- JaxPlan: Planning through autodifferentiation
- GurobiPlan: Planning through mixed discrete-continuous optimization
- PROST: Monte Carlo Tree Search (MCTS)
- Deep Reinforcement Learning (DQN, PPO, etc.): Popular Reinforcement Learning (RL) algorithms from Stable Baselines and RLlib
- Symbolic Dynamic Programming: Exact Symbolic regression-based planning and policy evaluation
We require Python 3.8+ and the following packages: ply
, pillow>=9.2.0
, numpy>=1.22
, matplotlib>=3.5.0
, gymnasium
, pygame
, termcolor
.
You can install our package, along with all of its prerequisites, using pip
pip install pyRDDLGym
Since pyRDDLGym does not come with any premade environments, you can either load RDDL documents from your local file system, or install rddlrepository for easy access to preexisting domains
pip install rddlrepository
Several example scripts are provided to illustrate basic pyRDDLGym usage:
- run_gym.py launches a pyRDDLGym environment and evaluates a given policy
- run_gym2.py is similar to the above, except the environment interaction is coded explicitly
- run_ground.py illustrates grounding a domain and instance
- run_intervals.py computes lower and upper bounds on the policy value using interval arithmetic
- run_server.py illustrates how to set up pyRDDLGym to send and receive messages through TCP
To simulate an environment for example, from the install directory of pyRDDLGym, type the following into a shell supporting the python command (you need rddlrepository):
python -m pyRDDLGym.examples.run_gym "CartPole_Continuous_gym" "0" 1
which loads instance "0" of the CartPole control problem with continuous actions from rddlrepository and simulates it with a random policy for one episode.
This section outlines some of the basic python API functions of pyRDDLGym in more detail.
Instantiation of an existing environment by name is as easy as:
import pyRDDLGym
env = pyRDDLGym.make("CartPole_Continuous_gym", "0")
Loading your own domain files is just as straightforward
import pyRDDLGym
env = pyRDDLGym.make("/path/to/domain.rddl", "/path/to/instance.rddl")
Both versions above instantiate env
as an OpenAI gym environment, so that the usual reset()
and step()
calls work as intended.
You can also pass custom settings to the make command, i.e.:
import pyRDDLGym
env = pyRDDLGym.make("Cartpole_Continuous_gym", "0", enforce_action_constraints=True, ...)
You can design your own visualizer by subclassing from pyRDDLGym.core.visualizer.viz.BaseViz
and overriding the render(state)
method.
Then, changing the visualizer of the environment is easy
viz_class = ... # the class name of your custom viz
env.set_visualizer(viz_class)
You can record an animated gif or movie of the agent interaction with an environment (described below). To do this, simply pass a MovieGenerator
object to the set_visualizer
method:
from pyRDDLGym.core.visualizer.movie import MovieGenerator
movie_gen = MovieGenerator("/path/where/to/save", "env_name")
env.set_visualizer(viz_class, movie_gen=movie_gen)
Agents map states to actions through the sample_action(obs)
function, and can be used to interact with an environment.
For example, to initialize a random agent:
from pyRDDLGym.core.policy import RandomAgent
agent = RandomAgent(action_space=env.action_space, num_actions=env.max_allowed_actions)
All agent instances support one-line evaluation in a given environment:
stats = agent.evaluate(env, episodes=1, verbose=True, render=True)
which returns a dictionary of summary statistics (e.g. "mean", "std", etc...), and which also visualizes the domain in real time. Of course, if you wish, the standard OpenAI gym interaction is still available to you:
total_reward = 0
state, _ = env.reset()
for step in range(env.horizon):
env.render()
action = agent.sample_action(state)
next_state, reward, terminated, truncated, _ = env.step(action)
print(f'state = {state}, action = {action}, reward = {reward}')
total_reward += reward
state = next_state
done = terminated or truncated
if done:
break
print(f'episode ended with reward {total_reward}')
# release all viz resources, and finish logging if used
env.close()
Note
All observations (for a POMDP), states (for an MDP) and actions are represented by dict
objects, whose keys correspond to the appropriate fluents as defined in the RDDL description.
Here, the syntax is pvar-name___o1__o2...
, where pvar-name
is the pvariable name, followed by 3 underscores, and object parameters o1
, o2
... are separated by 2 underscores.
Warning
There are two known issues not documented with RDDL:
- the minus (-) arithmetic operation must have spaces on both sides, otherwise there is ambiguity whether it refers to a mathematical operation or to variables
- aggregation-union-precedence parsing requires for encapsulating parentheses around aggregations, e.g., (sum_{}[]).
A complete archive of past and present RDDL problems, including all IPPC problems, is also available to clone\pip
- rddlrepository (
pip install rddlrepository
)
Software for related simulators:
The parser used in this project is based on the parser from Thiago Pbueno's pyrddl (used in rddlgym).
Please see our paper describing pyRDDLGym. If you found this useful, please consider citing us:
@article{taitler2022pyrddlgym,
title={pyRDDLGym: From RDDL to Gym Environments},
author={Taitler, Ayal and Gimelfarb, Michael and Gopalakrishnan, Sriram and Mladenov, Martin and Liu, Xiaotian and Sanner, Scott},
journal={arXiv preprint arXiv:2211.05939},
year={2022}}
This software is distributed under the MIT License.
- Ayal Taitler (University of Toronto, CA)
- Michael Gimelfarb (University of Toronto, CA)
- Jihwan Jeong (University of Toronto, CA)
- Sriram Gopalakrishnan (Arizona State University/J.P. Morgan, USA)
- Martin Mladenov (Google, BR)
- Jack Liu (University of Toronto, CA)