Series of n-armed bandit environments for the OpenAI Gym
Each env uses a different set of:
- Probability Distributions - A list of probabilities of the likelihood that a particular bandit will pay out
- Reward Distributions - A list of either rewards (if number) or means and standard deviations (if list) of the payout that bandit has
E.g. BanditTwoArmedHighLowFixed-v0 has p_dist=[0.8, 0.2]
, r_dist=[1, 1]
, meaning 80% of the time that action 0 is
selected it will payout 1, and 20% of the time action 2 is selected it will payout 1
You can access the distributions through the p_dist and r_dist variables using env.p_dist
or env.r_dist
if you want to match
your weights against the true values for plotting results of various algorithms
BanditTwoArmedDeterministicFixed-v0
: Simplest case where one bandit always pays, and the other always doesn'tBanditTwoArmedHighLowFixed-v0
: Stochastic version with a large difference between which bandit pays out of two choicesBanditTwoArmedHighHighFixed-v0
: Stochastic version with a small difference between which bandit pays where both are goodBanditTwoArmedLowLowFixed-v0
: Stochastic version with a small difference between which bandit pays where both are badBanditTenArmedRandomFixed-v0
: 10 armed bandit with random probabilities assigned to payoutsBanditTenArmedRandomRandom-v0
: 10 armed bandit with random probabilities assigned to both payouts and rewardsBanditTenArmedUniformDistributedReward-v0
: 10 armed bandit with that always pays out with a reward selected from a uniform distributionBanditTenArmedGaussian-v0
: 10 armed bandit mentioned on page 30 of Reinforcement Learning: An Introduction (Sutton and Barto)
git clone [email protected]:JKCooper2/gym-bandits.git
cd gym-bandits
pip install .
To install using requirements.txt
or environment.yml
call:
git+https://github.com/JKCooper2/gym-bandits#egg=gym-bandits
In your gym environment
import gym_bandits
env = gym.make("BanditTenArmedGaussian-v0") # Replace with relevant env