Are you fed up with friends snakes betraying you? This is your chance to redeem yourself and rise above the rest to crown yourself as the King of the Snakes Vasuki!
We present to you RL Games, an arena where you will find only blood and venom. Create your bot using Reinforcement Learning techniques to fight against the best of the best bots. With a cash prize pool of INR 25K, what more do you need?
As a part of Shaastra 2022, RL games is a team competition where participants compete to find the best policy to a given environment using reinforcement learning methods. Do check out our website for other exciting opportunities!
- Link to notebook:
- Environment
The environment consists of two snakes (agents) and 4
food locations at any instant. The snakes (agents) can move in three directions; namely left, right or straight ahead. The objective of the game is to possess a greater score than the opponent either by consuming the food or by colliding with the opponent.
A breief description on the environment is given below:
- The state space is characterised by a
8 x 8
Grid (Continious Space). - At any instant of time,
4
random coordinates out of8
fixed coordinates possess food.
- The agent may choose one of the three possible moves; left, right, forward at any instant.
- Depending on the position of the agent, the move may or may not be executed.
- For instance, if the agent lies on the first row and is facing North, and decides to move left, the move will be determined illegal and the agent will not be displaced. Although the move does not take place, the agent will be turned to face West.
- That is, the agent will first turn to left and then try to move. Since the move is illegal, the agent stays put.
- The agent must eat the food to grow.
- If the agent collides with the opponent:
- Let
s1
ands2
be the scores of the two agents. - If
s1 > s2
,r1 = 5 s2/(s1-s2)
andr2 = -3 s2/(s1-s2)
- If
s1 < s2
,r1 = -3 s1/(s2-s1)
andr2 = 5 s1/(s2-s1)
- Let
- After collison, the agent with the lesser score is randomly respawned.
-1
for legal moves-2
for illegal moves+4
for consuming food- Collision
- If
s1 > s2
,r1 = 5 s2/(s1-s2)
andr2 = -3 s2/(s1-s2)
- If
s1 <s 2
,r1 = -3 s1/(s2-s2)
andr2 = 5 s1/(s2-s1)
- If
-
Every game lasts for a maximum of
game_length = 100
iterations. -
The agent with the greater score wins the game.
-
Play
runs = 1000
games against the opponent. -
The agent with higher number of victories wins the bracket.