This is an attempt to solve OpenAI Lunar Lander-v2 using Deep Reinforcement Learning.
The search for hyperparameters values are challenging because of the large hyperparameters space need to be searched. As a result, we use the hyperparameters values from Deep Q-Learning with Keras and Gym that is used to solve Cartpole-v1 as starting point. The file has the training code for lunar lander model. For the longest time, the rewards were hovering between 0 and negative territories. The breakthrough came when we replace epsilon-greedy exploration strategy with Boltzman exploration strategy.
Credits are given in the source and References.
[1] Deep Q-Learning with Keras and Gym. URL:
[2] Playing Atari with Deep Reinforcement Learning. URL:
[3] Artificial Intelligence: Representation and Problem. URL:
[4] Human-level control through deep reinforcement learning. URL:
[5] Reinforcement Learning w/ Keras + OpenAI: DQNs. URL:
[6] Keras RL. URL: