-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DQN in CartPole-v0 doesn't learn #135
Comments
OK thanks for the update clarifications! The one thing I can think of that's missing from your configuration is to do with the epsilon-greedy schedule. Look in the EpsilonGreedyAgent (a base class for DqnAgent): You'll probably also want to increase Another thing would be to use Let us know if either of those help? This is interesting, I haven't actually run cartpole myself, would be good to see what settings work. |
FWIW from my experience with CartPole I'm not actually sure if DQN does well at that. DQN seems to strangely be more reliable on Pong than on CartPole, but I might not have settled on ideal hyperparameters. I usually verify DQN code by running on Pong. |
I tried several configurations for cartplole, but it didn't learn. Finally, I decided to test pong using custom agent and model, not from rlpyt, to see if my code is wrong. I just used the resized rgb image as input and the same configuration for sampler, algo, and runner but again without any sign of learning. It seems that my code has a problem that I cannot find it. Here is my code. I would be grateful if you can take a look at that. |
Finally, the problem is solved. The replay buffer setting is not correct for DQN and non-frame environments. It sets it to frame versions, but when I set it to UniformReplayBuffer, it works perfectly. I will clean the code and add one example and make a pull request. |
I'm trying to use rlpyt with my custom env with a non-image input state. For that, I first want to test it on a simple env, like CartPole-v0. And I use DQN and DqnAgent. But I get this error:
The code is:
But the ModelCls is None in DqnAgent and that's the reason for the error, I think. So I wrote an agent and model like this and used it instead of DqnAgent:
It runs fine, but It doesn't learn.
The plot is similar even after 20,000,000 steps. I checked the code several times and tested different configs for several days.
Do you have any idea to solve this problem?
Update:
The CartPole-v0 has discrete action space. +1 or -1 for action (two actions). DDPG and SAC work fine for my custom env with continues action space. I try to discretize the action space. I trained it using DQN from stable baseline and my pure pytorch implementation and it works. But I couldn't train it using rlpyt and decided to first try on CartPole-v0.
Do you see any problem in my code for an env with a discrete action space?
The text was updated successfully, but these errors were encountered: