-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mean reward first rises and then falls during the training process #8
Comments
I encountered the same situation before. Through inspecting the simulation visualization, I found that the decline of mean_reward is mainly due to the decline of mean_episode_length. After training for several iterations, many environments will terminate not long after they start the episode. Due to the increase of short episode, the mean_episode_length declines, and the mean_reward is calced by dividing reward_sum by max_episode_length. Because of the shorter episode, the reward_sum becomes smaller, and so the mean_reward declines. |
Thanks for your kind advice. I have retried with a different random seed and got a normal result this time. |
Can you provide an example training log of Go2?
It's weird that the mean reward first rises and then falls during the training process.
All settings follow the repository.
The text was updated successfully, but these errors were encountered: