Mean reward first rises and then falls during the training process #8

meng-zha · 2024-07-17T06:50:27Z

Can you provide an example training log of Go2?
It's weird that the mean reward first rises and then falls during the training process.
All settings follow the repository.

lupinjia · 2024-07-17T11:41:42Z

I encountered the same situation before. Through inspecting the simulation visualization, I found that the decline of mean_reward is mainly due to the decline of mean_episode_length. After training for several iterations, many environments will terminate not long after they start the episode. Due to the increase of short episode, the mean_episode_length declines, and the mean_reward is calced by dividing reward_sum by max_episode_length. Because of the shorter episode, the reward_sum becomes smaller, and so the mean_reward declines.
You can check and refine your termination condition to possibly avoid this. Or you can also increase the information the agent receives, because the original version is a POMDP setting in my view.

meng-zha · 2024-07-18T02:22:48Z

I encountered the same situation before. Through inspecting the simulation visualization, I found that the decline of mean_reward is mainly due to the decline of mean_episode_length. After training for several iterations, many environments will terminate not long after they start the episode. Due to the increase of short episode, the mean_episode_length declines, and the mean_reward is calced by dividing reward_sum by max_episode_length. Because of the shorter episode, the reward_sum becomes smaller, and so the mean_reward declines. You can check and refine your termination condition to possibly avoid this. Or you can also increase the information the agent receives, because the original version is a POMDP setting in my view.

Thanks for your kind advice. I have retried with a different random seed and got a normal result this time.
I will also look into the termination condition as your advice.

meng-zha closed this as completed Jul 18, 2024

zzerann mentioned this issue Dec 3, 2024

fail to load pre_trained model using sim2sim code #21

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mean reward first rises and then falls during the training process #8

Mean reward first rises and then falls during the training process #8

meng-zha commented Jul 17, 2024

lupinjia commented Jul 17, 2024 •

edited

Loading

meng-zha commented Jul 18, 2024

Mean reward first rises and then falls during the training process #8

Mean reward first rises and then falls during the training process #8

Comments

meng-zha commented Jul 17, 2024

lupinjia commented Jul 17, 2024 • edited Loading

meng-zha commented Jul 18, 2024

lupinjia commented Jul 17, 2024 •

edited

Loading