Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] EvalCallback using MPI #1069

Open
davidADSP opened this issue Jan 15, 2021 · 5 comments
Open

[question] EvalCallback using MPI #1069

davidADSP opened this issue Jan 15, 2021 · 5 comments
Labels
question Further information is requested

Comments

@davidADSP
Copy link

davidADSP commented Jan 15, 2021

When running a training loop using MPI, the EvalCallback doesn't seem to make use of the parallelisation:

for example, in this train function:

https://github.com/hardmaru/slimevolleygym/blob/master/training_scripts/train_ppo_mpi.py

it seems that the EvalCallback will be called once per instance, after a combined total of eval_freq timesteps across all of the instances. This appears to be problematic if you want to use the callback decide whether to save out a new best model, as there will be multiple attempts at calculating the average reward and therefore the best_model file will be overwritten potentially several times on the same update. The best score will be naturally inflated the more instances you have, due to some reward calculations being slightly higher than average due to favourable random fluctuations.

Is this correct?

If so, is there a way to instead split the n_eval_episodes across the workers and aggregate into a single score?

@davidADSP davidADSP changed the title EvalCallback using MPI [question] EvalCallback using MPI :question Jan 15, 2021
@davidADSP davidADSP changed the title EvalCallback using MPI :question [question] EvalCallback using MPI Jan 15, 2021
@araffin
Copy link
Collaborator

araffin commented Jan 18, 2021

Hello,

When running a training loop using MPI, the EvalCallback doesn't seem to make use of the parallelisation:

yes, the EvalCallback does not support MPI parallelization.

I would recommend you to try to switch to the VecEnv version of PPO (PPO2) if this is possible.
And even to switch to Stable-Baselines3 ;) : https://github.com/DLR-RM/stable-baselines3

If so, is there a way to instead split the n_eval_episodes across the workers and aggregate into a single score?

this is possible but non-trivial and would require you to define a custom callback (cf doc).

@araffin araffin added the question Further information is requested label Jan 18, 2021
@davidADSP
Copy link
Author

Hi @araffin - thanks for the reply!

Does VecEnv parallelise the gradient computation, or just the env part?

https://twitter.com/hardmaru/status/1260852988475658242

I've got PPO + MPI working really well on a multicore machine with a custom callback to handle the parallelisation of the evaluation. I'm also hesitant to switch to SB3 as it doesn't support Tensorflow, which is a shame.

Thanks for your help!

@araffin
Copy link
Collaborator

araffin commented Jan 18, 2021

Does VecEnv parallelise the gradient computation, or just the env part?

Just the env part.

I've got PPO + MPI working really well on a multicore machine with a custom callback to handle the parallelisation of the evaluation.

SB3 do not support MPI by default, but we would be happy to have an implementation of PPO MPI in our contrib repo ;)

See Stable-Baselines-Team/stable-baselines3-contrib#11

I'm also hesitant to switch to SB3 as it doesn't support Tensorflow, which is a shame.

The decision to move to PyTorch and drop MPI (for the default install) was not arbitrary, see #733 and #366 ;)

@danielstankw
Copy link

@araffin has anything changed with regards to SB3 supporting the MPI or its still not supported?

@araffin
Copy link
Collaborator

araffin commented Oct 20, 2021

@araffin has anything changed with regards to SB3 supporting the MPI or its still not supported?

It is not (Stable-Baselines-Team/stable-baselines3-contrib#11, Stable-Baselines-Team/stable-baselines3-contrib#45), but contribution is welcomed ;)

But with SB3, you can use multiple envs for evaluation which provide a great speed up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants