Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SWA with distributed training #22

Open
milliema opened this issue Feb 17, 2021 · 1 comment
Open

SWA with distributed training #22

milliema opened this issue Feb 17, 2021 · 1 comment

Comments

@milliema
Copy link

milliema commented Feb 17, 2021

In case of distributed training, e.g. DDP, each gpu will only process a minibatch, and the bn statistics computed in each gpu are different.
When SWA is adopted, we need to conduct 1 more epoch for bn_update, in this epoch should we use sync bn to average the bn statistics from all gpus?
And is there any other modifications we need to make for DDP training?

@izmailovpavel
Copy link
Collaborator

Hi @milliema I'd say you should do the same thing that is normally done with the batchnorm statistics in the end of parallel training, I imagine you are syncing the statistics between the copies of the model? I personally did not look into distributed SWA a lot, but here is a potentially useful reference: https://openreview.net/forum?id=rygFWAEFwS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants