You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In case of distributed training, e.g. DDP, each gpu will only process a minibatch, and the bn statistics computed in each gpu are different.
When SWA is adopted, we need to conduct 1 more epoch for bn_update, in this epoch should we use sync bn to average the bn statistics from all gpus?
And is there any other modifications we need to make for DDP training?
The text was updated successfully, but these errors were encountered:
Hi @milliema I'd say you should do the same thing that is normally done with the batchnorm statistics in the end of parallel training, I imagine you are syncing the statistics between the copies of the model? I personally did not look into distributed SWA a lot, but here is a potentially useful reference: https://openreview.net/forum?id=rygFWAEFwS.
In case of distributed training, e.g. DDP, each gpu will only process a minibatch, and the bn statistics computed in each gpu are different.
When SWA is adopted, we need to conduct 1 more epoch for bn_update, in this epoch should we use sync bn to average the bn statistics from all gpus?
And is there any other modifications we need to make for DDP training?
The text was updated successfully, but these errors were encountered: