Replies: 1 comment 5 replies
-
Hi @iN1k1 , this is an issue of Wandb. You can see the docs to learn how to use wandb on multiple gpus. You can use the group parameter when you initialize wandb to define a shared experiment and group the logged values together in the W&B App UI. like: vis_backends = [dict(type='WandbVisBackend',init_kwargs=dict(group='xxx'))] |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all,
I am running a training job with the
dist_train.sh
script and logging the training details on wandb with the help of theLoggerHook
andWandbGenVisBackend
. Everything is fine during training and all the pars and losses are correctly logged in wandb under a single run.However, when the
MultiValLoop
is executed, there will be multiple wandb initialization (one for each process that has rank>0). Each of these processes logs nothing. Validation results are saved only under the wandb run for rank=0. So, the problem is that there are multiple wandb inits that are useless and just adding noise on the wandb UI. Any way of avoiding multiple wand initialization during the validation loop?Thanks
Beta Was this translation helpful? Give feedback.
All reactions