how to use multiple GPUs without horovod #1906

flashal1 · 2024-12-05T03:19:47Z

Dear Lu Lu:
I have a problem. I am working on a school server which I fail to install horovod. Horever my model and data are too large to fit on a single GPU, and I don't know how the deepxde works to load data to GPU. Using torch.nn.parallel.DistributedDataParallel(model) would cause the problem: "unexpected keyword argument 'lr'" and other problems. I would like to know is there any other solution to use multiple GPUs without horvod. Thanks!

lululxvi · 2024-12-31T20:24:59Z

torch.nn.parallel.DistributedDataParallel should also work maybe with some code modification, but I am not familiar with it.

pescap · 2025-01-22T16:25:05Z

Hi, data-parallel acceleration is currently supported only with Horovod + TensorFlow 1.x and random collocation points sampling. "Horovod also supports PyTorch, TensorFlow 2.x, paving the way for multiple backend acceleration'' (link).

You could either implement Horovod for the PyTorch backend or directly use PyTorch's DistributedDataParallel, depending on your preferences.

Since Horovod for TF 1.x is already implemented, it might be easier to port it to Horovod + PyTorch. However, the principles behind DistributedDataParallel seem very similar to those of Horovod.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to use multiple GPUs without horovod #1906

how to use multiple GPUs without horovod #1906

flashal1 commented Dec 5, 2024

lululxvi commented Dec 31, 2024

pescap commented Jan 22, 2025 •

edited

Loading

how to use multiple GPUs without horovod #1906

how to use multiple GPUs without horovod #1906

Comments

flashal1 commented Dec 5, 2024

lululxvi commented Dec 31, 2024

pescap commented Jan 22, 2025 • edited Loading

pescap commented Jan 22, 2025 •

edited

Loading