You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear Lu Lu:
I have a problem. I am working on a school server which I fail to install horovod. Horever my model and data are too large to fit on a single GPU, and I don't know how the deepxde works to load data to GPU. Using torch.nn.parallel.DistributedDataParallel(model) would cause the problem: "unexpected keyword argument 'lr'" and other problems. I would like to know is there any other solution to use multiple GPUs without horvod. Thanks!
The text was updated successfully, but these errors were encountered:
Hi, data-parallel acceleration is currently supported only with Horovod + TensorFlow 1.x and random collocation points sampling. "Horovod also supports PyTorch, TensorFlow 2.x, paving the way for multiple backend acceleration'' (link).
You could either implement Horovod for the PyTorch backend or directly use PyTorch's DistributedDataParallel, depending on your preferences.
Since Horovod for TF 1.x is already implemented, it might be easier to port it to Horovod + PyTorch. However, the principles behind DistributedDataParallel seem very similar to those of Horovod.
Dear Lu Lu:
I have a problem. I am working on a school server which I fail to install horovod. Horever my model and data are too large to fit on a single GPU, and I don't know how the deepxde works to load data to GPU. Using torch.nn.parallel.DistributedDataParallel(model) would cause the problem: "unexpected keyword argument 'lr'" and other problems. I would like to know is there any other solution to use multiple GPUs without horvod. Thanks!
The text was updated successfully, but these errors were encountered: