Multi-gpu support #5

R00Kie-Liu · 2019-05-08T03:29:17Z

How to use Multi-gpu to search?

198808xc · 2019-05-08T10:57:19Z

Thanks for this question!

I think multi-GPU works just like single-GPU. Since our search on CIFAR takes a few hours, we did not consider multi-GPU training. However, during our recent work that generalizes P-DARTS in searching on ImageNet directly, we did use 8 GPUs for acceleration.

@chenxin061 more experiences to share?

chenxin061 · 2019-05-08T11:30:05Z

To search with multiple GPUs, you need to change a few lines in train_search.py.

Delete all lines related to GPU ID setting. Instead, you can set GPU ids with CUDA_VISIBLE_DEVICES.
Add model = nn.DataParallel(model) before model = model.cuda() and model = model.module after it.

zihaozhang9 · 2019-06-15T07:37:37Z

要使用多个GPU进行搜索，您需要在train_search.py中更改几行。

删除与GPU ID设置相关的所有行。相反，您可以使用CUDA_VISIBLE_DEVICES设置GPU ID。

在它model = nn.DataParallel(model)之前model = model.cuda()和model = model.module之后添加。

To search with multiple GPUs, you need to change a few lines in train_search.py.

Delete all lines related to GPU ID setting. Instead, you can set GPU ids with CUDA_VISIBLE_DEVICES.

Add model = nn.DataParallel(model) before model = model.cuda() and model = model.module after it.

I added code model = nn.DataParallel(model) to file train_search.py

Traceback (most recent call last):
File "train_search.py", line 469, in
main()
File "train_search.py", line 142, in main
optimizer_a = torch.optim.Adam(model.arch_parameters(),
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 518, in getattr
type(self).name, name))
AttributeError: 'DataParallel' object has no attribute 'arch_parameters'

anhcda-study · 2019-06-21T11:23:50Z

@zihaozhang9 to fix that
Change model.arch_parameters() to model.module.arch_parameters()

JarveeLee · 2019-09-03T08:46:46Z

I did several thing ..
comment set device
#torch.cuda.set_device(args.gpu)
and then

model = nn.DataParallel(model)
model = model.cuda()
model = model.module

then
os.environ["CUDA_VISIBLE_DEVICES"] = '0,1,2,3,4,5,6,7'
parser.add_argument('--batch_size', type=int, default=192, help='batch size')

but still I can not train_search.py on multi gpu, it will still try to overwhelm single gpu then out of memory .... What is wrong here ... ?

I am using pytorch1.0.0 python3.6 and get 4 by print(torch.cuda.device_count())

If I use

model = nn.DataParallel(model)
model = model.cuda()
#model = model.module
and
model.module.arch_parameters()

I will get this error ....

chenxin061 · 2019-09-03T12:50:16Z

The new version of our code now supports multi-GPU search!
@JarveeLee You can try it.
Use CUDA_VISIBLE_DEVICES to assign GPU ids.

JarveeLee · 2019-09-03T14:08:14Z

I see your modification, I did the same to support multi gpu ,what is more ,
in

class MixedOp(nn.Module):
def forward(self, x, weights):
return sum(w * op(x) for w, op in zip(weights, self.m_ops))

shall change to

class MixedOp(nn.Module):
def forward(self, x, weights):
return sum(w.cuda() * op(x.cuda()) for w, op in zip(weights, self.m_ops))

other wise the error that I encountered will still happen...
I am working on a complex awful GPU server , hard to control enviroment , that is my experience...

davidrpugh · 2019-10-27T06:09:03Z

@chenxin061 Thanks for sharing your code! Can you confirm whether you used 8 V100 GPUs with 16 GB of memory per card or 8 V100 GPUs with 32 GB memory per card? Thanks!

chenxin061 · 2019-10-28T06:55:36Z

@davidrpugh The search code is tested on two P100 GPUs and the evaluating code is tested on 8 V100 with 16GB memory.

davidrpugh · 2019-10-28T08:42:28Z

@chenxin061 Thanks! I suspected as much for the V100s. Didn't realize that you used 2 P100s. I was able to complete the search process using CIFAR-10 or CIFAR-100 using a single P100 with 16 GB in between 7-8 hours (as advertised in the paper and README).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-gpu support #5

Multi-gpu support #5

R00Kie-Liu commented May 8, 2019

198808xc commented May 8, 2019

chenxin061 commented May 8, 2019 •

edited

Loading

zihaozhang9 commented Jun 15, 2019

anhcda-study commented Jun 21, 2019

JarveeLee commented Sep 3, 2019 •

edited

Loading

chenxin061 commented Sep 3, 2019

JarveeLee commented Sep 3, 2019

davidrpugh commented Oct 27, 2019

chenxin061 commented Oct 28, 2019

davidrpugh commented Oct 28, 2019

Multi-gpu support #5

Multi-gpu support #5

Comments

R00Kie-Liu commented May 8, 2019

198808xc commented May 8, 2019

chenxin061 commented May 8, 2019 • edited Loading

zihaozhang9 commented Jun 15, 2019

anhcda-study commented Jun 21, 2019

JarveeLee commented Sep 3, 2019 • edited Loading

chenxin061 commented Sep 3, 2019

JarveeLee commented Sep 3, 2019

davidrpugh commented Oct 27, 2019

chenxin061 commented Oct 28, 2019

davidrpugh commented Oct 28, 2019

chenxin061 commented May 8, 2019 •

edited

Loading

JarveeLee commented Sep 3, 2019 •

edited

Loading