You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a nn.Sequential model, and I can run it without pippeline parallel.
I try to run it with pipeline=2, but I got following error:
Traceback (most recent call last):
File "test_pp.py", line 197, in <module>
pipelinable.policy = "uniform"
File "/mnt/user/.local/lib/python3.8/site-packages/colossalai/trainer/_trainer.py", line 321, in fit
self._train_epoch(
File "/mnt/user/.local/lib/python3.8/site-packages/colossalai/trainer/_trainer.py", line 181, in _train_epoch
logits, label, loss = self.engine.execute_schedule(
File "/mnt/user/.local/lib/python3.8/site-packages/colossalai/engine/_base_engine.py", line 201, in execute_schedule
output, label, loss = self._schedule.forward_backward_step(self, data_iter, **kwargs)
File "/mnt/user/.local/lib/python3.8/site-packages/colossalai/engine/schedule/_pipeline_schedule.py", line 393, in forward_backward_step
output_obj = self._forward_step(engine,
File "/mnt/user/.local/lib/python3.8/site-packages/colossalai/engine/schedule/_pipeline_schedule.py", line 251, in _forward_step
output_obj = self._call_engine(engine.model, data)
File "/mnt/user/.local/lib/python3.8/site-packages/colossalai/engine/schedule/_pipeline_schedule.py", line 178, in _call_engine
return model(data)
File "/mnt/user/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/user/.local/lib/python3.8/site-packages/colossalai/amp/naive_amp/naive_amp.py", line 152, in forward
out = self.model(*args, **kwargs)
File "/mnt/user/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/user/.local/lib/python3.8/site-packages/colossalai/pipeline/pipelinable.py", line 249, in forward
input_tensor = call_module(module, args=(input_tensor,), kwargs=module_kwargs)
File "/mnt/user/.local/lib/python3.8/site-packages/colossalai/pipeline/utils.py", line 254, in call_module
return module(*args_needed, **kwargs)
File "/mnt/user/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/lustre/yanghuaan1/sdx_tmp/clip2/core/models/syncbn_helper.py", line 95, in forward
return super(OnlySyncStatsBN, self).forward(input)
File "/mnt/user/.local/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 135, in forward
self._check_input_dim(input)
File "/mnt/user/.local/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 407, in _check_input_dim
raise ValueError("expected 4D input (got {}D input)".format(input.dim()))
ValueError: expected 4D input (got 2D input)
code:
with pipelinable:
model = create_model()
pipelinable.policy = "balanced"
pipelinable.to_layer_list()
model = pipelinable.partition(NUM_CHUNKS, gpc.pipeline_parallel_size, gpc.get_local_rank(ParallelMode.PIPELINE))
Is it possible that the error is caused by the pipeline partition?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I have a nn.Sequential model, and I can run it without pippeline parallel.
I try to run it with pipeline=2, but I got following error:
code:
Is it possible that the error is caused by the pipeline partition?
Beta Was this translation helpful? Give feedback.
All reactions