Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation #201

Open
Bananavision opened this issue Nov 3, 2021 · 5 comments · May be fixed by #204

Comments

@Bananavision
Copy link

Running a custom data set on torch v 1.5 gives the following

Traceback (most recent call last): File "/home/usr/project/solov2/SOLO/tools/train.py", line 125, in <module> main() File "/home/usr/project/solov2/SOLO/tools/train.py", line 115, in main train_detector( File "/home/usr/project/solov2/SOLO/mmdet/apis/train.py", line 107, in train_detector _non_dist_train( File "/home/usr/project/solov2/SOLO/mmdet/apis/train.py", line 299, in _non_dist_train runner.run(data_loaders, cfg.workflow, cfg.total_epochs) File "/home/usr/project/solov2/lib/python3.9/site-packages/mmcv-0.2.16-py3.9-linux-x86_64.egg/mmcv/runner/runner.py", line 364, in run epoch_runner(data_loaders[i], **kwargs) File "/home/usr/project/solov2/lib/python3.9/site-packages/mmcv-0.2.16-py3.9-linux-x86_64.egg/mmcv/runner/runner.py", line 275, in train self.call_hook('after_train_iter') File "/home/usr/project/solov2/lib/python3.9/site-packages/mmcv-0.2.16-py3.9-linux-x86_64.egg/mmcv/runner/runner.py", line 231, in call_hook getattr(hook, fn_name)(self) File "/home/usr/project/solov2/lib/python3.9/site-packages/mmcv-0.2.16-py3.9-linux-x86_64.egg/mmcv/runner/hooks/optimizer.py", line 19, in after_train_iter runner.outputs['loss'].backward() File "/home/usr/project/solov2/lib/python3.9/site-packages/torch/_tensor.py", line 307, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/usr/project/solov2/lib/python3.9/site-packages/torch/autograd/__init__.py", line 154, in backward Variable._execution_engine.run_backward( RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2, 128, 336, 200]], which is output 0 of ReluBackward0, is at version 3; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Is this a specific error for torch >= 1.5? What is the current workaround?

Thanks

@haqishen haqishen linked a pull request Nov 11, 2021 that will close this issue
@haqishen
Copy link

Hi, I've got the same issue with you.

My env:
cuda 11.3
pytorch 1.10

I fix the bug by this modification, please have try ;)
#204

@IremYoldas
Copy link

I think this is a specific error for torch >= 1.5. I got this error too and I was already using CUDA 10.1 but I downgrade Pytorch version to 1.4. I fixed the bug that way.
PS: Before downgrade Pytorch version I tried many things. (Basically I did everything that I found. For instance, I changed nn.relu(inplace=True) to False). I added .step after ReLu backward etc.)

@abhiagwl4262
Copy link

How to handle this if I don't want to downgrade my pytorch version ?

@Sue-Tang-Up
Copy link

I have the same question, so how to handle this if I don't want to downgrade my pytorch version ?

@yd7am
Copy link

yd7am commented Mar 28, 2024

I have the same question, so how to handle this if I don't want to downgrade my pytorch version ?
x = self.activate(x) -> x = self.activate(x).clone()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants