Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

NMS not compiled with GPU support #477

Closed
sanjmohan opened this issue Feb 21, 2019 · 8 comments
Closed

NMS not compiled with GPU support #477

sanjmohan opened this issue Feb 21, 2019 · 8 comments

Comments

@sanjmohan
Copy link

sanjmohan commented Feb 21, 2019

❓ Questions and Help

Trying to build and run the repo, and on running I am getting this runtime error:

2019-02-21 02:14:18,430 maskrcnn_benchmark.trainer INFO: Start training
Traceback (most recent call last):
  File "tools/train_net.py", line 174, in <module>
    main()
  File "tools/train_net.py", line 167, in main
    model = train(cfg, args.local_rank, args.distributed)
  File "tools/train_net.py", line 73, in train
    arguments,
  File "/raid/sanjay/maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 66, in do_train
    loss_dict = model(images, targets)
  File "/raid/sanjay/.conda/envs/nightly/lib/python3.6/site-packages/torch/nn/modules/module.py", line 492, in __call__
    result = self.forward(*input, **kwargs)
  File "/raid/sanjay/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 50, in forward
    proposals, proposal_losses = self.rpn(images, features, targets)
  File "/raid/sanjay/.conda/envs/nightly/lib/python3.6/site-packages/torch/nn/modules/module.py", line 492, in __call__
    result = self.forward(*input, **kwargs)
  File "/raid/sanjay/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 159, in forward
    return self._forward_train(anchors, objectness, rpn_box_regression, targets)
  File "/raid/sanjay/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 175, in _forward_train  
    anchors, objectness, rpn_box_regression, targets
  File "/raid/sanjay/.conda/envs/nightly/lib/python3.6/site-packages/torch/nn/modules/module.py", line 492, in __call__
    result = self.forward(*input, **kwargs)
  File "/raid/sanjay/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/inference.py", line 138, in forward   
    sampled_boxes.append(self.forward_for_single_feature_map(a, o, b))
  File "/raid/sanjay/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/inference.py", line 118, in forward_for_single_feature_map
    score_field="objectness",
  File "/raid/sanjay/maskrcnn-benchmark/maskrcnn_benchmark/structures/boxlist_ops.py", line 27, in boxlist_nms
    keep = _box_nms(boxes, score, nms_thresh)
RuntimeError: Not compiled with GPU support (nms at /raid/sanjay/maskrcnn-benchmark/maskrcnn_benchmark/csrc/nms.h:22)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fd5399b58b5 in /raid/sanjay/.conda/envs/nightly/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: nms(at::Tensor const&, at::Tensor const&, float) + 0xd4 (0x7fd52d3313a4 in /raid/sanjay/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so)
frame #2: <unknown function> + 0x14ebf (0x7fd52d33debf in /raid/sanjay/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x11d55 (0x7fd52d33ad55 in /raid/sanjay/maskrcnn-benchmark/maskrcnn_benchmark/_C.cpython-36m-x86_64-linux-gnu.so)
<omitting python frames>

I am using nightly pytorch build (installed with conda install -c pytorch pytorch-nightly cuda92). I downloaded both the nightly pytorch build and the maskrcnn repo today (2/21).

Current versions are:

$ python -c "import torch; print(torch.__version__); print(torch.version.cuda)"
1.0.0.dev20190201
9.0.176
$ conda list | grep torch
cuda92                    1.0                           0    pytorch
libtorch                  0.1.12                  nomkl_0  
pytorch-ignite            0.1.2                     <pip>
pytorch-nightly           1.0.0.dev20190201 py3.6_cuda9.0.176_cudnn7.4.1_0    pytorch
torch                     1.0.1.post2               <pip>
torchvision-nightly       0.2.1                     <pip>

maskrcnn version:

$ git log -1
commit b23eee0cb72af70f4e4a72e73537f0884cfd1cff
Author: Stzpz <[email protected]>
Date:   Wed Feb 20 07:47:10 2019 -0800

    Supported FBNet architecture. (#463)

I have seen other closed issues re: this problem and I have tried to follow the solutions in those issues but am still experiencing this error. I would appreciate any help on this. Thanks!

@LeviViana
Copy link
Contributor

Could you please tell what is the output of python -c "import torch;from torch.utils.cpp_extension import CUDA_HOME;print(CUDA_HOME);print(torch.cuda.is_available())" ?

It should be something like :
/usr/local/cuda
True

@sanjmohan
Copy link
Author

sanjmohan commented Feb 22, 2019

ah! your question helped me realize my mistake. I was not running python setup.py with the CUDA_VISIBLE_DEVICES flag so the cuda code was not being compiled on any gpu. Problem fixed. Thanks for your help!

@fmassa
Copy link
Contributor

fmassa commented Feb 22, 2019

Thanks @LeviViana for helping out figure the reason!

@xixixijie
Copy link

Can you tell me how to run python setup.py with the CUDA_VISIBLE_DEVICES flag?

@sanjmohan
Copy link
Author

sanjmohan commented Mar 4, 2019

On the command line I just ran

$ CUDA_VISIBLE_DEVICES=0 python setup.py <options>

with a 0 because when I ran $ nvidia-smi it showed the ID of the gpu as 0 (and with the options in the INSTALL.md instructions). This fixed the issue for me because running

$ python -c "import torch; print(torch.cuda.is_available())"

printed False but running

$ CUDA_VISIBLE_DEVICES=0 python -c "import torch; print(torch.cuda.is_available())"

printed True.

@shgnag
Copy link

shgnag commented Oct 10, 2019

Thanks @sanjmohan
I'm getting the same error even after setting CUDA_VISIBLE_DEVICES flag. Also, even $ python -c "import torch; print(torch.cuda.is_available())" is returning True for me.
Can you please help me with this?

@ox1d0
Copy link

ox1d0 commented Jan 23, 2020

Hey no sure but I have same issue here ..

1.3.0
10.1.243
(maskrcnn_benchmark) ox1d0@Nexus001:~/maskrcnn-benchmark/demo$ python -c "import torch; print(torch.__version__); print(torch.version.cuda)"
1.3.0
10.1.243
(maskrcnn_benchmark) ox1d0@Nexus001:~/maskrcnn-benchmark/demo$ python -c "import torch; from torch.utils.cpp_extension import CUDA_HOME; print(CUDA_HOME); print(torch.cuda.is_available())"
/usr/local/cuda-10.0
True
(maskrcnn_benchmark) ox1d0@Nexus001:~/maskrcnn-benchmark/demo$ nvidia-smi
Thu Jan 23 13:04:01 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.34       Driver Version: 430.34       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1050    Off  | 00000000:07:00.0  On |                  N/A |
|  0%   41C    P8    N/A / 120W |    110MiB /  1999MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1019      G   /usr/lib/xorg/Xorg                           107MiB |
+-----------------------------------------------------------------------------+
(maskrcnn_benchmark) ox1d0@Nexus001:~/maskrcnn-benchmark/demo$ CUDA_VISIBLE_DEVICES=0 python -c "import torch; print(torch.cuda.is_available())"
True```


how can i recompile maskrcnn-benchmark with cuda enabled ?

@zzy0222
Copy link

zzy0222 commented Mar 18, 2021

ah! your question helped me realize my mistake. I was not running python setup.py with the CUDA_VISIBLE_DEVICES flag so the cuda code was not being compiled on any gpu. Problem fixed. Thanks for your help!

That's quite helpful! I also can't compile before because I don't find CUDA_NAME in my computer, and I reinstall my NIVIDA driver. It works!!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants