Different result of rangenet_lib #9

TT22TY · 2019-11-27T01:41:10Z

Hello, I have tested rangenet_lib, the result is as follows, which is different to the demo on the website(https://github.com/PRBonn/rangenet_lib). I use the pre_trained model provided on the website, I wonder why it is different and wrong. Could you please give some suggestions to find out the reason? Besides , I also opened an issue under the SuMa++
（https://github.com/PRBonn/semantic_suma/issues/6#issue-525720509）

Thank you very much.

jbehley · 2019-12-03T06:16:45Z

Hi @TT22TY,

we are currently investigating the issue. Currently, we only know that it might occur on Nvidia RTX graphic cards.

Do you also have an RTX model?

(sorry for the late reply.)

TT22TY · 2019-12-04T05:49:01Z

Thanks for your reply. :) @jbehley
I use GeForce RTX 2080,Cuda 10.0, V10.0.130, Cudnn 7.5.0, Driver Version: 410.48. I am not sure what does "an RTX model" mean？
Thank you very much.

jbehley · 2019-12-04T07:16:25Z

Okay, this seems to be really an issue with the Geforce RTX line. I experienced similar problems with my RTX 2070 and therefore we have to investigate what's the reason.

The non-TensorRT part works under Pytorch with an RTX 2080 Ti, since this is the card we used for our experiments.

jbehley · 2019-12-05T07:27:31Z

It seems to be a problem of the version of TensorRT and RTK graphic cards. Everything works as expected with TensorRT 5.1:

Hope that resolves your issue. We also changed the README.md to account for this requirement.

TT22TY · 2019-12-05T09:35:01Z

Thank you very much! @jbehley

In fact ,I use TensorRT 5.1.5.0, so I wonder which version do you installed?
Besides, since the version of TensorRT is related to cuda and cudnn, could you please tell me the exact version of the cuda and cudnn you use? Mine is Cuda 10.0, V10.0.130, Cudnn 7.5.0, Driver Version: 410.48. TensorRT 5.1.5.0.

Thank you!

Chen-Xieyuanli · 2019-12-05T20:58:41Z

Hi @TT22TY ,

I've tested on three different setups.

Quadro P4000, Cuda 10.0.130, Cudnn 7.6.0, Driver 418.56, TensorRT 5.1.5.0
GeForce GTX 1080 Ti, Cuda 9.0.176, Cudnn 7.3.1, Driver 390.116, TensorRT 5.1.5.0
GeForce GTX 1050 Ti, Cuda 10.0.130, Cudnn 7.5.0, Driver 410.79, TensorRT 5.1.5.0

I hope it helps.

jbehley · 2019-12-05T22:04:09Z

Hi @TT22TY,

and I run it under Ubuntu 18.04 using the following combination:

GeForce RTX 2070, Cuda V10.1.243, Cudnn 7.6.3, Driver 418.87.01, TensorRT 5.1.5.0-ga

For TensorRT I downloaded: nv-tensorrt-repo-ubuntu1804-cuda10.1-trt5.1.5.0-ga-20190427_1-1_amd64.deb

TT22TY · 2019-12-06T01:43:07Z

@jbehley @Chen-Xieyuanli ,Thank you very much, I am not sure which part lead to the wrong result, mine is
Ubuntu 16.04, GeForce RTX 2080, Cuda V10.0.130, Cudnn 7.5.0, Driver Version: 410.48 ,TensorRT 5.1.5.0.

For TensorRT I downloaded: Tensorrt-TensorRT-5.1.5.0. ubuntu-16.04.5x86_64-gnu.cuda-10.0.cudnn7.5.tar.gz

Do you have any suggestions for which one should I update, I will try it again.

Thank you very much! :)

Chen-Xieyuanli · 2019-12-06T07:13:19Z

Hi @TT22TY , since you have the RTX graphic card, I would suggest you following @jbehley's setup.

Please make sure that you build the system with the expected setup, because different versions of Cuda, Cudnn or TensorRT could exist at the same time. When compiling the system, It might still connect to the wrong versions.

TT22TY · 2019-12-06T09:40:36Z

@Chen-Xieyuanli Thank you very much, I will try. :)

Ewaolx · 2019-12-08T00:55:43Z

In reference to issue #6,

Below is my hardware setup.
GeForce GTX 1050
Cuda: 10.1, V10.1.243
Cudnn: 7
TensorRT: 6.0.1

Just wanted to know, if this setup is compatible or should I switch to other setup.
Thanks!

jbehley · 2019-12-09T07:47:13Z

We currently only experienced problem with RTX models. However, I would currently suggest to use TensorRT 5.1, since this is the version we also tested the most and have had running on many other systems.

If you go for TensorRT 6.0, we cannot guarantee that everything works as expected.

Ewaolx · 2019-12-21T03:19:09Z

Thanks for the update.

I tried running with the TensorRT 5.1, but still getting the same result. Also, I tried the .bin file provided in the example folder and had no issues getting the expected result.

It would be great, if possible, you guys can try this pcd file and see if getting the same results as mine.
Thank you!

LongruiDong · 2019-12-26T08:27:43Z

Hi ,
I have also get similiar bad result of example scan:

my envs:

Ubuntu 5.4.0-6ubuntu1~16.04.12
CUDA Version 10.0.130  cudnn 7.5.1
TITAN RTX 24g  
Nvidia driver  440.44

and TensorRT information:

ii  libnvinfer-dev                                              5.1.5-1+cuda10.0                                      amd64        TensorRT development libraries and headers
ii  libnvinfer-samples                                          5.1.5-1+cuda10.0                                      all          TensorRT samples and documentation
ii  libnvinfer5                                                 5.1.5-1+cuda10.0                                      amd64        TensorRT runtime libraries
ii  tensorrt                                                    5.1.5.0-1+cuda10.0                                    amd64        Meta package of TensorRT

TT22TY · 2020-01-08T11:17:47Z

Hi，
@jbehley
I just try on Xavier with TensorRT 5.1 , it does work correctly, while try it on TX2 with TensorRT 6, it is incorrect as before. It seem that the GPU and TensorRT version really matters.

Besides, I am wondering when will you release the ROS interface for lidar-bonnetal~

Thank you very much!

LongruiDong · 2020-01-09T09:08:13Z

Hi ,
I have also get similiar bad result of example scan:

my envs:

Ubuntu 5.4.0-6ubuntu1~16.04.12
CUDA Version 10.0.130  cudnn 7.5.1
TITAN RTX 24g  
Nvidia driver  440.44

and TensorRT information:

ii  libnvinfer-dev                                              5.1.5-1+cuda10.0                                      amd64        TensorRT development libraries and headers
ii  libnvinfer-samples                                          5.1.5-1+cuda10.0                                      all          TensorRT samples and documentation
ii  libnvinfer5                                                 5.1.5-1+cuda10.0                                      amd64        TensorRT runtime libraries
ii  tensorrt                                                    5.1.5.0-1+cuda10.0                                    amd64        Meta package of TensorRT

Hi,
still not solved on my Titan RTX GPU although I tried using

Cuda V10.1.243, Cudnn 7.6.3, Driver 440.44, TensorRT 5.1.5.0-ga

I have also test on Driver 418.87(other same as above line), However meet run time error #15 again...

If I train a new model from scratch on semantic kitti in the current environment, will this issue be solved,
By doing so , Whether it can achieve the expected semantic segmentation results?

Thanks a lot~

jbehley · 2020-01-09T10:12:57Z

You can always run the model without TensorRT (see lidar-bonnetal for the pytorch models), since this worked reliably on all our systems with different GPUs.

We currently cannot do anything to solve or give advise, since it seems to be a problem with the RTX and specific versions. I don't know if you can turn-off some optimizations (fp16, etc.) and this hurts the result.

@TT22TY: Good to hear that it works partially.

You can also try to open an issue with Nvidida and TensorRT (https://github.com/NVIDIA/TensorRT/issues). They might have some suggestions.

Chen-Xieyuanli · 2020-01-09T11:41:17Z

Hi @TT22TY, I'm glad that it works for you now, and we now also know that it's quite sensitive to the GPU and TensorRT versions. Andres @tano297 may later release a better version of C++ and ROS interface for LiDAR-bonnetal.

Hi @LongruiDong, I'm sorry for this repo doesn't work properly for you. A more complete development version with different model formats may release later by Andres @tano297.

Since we are almost clear that the problems are caused by the GPUs and TensorRT, which we cannot do anything, I will therefore close this issue.

If there are other problems relating to this, please feel free to ask me reopening this issue.

Claud1234 · 2020-04-28T17:14:12Z

Hi @TT22TY,

and I run it under Ubuntu 18.04 using the following combination:
1. GeForce RTX 2070, Cuda V10.1.243, Cudnn 7.6.3, Driver 418.87.01, TensorRT 5.1.5.0-ga
For TensorRT I downloaded: nv-tensorrt-repo-ubuntu1804-cuda10.1-trt5.1.5.0-ga-20190427_1-1_amd64.deb

Hi.
Are you sure you get the correct result based on this configuration?

We test this configuration in a fresh machine, exactly followed your config. There is output, but it is not the proper segment result.

Here is our system config:

Any other things I am missing?

Thanks for help

TT22TY · 2020-05-06T10:38:40Z

Hi, @Claud1234
I did not get the correct result on my machine(Ubuntu 16.04, GeForce RTX 2070, Cuda V10.0.130, Cudnn 7.5.0, Driver Version: 410.48 ,TensorRT 5.1.5.0.), while I get the correct result on Xavier .
Since I do not have a machine with RTX 2070, I have no idea how it performs on this configuration.
Sorry for that, and I am also looking forward to solutions to this issue.

kuzen · 2020-06-10T08:36:30Z

I also get a differernt result.
But after converting the onnx opset version from 7 to 9, and optimizing this model, it can work normally

after converting and optimizing the model

this is my convert code

import onnx
from onnx import version_converter, optimizer

model_path = './model.onnx'
original_model = onnx.load(model_path)
converted_model = version_converter.convert_version(original_model, 9)
optimized_model = optimizer.optimize(converted_model)

onnx.save(optimized_model, './model.onnx')

Chen-Xieyuanli · 2020-06-10T09:04:24Z

Hey @kuzen, thank you very much for your feedback. We now find a new solution to the incompatible problem! :-)

TT22TY · 2020-06-11T09:47:16Z

Hi @kuzen , thank you for sharing your solution, could you please also share you hardware setup, for example, the Cuda， and tensorrt version, thank you very much~

kuzen · 2020-06-11T10:52:41Z

Hi @kuzen , thank you for sharing your solution, could you please also share you hardware setup, for example, the Cuda， and tensorrt version, thank you very much~

Hi, this is my software version
Titan RTX, Ubuntu 16.04, Cuda V10.1, Driver 430.50, TensorRT 5.1.5-ga, libcublas 10.1.0.105-1, Cudnn 7.6.4.38-1

Claud1234 · 2020-06-12T14:22:24Z

I also get a differernt result.
But after converting the onnx opset version from 7 to 9, and optimizing this model, it can work normally

after converting and optimizing the model

this is my convert code
import onnx
from onnx import version_converter, optimizer

model_path = './model.onnx'
original_model = onnx.load(model_path)
converted_model = version_converter.convert_version(original_model, 9)
optimized_model = optimizer.optimize(converted_model)

onnx.save(optimized_model, './model.onnx')

I kind of do not understand, because in original model(file 'model.onnx'), the onnx opset version already is 9. I do not know why you say it is 7?

kuzen · 2020-06-14T03:09:42Z

I kind of do not understand, because in original model(file 'model.onnx'), the onnx opset version already is 9. I do not know why you say it is 7?

Thanks.
Conversion is not necessary. Using only the optimizer is enough.

TT22TY · 2020-06-15T05:50:01Z

@kuzen Thank you very much. But it does not work for me, and I wonder whether it works for you @Claud1234 @LongruiDong

Claud1234 · 2020-06-15T14:54:30Z

@kuzen Thank you very much. But it does not work for me, and I wonder whether it works for you @Claud1234 @LongruiDong

@kuzen @TT22TY I have tried the approach to convert the opset version and optimize the ONNX model. But a very weird thing is I only succeed to get correct result for ONE time! After I delete the '.trt' file then re-try it, the results are always same and wrong! I did not change anything for dependencies at all.

@kuzen Would you please re-try or explain more about the whole thing? Can you get the correct result every time if delete the '.trt' file? I have unified the version of libcublas and CUDA as you recommended, but I am not sure whether this is necessary.

balajiravichandiran · 2020-06-16T08:49:53Z

Ubuntu 18.04, RTX 2060, cuda 10.1, tensorrt 5.1

With FP16 set true: Result is wrong.

With FP16 set false. Result looks normal
netTensorrt.cpp line 508 (builder->setFp16Mode(false);)

Chen-Xieyuanli · 2020-06-16T10:34:25Z

@balajiravichandiran Thank you very much for the feedback!

TT22TY · 2020-06-16T11:01:34Z

@balajiravichandiran thank you very much, it works! amazing~

aprilliuwei · 2021-12-16T12:38:59Z

@balajiravichandiran @Chen-Xieyuanli @kuzen Hello, I can use the pre-trained model to get the correct prediction results, but when I use my own model to test, the above problems still occur. I tried the two methods you provided but they didn’t work for me. , Do you have any suggestions, thank you very much.
pre-trained model：

my own model：

Chen-Xieyuanli · 2021-12-16T13:05:05Z

Hey @balajiravichandiran, thanks for using our code.

Are the range image results also from your method or rangenet++? They look good.

This issue seems to be visited very frequently. I will then keep it open for more joining the discussion.

aprilliuwei · 2021-12-16T13:18:33Z

The left side of Figure 2 is the result of the rangenet_lib test, and the right side is the prediction result output by lidar-bonnetal during the training process. Figure 3 is the result of the rangenet_lib test with the model after I trained kitti. It looks terrible.So I don't know why the pre-trained model has good results, but the effect of the model trained by myself is so bad.

Chen-Xieyuanli · 2021-12-16T13:24:19Z

The left side of Figure 2 is the result of the rangenet_lib test, and the right side is the prediction result output by lidar-bonnetal during the training process. Figure 3 is the result of the rangenet_lib test with the model after I trained kitti. It looks terrible.So I don't know why the pre-trained model has good results, but the effect of the model trained by myself is so bad.

Okey, I got it. The problem is the model trained by you is not as good as the one trained by us.

It is then not the problem of rangenet_lib. You may check your setup again and raise an issue in RangeNet++ repo

Natsu-Akatsuki · 2022-02-13T03:43:00Z

@Chen-Xieyuanli Hi, I meet the same problem (different result of ranglib). This is my strategy. Due to FP16 with insufficient dynamic range, some intermediate layer outputs could be represented in FP16 precision with overflow/underflow. I find the layer, and set it mandatorily FP32. The problem is workarounded. (PS: The code is refractored to apply for ubuntu20.04, tensorrt 8.2, but I think the tensorrt's version is not a big deal)

Chen-Xieyuanli · 2022-02-13T08:58:00Z

Hey @Natsu-Akatsuki, thanks a lot for your feedback!

- PRBonn#9 (comment)

Chen-Xieyuanli mentioned this issue Dec 5, 2019

Different semantic results of suma++ PRBonn/semantic_suma#6

Closed

Chen-Xieyuanli closed this as completed Jan 9, 2020

Chen-Xieyuanli mentioned this issue Apr 16, 2020

Not the same as video demo's PRBonn/semantic_suma#20

Closed

Chen-Xieyuanli reopened this May 6, 2020

Chen-Xieyuanli mentioned this issue May 20, 2020

Exporting PyTorch models to ONNX #7

Closed

TT22TY closed this as completed May 25, 2020

Chen-Xieyuanli mentioned this issue Jun 10, 2020

can not getting expected results in example #30

Closed

TT22TY reopened this Jun 16, 2020

TT22TY closed this as completed Jun 16, 2020

jbehley mentioned this issue Jul 27, 2020

the effect of semantic segmentation PRBonn/semantic_suma#26

Closed

Chen-Xieyuanli mentioned this issue Sep 28, 2021

SyntaxError: invalid syntax When build library #37

Closed

Chen-Xieyuanli reopened this Dec 16, 2021

Chen-Xieyuanli added the good first issue Good for newcomers label Jan 23, 2022

f-fl0 added a commit to f-fl0/rangenet_lib that referenced this issue May 3, 2022

Disable FP16 by default

394d3bb

- PRBonn#9 (comment)

Natsu-Akatsuki mentioned this issue Jun 29, 2023

推理速度问题 Natsu-Akatsuki/RangeNet-TensorRT#8

Closed

changh95 mentioned this issue Mar 12, 2024

Fix errors and make a Dockerfile PRBonn/semantic_suma#66

Merged

Different result of rangenet_lib #9

Different result of rangenet_lib #9

Comments

TT22TY commented Nov 27, 2019 • edited Loading

jbehley commented Dec 3, 2019

TT22TY commented Dec 4, 2019

jbehley commented Dec 4, 2019

jbehley commented Dec 5, 2019 • edited Loading

TT22TY commented Dec 5, 2019 • edited Loading

Chen-Xieyuanli commented Dec 5, 2019

jbehley commented Dec 5, 2019

TT22TY commented Dec 6, 2019 • edited Loading

Chen-Xieyuanli commented Dec 6, 2019

TT22TY commented Dec 6, 2019 • edited Loading

Ewaolx commented Dec 8, 2019

jbehley commented Dec 9, 2019

Ewaolx commented Dec 21, 2019

LongruiDong commented Dec 26, 2019

TT22TY commented Jan 8, 2020

LongruiDong commented Jan 9, 2020

jbehley commented Jan 9, 2020

Chen-Xieyuanli commented Jan 9, 2020

Claud1234 commented Apr 28, 2020

TT22TY commented May 6, 2020 • edited Loading

kuzen commented Jun 10, 2020

Chen-Xieyuanli commented Jun 10, 2020

TT22TY commented Jun 11, 2020

kuzen commented Jun 11, 2020

Claud1234 commented Jun 12, 2020

kuzen commented Jun 14, 2020

TT22TY commented Jun 15, 2020

Claud1234 commented Jun 15, 2020 • edited Loading

balajiravichandiran commented Jun 16, 2020

Chen-Xieyuanli commented Jun 16, 2020

TT22TY commented Jun 16, 2020

aprilliuwei commented Dec 16, 2021

Chen-Xieyuanli commented Dec 16, 2021

aprilliuwei commented Dec 16, 2021

Chen-Xieyuanli commented Dec 16, 2021 • edited Loading

Natsu-Akatsuki commented Feb 13, 2022

Chen-Xieyuanli commented Feb 13, 2022

TT22TY commented Nov 27, 2019 •

edited

Loading

jbehley commented Dec 5, 2019 •

edited

Loading

TT22TY commented Dec 5, 2019 •

edited

Loading

TT22TY commented Dec 6, 2019 •

edited

Loading

TT22TY commented Dec 6, 2019 •

edited

Loading

TT22TY commented May 6, 2020 •

edited

Loading

Claud1234 commented Jun 15, 2020 •

edited

Loading

Chen-Xieyuanli commented Dec 16, 2021 •

edited

Loading