tensorRT 6.0.1 performs worse than tensorRT 5.0.1

346842280 · April 23, 2020, 9:55am

Description

I modified sampleUffSSD and wanted to run inference on videos(4 videos) using OpenCV.
I used to modified same example (sampleUffSSD) of tensorRT 5.0.1 to do the same thing. The performance is better using example of tensorRT 5.0.1.(both on Jetson Xavier)

I only modified the input preprocessing and output verifying(enable to write Mat into buffer and use output to draw boxes on images).

I don’t need to add waitKey in tensorRT 5.0.1 when displaying image with imshow(). But I need to add this in example of 6.0.1. Or there would be segmentation fault. Maybe caused by running out of GPU resource?

So I would like to know whether this is caused by tensorRT or opencv.( opencv is 3.4.3 )?

Environment

TensorRT Version: 6.0.1
GPU Type:
Nvidia Driver Version:
CUDA Version: 10.0
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

SunilJB · April 23, 2020, 11:32am

Moving to Jetson Xavier forum so that Jetson team can take a look.

AastaLLL · April 24, 2020, 3:10am

Hi,

Here are three initial suggestion for you:

1. Device performance:
Please remember to maximize the device performance before benchmarking:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

2. DLA or GPU?
Please check your TensorRT engine is running on the GPU or DLA first.
DLA keep adding supported layer so you may find more layers can run on DLA in TensorRT 6.0.

3. OpenCV
The new pre-installed OpenCV 4.2.1 doesn’t enable GPU support.

Thanks.

346842280 · April 24, 2020, 4:34am

Thank you for your reply.

I set the device power mode as highest performance. And also specify using two DLAcore to run the program. The performance can not be as good as tensorRT 5.0.1

If I commented waitKey and asynchronous displayed results, the program would freeze. But this would not happened using tensorRT 5.0.1 with same version OpenCV (3.4.3)

AastaLLL · April 27, 2020, 2:21am

Hi,

To figure out the issue comes from TensorRT or OpenCV, could you try to inference your model with trtexec first?

$ /usr/src/tensorrt/bin/trtexec [your/model/info]

Thanks.

346842280 · April 27, 2020, 2:51am

Thank you for your reply. This is what I got

[03/27/2020-10:47:44] [I] Average over 10 runs is 16.1945 ms (host walltime is 16.2876 ms, 99% percentile time is 17.9824).
[03/27/2020-10:47:44] [I] Average over 10 runs is 16.1945 ms (host walltime is 16.2876 ms, 99% percentile time is 17.9824).
[03/27/2020-10:47:45] [I] Average over 10 runs is 16.1862 ms (host walltime is 16.2646 ms, 99% percentile time is 17.8004).
[03/27/2020-10:47:45] [I] Average over 10 runs is 16.2935 ms (host walltime is 16.3688 ms, 99% percentile time is 17.9064).
[03/27/2020-10:47:45] [I] Average over 10 runs is 16.1357 ms (host walltime is 16.2134 ms, 99% percentile time is 17.4815).
[03/27/2020-10:47:45] [I] Average over 10 runs is 16.0828 ms (host walltime is 16.1567 ms, 99% percentile time is 17.966).
[03/27/2020-10:47:45] [I] Average over 10 runs is 16.095 ms (host walltime is 16.1709 ms, 99% percentile time is 17.9229).
[03/27/2020-10:47:45] [I] Average over 10 runs is 16.2908 ms (host walltime is 16.3666 ms, 99% percentile time is 17.9405).
[03/27/2020-10:47:46] [I] Average over 10 runs is 16.078 ms (host walltime is 16.1491 ms, 99% percentile time is 18.2194).
[03/27/2020-10:47:46] [I] Average over 10 runs is 16.077 ms (host walltime is 16.1466 ms, 99% percentile time is 17.77).
[03/27/2020-10:47:46] [I] Average over 10 runs is 16.0435 ms (host walltime is 16.1205 ms, 99% percentile time is 18.2718).
&&&& PASSED TensorRT.trtexec # ./trtexec -- 
uff=/usr/src/tensorrt/data/ssd/sample_ssd_relu6.uff --output=NMS -- 
uffInput=Input,3,300,300

AastaLLL · April 28, 2020, 5:24am

Hi,

Would you mind to compare the performance between TensorRT v5.0 and v6.0?
Is the performance drop can be reproduced by the trtexec?

If yes, please share uff model with us.
Thanks.

Topic		Replies	Views
TensorRT 6.0.1 performs worse than TensorRT 5.1.6 on Jetson AGX Xavier Jetson AGX Xavier	4	1212	October 18, 2021
No improvements from TensorRT on NVIDIA-AI-IOT/tf_trt_models TensorRT	3	1611	February 21, 2019
Inference time of tensorrt 6.3 is slower than tensorrt 6.0 TensorRT tensorrt , driveos	7	988	October 12, 2021
TensorRT not improving FPS on GTX 1080ti TensorRT	9	2483	November 21, 2018
TensorRT 1.0.0 fails on TX1 with SqueezeNet Jetson TX1	0	509	January 6, 2017
TensorRT (TF-TRT) doesn't improve TF model in GeForce 1060? TensorRT	7	2999	January 18, 2019
Slow object detection speed Xavier AGX 32GB Jetson AGX Xavier tensorrt , tensorflow	6	1293	October 18, 2021
can the nvidia TensorRT accelerate SSD(single shot detector)? Jetson TX2	22	9064	October 18, 2021
Problem for TensorRT 5.1.5.0 and Geforce RTX 2080 TensorRT	1	902	March 5, 2020
Inference Time is not stable TensorRT	10	1843	January 3, 2019

tensorRT 6.0.1 performs worse than tensorRT 5.0.1

Description

Environment

Relevant Files

Steps To Reproduce

Related topics