tensorRT 6.0.1 performs worse than tensorRT 5.0.1

Description

I modified sampleUffSSD and wanted to run inference on videos(4 videos) using OpenCV.
I used to modified same example (sampleUffSSD) of tensorRT 5.0.1 to do the same thing. The performance is better using example of tensorRT 5.0.1.(both on Jetson Xavier)

I only modified the input preprocessing and output verifying(enable to write Mat into buffer and use output to draw boxes on images).

I don’t need to add waitKey in tensorRT 5.0.1 when displaying image with imshow(). But I need to add this in example of 6.0.1. Or there would be segmentation fault. Maybe caused by running out of GPU resource?

So I would like to know whether this is caused by tensorRT or opencv.( opencv is 3.4.3 )?

Environment

TensorRT Version: 6.0.1
GPU Type:
Nvidia Driver Version:
CUDA Version: 10.0
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Moving to Jetson Xavier forum so that Jetson team can take a look.

Hi,

Here are three initial suggestion for you:

1. Device performance:
Please remember to maximize the device performance before benchmarking:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

2. DLA or GPU?
Please check your TensorRT engine is running on the GPU or DLA first.
DLA keep adding supported layer so you may find more layers can run on DLA in TensorRT 6.0.

3. OpenCV
The new pre-installed OpenCV 4.2.1 doesn’t enable GPU support.

Thanks.

Thank you for your reply.

I set the device power mode as highest performance. And also specify using two DLAcore to run the program. The performance can not be as good as tensorRT 5.0.1

If I commented waitKey and asynchronous displayed results, the program would freeze. But this would not happened using tensorRT 5.0.1 with same version OpenCV (3.4.3)

Hi,

To figure out the issue comes from TensorRT or OpenCV, could you try to inference your model with trtexec first?

$ /usr/src/tensorrt/bin/trtexec [your/model/info]

Thanks.

Thank you for your reply. This is what I got

[03/27/2020-10:47:44] [I] Average over 10 runs is 16.1945 ms (host walltime is 16.2876 ms, 99% percentile time is 17.9824).
[03/27/2020-10:47:44] [I] Average over 10 runs is 16.1945 ms (host walltime is 16.2876 ms, 99% percentile time is 17.9824).
[03/27/2020-10:47:45] [I] Average over 10 runs is 16.1862 ms (host walltime is 16.2646 ms, 99% percentile time is 17.8004).
[03/27/2020-10:47:45] [I] Average over 10 runs is 16.2935 ms (host walltime is 16.3688 ms, 99% percentile time is 17.9064).
[03/27/2020-10:47:45] [I] Average over 10 runs is 16.1357 ms (host walltime is 16.2134 ms, 99% percentile time is 17.4815).
[03/27/2020-10:47:45] [I] Average over 10 runs is 16.0828 ms (host walltime is 16.1567 ms, 99% percentile time is 17.966).
[03/27/2020-10:47:45] [I] Average over 10 runs is 16.095 ms (host walltime is 16.1709 ms, 99% percentile time is 17.9229).
[03/27/2020-10:47:45] [I] Average over 10 runs is 16.2908 ms (host walltime is 16.3666 ms, 99% percentile time is 17.9405).
[03/27/2020-10:47:46] [I] Average over 10 runs is 16.078 ms (host walltime is 16.1491 ms, 99% percentile time is 18.2194).
[03/27/2020-10:47:46] [I] Average over 10 runs is 16.077 ms (host walltime is 16.1466 ms, 99% percentile time is 17.77).
[03/27/2020-10:47:46] [I] Average over 10 runs is 16.0435 ms (host walltime is 16.1205 ms, 99% percentile time is 18.2718).
&&&& PASSED TensorRT.trtexec # ./trtexec -- 
uff=/usr/src/tensorrt/data/ssd/sample_ssd_relu6.uff --output=NMS -- 
uffInput=Input,3,300,300

Hi,

Would you mind to compare the performance between TensorRT v5.0 and v6.0?
Is the performance drop can be reproduced by the trtexec?

If yes, please share uff model with us.
Thanks.