Inference time of tensorrt 6.3 is slower than tensorrt 6.0

eri.kasamatsu · April 21, 2021, 4:16am

Description

Hi,
We measured inference time with our original yolov3 model but the inference time of TensorRT 6.3 is slower than TensorRT 6.0.
TensorRT 6.0 : average 11ms
TensorRT 6.3 : average 30ms

We can also see the same problem using sample program(sampleOnnxMNIST) which is included in TensorRT package.
Please refer to the attatched profile results.
What should we do to make as fast as TensorRT 6.0?

Profile result of TensorRT 6.0
profileresult_trt_6_0.qdrep (827.2 KB)

Profile result of TensorRT 6.3
profileresult_trt_6_3.qdrep (925.3 KB)

Environment

Envoronment for TensorRT 6.0
TensorRT Version: ver6.0.1.8(official release)
GPU Type: GTX1080
Nvidia Driver Version: 440.118.02
CUDA Version: 10.2
CUDNN Version: 7.6.5.32
Operating System + Version: Ubuntu18.04
Python Version (if applicable): 3.6.9
Baremetal or Container (if container which image + tag): 19.12-tf1-py3

Envoronment for TensorRT 6.3
TensorRT Version: ver6.3.1(included in DriveOS 5.2)
GPU Type: GTX1080
Nvidia Driver Version: 440.118.02
CUDA Version: 10.2
CUDNN Version: 7.6.6.184-1
Operating System + Version: Ubuntu18.04
Python Version (if applicable): 3.6.9

Relevant Files

profileresult_trt_6_0.qdrep (827.2 KB)
profileresult_trt_6_3.qdrep (925.3 KB)

Steps To Reproduce

Just execute sampleOnnxMnist which is included in each TensorRT package.

NVES · April 21, 2021, 4:37am

Hi,
Request you to share the model, script, profiler and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer below link for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-722/best-practices/index.html#measure-performance

Thanks!

eri.kasamatsu · April 21, 2021, 5:48am

Hi @NVES

We use your sample code and model so that you can reproduce.

We modified sampleOnnxMNIST.cpp to measure inference time.
Modified files are below.

sampleOnnxMNIST_trt60.cpp (13.3 KB)
sampleOnnxMNIST_trt63.cpp (13.7 KB)

And I’ve already attached profiler output into the first post.(profileresult_trt_6_0.qdrep and profileresult_trt_6_3.qdrep]
Please refer to them.

We’ll try running trtexec while waiting for your investigation.

Thanks

spolisetty · April 21, 2021, 5:59pm

Hi @eri.kasamatsu,

We recommend you to try on latest TensorRT version (official release). Please let us know if you still face this issue.

Thank you.

eri.kasamatsu · April 21, 2021, 10:58pm

Hi @spolisetty

We’ll port our model to Drive AGX (DriveOS 5.2) soon.
That’s why we use TensorRT 6.3 not TensorRT 7.

Thanks

eri.kasamatsu · April 22, 2021, 10:06am

Hi

I’ve run sampleOnnxMNIST with trtexec and attached the results.
You can see the layer time total runtime of TesorRT 6.3 is much slower than TensorRT 6.0.

trtexec_trt6_0.txt (5.1 KB)

trtexec_trt6_3.txt (6.2 KB)

Furthermore, layers are so different between TensorRT 6.0 and TensorRT 6.3.
Does it cause performance degradation?

Thanks

spolisetty · May 5, 2021, 10:16am

Hi @eri.kasamatsu,

Sorry for the delayed response. 6.0 uses implicit batch and 6.3 use explicit batch for onnx parser. There had been some issues in parser which are resolved in further versions.

Thank you.

Topic		Replies	Views
ONNX engine initialisation/build takes significantly longer in TensorRT 8.5 vs 8.0 TensorRT tensorrt , performance , benchmarks	10	1343	August 20, 2024
Tensorrt inference slower than tensorflow TensorRT	3	486	November 27, 2020
Inference time increases in for loop TensorRT	2	371	February 6, 2023
Inference result gets worse when converting pytorch model to TensorRT model TensorRT pytorch	6	1142	January 19, 2022
P6000 TensorRT too slow and the serialized fp16-model size is not as expected TensorRT tensorrt	1	459	April 4, 2023
Inference time mismatch between same configuration on Windows and Ubuntu TensorRT tensorrt , windows-driver	2	660	September 27, 2023
Building a engine takes too long TensorRT	13	3325	December 8, 2022
TensorRT: slowdown for buildSerializedNetwork() TensorRT	6	965	April 1, 2023
TensorRt inference is taking 1.5 sec to inference a single frame.i want to speed up my inference TensorRT tensorrt , jetson-inference , jetson-nano	1	914	March 13, 2023
Inference time increases rapidly when set a high resolution input image TensorRT tensorrt , cuda , ubuntu	1	804	September 13, 2023

Inference time of tensorrt 6.3 is slower than tensorrt 6.0

Description

Environment

Relevant Files

Steps To Reproduce

Related topics