TensorRT Inconsistent Inference Performance with Python and Trtexec

Tungdil99 · April 2, 2024, 6:17am

Description

I converted a pretrained pytorch model to a tensorrt engine via onnx. When I run inference on this engine with Python API and Trtexec CLI the performance results I get are inconsistent. I’m providing the script and commands I used down below.

The trtexec yields this result indicating that the engine is suppose to run at around 58 FPS but when using the Python API the FPS performance I measure is not even close to that which is around 13 FPS.

I’m having a problem building a high performance inference pipeline. When I use trtexec I’m unable to recover the resultant image from the exported JSON file (see my other post here) and when I’m using the Python API I’m unable to achieve high performance. Can you help me with that.

[I] === Performance summary ===
[I] Throughput: 57.9849 qps
[I] Latency: min = 17.447 ms, max = 22.3547 ms, mean = 17.6472 ms, median = 17.4985 ms, percentile(90%) = 17.572 ms, percentile(95%) = 18.8232 ms, percentile(99%) = 21.3503 ms
[I] Enqueue Time: min = 3.90723 ms, max = 6.87085 ms, mean = 5.02085 ms, median = 5.04135 ms, percentile(90%) = 5.14331 ms, percentile(95%) = 5.22873 ms, percentile(99%) = 6.62476 ms
[I] H2D Latency: min = 0.10498 ms, max = 0.183746 ms, mean = 0.12631 ms, median = 0.118652 ms, percentile(90%) = 0.150391 ms, percentile(95%) = 0.165649 ms, percentile(99%) = 0.175781 ms
[I] GPU Compute Time: min = 16.967 ms, max = 21.9099 ms, mean = 17.1498 ms, median = 17.0004 ms, percentile(90%) = 17.0549 ms, percentile(95%) = 18.2924 ms, percentile(99%) = 20.8427 ms
[I] D2H Latency: min = 0.223145 ms, max = 0.38623 ms, mean = 0.371093 ms, median = 0.372025 ms, percentile(90%) = 0.374756 ms, percentile(95%) = 0.375122 ms, percentile(99%) = 0.378174 ms
[I] Total Host Walltime: 3.05252 s
[I] Total GPU Compute Time: 3.03552 s

Environment

TensorRT Version: 8.5.2-1+cuda11.4
GPU Type: Jetson AGX Orin
Nvidia Driver Version: NVIDIA UNIX Open Kernel Module for aarch64 35.4.1
CUDA Version: 11.4
CUDNN Version: 8.6
Operating System + Version: 5.10.120-tegra - Ubuntu 20.04
Python Version (if applicable): 3.8.2
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.1.0a0+41361538.nv23.06
Baremetal or Container (if container which image + tag):

Relevant Files

uiu-net-11-fp16.zip (61.0 MB)
infer-trt.zip (1.9 KB)

Steps To Reproduce

Please include:

Run “infer-trt.py” and observe FPS performance
Run “trtexec --loadEngine=uiu-net-11-fp16.engine” and observe FPS performance

Topic		Replies	Views
Tensorrt is slower than pytorch TensorRT	2	2183	September 15, 2021
Inference Speed Jetson Xavier NX pytorch	6	843	April 12, 2023
Low performance in fp32 model on Xaiver Jetson AGX Xavier jetson-inference	4	508	June 16, 2022
Difference between running the inference with trtexec and tensorrt python API Jetson AGX Xavier tensorrt , python	4	2954	October 18, 2021
Performance Discrepancy - Python API vs. trtexec on Jetson AGX Orin Board Jetson AGX Orin jetson-inference	8	684	July 10, 2023
TensorRt inference is taking 1.5 sec to inference a single frame.i want to speed up my inference.How can i do that TensorRT tensorrt , cuda , jetson-nano	3	743	March 13, 2023
Performance Bottleneck in TensorRT Inference on Jetson with Semantic Segmentation Model (DWConv) TensorRT tensorrt , jetson-inference , cudnn	1	393	January 31, 2024
TensorRT inference slower than PyTorch, different tactics are being selected TensorRT tensorrt	1	1097	November 27, 2023
Python engine inference API bug TensorRT	2	475	February 13, 2023
Inference is so slow with torch1.6 Jetson Xavier NX nvbugs , pytorch	12	3528	October 23, 2020

TensorRT Inconsistent Inference Performance with Python and Trtexec

Description

Environment

Relevant Files

Steps To Reproduce

Related topics