TensorRT Inconsistent Inference Performance with Python and Trtexec

Description

I converted a pretrained pytorch model to a tensorrt engine via onnx. When I run inference on this engine with Python API and Trtexec CLI the performance results I get are inconsistent. I’m providing the script and commands I used down below.

The trtexec yields this result indicating that the engine is suppose to run at around 58 FPS but when using the Python API the FPS performance I measure is not even close to that which is around 13 FPS.

I’m having a problem building a high performance inference pipeline. When I use trtexec I’m unable to recover the resultant image from the exported JSON file (see my other post here) and when I’m using the Python API I’m unable to achieve high performance. Can you help me with that.

[I] === Performance summary ===
[I] Throughput: 57.9849 qps
[I] Latency: min = 17.447 ms, max = 22.3547 ms, mean = 17.6472 ms, median = 17.4985 ms, percentile(90%) = 17.572 ms, percentile(95%) = 18.8232 ms, percentile(99%) = 21.3503 ms
[I] Enqueue Time: min = 3.90723 ms, max = 6.87085 ms, mean = 5.02085 ms, median = 5.04135 ms, percentile(90%) = 5.14331 ms, percentile(95%) = 5.22873 ms, percentile(99%) = 6.62476 ms
[I] H2D Latency: min = 0.10498 ms, max = 0.183746 ms, mean = 0.12631 ms, median = 0.118652 ms, percentile(90%) = 0.150391 ms, percentile(95%) = 0.165649 ms, percentile(99%) = 0.175781 ms
[I] GPU Compute Time: min = 16.967 ms, max = 21.9099 ms, mean = 17.1498 ms, median = 17.0004 ms, percentile(90%) = 17.0549 ms, percentile(95%) = 18.2924 ms, percentile(99%) = 20.8427 ms
[I] D2H Latency: min = 0.223145 ms, max = 0.38623 ms, mean = 0.371093 ms, median = 0.372025 ms, percentile(90%) = 0.374756 ms, percentile(95%) = 0.375122 ms, percentile(99%) = 0.378174 ms
[I] Total Host Walltime: 3.05252 s
[I] Total GPU Compute Time: 3.03552 s

Environment

TensorRT Version: 8.5.2-1+cuda11.4
GPU Type: Jetson AGX Orin
Nvidia Driver Version: NVIDIA UNIX Open Kernel Module for aarch64 35.4.1
CUDA Version: 11.4
CUDNN Version: 8.6
Operating System + Version: 5.10.120-tegra - Ubuntu 20.04
Python Version (if applicable): 3.8.2
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.1.0a0+41361538.nv23.06
Baremetal or Container (if container which image + tag):

Relevant Files

uiu-net-11-fp16.zip (61.0 MB)
infer-trt.zip (1.9 KB)

Steps To Reproduce

Please include:

  • Run “infer-trt.py” and observe FPS performance
  • Run “trtexec --loadEngine=uiu-net-11-fp16.engine” and observe FPS performance