Execution time much slower with TensorRT

vladi.shtompel · April 2, 2025, 11:38am

Description

I’m trying to speed up a KDE calculation, not a model. I wrap the function in an nn.Module with torch and export it with onnx and then with trtexec.

On my laptop I have RTX4070 and TensorRT 10.9, there I got roughly 7x speedup (28ms → 4ms!). Also building the engine takes less than a minute.

On the Orin, however, building the engine takes over 30 minutes. I can see in verbose that its trying many tactics, each is taking a very long time to execute. But the worst part is, that even after the long build the execution time of the function increases 2x (38ms → 65ms).

Environment

TensorRT Version: 8.6.2.3
GPU Type: Jetson Orin AGX 64
Nvidia Driver Version:
CUDA Version: 12.2.140
CUDNN Version: 8.9.4
Operating System + Version: Jetpack 6.0, L4T 36.3
Python Version (if applicable): 3.10.12
PyTorch Version (if applicable): 2.3
Baremetal or Container (if container which image + tag):

Relevant Files

I’m attaching the “model” code and the onnx export code in kde_example.py.
I also uploaded the resulting .onnx and .trt files that I get.

Steps To Reproduce

python3 kde_example.py
trtexec --onnx=kde.onnx --fp16 --saveEngine=kde.trt --verbose --builderOptimizationLevel=5

Topic		Replies	Views
Building a engine takes too long TensorRT	13	3273	December 8, 2022
TensorRT int8 slower than FP16 due to reformat layer TensorRT tensorrt , cudnn	0	84	October 11, 2024
So slow when open the trt file and create Runtime TensorRT tensorrt , cuda , ubuntu	4	783	January 16, 2022
trt_pose onnx performance issues TensorRT	1	641	March 9, 2020
Jetson Orin Nano TensorRT Jetson Orin Nano tensorrt	7	110	August 6, 2024
Performance DECREASE with tensorRT under onnxruntime, pt2 Jetson AGX Xavier tensorrt	5	2866	May 25, 2022
createExecutionContext takes too long in TensorRT 8.0.3.4 TensorRT	1	398	September 14, 2022
Model inferenced with tensorrt is slower than regular pytorch TensorRT cudnn	1	463	February 16, 2024
Performance discrepancy using TensorRT engines TensorRT tensorrt	3	659	October 5, 2021
There is no speed up with trt model compared with pytorch TensorRT tensorrt , pytorch	5	1224	May 12, 2022

Execution time much slower with TensorRT

Description

Environment

Relevant Files

Steps To Reproduce

Related topics