Two TRT compiled engines that were generated from the same Onnx model show different inference average times

orong13 · June 15, 2024, 8:10am

Description

I have an Onnx model - see attached below model.zip.

I generated two TRT engines for this model using two different methods, one with trtexec and second using my adapted Python script which based on the TRT SDK sample “onnx_resnet50.py”.

Let’s call the engines:

TRT_256.trt.engine
TRT_256_Own.trt.engine

Using trtexec on both generated engines for inference I am getting different time measurements results.

Environment

TensorRT Version: 8.6.1.6
GPU Type: RTX 4090 mobile
Nvidia Driver Version: 546.24
CUDA Version: 12.3, V12.3.107
CUDNN Version: 8.9.7
Operating System + Version: Ubuntu 22.04.3 LTS (GNU/Linux 5.15.133.1-microsoft-standard-WSL2 x86_64)
Python Version (if applicable): 3.10.12
TensorFlow Version (if applicable): NA
PyTorch Version (if applicable): 2.2.1+cu121
Baremetal or Container (if container which image + tag): Container - nvcr.io/nvidia/tensorrt:24.01-py3

Relevant Files

model.zip (60.3 KB)

Engines.zip (188.2 KB)

Exportprofiles.zip (1.1 KB)

Engines_Layers_Info.zip (2.8 KB)

Steps To Reproduce

Please include:
TRT engine creation using trtexec:

Create TRT_256:
trtexec --onnx=./localRegistrationTm_256.onnx --fp16 --saveEngine=TRT_256.trt.engine --verbose

TRT engines execution:

TRT_256:
trtexec --loadEngine=./TRT_256.trt.engine --warmUp=3000 --iterations=3000 --verbose --exportProfile=TRT_256_profile.txt
TRT_256_Own:
trtexec --loadEngine=./TRT_256_Own.trt.engine --warmUp=3000 --iterations=3000 --verbose --exportProfile=TRT_256_Own_profile.txt

The attached export profiles above show that only one Conv layer average time was significantly increased while using the TRT_256_Own.trt.engine, all other layers are more of the same.

Please help my analyze what is the root problem which cause this difference.

Regards,

AakankshaS · June 30, 2024, 10:50am

Hi @orong13 ,
Are the differences drastic?
If you are using same engine with same input, TensorRT should be deterministic.
However I don’t think engine building is supposed to be deterministic as tactics are chosen based on observed runtime. If you’re outputting your log with info level, you should be able to compare tactic selection between the two engines. Since different tactics/kernels could change order of operations, you would expect floating point differences.

orong13 · August 11, 2024, 3:44pm

Thank you very much.
The issue is no more relevant.
I found the root cause.

Topic		Replies	Views
Trtexec generates different engines when using the same platform/machine with the same onnx model TensorRT	3	1286	March 29, 2022
Run to run variation with TensorRT TensorRT tensorrt	1	452	September 2, 2022
Non-deterministic TensorRT engine building TensorRT tensorrt	3	750	March 10, 2021
Model onnx trt engine generation process report different results compared between two PCs TensorRT	8	1338	July 6, 2022
The same trt engine performs very differently in different programs TensorRT cudnn	0	73	September 9, 2025
Different inference time when loading engine from serialized file TensorRT tensorrt	14	1698	November 2, 2021
Performance discrepancy using TensorRT engines TensorRT tensorrt	3	772	October 5, 2021
Execution time much slower with TensorRT TensorRT tensorrt , cudnn , jetson-orin	0	150	April 2, 2025
Question about TensorRT reproducibility on different architectures TensorRT	2	1045	September 16, 2021
TensorRT engines are built so differently with the same IBuilderConfig, how to fix? TensorRT	1	744	September 20, 2021

Two TRT compiled engines that were generated from the same Onnx model show different inference average times

Description

Environment

Relevant Files

Steps To Reproduce

Related topics