Description
I have an Onnx model - see attached below model.zip.
I generated two TRT engines for this model using two different methods, one with trtexec and second using my adapted Python script which based on the TRT SDK sample “onnx_resnet50.py”.
Let’s call the engines:
-
TRT_256.trt.engine
-
TRT_256_Own.trt.engine
Using trtexec on both generated engines for inference I am getting different time measurements results.
Environment
TensorRT Version: 8.6.1.6
GPU Type: RTX 4090 mobile
Nvidia Driver Version: 546.24
CUDA Version: 12.3, V12.3.107
CUDNN Version: 8.9.7
Operating System + Version: Ubuntu 22.04.3 LTS (GNU/Linux 5.15.133.1-microsoft-standard-WSL2 x86_64)
Python Version (if applicable): 3.10.12
TensorFlow Version (if applicable): NA
PyTorch Version (if applicable): 2.2.1+cu121
Baremetal or Container (if container which image + tag): Container - nvcr.io/nvidia/tensorrt:24.01-py3
Relevant Files
model.zip (60.3 KB)
Engines.zip (188.2 KB)
Exportprofiles.zip (1.1 KB)
Engines_Layers_Info.zip (2.8 KB)
Steps To Reproduce
Please include:
TRT engine creation using trtexec:
- Create TRT_256:
trtexec --onnx=./localRegistrationTm_256.onnx --fp16 --saveEngine=TRT_256.trt.engine --verbose
TRT engines execution:
-
TRT_256:
trtexec --loadEngine=./TRT_256.trt.engine --warmUp=3000 --iterations=3000 --verbose --exportProfile=TRT_256_profile.txt -
TRT_256_Own:
trtexec --loadEngine=./TRT_256_Own.trt.engine --warmUp=3000 --iterations=3000 --verbose --exportProfile=TRT_256_Own_profile.txt
The attached export profiles above show that only one Conv layer average time was significantly increased while using the TRT_256_Own.trt.engine, all other layers are more of the same.
Please help my analyze what is the root problem which cause this difference.
Regards,