Converting an ONNX model to TensortRT Engine Takes Days

selcuklu.1014 · August 12, 2025, 8:46am

Description

Hi, I have an ONNX model and currently I get 30FPS of inferenece on RTX4060 Mobile. I am trying to gain some performance throgh using TensorRT. This is the summary of my ONNX model:

nodes = 2049 initializers = 444 inputs = 2 outputs = 1 top ops: [(‘Constant’, 592), (‘Unsqueeze’, 227), (‘Add’, 163), (‘Transpose’, 133), (‘MatMul’, 132), (‘Concat’, 113), (‘Shape’, 107), (‘Gather’, 103), (‘Reshape’, 95), (‘Mul’, 94), (‘Div’, 53), (‘LayerNormalization’, 48), (‘Conv’, 45), (‘Erf’, 26), (‘Cast’, 26), (‘Slice’, 20), (‘Softmax’, 14), (‘BatchNormalization’, 9), (‘ReduceMean’, 8), (‘Relu’, 7)]

also this is link to onnx file. Now this is where problems start, using this command:

trtexec --onnx=asymformer_160.onnx ^
–saveEngine=test.engine ^
–fp16 --noTF32 ^
–minShapes=img:1x3x160x160,dep:1x1x160x160 ^
–optShapes=img:1x3x160x160,dep:1x1x160x160 ^
–maxShapes=img:1x3x160x160,dep:1x1x160x160 ^
–precisionConstraints=prefer ^
–memPoolSize=workspace:2048 ^
–tacticSources=+CUBLAS,+CUBLAS_LT

I cannot get it complete it after 10 hours of waiting and gave up.

TensorRT Version: TensorRT-10.13.2.6.Windows.win10.cuda-12.9
GPU Type: RTX4060
Nvidia Driver Version: 576.88
CUDA Version: 12.8
CUDNN Version: cudnn-windows-x86_64-8.9.7.29_cuda12-archive
Operating System + Version: Windows 10
Python Version (if applicable): 3.11.9
TensorFlow Version (if applicable): -
PyTorch Version (if applicable): pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu128

Steps To Reproduce

Install the ONNX file I shared.
Run the trtexec command.

I actviated verbose flag to see if I can catch any errors, but seemingly everything works but they are just too slow! I don’t know if this is normal or not since I am not familiar with TensorRT.

selcuklu.1014 · August 14, 2025, 12:51pm

As a follow up, I can generate an engine with the same setup using polygraph, but I cannot get it done in TensorRT.

forgottenforge · August 20, 2025, 10:00pm

hey,
i think your 10+ hour conversion issue is caused by the 132 MatMul operations with asymmetric shapes that TensorRT struggles to optimize.
have you tried:

Pre-optimize transformer attention matrices
Handling the multi-scale stereo matching operation
Reducing TensorRT’s optimization search space

best wishes

Topic		Replies	Views
ONNX to TensorRT conversion TensorRT	3	800	July 6, 2023
Parseq tensorrt conversion takes for ever to complete TensorRT cudnn	1	102	August 30, 2024
Tensor RT optimization causes performance downgrade compared to onnx model TensorRT	4	1052	January 26, 2022
How to save time by converting ONNX to TensorRT TensorRT tensorrt , opencv , cuda , python , cudnn	3	104	April 20, 2025
tensorRT inference unstable compared onnxruntime TensorRT	4	1462	May 4, 2021
Issues while converting ONNX to TRT Jetson Nano tensorrt , onnx	9	1431	October 18, 2021
Error while converting my onnx model : Could not find any implementation for node TensorRT	3	1708	April 7, 2022
Onnx -> tensorrt fp32 conversion performance degradation different outputs TensorRT tensorrt , pytorch , onnx	4	2249	November 29, 2022
Converting from ONNX to TensorRT Fails when saving engine TensorRT	3	1144	July 8, 2020
Onnx to tensorrt inference fail TensorRT tensorrt	1	554	August 24, 2021

Converting an ONNX model to TensortRT Engine Takes Days

Description

Related topics