I encountered an issue when attempting to create a TensorRT engine from an ONNX PTQ model on the NVIDIA Orin platform. I used the onnx_ptq tool provided by NVIDIA.
During the engine creation process, I received the following error:
[02/27/2025-14:24:37] [V] [TRT] Removing /stages/stages.3/downsample/attn/q/Add_output_0_QuantizeLinear
[02/27/2025-14:24:37] [V] [TRT] Removing /stages/stages.2/blocks/blocks.14/Add_1_output_0_DequantizeLinear_2
[02/27/2025-14:24:37] [V] [TRT] Removing stages.3.downsample.attn.q.local.weight_DequantizeLinear
[02/27/2025-14:24:37] [V] [TRT] ConstWeightsFusion: Fusing stages.3.downsample.attn.q.local.weight + stages.3.downsample.attn.q.local.weight_QuantizeLinear with /stages/stages.3/downsample/attn/q/local/Conv
[02/27/2025-14:24:37] [E] Error[2]: [graphOptimizer.cpp::fusePattern::1909] Error Code 2: Internal Error (Assertion matchPattern(context, first) && matchBackend(context, first) failed. )
[02/27/2025-14:24:37] [E] Engine could not be created from network
[02/27/2025-14:24:37] [E] Building engine failed
[02/27/2025-14:24:37] [E] Failed to create engine from model or file.
[02/27/2025-14:24:37] [E] Engine set up failed
This issue seems similar to this post, but I’m using a much newer version of TensorRT.
Step To Reproduce
efficientformerv2_l.ptq.zip (93.4 MB)
/usr/src/tensorrt/bin/trtexec --verbose --onnx=efficientformerv2_l.ptq.onnx --saveEngine=efficientformerv2_l.engine --timingCacheFile=timing.cache --fp16 --int8
Thank you in advance for your support!
Environment
Platform : Orin
Jetpack Version : 6.2+b77
TensorRT Version : 10.7
CUDA Version : 12.6.85
CUDNN Version : 9.3
Operating System + Version : Ubuntu 22.04 Jammy Jellyfish
Baremetal or Container (if container which image + tag): baremetal