Assertion Error while Creating TensorRT Engine from ONNX PTQ Model

I encountered an issue when attempting to create a TensorRT engine from an ONNX PTQ model on the NVIDIA Orin platform. I used the onnx_ptq tool provided by NVIDIA.

During the engine creation process, I received the following error:

[02/27/2025-14:24:37] [V] [TRT] Removing /stages/stages.3/downsample/attn/q/Add_output_0_QuantizeLinear
[02/27/2025-14:24:37] [V] [TRT] Removing /stages/stages.2/blocks/blocks.14/Add_1_output_0_DequantizeLinear_2
[02/27/2025-14:24:37] [V] [TRT] Removing stages.3.downsample.attn.q.local.weight_DequantizeLinear
[02/27/2025-14:24:37] [V] [TRT] ConstWeightsFusion: Fusing stages.3.downsample.attn.q.local.weight + stages.3.downsample.attn.q.local.weight_QuantizeLinear with /stages/stages.3/downsample/attn/q/local/Conv
[02/27/2025-14:24:37] [E] Error[2]: [graphOptimizer.cpp::fusePattern::1909] Error Code 2: Internal Error (Assertion matchPattern(context, first) && matchBackend(context, first) failed. )
[02/27/2025-14:24:37] [E] Engine could not be created from network
[02/27/2025-14:24:37] [E] Building engine failed
[02/27/2025-14:24:37] [E] Failed to create engine from model or file.
[02/27/2025-14:24:37] [E] Engine set up failed

This issue seems similar to this post, but I’m using a much newer version of TensorRT.

Step To Reproduce

efficientformerv2_l.ptq.zip (93.4 MB)

/usr/src/tensorrt/bin/trtexec --verbose --onnx=efficientformerv2_l.ptq.onnx --saveEngine=efficientformerv2_l.engine --timingCacheFile=timing.cache --fp16 --int8

Thank you in advance for your support!

Environment

Platform : Orin
Jetpack Version : 6.2+b77
TensorRT Version : 10.7
CUDA Version : 12.6.85
CUDNN Version : 9.3
Operating System + Version : Ubuntu 22.04 Jammy Jellyfish
Baremetal or Container (if container which image + tag): baremetal

Hi,

Could you try to run the ONNX model with ONNXRuntime to see if there is any issue first?
Thanks.

I resolved the issue by increasing the builderOptimizationLevel to 4.

trtexec --verbose --onnx=efficientformerv2_l.ptq.onnx --saveEngine=efficientformerv2_l.engine --timingCacheFile=timing.cache --builderOptimizationLevel=4 --fp16 --int8

I’m not exactly sure how it works, but hopefully, this can help someone facing a similar issue…

Hi,

Thanks a lot for sharing this info:

$ /usr/src/tensorrt/bin/trtexec -h
  --builderOptimizationLevel         Set the builder optimization level. (default is 3)
                                     Higher level allows TensorRT to spend more building time for more optimization options.
                                     Valid values include integers from 0 to the maximum optimization level, which is currently 5.

Setting the builderOptimizationLevel larger than 3 allows TensorRT to use more time to find an algorithm.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.