Failed to create tensorrt engine from QAT onnx model

Description

I am debuging QAT with pytorch_quantization(TensorRT/tools/pytorch-quantization at main · NVIDIA/TensorRT · GitHub).
The onnx model with QDQ from pth can be convertered successfully.
But when i am trying to convert the simple ONNX model to TensorRT, it failed.

[01/13/2023-08:20:26] [V] [TRT] Removing QuantizeLinear_70
[01/13/2023-08:20:26] [V] [TRT] Removing DequantizeLinear_41
[01/13/2023-08:20:26] [V] [TRT] Removing DequantizeLinear_44
[01/13/2023-08:20:26] [V] [TRT] ConstWeightsFusion: Fusing conv23.weight + QuantizeLinear_43 with Conv_45
[01/13/2023-08:20:26] [E] Error[2]: [graphOptimizer.cpp::fusePattern::1777] Error Code 2: Internal Error (Assertion matchPattern(context, first) && matchBackend(first) failed. )
[01/13/2023-08:20:26] [E] Error[2]: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[01/13/2023-08:20:26] [E] Engine could not be created from network
[01/13/2023-08:20:26] [E] Building engine failed
[01/13/2023-08:20:26] [E] Failed to create engine from model or file.
[01/13/2023-08:20:26] [E] Engine set up failed

Environment

I test tensorrt in Docker

Platform : Orin
Jetpack Version : 5.0.2-b231
TensorRT Version : 8.4.1
CUDA Version : 11.4
CUDNN Version : 8.4.1
Operating System + Version : ubuntu20.04
Python Version (if applicable) : 3.8.10
PyTorch Version (if applicable): 1.10
Baremetal or Container (if container which image + tag): dustynv/ros:noetic-ros-base-l4t-r35.1.0(GitHub - dusty-nv/jetson-containers: Machine Learning Containers for NVIDIA Jetson and JetPack-L4T))

Relevant Files

0.pth.onnx (173.1 KB)
0.pth.onnx.log (98.0 KB)

Steps To Reproduce

trtexec --verbose --nvtxMode=verbose --buildOnly --workspace=8192 --onnx=0.pth.onnx --saveEngine=0.pth.onnx.engine --timingCacheFile=./timing.cache --profilingVerbosity=detailed --fp16 --int8

another same topic: https://forums.developer.nvidia.com/t/error-code-2-internal-error-assertion-matchpattern-context-first-matchbackend-first-failed/223247/7

Hi,

We could build TensorRT engine successfully on Tesla V100 GPUs.

[01/13/2023-12:02:23] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[01/13/2023-12:02:23] [I] Engine deserialized in 0.0136184 sec.
[01/13/2023-12:02:23] [I] Skipped inference phase since --buildOnly is added.
&&&& PASSED TensorRT.trtexec [TensorRT v8501] # trtexec --verbose --nvtxMode=verbose --buildOnly --workspace=8192 --onnx=0.pth.onnx --saveEngine=0.pth.onnx.engine --timingCacheFile=./timing.cache --profilingVerbosity=detailed --fp16 --int8

Please try on the latest TensorRT version 8.5.2 and let us know if you still face this issue. We are moving this post to the Jetson AGX Orin forum if you need further help.

Thank you.

@ spolisetty
it works on the TensorRT version 8.5.1(I build the tensorrt engine with docker: nvcr.io/nvidia/tensorrt:22.12-py3).

Can you explain more about this error?

Another question:
How to upgrade the tensorrt version from 8.4.1 to 8.5.1(or 8.5.2) on orin.

Thanks!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.