Error Code 2: Internal Error (Assertion matchPattern(context, first) && matchBackend(first) failed. )

Description

I tried to use int8 quantization on tensorrt. The workflow is pytorch-onnx-tensorrt. When running ./trtexec --onnx=model.onnx --int8 --verbose, encounter the error:
[E] Error[2]: [graphOptimizer.cpp::fusePattern::1777] Error Code 2: Internal Error (Assertion matchPattern(context, first) && matchBackend(first) failed. )

Environment

TensorRT Version: 8.4.2.4
GPU Type: NVIDIA TITAN Xp
CUDA Version: 10.2
CUDNN Version: 8.4
Operating System + Version: ubuntu16.04
Python Version (if applicable): 3.8
PyTorch Version (if applicable): ‘1.10.0+cu102’

Steps To Reproduce

./trtexec --onnx=model.onnx --int8 --verbose
model.onnx (89.9 MB)

Hi,

We could reproduce the same error.
Please allow us some time to check the cause of the issue and work on it.

Thank you.

Hi, Please refer to the below links to perform inference in INT8

Thanks!

Thank you very much.
And there is an additional detail that may help you solve this error. The onnx above is the model after doing QAT with pytorch-quantization. Whithout doing QAT, I can successfully build int8 engine with trtexec for same model. I guess this is related to graph fusion? Maybe wrongly put QDQ operators somewhere I guess.
My another try is change the DepthToSpace operators (nn.pixelshuffle in pytorch) to reshape-transpose-reshape, and encouter the same error.

Hi @user118950,

We are working on a fix for this issue.
I think the ONNX model link has been corrupted, could you please share with us the ONNX model again here or via DM.

Thank you.

hi @ spolisetty,
Any update?
I got the same error.

Description

...
[01/11/2023-02:22:33] [V] [TRT] Running: QuantizeConvWithResidualAdd on Conv_45
[01/11/2023-02:22:33] [V] [TRT] Swapping Add_65 + Relu_66 with QuantizeLinear_69
[01/11/2023-02:22:33] [V] [TRT] QuantizeDoubleInputNodes: fusing QuantizeLinear_69 into Conv_45
[01/11/2023-02:22:33] [V] [TRT] QuantizeDoubleInputNodes: fusing (DequantizeLinear_41 and DequantizeLinear_44) into Conv_45
[01/11/2023-02:22:33] [V] [TRT] Removing QuantizeLinear_69
[01/11/2023-02:22:33] [V] [TRT] Removing DequantizeLinear_41
[01/11/2023-02:22:33] [V] [TRT] Removing DequantizeLinear_44
[01/11/2023-02:22:33] [V] [TRT] ConstWeightsFusion: Fusing conv23.weight + QuantizeLinear_43 with Conv_45
[01/11/2023-02:22:33] [E] Error[2]: [graphOptimizer.cpp::fusePattern::1777] Error Code 2: Internal Error (Assertion matchPattern(context, first) && matchBackend(first) failed. )
[01/11/2023-02:22:33] [E] Error[2]: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[01/11/2023-02:22:33] [E] Engine could not be created from network
[01/11/2023-02:22:33] [V] [TRT] Deleting timing cache: 3564 entries, served 0 hits since creation.
[01/11/2023-02:22:33] [E] Building engine failed
[01/11/2023-02:22:33] [E] Failed to create engine from model or file.
[01/11/2023-02:22:33] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8401] # trtexec --verbose --nvtxMode=verbose --buildOnly --workspace=8192 --onnx=pytorch_quantization_identity/0.pth.onnx --saveEngine=pytorch_quantization_identity/0.pth.onnx.engine --timingCacheFile=./timing.cache --profilingVerbosity=detailed --fp16 --int8

Environment

Platform: Orin
Jetpack Version: 5.0.2-b231
TensorRT Version: 8.4.1
CUDA Version: 11.4
CUDNN Version: 8.4.1
Operating System + Version: ubuntu20.04
Python Version (if applicable): 3.8.10

Steps To Reproduce

trtexec --verbose --nvtxMode=verbose --buildOnly --workspace=8192 --onnx=pytorch_quantization_identity/0.pth.onnx --saveEngine=pytorch_quantization_identity/0.pth.onnx.engine --timingCacheFile=./timing.cache --profilingVerbosity=detailed --fp16 --int8

0.pth.onnx (173.1 KB)

Hi @jolly.ming2005,

Could you create a new post with complete verbose logs and issue repro onnx model.

Thank you.

Hi @spolisetty
I have created a new post: Failed to create tensorrt engine from QAT onnx model