I tried to use int8 quantization on tensorrt. The workflow is pytorch-onnx-tensorrt. When running ./trtexec --onnx=model.onnx --int8 --verbose, encounter the error:
[E] Error: [graphOptimizer.cpp::fusePattern::1777] Error Code 2: Internal Error (Assertion matchPattern(context, first) && matchBackend(first) failed. )
TensorRT Version: 126.96.36.199
GPU Type: NVIDIA TITAN Xp
CUDA Version: 10.2
CUDNN Version: 8.4
Operating System + Version: ubuntu16.04
Python Version (if applicable): 3.8
PyTorch Version (if applicable): ‘1.10.0+cu102’
Steps To Reproduce
./trtexec --onnx=model.onnx --int8 --verbose
model.onnx (89.9 MB)
We could reproduce the same error.
Please allow us some time to check the cause of the issue and work on it.
Hi, Please refer to the below links to perform inference in INT8
Thank you very much.
And there is an additional detail that may help you solve this error. The onnx above is the model after doing QAT with pytorch-quantization. Whithout doing QAT, I can successfully build int8 engine with trtexec for same model. I guess this is related to graph fusion? Maybe wrongly put QDQ operators somewhere I guess.
My another try is change the DepthToSpace operators (nn.pixelshuffle in pytorch) to reshape-transpose-reshape, and encouter the same error.
We are working on a fix for this issue.
I think the ONNX model link has been corrupted, could you please share with us the ONNX model again here or via DM.