I tried to use int8 quantization on tensorrt. The workflow is pytorch-onnx-tensorrt. When running ./trtexec --onnx=model.onnx --int8 --verbose, encounter the error:
[E] Error: [graphOptimizer.cpp::fusePattern::1777] Error Code 2: Internal Error (Assertion matchPattern(context, first) && matchBackend(first) failed. )
TensorRT Version: 126.96.36.199 GPU Type: NVIDIA TITAN Xp CUDA Version: 10.2 CUDNN Version: 8.4 Operating System + Version: ubuntu16.04 Python Version (if applicable): 3.8 PyTorch Version (if applicable): ‘1.10.0+cu102’
Thank you very much.
And there is an additional detail that may help you solve this error. The onnx above is the model after doing QAT with pytorch-quantization. Whithout doing QAT, I can successfully build int8 engine with trtexec for same model. I guess this is related to graph fusion? Maybe wrongly put QDQ operators somewhere I guess.
My another try is change the DepthToSpace operators (nn.pixelshuffle in pytorch) to reshape-transpose-reshape, and encouter the same error.