A clear and concise description of the bug or issue.
TensorRT Version: 8.2.06
GPU Type: RTX2080
Nvidia Driver Version: 470.86
CUDA Version: 11.4
CUDNN Version: 8.2.4
Operating System + Version: Ubuntu18.04
Python Version (if applicable): 3.7
PyTorch Version (if applicable): 1.9
I’m having problems with exporting the int8 quantized models (with Nvidia’s pytorch-quantization toolkit) to ONNX.
I followed the Basic Functionalities — pytorch-quantization master documentation and exported the ONNX file successfully.
However, when I use the inference on the pytorch quantized model and compare to the result of the onnxruntime I receive totally different results.
I attach the DepthNet code.txt where the forward function of the model is described, the onnx_creation.txt where the code of creation of ONNX file is described and the comparison between the result of pytorch and onnxruntime.
The original pytorch quantized model and its exported onnx is also attached.