ONNX runtime result differs from int8 quantized pytorch model

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version: 8.2.06
GPU Type: RTX2080
Nvidia Driver Version: 470.86
CUDA Version: 11.4
CUDNN Version: 8.2.4
Operating System + Version: Ubuntu18.04
Python Version (if applicable): 3.7
PyTorch Version (if applicable): 1.9

DepthNet code.txt (7.2 KB)
onnx_creation.txt (3.2 KB)
finetuned_quantized_depthnet.pt (10.4 MB)
quantized_depthnet.onnx (7.3 MB)

Hello,

I’m having problems with exporting the int8 quantized models (with Nvidia’s pytorch-quantization toolkit) to ONNX.

I followed the Basic Functionalities — pytorch-quantization master documentation and exported the ONNX file successfully.

However, when I use the inference on the pytorch quantized model and compare to the result of the onnxruntime I receive totally different results.

I attach the DepthNet code.txt where the forward function of the model is described, the onnx_creation.txt where the code of creation of ONNX file is described and the comparison between the result of pytorch and onnxruntime.

The original pytorch quantized model and its exported onnx is also attached.

Thank you,
Alex.

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Hello, the ONNX file is attached (quantized_depthnet.onnx).

The issue is not in creating TRT engine from ONNX file, but that the inference on an exported ONNX differs from the quantized model.

Thank you,
Alex.

Hi,

Could you please share steps to execute the above scripts(in txt) for reproducing the issue for better debugging.

Thank you.

Hi,

I composed a Pycharm project, which loads the quantized model, export it to onnx, loads the onnx and produces outputs from torch and onnxruntime, comparing between them (only the ‘depth’ output is compared).
The 4Nvda folder contains the depthnet_nvda.pt quantized model and quantized_depthnet.onnx produced from it.

Zipped folder attached.

Thank you.
4Nvda.zip (14.5 MB)

Hi,

This issue looks like more related to exported onnx model. We recommend you to please post your concern on https://discuss.pytorch.org/ or Discussions · microsoft/onnxruntime · GitHub to get better help.

Thank you.