Fake quantization ONNX model parse ERROR using TensorRT 8

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version: 8.0.1.6
GPU Type: 2080
Nvidia Driver Version: 470.63.01
CUDA Version: 11.3
CUDNN Version: 8.0
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.7
PyTorch Version (if applicable): 1.9

Hello,
I had successfully per-channel quantized and exported calibrated int8 model using pytorch quantization toolbox and torch ONNX export (using op-13).
The QuantizeLinear related to quantized weights blocks of the model include a Tensor with output dimension same as the number of channels of the layer they represent, thus also having an axis attribute set by the ONNX export
function fake_quantize_per_channel_affine defined in the op-13.
However when I load the ONNX file, the checker fails with the following error:

Traceback (most recent call last):
** File “/vayaalgo/Work/Pruning/Convert2TRT/main.py”, line 244, in **
** import_ONNX()**
** File “/vayaalgo/Work/Pruning/Convert2TRT/utils/models_quant.py”, line 15, in import_ONNX**
** engine = backend.prepare(depthnet, device=‘CUDA:0’)**
** File “/usr/local/lib/python3.7/dist-packages/onnx_tensorrt-8.0.1-py3.7.egg/onnx_tensorrt/backend.py”, line 235, in prepare**
** File “/usr/local/lib/python3.7/dist-packages/onnx/backend/base.py”, line 74, in prepare**
** onnx.checker.check_model(model)**
** File “/usr/local/lib/python3.7/dist-packages/onnx/checker.py”, line 91, in check_model**
** C.check_model(model.SerializeToString())**
onnx.onnx_cpp2py_export.checker.ValidationError: Unrecognized attribute: axis for operator QuantizeLinear

==> Context: Bad node spec: input: “features_0_0.weight” input: “465” input: “1337” output: “468” name: “QuantizeLinear_7” op_type: “QuantizeLinear” attribute { name: “axis” i: 0 type: INT }

How to solve this issue?

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:
https://docs.nvidia.com/deeplearning/tensorrt/quick-start-guide/index.html#onnx-export

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

quantized_depthnet.onnx (7.7 MB)
Hello,
Please find the attached ONNX file describing my model.
The model = onnx.load(filename) command loads the model, however onnx.checker.check_model(model) fails with the same error as above: onnx.onnx_cpp2py_export.checker.ValidationError: Unrecognized attribute: axis for operator QuantizeLinear

Hi,

Looks like there is something wrong with pytorch to onnx conversion. Please refer following link/user guide to make sure you’re converting correctly.
https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/userguide.html#export-to-onnx

Thank you.