Failed to convert quantized onnx model to engine

Description

1、Using trtexec to convert original onnx model to engine is OK
2、Using trtexec to convert quantized onnx model which is verified by onnxruntim, error happened:

[07/11/2024-06:35:43] [V] [TRT] Importing initializer: head.obj_preds.2.bias_quantized_zero_point

[07/11/2024-06:35:43] [V] [TRT] Parsing node: head.cls_preds.0.bias_DequantizeLinear [DequantizeLinear]

[07/11/2024-06:35:43] [V] [TRT] Searching for input: head.cls_preds.0.bias_quantized

[07/11/2024-06:35:43] [V] [TRT] Searching for input: head.cls_preds.0.bias_quantized_scale

[07/11/2024-06:35:43] [V] [TRT] Searching for input: head.cls_preds.0.bias_quantized_zero_point

[07/11/2024-06:35:43] [V] [TRT] head.cls_preds.0.bias_DequantizeLinear [DequantizeLinear] inputs: [head.cls_preds.0.bias_quantized → (16)[INT32]], [head.cls_preds.0.bias_quantized_scale → (1)[FLOAT]], [head.cls_preds.0.bias_quantized_zero_point → (1)[INT32]],

[07/11/2024-06:35:43] [V] [TRT] Registering layer: head.cls_preds.0.bias_quantized for ONNX node: head.cls_preds.0.bias_quantized

[07/11/2024-06:35:43] [V] [TRT] Registering layer: head.cls_preds.0.bias_quantized_scale for ONNX node: head.cls_preds.0.bias_quantized_scale

[07/11/2024-06:35:43] [V] [TRT] Registering layer: head.cls_preds.0.bias_quantized_zero_point for ONNX node: head.cls_preds.0.bias_quantized_zero_point

[07/11/2024-06:35:43] [E] Error[3]: head.cls_preds.0.bias_DequantizeLinear: only activation types allowed as input to this layer.

[07/11/2024-06:35:43] [E] [TRT] ModelImporter.cpp:726: While parsing node number 0 [DequantizeLinear → “head.cls_preds.0.bias”]:

[07/11/2024-06:35:43] [E] [TRT] ModelImporter.cpp:727: — Begin node —

[07/11/2024-06:35:43] [E] [TRT] ModelImporter.cpp:728: input: “head.cls_preds.0.bias_quantized”

input: “head.cls_preds.0.bias_quantized_scale”

input: “head.cls_preds.0.bias_quantized_zero_point”

output: “head.cls_preds.0.bias”

name: “head.cls_preds.0.bias_DequantizeLinear”

op_type: “DequantizeLinear”

[07/11/2024-06:35:43] [E] [TRT] ModelImporter.cpp:729: — End node —

[07/11/2024-06:35:43] [E] [TRT] ModelImporter.cpp:731: ERROR: ModelImporter.cpp:185 In function parseGraph:

[6] Invalid Node - head.cls_preds.0.bias_DequantizeLinear

head.cls_preds.0.bias_DequantizeLinear: only activation types allowed as input to this layer.

[07/11/2024-06:35:43] [E] Failed to parse onnx file

[07/11/2024-06:35:43] [I] Finish parsing network model

[07/11/2024-06:35:43] [E] Parsing model failed

[07/11/2024-06:35:43] [E] Failed to create engine from model or file.

[07/11/2024-06:35:43] [E] Engine set up failed

&&&& FAILED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --verbose --onnx=yolox_s.onnx --saveEngine=yolox_s.engine --int8 --minShapes=input:1x3x640x640 --optShapes=input:2x3x640x640 --maxShapes=input:4x3x640x640 --workspace=4096

Environment

TensorRT Version: 8.5.5.2
GPU Type: Jetson Orin Nano
CUDA Version: 11.4
CUDNN Version: 8.6
Operating System + Version: ubuntu20.04
Python Version (if applicable): python3.8.10
**Docker container: nvcr.io/nvidia/deepstream-l4t:6.2-samples

Hi @zhang.ga ,
Can you help us with the model and repro steps?

Thanks

Repro steps:
1 Quantizate onnx fp16 model to onnx int8 model
2 Run trtexec to convert onnx_int8 to tensorrt engine:
/usr/src/tensorrt/bin/trtexec --verbose --onnx=yolox_s.onnx --saveEngine=yolox_s.engine --int8 --minShapes=input:1x3x640x640 --optShapes=input:2x3x640x640 --maxShapes=input:4x3x640x640 --workspace=4096

quantization.py.txt (2.1 KB)
yolox_s_int8.onnx.txt (8.7 MB)

Hi, have you reproduced the issue?

Hello, anybody here?