Does TensorRT 8.6.1 support INT8 quantization for HardSwish?

FusionYu · October 20, 2023, 6:09am

Description

In the ONNX-TensorRT operator support list (https://github.com/onnx/onnx-tensorrt/blob/main/docs/operators.md), it shows that HardSwish exported from ONNX can support INT8 inference. However, when I tried the simplest network with INT8 in TensorRT, I noticed HardSwish was not quantized to INT8.
May I ask why HardSwish was not quantized to INT8 in my test? . Please advise what could be the reason it did not quantize HardSwish to INT8 in my test case. Thank you!

Environment

TensorRT Version: 8.6.1
GPU Type: 3060
Nvidia Driver Version: 523
CUDA Version: 12.2
CUDNN Version:
Operating System + Version:
Python Version (if applicable): 3.10
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.0
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

screenshot-20231020-1404541210×931 43.9 KB

AakankshaS · October 20, 2023, 8:23am

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

github.com

onnx/onnx-tensorrt/blob/main/docs/operators.md

<!--- SPDX-License-Identifier: Apache-2.0 -->

# Supported ONNX Operators

TensorRT 8.6 supports operators up to Opset 17. Latest information of ONNX operators can be found [here](https://github.com/onnx/onnx/blob/master/docs/Operators.md)

TensorRT supports the following ONNX data types: DOUBLE, FLOAT32, FLOAT16, INT8, and BOOL

> Note: There is limited support for INT32, INT64, and DOUBLE types. TensorRT will attempt to cast down INT64 to INT32 and DOUBLE down to FLOAT, clamping values to `+-INT_MAX` or `+-FLT_MAX` if necessary.

See below for the support matrix of ONNX operators in ONNX-TensorRT.

## Operator Support Matrix

| Operator                  | Supported  | Supported Types | Restrictions                                                                                                           |
|---------------------------|------------|-----------------|------------------------------------------------------------------------------------------------------------------------|
| Abs                       | Y          | FP32, FP16, INT32 |
| Acos                      | Y          | FP32, FP16 |
| Acosh                     | Y          | FP32, FP16 |
| Add                       | Y          | FP32, FP16, INT32 |

This file has been truncated. show original

Also, request you to share your model and script if not shared already so that we can help you better.

Meanwhile, for some common errors and queries please refer to below link:

Thanks!

FusionYu · October 21, 2023, 5:21am

I’m very sorry for the late reply. I have uploaded the code and models. I checked the supported operators list, and it says HardSwish supports int8 calculation, but you can see in ori_layer.json.svg that HardSwish used floating point for the computation. Aside from the question of how to get HardSwish to do quantized calculation, I did Q/DQ quantization while keeping the scales consistent, but there was still a lot of float data reformat. Why is that? Thank you for your reply.
demoqdq.zip (455.0 KB)

FusionYu · October 21, 2023, 5:27am

In the compressed package, demo.ipynb is the network code and some operations. ori.onnx is the exported model. ori_verbose.json is the verbose information.

FusionYu · October 21, 2023, 12:39pm

I referred to https://github.com/NVIDIA/tensorrt/blob/master/tools/pytorch-quantization/examples/torch vision/models/classification/resnet.py # l123.
Gave me a great help.

Q1: I tried to add residual_quantizer, but the node was not found in final onnx, and the residual_quantizer could not be found in the log. But it does reduce data flow, which is great but incomprehensible.

Q2: Although the network is a pure INT8 inference, its speed is not as fast as that of the engine without QDQ nodes. It can be clearly seen from ori_layer.json.svg and qdq_layer.json.svg The same Convolution node, in QDQ engine,
[ convolution
0.0729557 ms model.conv2.conv.weight
/model/conv2/conv/_weight_quantizer/QuantizeLinear
/model/conv2/conv/Conv]
In an engine without QDQ nodes.
[Convolution
0.0502896 ms
/model/conv2/conv/Conv]
and other convs are the same as this .

Thank you for your replay.
below is the new code and model and logs
demoqdq.zip (461.7 KB)

FusionYu · October 21, 2023, 12:42pm

Regarding the Q2, it may be caused by my GPU… I try it on jetson in a few days.

system · November 4, 2023, 12:43pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Practical aspects about neural networks quantization with TensorRT TensorRT tensorrt	1	800	March 31, 2023
Tensorrt inferencing getting failed with custom quantized int 8 TensorFlow model TensorRT tensorrt , ubuntu , python , cudnn	1	17	March 28, 2025
Convert int8-onnx model to trt engine? TensorRT onnx	6	1086	April 29, 2023
Problem converting TensorFlow 2-> ONNX model to TensorRT Engine (efficientdet_d0) TensorRT	8	1397	November 17, 2022
How to pass uint8 input to a tensorrt engine? TensorRT	6	2827	October 12, 2021
Why while ONNX-TensorRT conversion with INT8 quantizations some layers are not quantized? TensorRT tensorrt , pytorch , onnx	12	2714	December 4, 2022
Quantization of D-FINE in tensorrt 10.8 fails TensorRT tensorrt , cudnn , onnx	3	47	April 30, 2025
How can we know we have convert the onnx to int8trt rather than Float32? TensorRT tensorrt	23	1881	June 14, 2021
Detectron2: faster inferencing TensorRT	2	1412	April 29, 2022
Keras CRNN model conversion to tensorrt engine error TensorRT tensorrt , tensorflow , onnx	3	957	April 8, 2022