Explicit quantization vs implicit quantization

872045638 · April 24, 2022, 4:03am

Description

I am confused about why I can not use the calibration table contained in the QAT ONNX model(explicit quantization) and then use tensorrt internal quantization(implicit quantization)? Can someone help me?

Environment

TensorRT Version: 7.0
GPU Type: v100

NVES · April 24, 2022, 4:37am

Hi,

Request you to share the model, script, profiler, and performance output if not shared already so that we can help you better.

Alternatively, you can try running your model with trtexec command.

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer to the below links for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-803/best-practices/index.html#measure-performance

https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-803/best-practices/index.html#model-accuracy

Thanks!

872045638 · April 24, 2022, 6:23am

Thanks for your reply!
This is the resnet18 onnx model(implicit quantization)
resnet18.onnx (42.6 MB)
This is the quantized resnet18 onnx model exported by pytorch_quantization package(explicit quantization)
resnet18_quant.onnx (42.7 MB)

this is the command I use

trtexec --onnx=xxx.onnx  --saveEngine=tmp.trt   --iterations=10000  --int8

the results shows that the speed of the explicit quantization(mean GPU time 2.2ms) is much slower than implicit(0.9ms).

And my question is that why TensorRT cannot use calibration info in the explicit quantization model to perform like implicit quantization, instead, must use Q/DQ node, which is slower than implicit quantization?

In other word, why the ptq model exported from pytorch_quantization cannot perform like trt internal ptq( plain TensorRT INT8 processing )

And why we cannot remove the q/dq layer of the explicit quantization model then use trt internal ptq

spolisetty · April 26, 2022, 11:18am

Hi,

Hope the following doc will help you. We can find clear details on Explicit vs Implicit Quantization

Thank you.

Topic		Replies	Views
TRT8 - PTQ using integrated Q\DQ nodes inside the PyTorch model (Explicit) Vs. PTQ using calibration based IInt8EntropyCalibrator2 (Implicit)) TensorRT	3	958	December 13, 2021
How exactly are you supposed to do explicit quantization? TensorRT	1	133	March 4, 2025
Practical aspects about neural networks quantization with TensorRT TensorRT tensorrt	1	845	March 31, 2023
Superseded by explicit quantization TensorRT	2	306	September 3, 2024
Confused about the design concept of Explicit quantization Q/DQ node in pytorh_quantizaiton toolkit TensorRT	5	934	April 27, 2022
TensorRT explicit quantization layer fusion TensorRT tensorrt	4	1115	May 3, 2022
Achieving FP32 Accuracy for INT8 Inference Using Quantization Aware Training with NVIDIA TensorRT Technical Blog	1	845	December 3, 2023
Converting to TRT a model from Quantization Aware Training without applying calibration TensorRT	5	1758	February 2, 2021
Tensorrt inferencing getting failed with custom quantized int 8 TensorFlow model TensorRT tensorrt , ubuntu , python , cudnn	1	32	March 28, 2025
Unable to build model engine for INT8 yolov8m quantized using tensorrt model optimizer TensorRT jetson , deepstream	5	466	September 24, 2024

Explicit quantization vs implicit quantization

Description

Environment

Related topics