Torch-Quantization Examples for Manual Q/DQ Control

m.haritsah · July 31, 2023, 3:17am

Description

I want to learn about the manual method to add q/dq layers between operations similar to your tensorrt developer guide. Do you have any examples on how to reproduce the layers in Figure 1 - Figure 10?

My tensorrt engine is slower than the default fp16 engine without q/dq layers because there are a lot and excessive input/output scaling between the operations. From your documentation, it’s recommended to be conservative about adding the q/dq operations, but in the tensorrt examples and quantization source codes, there is no specific function to add or remove q/dq operations between input/output. All operations are by defualt quantize the input, but I want to know how to quantize only in the first layer, keep in running in int8 until the last layer and finally convert the output using dequantize to get the float32 data like in figure 8 or figure 9,

spolisetty · August 29, 2023, 2:15pm

Hi,

We hope the following examples are helpful.
https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/tutorials/quant_resnet50.html

Thank you.

Topic		Replies	Views
TensorRT TensorRT tensorrt	5	654	January 19, 2022
Confused about the design concept of Explicit quantization Q/DQ node in pytorh_quantizaiton toolkit TensorRT	5	894	April 27, 2022
How does TensorRT implements the Add in INT8 mode TensorRT	4	446	May 9, 2022
Achieving FP32 Accuracy for INT8 Inference Using Quantization Aware Training with NVIDIA TensorRT Technical Blog	1	835	December 3, 2023
Using TRT Quantization Toolkit TensorRT	1	472	September 3, 2021
TensorRT explicit quantization layer fusion TensorRT tensorrt	4	1103	May 3, 2022
TF TRT Quantization Aware training TensorRT	1	1015	December 13, 2019
Post quantization aware training is slower than fp16 and post quantization TensorRT	12	2650	September 25, 2024
QAT int8 TRT engine slower than fp16 TensorRT tensorrt , pytorch , python , onnx	3	2253	January 6, 2022
Does TensorRT 8.6.1 support INT8 quantization for HardSwish? TensorRT cudnn	6	443	October 21, 2023

Torch-Quantization Examples for Manual Q/DQ Control

Description

Related topics