How does TensorRT implements the Add in INT8 mode

872045638 · May 6, 2022, 8:22am

Description

if I give tensorrt my own calibration table, which each layer’s scale is different, I want to know how does TensorRT implements the Add in INT8 mode if the two input of Add node have different scale? Thanks!

Environment

TensorRT Version:
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Steps To Reproduce

NVES · May 6, 2022, 8:37am

Hi, Please refer to the below links to perform inference in INT8

Thanks!

872045638 · May 6, 2022, 8:49am

thanks, but the answer is not what I am looking for, can you give me some guide about how does TensorRT implements the Add in INT8 mode if the two input of Add node have different scale?

spolisetty · May 9, 2022, 10:26am

Hi,

Looks like you are asking about performing Q(Q(x1, s1) + Q(x2, s2), s3) where s1, s2, and s3 are the different quantization scales; x1 and x2 are the two input operands, and Q is the quantization operation.

TRT performs this by converting the operands Q(x1, s1) and Q(x2, s2) to floats and then adding them.

Q(x1, s1) + Q(x2, s2) is converted to a float operation: y’ = DQ(Q(x1, s1),s1) + DQ(Q(x2, s2), s2). Then we quantize the result: result = Q(y’, s3). Of course, if there is not s3 defined, then we just emit y’ (float result)

QAT also operates exactly the same, so to be clear, Q(x1, s1) means that the “original” tensor data, x1 is quantized using scale s1 by whatever layer generated the tensor.

Thank you.

system · May 23, 2022, 10:26am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can TensorRT 7.1.3 convert an INT8 pytorch QAT model to engine? TensorRT	3	706	April 21, 2022
The result of tensorrt qat is not equal to the result of pytorch qat in int8 mode TensorRT	5	484	June 5, 2020
Int8 implementation of element-wise ops with multiple inputs TensorRT	2	22	December 2, 2024
Under the int8 mode, the output of onnxruntime and tensorRT are inconsistent TensorRT	3	724	August 19, 2022
TensorRT 8-bit Quantization questions TensorRT	7	4794	April 26, 2018
How to enforce convert all layers to INT8 when building int8 engine model? TensorRT	5	402	June 21, 2023
How to use the IScaleLayer as a quantization node? TensorRT tensorrt , safety	2	451	July 7, 2022
QAT int8 TRT engine slower than fp16 TensorRT tensorrt , pytorch , python , onnx	3	2170	January 6, 2022
TRT8 - PTQ using integrated Q\DQ nodes inside the PyTorch model (Explicit) Vs. PTQ using calibration based IInt8EntropyCalibrator2 (Implicit)) TensorRT	3	938	December 13, 2021
INT8 cache portable across TRT versions? TensorRT	5	714	March 23, 2022

How does TensorRT implements the Add in INT8 mode

Description

Environment

Relevant Files

Steps To Reproduce

Related topics