How does TensorRT implements the Add in INT8 mode

Description

if I give tensorrt my own calibration table, which each layer’s scale is different, I want to know how does TensorRT implements the Add in INT8 mode if the two input of Add node have different scale? Thanks!

Environment

TensorRT Version:
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Steps To Reproduce

Hi, Please refer to the below links to perform inference in INT8

Thanks!

thanks, but the answer is not what I am looking for, can you give me some guide about how does TensorRT implements the Add in INT8 mode if the two input of Add node have different scale?

Hi,

Looks like you are asking about performing Q(Q(x1, s1) + Q(x2, s2), s3) where s1, s2, and s3 are the different quantization scales; x1 and x2 are the two input operands, and Q is the quantization operation.

TRT performs this by converting the operands Q(x1, s1) and Q(x2, s2) to floats and then adding them.

Q(x1, s1) + Q(x2, s2) is converted to a float operation: y’ = DQ(Q(x1, s1),s1) + DQ(Q(x2, s2), s2). Then we quantize the result: result = Q(y’, s3). Of course, if there is not s3 defined, then we just emit y’ (float result)

QAT also operates exactly the same, so to be clear, Q(x1, s1) means that the “original” tensor data, x1 is quantized using scale s1 by whatever layer generated the tensor.

Thank you.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.