if I give tensorrt my own calibration table, which each layer’s scale is different, I want to know how does TensorRT implements the Add in INT8 mode if the two input of Add node have different scale? Thanks!
Nvidia Driver Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Steps To Reproduce
Hi, Please refer to the below links to perform inference in INT8
thanks, but the answer is not what I am looking for, can you give me some guide about how does TensorRT implements the Add in INT8 mode if the two input of Add node have different scale?
Looks like you are asking about performing Q(Q(x1, s1) + Q(x2, s2), s3) where s1, s2, and s3 are the different quantization scales; x1 and x2 are the two input operands, and Q is the quantization operation.
TRT performs this by converting the operands Q(x1, s1) and Q(x2, s2) to floats and then adding them.
Q(x1, s1) + Q(x2, s2) is converted to a float operation: y’ = DQ(Q(x1, s1),s1) + DQ(Q(x2, s2), s2). Then we quantize the result: result = Q(y’, s3). Of course, if there is not s3 defined, then we just emit y’ (float result)
QAT also operates exactly the same, so to be clear, Q(x1, s1) means that the “original” tensor data, x1 is quantized using scale s1 by whatever layer generated the tensor.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.