if I give tensorrt my own calibration table, which each layer’s scale is different, I want to know how does TensorRT implements the Add in INT8 mode if the two input of Add node have different scale? Thanks!
Environment
TensorRT Version: GPU Type: Nvidia Driver Version: CUDA Version: CUDNN Version: Operating System + Version: Python Version (if applicable): TensorFlow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if container which image + tag):
thanks, but the answer is not what I am looking for, can you give me some guide about how does TensorRT implements the Add in INT8 mode if the two input of Add node have different scale?
Looks like you are asking about performing Q(Q(x1, s1) + Q(x2, s2), s3) where s1, s2, and s3 are the different quantization scales; x1 and x2 are the two input operands, and Q is the quantization operation.
TRT performs this by converting the operands Q(x1, s1) and Q(x2, s2) to floats and then adding them.
Q(x1, s1) + Q(x2, s2) is converted to a float operation: y’ = DQ(Q(x1, s1),s1) + DQ(Q(x2, s2), s2). Then we quantize the result: result = Q(y’, s3). Of course, if there is not s3 defined, then we just emit y’ (float result)
QAT also operates exactly the same, so to be clear, Q(x1, s1) means that the “original” tensor data, x1 is quantized using scale s1 by whatever layer generated the tensor.