I am running INT8 quanization using TRT5 in top of Tensorflow.
In the presentation of the INT8 quantization they mention that the activations are quantized using the Entropy Calibrator, however, the weights are quantized using min-max quantization.
Question: are the weights of the hole graph (all trainable parameters: batch norm param + biases + kernel weights) are taken into consideration and then we just map the max to 127 and the min to -127?
If yes, can you please explain how this is possible if we have huge values for biases or batch norm parameters?
I am using TF-TRT, so TensorRT in top of Tensorflow. Just for testing, not in c++ level.
And my question is how the weights are quantized for INT8 quantization? Is it quantizating the hole weights together or is it doing it for weights layer by layer.
I am only asking about the weights, and not the activations.