can you please answer me these questions regarding the entropy calibration algorithm used in this presentation: http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf
For i in range( 128 , 2048 ):
reference_distribution_P = [ bin[ 0 ] , …, bin[ i-1 ] ]
outliers_count = sum( bin[ i ] , bin[ i+1 ] , … , bin[ 2047 ] )
reference_distribution_P[ i-1 ] += outliers_count
P /= sum§
candidate_distribution_Q = quantize [ bin[ 0 ], …, bin[ i-1 ] ] into 128 levels
Q /= sum(Q)
divergence[ i ] = KL_divergence( reference_distribution_P, candidate_distribution_Q)
1- Why we are quantizing into 128 bins and not 256 since it is INT8 quantization (2^8 = 256) ??
2- When we found the threshold T, how is the calibration range is set? is it [-|T|, |T|]?
If yes, we will have a huge loss if the activation values are not symmetric wrt 0!!