Hello everyone,

can you please answer me these questions regarding the entropy calibration algorithm used in this presentation: http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf

For i in range( 128 , 2048 ):

reference_distribution_P = [ bin[ 0 ] , …, bin[ i-1 ] ]

outliers_count = sum( bin[ i ] , bin[ i+1 ] , … , bin[ 2047 ] )

reference_distribution_P[ i-1 ] += outliers_count

P /= sum§

candidate_distribution_Q = quantize [ bin[ 0 ], …, bin[ i-1 ] ] into 128 levels

Q /= sum(Q)

divergence[ i ] = KL_divergence( reference_distribution_P, candidate_distribution_Q)

End For

1- Why we are quantizing into 128 bins and not 256 since it is INT8 quantization (2^8 = 256) ??

2- When we found the threshold T, how is the calibration range is set? is it [-|T|, |T|]?

If yes, we will have a huge loss if the activation values are not symmetric wrt 0!!