Hello everyone,

Can you please explain me how tensorrt is calculating the KL divergence between FP32 model and INT8 model in order minimize the loss of information?

Fares

Hello everyone,

Can you please explain me how tensorrt is calculating the KL divergence between FP32 model and INT8 model in order minimize the loss of information?

Fares

Hi Fares,

Not too sure on the specifics of the entropy calculation, but perhaps this thread can help: https://devtalk.nvidia.com/default/topic/1065472/tensorrt/tensorrt-4-0-1-int8-precision-vs-fp32-precision-objects-detections-inference-results/post/5405592/#5405592

And also this presentation: http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf

Towards the end of the above presentation is a slide titled, “Entropy Calibration - pseudocode”

Hello NVES_R,

Thank you for your response.

I would like to know if tensorrt support only symmetric activation histograms?

Because in the “Entropy calibration - pseudocode” we have this :

For i in range( 128 , 2048 ):

reference_distribution_P = [ bin[ 0 ] , …, bin[ i-1 ] ]

outliers_count = sum( bin[ i ] , bin[ i+1 ] , … , bin[ 2047 ] )

reference_distribution_P[ i-1 ] += outliers_count

P /= sum§

candidate_distribution_Q = quantize [ bin[ 0 ], …, bin[ i-1 ] ] into 128 levels

Q /= sum(Q)

divergence[ i ] = KL_divergence( reference_distribution_P, candidate_distribution_Q)

End For

We can see that we are only taking care on the right side and quantizing in 128 bins!!

So I suppose here tensorrt is only assuming that we have symmetric range of activations!

Hi Fares,

Yes I believe the range from calibration is symmetric. You can see this in calibration caches, because they only contain one value for each layer, which is the absolute max of the range.

Two workarounds in this scenario are to either, manually set the min/max range if you know their expected values (https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/c_api/classnvinfer1_1_1_i_tensor.html#a956f662b1d2ebe7ba3aba3391aedddf5) – though I still believe this will create a symmetric range based on the min/max values you provide – or to use quantization-aware training (QAT) when training your model, and then convert your model to TensorRT with the EXPLICIT_PRECISION flag: https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#work-with-qat-networks

Hello,

Thanks for your answer.

Since the calibration range is symmetric [-|T|,|T|], can you please explain the slide “Results From Calibration #2” from this presentation?? http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf

The slide says that using KL-divergence algorithm the threshold is found at ~= 40.

The calibration range will therefore be [-40, 40].

However, we don’t have any negative activation values! so the negative range will be a huge lost!