Tensorrt - Entropy Calibration - pseudocode

Hello everyone,

can you please answer me these questions regarding the entropy calibration algorithm used in this presentation: http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf

For i in range( 128 , 2048 ):
reference_distribution_P = [ bin[ 0 ] , …, bin[ i-1 ] ]
outliers_count = sum( bin[ i ] , bin[ i+1 ] , … , bin[ 2047 ] )
reference_distribution_P[ i-1 ] += outliers_count
P /= sum§
candidate_distribution_Q = quantize [ bin[ 0 ], …, bin[ i-1 ] ] into 128 levels
Q /= sum(Q)
divergence[ i ] = KL_divergence( reference_distribution_P, candidate_distribution_Q)
End For

1- Why we are quantizing into 128 bins and not 256 since it is INT8 quantization (2^8 = 256) ??
2- When we found the threshold T, how is the calibration range is set? is it [-|T|, |T|]?
If yes, we will have a huge loss if the activation values are not symmetric wrt 0!!

Hi,

  1. Int8 quantization is symmetric. So we just need to quantize the absolute value, hence half of the range. (-128 to 127)
  2. The loss may not be as huge as you think. Actually what you lose is one bit at most.
    Theoretical you can use 2x scale for 0-127, compared to 1x scale for 0-255. They represent the same float range.
    Because in two cases, the scale will be different. What you lose is more likely 1-bit resolution, rather than half of the dynamic range.
    So it is actually 7-bit vs 8-bit, when all your values are non-negative.

We would be interested to investigate if this causes huge precision loss for certain cases.

Thanks

Hello,

Thank you for your answer.

1- In the algorithm described above, we are taking into consideration the WHOLE activation range (from bin[0] to bin[2047]) and quantizing it into 128 bins! so we are not taking the half of the range!

2- I just enclosed a plot where we can see the activation distribution and the threshold found. We can see that the threshold found is almost T=1.6. So the quantization range will [-1.6 , 1.6].
We can see that the majority of the points are not taken into consideration!! So the symmetric quantization will cause a huge lost!
negative_normal_Threshold1000.png

Hi,
In this case, you try to use per tensor dynamic range approach.

Please refer below link for more details:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt-601/tensorrt-developer-guide/index.html#set_tensor_mp_python

Also, could you please share the script, model and data file so we can better help?

Thanks

Hi, thank you for your response!

I have a trained DNN for object detection that I converted to frozen graph (.pb)

Now, I am using TensorRT 5 in top of Tensorflow 1.15 to generate trt inference graph:

converter = trt.TrtGraphConverter(input_graph_def=graph_def,
                                          nodes_blacklist=outputs,
                                          precision_mode="INT8",
                                          max_batch_size=1,
                                          minimum_segment_size=5,
                                          max_workspace_size_bytes=2048<<20,
                                          use_calibration=True)
calib_graph = converter.convert()

trt_graph = converter.calibrate(fetch_names=outputs,
                                num_runs=10000,
                                input_map_fn=input_calib_data_fn)

The problem is that I got huge drop in accuracy using the quantized INT8 graph.

Hi,

Sorry for late reply.
IEntropyCalibratorV2 supports per activation tensor scaling. Each tensor would have its own threshold.
Could you please try TRT 7 with IEntropyCalibratorV2 instead of TRT 5?

Thanks