cutoff and quantile parameters in TensorRT

From the presentation of Szymon Migacz (http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf) I learned that TensorRT employs symmetric linear quantization and that it requires the threshold parameter (T) to be estimated on a calibration dataset.

However, when looking into the int8 sample provided with TensorRT I can see the two parameters are being estimated: cutoff and quantile. My question is: how do they correspond to the T?
I assume that T = |max| * cutoff, or am I wrong? What meaning does the quantile has then?

TensorRT user guide mentions a white paper describing those in details (The cutoff and quantile parameters take values in the range [0,1]; their meaning is discussed in detail in the accompanying white paper.), but I have no idea where to find the white paper, could you maybe share it here?

Thanks.

Can you share the version of TensorRT that you are looking at? It sounds like perhaps you are looking at TensorRT 2.1 RC rather than 2.1 GA.

If you installed with debian packages you can do

dpkg -l | grep TensorRT

and send the results.

I was using the 2.0 EA version. With the new entropy calibrator these parameters do not make sense anymore.

Parameters cutoff and quantile have to be specified only for “legacy” calibrator. It’s difficult to set values of cutoff and quantile without running experiments. Our recommended way was to run 2D grid search and look for optimal combination of (cutoff, quantile) for a given network on a given dataset. This was implemented in sampleINT8 shipped with TensorRT 2.0 EA.

New entropy calibrator doesn’t require any external hyperparameters, and it determines quantization thresholds automatically based on the distributions of activations on calibration dataset. In my presentation at GTC I was talking only about the new entropy calibrator, it’s available in TensorRT 2.1 GA.

@szmigacz, I have read your presentation on GTC. but when I implement the solution as you presented, I find that the quantization for input will lose much accuracy, especially when the input has a wide range distribution. For example, when input has range of (-5000, 5000), the threshold is 3000, then the quant error will reach 3000/127, it will make the output much different from the true output.

In your presentation, you take experiments that shows the perfect accuracy nearly without loss.
So I’m curious about how you solve or ameliorate the accuracy loss problem for input quantization. Can you share some idea with me?
In tensorRT, it has two more parameters cutoff and quantile. Do they has an impact on the result of quantization?