From the presentation of Szymon Migacz (http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf) I learned that TensorRT employs symmetric linear quantization and that it requires the threshold parameter (T) to be estimated on a calibration dataset.
However, when looking into the int8 sample provided with TensorRT I can see the two parameters are being estimated: cutoff and quantile. My question is: how do they correspond to the T?
I assume that T = |max| * cutoff, or am I wrong? What meaning does the quantile has then?
TensorRT user guide mentions a white paper describing those in details (The cutoff and quantile parameters take values in the range [0,1]; their meaning is discussed in detail in the accompanying white paper.), but I have no idea where to find the white paper, could you maybe share it here?