TensorRT INT8 Quantization : weights + activations quantization

farescharfii · January 14, 2020, 8:42am

Hello everyone,

I am running INT8 quanization using TRT5 in top of Tensorflow.
In the presentation of the INT8 quantization they mention that the activations are quantized using the Entropy Calibrator, however, the weights are quantized using min-max quantization.

Question: are the weights of the hole graph (all trainable parameters: batch norm param + biases + kernel weights) are taken into consideration and then we just map the max to 127 and the min to -127?

If yes, can you please explain how this is possible if we have huge values for biases or batch norm parameters?

Thanks,
Fares

SunilJB · January 14, 2020, 10:22am

Hi,

There are two ways to enable Int8 interface:

Dynamic range - by setting min, max value per layer
Int8 calibration - Implement the IInt8Calibrator interface to provide calibration data to TensorRT

Please refer below link for more details:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt-515/tensorrt-developer-guide/index.html#enable_int8_c

Int8 quantization is performed per layer. By default, TensorRT will choose Int8 implementation only if it results in a higher-performance network. If an implementation at a higher precision is faster, TensorRT will use it.
You can override this behavior by making the type constraints strict.
builder->setStrictTypeConstraints(true);
https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt-515/tensorrt-developer-guide/index.html#set_layer_mp_c

TRT is quantizing both weight and activation to INT8 precision, but TRT doesn’t accept quantized weights as input from the user on TRT 5.x.

Thanks

farescharfii · January 14, 2020, 11:18am

Hello,

Thank you for your answer!

I am using TF-TRT, so TensorRT in top of Tensorflow. Just for testing, not in c++ level.

And my question is how the weights are quantized for INT8 quantization? Is it quantizating the hole weights together or is it doing it for weights layer by layer.

I am only asking about the weights, and not the activations.

Thanks.

SunilJB · January 17, 2020, 6:31am

Hi,

During INT8 quantization both weights and activations are quantized on per layer basis.

Thanks

curiousguy · February 13, 2020, 11:55pm

Are the quantization then hardware/os specific or are they portable?

Topic		Replies	Views
TensorRT - INT8 Quantization - weights - activations TensorRT	2	1021	January 9, 2020
TensorRT 8-bit Quantization questions TensorRT	7	4945	April 26, 2018
Is there a no way to get quantized weights after calibration? TensorRT	1	481	November 26, 2020
How to set my own quantized weigit and bias scale(not activation)? TensorRT tensorrt	1	406	January 15, 2021
Is there any method to build model with int8 weight in tensorrt? TensorRT	1	1301	July 29, 2021
Alexnet using INT8 GPU-Accelerated Libraries	5	1791	August 29, 2017
IInt8EntropyCalibrator TensorRT	2	1188	September 4, 2018
Is all layer quantized to int8? TensorRT	2	1252	August 30, 2019
Want to know more about INT8 precision Jetson AGX Xavier jetson-inference	7	1057	January 16, 2024
Int8 quantization TensorRT	1	561	December 16, 2021

TensorRT INT8 Quantization : weights + activations quantization

Related topics