How to apply a custom int8 quantization method with TensorRT ?

Nawnaw · April 2, 2019, 1:36pm

Hi, I’ve been looking for a way to quantize my DNN model using my int8 weight and activation
quantization method and do inference with TensorRT.

I can train the model by quantization-aware training with some quantization method, and also can save the trained low-precision weights, but, in the inference phase, I have no clue how to apply the same activation quantization method as that of the training phase, instead of the TensorRT’s method…

So my question is that how can I implement a custom activation quantization method with TensorRT inference ?

Thank you.

Nawnaw · April 8, 2019, 2:36am

Is it impossible ? Do I need to implement quantized layers and activation functions without TensorRT ?

Topic		Replies	Views
TensorRT INT8 Quantization : weights + activations quantization TensorRT	4	2235	February 13, 2020
Tensorrt inferencing getting failed with custom quantized int 8 TensorFlow model TensorRT tensorrt , ubuntu , python , cudnn	1	115	March 28, 2025
TensorRT - INT8 Quantization - weights - activations TensorRT	2	1021	January 9, 2020
TensorRT 8-bit Quantization questions TensorRT	7	4950	April 26, 2018
Is there a no way to get quantized weights after calibration? TensorRT	1	481	November 26, 2020
Practical aspects about neural networks quantization with TensorRT TensorRT tensorrt	1	986	March 31, 2023
How to set my own quantized weigit and bias scale(not activation)? TensorRT tensorrt	1	407	January 15, 2021
How to quantize a model for Tensorrt? TensorRT tensorrt , pytorch , python	0	190	February 6, 2025
pre-quantized models on Jetson AGX Xavier Jetson AGX Xavier	10	1084	October 18, 2021
Int8 quantization TensorRT	1	561	December 16, 2021

How to apply a custom int8 quantization method with TensorRT ?

Related topics