TF TRT Quantization Aware training

Hello everyone,

I want to experiment INT8 quantization-aware training supported by TF-TRT (TRT5).

Documentation is in this guide: Accelerating Inference In TF-TRT User Guide :: NVIDIA Deep Learning Frameworks Documentation

They mention the following: Your TensorFlow graph should be augmented with quantization nodes and then the model will be trained as normal.You can use fixed quantization ranges or make them trainable variables.

My question is the following: How to make the quantization ranges trainable? In fact, we have to add fake quantization layers (tf.quantization.fake_quant_with_min_max_vars(inputs, min, max)) → How should we make these min and max variables trainable?

Thanks!

Hi,

Please refer to below sample test example:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/compiler/tensorrt/test/quantization_mnist_test.py

Thanks