Quantization in TensorRt

Hello everyone.
I am trying to quantize a model (SSD-MobileNetV1) in order to get it from FP32 to INT8 precision. Next, I will have to test the performance of this model in the Jetson using TensorRT. I am wondering, does TensorRT support INT8 precision? Or do some conversions to higher precision?
Thank you.

Hi,

TensorRT does support INT8 operation but Nano doesn’t.
You can find more details below. Nano’s GPU architecture is 5.3.

Thanks.

Thanks for the reply.
So on the jetson I can run a model with INT8 precision only on the CPU right? But in this case I can’t use TensorRT right?

Thank you.

Hi,

The support of CPU mode depends on which library (ex. PyTorch, TensorFlow) you use and if it has INT8 implementation for ARM.

We also have some devices that can support INT8 on the GPU model, ex. Xavier and XavierNX.
With these devices, you can deploy the model in INT8 with TensorRT directly.

Thanks.

Thanks for the reply @AastaLLL.
I wanted to quantize a model to be able to run it on the jetson Nano. From what I learn it is not possible to use the GPU but only its CPU, which would make no sense on the Nano.
Thank you.

Hi,

Nano doesn’t support INT8 operation (hardware limitation).
Do you think FP16 is an option for you?

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.