Int8 model speeds up the inferencing but does it takes considerable less GPU memory?

sharma.rahul98912 · August 18, 2022, 7:00am

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 6.0
• JetPack Version (valid for Jetson only)
• TensorRT Version 8.0.1
• NVIDIA GPU Driver Version (valid for GPU only) 495.44
• Issue Type( questions, new requirements, bugs) Question

Hi,
I know INT8 model speeds up the inferencing but is there any math to tell how much less GPU memory(%) will it take?

Like i am using Centerface facedetector model (FP32). It takes around 2.5ms for inferencing. But Deepstream app with Centerface as Primary detector takes 1.2GB GPU memory.

So if i converted the model to INT8 ? can anyone tell how much speed i would get and how much less GPU memory it will take? I just want approximate ideas to understand.

Note-
I tried Facenet and integrated with Deepstream, The model is INT8 but it took 1.1GB GPU memory.

Thanks.

fanzh · August 19, 2022, 8:19am

there is no math to tell how much less GPU memory.

alexandru.cocinda · August 19, 2022, 8:36am

At this point my guess is that no-one really knows how the Gst-nvinfer plugin actually works, so you are very unlikely to get that kind of info here.
However, I am currently optimizing and reformatting the plugin while porting it to C++ and I can give you some indications. First, it depends on the network you use. TensorRT will allocate memory for every layer used by the network and you will have to allocate input and output buffers to pass the input to the network and to retrieve the inference result. The input tensor(s) are generally floats because you apply normalization, but the rest entirely depends on your network definition. Therefore, the best way to find out is to profile.
Finally, I would like to take a moment to appreciate fanzh’s “there is no math to tell how much”. This is just so LOL :)))))))

sharma.rahul98912 · August 19, 2022, 8:45am

Hi alexandru,
Thanks for your valuable info.

I would also appreciate the fanzh’s reply. As i asked the idea for the same if there is any.
Also i mentioned the Centerface model and Deepstream triton server is already using that . So i thought model architecture is also know to Nvidia. So they can give me some insight for this model for FP32->INT8 conversion.

system · September 2, 2022, 8:46am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.