High inference time while running UNet with INT8 precision

gaurav.verma · February 7, 2021, 6:32pm

I am running the example provided at DeepLearningExamples/Colab_UNet_Industrial_TF_TFTRT_inference_demo.ipynb at master · NVIDIA/DeepLearningExamples · GitHub.

While FP32 and FP15 are giving an inference time of 0.007 secs, INT8 precision is way too slow. Around 5 secs. I am running it on one GPU.

Can someone help me to understand the possible reasons for such a behavior?

spolisetty · February 9, 2021, 9:47am

Hi @gaurav.verma,

Could you please let us know which gpu you are using. We have int8 tensorcore only for Turing and Ampere gpus. Though int8 is also functionally supported on some old gpus, we recommend to switch to newer GPU for acceleration.

Thank you.

gaurav.verma · February 9, 2021, 10:37am

Hi @spolisetty,

I am using GeForce RTX 2080 (Turing Arch). And there are no tensor cores on the GPU.

Thank You.

NVES · February 9, 2021, 2:58pm

Hi, Request you to share the model, script, profiler and performance output so that we can help you better.

Alternatively, you can try running your model with trtexec command
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
or view these tips for optimizing performance
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html

Thanks!

spolisetty · February 10, 2021, 7:10am

Hi @gaurav.verma,

Thanks for providing the info. Could you please provide us verbose build log and inference log, to debug.

Thank you.

gaurav.verma · February 10, 2021, 10:29am

Hi @spolisetty,

You can reproduce the results using UNet_Industrial TensorFlow checkpoint (AMP) | NVIDIA NGC (v19.x.x.], and then you can follow INT8 Inference from INT8 Inf TRT.

Topic		Replies	Views
TRT Engin in INT8 is much slower than FP16 TensorRT	4	1920	November 11, 2021
INT8 (8-bit inference, post-training quantization) on Windows 10 is much slower than Ubuntu 20.04 TensorRT	5	737	September 23, 2022
low inference latency for INT8, comaped to FP32, FP16 using Tensorflow 1.13 and TensorRT 5.1.2 TensorRT	1	975	January 24, 2020
Turing Tensor core int4 operation TensorRT	3	2802	December 11, 2018
Same inference speed for INT8 and FP16 TensorRT	10	5792	October 12, 2021
FP16 --half=true option doesn't work on GTX 1080 TI although it runs ./sample_int8 INT8 GPU-Accelerated Libraries	2	4901	August 23, 2017
Failed to use INT8 precision mode when using tf-trt on Xavier Jetson AGX Xavier	4	968	October 18, 2021
Large time delay by chance in TensorRT8.2 TensorRT	4	442	April 18, 2022
tensorRT FP8 support TensorRT tensorrt	2	2570	June 21, 2023
TensorRT poor inference performance on Ampere TensorRT	1	458	February 25, 2021

High inference time while running UNet with INT8 precision

Related topics