High inference time while running UNet with INT8 precision

I am running the example provided at DeepLearningExamples/Colab_UNet_Industrial_TF_TFTRT_inference_demo.ipynb at master · NVIDIA/DeepLearningExamples · GitHub.

While FP32 and FP15 are giving an inference time of 0.007 secs, INT8 precision is way too slow. Around 5 secs. I am running it on one GPU.

Can someone help me to understand the possible reasons for such a behavior?

Hi @gaurav.verma,

Could you please let us know which gpu you are using. We have int8 tensorcore only for Turing and Ampere gpus. Though int8 is also functionally supported on some old gpus, we recommend to switch to newer GPU for acceleration.

Thank you.

Hi @spolisetty,

I am using GeForce RTX 2080 (Turing Arch). And there are no tensor cores on the GPU.

Thank You.

Hi, Request you to share the model, script, profiler and performance output so that we can help you better.

Alternatively, you can try running your model with trtexec command

or view these tips for optimizing performance
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html

Thanks!

Hi @gaurav.verma,

Thanks for providing the info. Could you please provide us verbose build log and inference log, to debug.

Thank you.

Hi @spolisetty,

You can reproduce the results using NVIDIA NGC (v19.x.x.], and then you can follow INT8 Inference from INT8 Inf TRT.