I am running the example provided at DeepLearningExamples/Colab_UNet_Industrial_TF_TFTRT_inference_demo.ipynb at master · NVIDIA/DeepLearningExamples · GitHub .
While FP32 and FP15 are giving an inference time of 0.007 secs, INT8 precision is way too slow. Around 5 secs. I am running it on one GPU.
Can someone help me to understand the possible reasons for such a behavior?
Hi @gaurav.verma ,
Could you please let us know which gpu you are using. We have int8 tensorcore only for Turing and Ampere gpus. Though int8 is also functionally supported on some old gpus, we recommend to switch to newer GPU for acceleration.
Thank you.
Hi @spolisetty ,
I am using GeForce RTX 2080 (Turing Arch). And there are no tensor cores on the GPU.
Thank You.
NVES
February 9, 2021, 2:58pm
4
Hi, Request you to share the model, script, profiler and performance output so that we can help you better.
Alternatively, you can try running your model with trtexec command
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
or view these tips for optimizing performance
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html
Thanks!
Hi @gaurav.verma ,
Thanks for providing the info. Could you please provide us verbose build log and inference log, to debug.
Thank you.
Hi @spolisetty ,
You can reproduce the results using UNet_Industrial TensorFlow checkpoint (AMP) | NVIDIA NGC (v19.x.x.], and then you can follow INT8 Inference from INT8 Inf TRT .