I am running the example provided at DeepLearningExamples/Colab_UNet_Industrial_TF_TFTRT_inference_demo.ipynb at master · NVIDIA/DeepLearningExamples · GitHub.
While FP32 and FP15 are giving an inference time of 0.007 secs, INT8 precision is way too slow. Around 5 secs. I am running it on one GPU.
Can someone help me to understand the possible reasons for such a behavior?