Why INT8 calibration conversion takes almost 1 hr?

hi all, why the calibration is a very slow process that takes almost 1 hour?. On the other hand, I have profiled the GPU % utilization while conducting the calibration process and it was only around 1% during the entire process, shouldn’t be the %gpu utilization be maximized in order to decrease the calibration time?. Is the calibration process happening at the cpu level or at the gpu level?

Hello,

This is not expected. I’m seeing high GPU utlization during tensorrt/samples/sampleINT8 calibration.

Can you verify you are running under GPU enabled?

Hi NVES, I am using the script sample inference.py and running it from the docker image nvcr.io/nvidia/tensorflow:18.10-py3. How can I verify the GPU is enabled?

OK. You should have GPU enabled using the NGC TF container. Just to make sure, try the following (you should see list of GPUs printed)

import tensorflow as tf
tf.test.gpu_device_name()

The other consideration is you are using imagenet dataset. It’s large. How is stored? I suspect disk i/o activity is negating performance gains you might’ve experienced with GPUs.

Hi, here is the tracelog. The Imagenet dataset is stored as TFRecord files

>>> import tensorflow as tf
>>> tf.test.gpu_device_name()
2019-01-08 19:29:06.215674: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:21:00.0
totalMemory: 15.72GiB freeMemory: 15.59GiB
2019-01-08 19:29:06.345053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 1 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:41:00.0
totalMemory: 15.72GiB freeMemory: 15.59GiB
2019-01-08 19:29:06.484859: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 2 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:81:00.0
totalMemory: 15.72GiB freeMemory: 15.59GiB
2019-01-08 19:29:06.632444: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 3 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:c1:00.0
totalMemory: 15.72GiB freeMemory: 15.59GiB
2019-01-08 19:29:06.645180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0, 1, 2, 3
2019-01-08 19:29:07.924597: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-08 19:29:07.924676: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0 1 2 3
2019-01-08 19:29:07.924685: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N Y Y Y
2019-01-08 19:29:07.924694: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 1:   Y N Y Y
2019-01-08 19:29:07.924700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 2:   Y Y N Y
2019-01-08 19:29:07.924708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 3:   Y Y Y N
2019-01-08 19:29:07.926263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/device:GPU:0 with 15093 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:21:00.0, compute capability: 7.5)
2019-01-08 19:29:08.072384: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/device:GPU:1 with 15093 MB memory) -> physical GPU (device: 1, name: Tesla T4, pci bus id: 0000:41:00.0, compute capability: 7.5)
2019-01-08 19:29:08.223131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/device:GPU:2 with 15093 MB memory) -> physical GPU (device: 2, name: Tesla T4, pci bus id: 0000:81:00.0, compute capability: 7.5)
2019-01-08 19:29:08.370002: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/device:GPU:3 with 15093 MB memory) -> physical GPU (device: 3, name: Tesla T4, pci bus id: 0000:c1:00.0, compute capability: 7.5)
'/device:GPU:0'

What average %gpu utilization did you get during the calibration process?. Some recommendations?