I’ve built tensorflow from source on my drive PX2 (Cuda 9.2, Cudnn 7.1.2). I’m trying to run a mobilenet network in inference mode on the px2. I’m seeing an average of 12 seconds for inference whereas I expect it to run within a few hundred milliseconds.
Based on the logs, I see that the GPUs are being recognized:
2019-01-24 11:53:56.615435: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0, 1 2019-01-24 11:53:56.615583: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-01-24 11:53:56.615629: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] 0 1 2019-01-24 11:53:56.615668: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0: N N 2019-01-24 11:53:56.615702: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1: N N 2019-01-24 11:53:56.615811: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3401 MB memory) -> physical GPU (device: 0, name: DRIVE PX 2 AutoChauffeur, pci bus id: 0000:04:00.0, compute capability: 6.1) 2019-01-24 11:53:56.616331: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 1796 MB memory) -> physical GPU (device: 1, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
But tegrastats shows 0% utilization of the gpus even when I run it at a very high frequency.
I also used nvpmodel to set mode 0 which I think is the highest clock settings
Also, I don’t understand why in the device matrix, it lists the gpu usage as ‘N N’ but it says after that it’s created tensorflow devices on each GPU