Hi all, I’ve recently started working on a TX2 to see if I can use it to accelerate a KERAS object detection program I’ve been working on. The code is kind of a group project written in python3 and originally developed on windows machines.
I understand that using tensorRT is the best path forward, but as development is ongoing via windows I was hoping to stick to python3 (which isn’t supported by tensorRT on aarch64 yet afaik)
I built a wheel for tensorflow1.6 for python3.6 with CUDA9 support enabled and fired up my code. I was happy to find that tensorflow detected the GPU (as posted below) BUT our code still runs painfully slow.
tegrastats.sh shows gpu usage only from 0-12% while the keras python program is running, so I’d assume it is not in fact using the GPU?
I also used
sudo nvpmodel -m 0
with no difference
Any insights would be much appreciated, thanks!
Using TensorFlow backend.
2018-03-14 10:08:15.229050: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero
2018-03-14 10:08:15.229193: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
totalMemory: 7.67GiB freeMemory: 4.57GiB
2018-03-14 10:08:15.229249: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-03-14 10:08:16.792383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-03-14 10:08:16.792468: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
2018-03-14 10:08:16.792495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
2018-03-14 10:08:16.792683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3946 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)