This thread is to report the behavior of TensorFlow C-library on Jetson Nano. In particular, TF library with CUDA support seems to get stuck during runtime. I built TensorFlow v1.12.0 as a C-library first without CUDA support enabled and then with CUDA support enabled. I am comparing the output of these two scenarios using a simple matrix calculation benchmark code (written in Go).
Using TF C-library built without CUDA support
In this case the matrix allocation and computation output is as expected… no issues. As you can see a 100x100 matrix is allocated in about 14ms and its inverse is computed in about 10ms and so on.
$ matrix-inversion-benchmark-tf
matrix allocation: [100 100] 13.733535ms
inv computation: [100 100] 10.961744ms
matrix allocation: [200 200] 3.695703ms
inv computation: [200 200] 18.391433ms
matrix allocation: [500 500] 21.158223ms
inv computation: [500 500] 194.820018ms
matrix allocation: [1000 1000] 79.596653ms
inv computation: [1000 1000] 1.332078944s
Using TF C-library built with CUDA support
In this case we see a NUMA warning and couple other messages but the code gets stuck.
$ matrix-inversion-benchmark-tf
2019-04-20 05:45:21.795288: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:931] ARM64 does not support NUMA - returning NUMA node zero
2019-04-20 05:45:21.795471: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
pciBusID: 0000:00:00.0
totalMemory: 3.87GiB freeMemory: 1.72GiB
2019-04-20 05:45:21.795528: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
^C
Other details
Output of ldd command on the TF C-library built using CUDA support:
$ ldd /usr/local/lib/libtensorflow.so
linux-vdso.so.1 (0x0000007f95b04000)
libtensorflow_framework.so => /usr/local/lib/libtensorflow_framework.so (0x0000007f84c3d000)
libcublas.so.10.0 => /usr/local/cuda/lib64/libcublas.so.10.0 (0x0000007f8030c000)
libcusolver.so.10.0 => /usr/local/cuda/lib64/libcusolver.so.10.0 (0x0000007f7723b000)
libcudart.so.10.0 => /usr/local/cuda/lib64/libcudart.so.10.0 (0x0000007f771ca000)
libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000007f77192000)
libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000007f77166000)
libgomp.so.1 => /usr/lib/aarch64-linux-gnu/libgomp.so.1 (0x0000007f77129000)
libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000007f7706f000)
librt.so.1 => /lib/aarch64-linux-gnu/librt.so.1 (0x0000007f77058000)
libstdc++.so.6 => /usr/lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000007f76ec3000)
libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000007f76e9f000)
libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000007f76d46000)
/lib/ld-linux-aarch64.so.1 (0x0000007f95ad9000)
libcuda.so.1 => /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 (0x0000007f75e22000)
libcudnn.so.7 => /usr/lib/aarch64-linux-gnu/libcudnn.so.7 (0x0000007f6180d000)
libcufft.so.10.0 => /usr/local/cuda/lib64/libcufft.so.10.0 (0x0000007f598fe000)
libcurand.so.10.0 => /usr/local/cuda/lib64/libcurand.so.10.0 (0x0000007f556ff000)
libnvrm_gpu.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so (0x0000007f556bc000)
libnvrm.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm.so (0x0000007f5567a000)
libnvrm_graphics.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm_graphics.so (0x0000007f5565b000)
libnvidia-fatbinaryloader.so.32.1.0 => /usr/lib/aarch64-linux-gnu/tegra/libnvidia-fatbinaryloader.so.32.1.0 (0x0000007f555fd000)
libnvos.so => /usr/lib/aarch64-linux-gnu/tegra/libnvos.so (0x0000007f555df000)
Output of ldd command on TF C-library built without CUDA support:
$ ldd ./libtensorflow.so
linux-vdso.so.1 (0x0000007f80311000)
libtensorflow_framework.so (0x0000007f7c63c000)
libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000007f7c604000)
libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000007f7c5d8000)
libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000007f7c51e000)
librt.so.1 => /lib/aarch64-linux-gnu/librt.so.1 (0x0000007f7c507000)
libstdc++.so.6 => /usr/lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000007f7c372000)
libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000007f7c34e000)
libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000007f7c1f5000)
/lib/ld-linux-aarch64.so.1 (0x0000007f802e6000)
Appreciate any help towards resolution of this issue. As a side note, I tried building TF v1.13.1 but failed due to apparent gcc issue reported here: https://github.com/tensorflow/tensorflow/issues/27931