TensorFlow C-library with CUDA support gets stuck

sdeoras · April 20, 2019, 1:05pm

This thread is to report the behavior of TensorFlow C-library on Jetson Nano. In particular, TF library with CUDA support seems to get stuck during runtime. I built TensorFlow v1.12.0 as a C-library first without CUDA support enabled and then with CUDA support enabled. I am comparing the output of these two scenarios using a simple matrix calculation benchmark code (written in Go).

Using TF C-library built without CUDA support
In this case the matrix allocation and computation output is as expected… no issues. As you can see a 100x100 matrix is allocated in about 14ms and its inverse is computed in about 10ms and so on.

$ matrix-inversion-benchmark-tf
matrix allocation: [100 100] 13.733535ms
  inv computation: [100 100] 10.961744ms
matrix allocation: [200 200] 3.695703ms
  inv computation: [200 200] 18.391433ms
matrix allocation: [500 500] 21.158223ms
  inv computation: [500 500] 194.820018ms
matrix allocation: [1000 1000] 79.596653ms
  inv computation: [1000 1000] 1.332078944s

Using TF C-library built with CUDA support
In this case we see a NUMA warning and couple other messages but the code gets stuck.

$ matrix-inversion-benchmark-tf
2019-04-20 05:45:21.795288: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:931] ARM64 does not support NUMA - returning NUMA node zero
2019-04-20 05:45:21.795471: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
pciBusID: 0000:00:00.0
totalMemory: 3.87GiB freeMemory: 1.72GiB
2019-04-20 05:45:21.795528: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
^C

Other details
Output of ldd command on the TF C-library built using CUDA support:

$ ldd /usr/local/lib/libtensorflow.so
	linux-vdso.so.1 (0x0000007f95b04000)
	libtensorflow_framework.so => /usr/local/lib/libtensorflow_framework.so (0x0000007f84c3d000)
	libcublas.so.10.0 => /usr/local/cuda/lib64/libcublas.so.10.0 (0x0000007f8030c000)
	libcusolver.so.10.0 => /usr/local/cuda/lib64/libcusolver.so.10.0 (0x0000007f7723b000)
	libcudart.so.10.0 => /usr/local/cuda/lib64/libcudart.so.10.0 (0x0000007f771ca000)
	libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000007f77192000)
	libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000007f77166000)
	libgomp.so.1 => /usr/lib/aarch64-linux-gnu/libgomp.so.1 (0x0000007f77129000)
	libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000007f7706f000)
	librt.so.1 => /lib/aarch64-linux-gnu/librt.so.1 (0x0000007f77058000)
	libstdc++.so.6 => /usr/lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000007f76ec3000)
	libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000007f76e9f000)
	libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000007f76d46000)
	/lib/ld-linux-aarch64.so.1 (0x0000007f95ad9000)
	libcuda.so.1 => /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 (0x0000007f75e22000)
	libcudnn.so.7 => /usr/lib/aarch64-linux-gnu/libcudnn.so.7 (0x0000007f6180d000)
	libcufft.so.10.0 => /usr/local/cuda/lib64/libcufft.so.10.0 (0x0000007f598fe000)
	libcurand.so.10.0 => /usr/local/cuda/lib64/libcurand.so.10.0 (0x0000007f556ff000)
	libnvrm_gpu.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so (0x0000007f556bc000)
	libnvrm.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm.so (0x0000007f5567a000)
	libnvrm_graphics.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm_graphics.so (0x0000007f5565b000)
	libnvidia-fatbinaryloader.so.32.1.0 => /usr/lib/aarch64-linux-gnu/tegra/libnvidia-fatbinaryloader.so.32.1.0 (0x0000007f555fd000)
	libnvos.so => /usr/lib/aarch64-linux-gnu/tegra/libnvos.so (0x0000007f555df000)

Output of ldd command on TF C-library built without CUDA support:

$ ldd ./libtensorflow.so
	linux-vdso.so.1 (0x0000007f80311000)
	libtensorflow_framework.so (0x0000007f7c63c000)
	libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000007f7c604000)
	libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000007f7c5d8000)
	libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000007f7c51e000)
	librt.so.1 => /lib/aarch64-linux-gnu/librt.so.1 (0x0000007f7c507000)
	libstdc++.so.6 => /usr/lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000007f7c372000)
	libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000007f7c34e000)
	libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000007f7c1f5000)
	/lib/ld-linux-aarch64.so.1 (0x0000007f802e6000)

Appreciate any help towards resolution of this issue. As a side note, I tried building TF v1.13.1 but failed due to apparent gcc issue reported here: https://github.com/tensorflow/tensorflow/issues/27931

AastaLLL · April 22, 2019, 3:48am

Hi,

May I know which compute capacity do you use for building the C++ library?
Please noticed that Nano is sm=5.3.

------------------------------------------------------------------------------------------------------------
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: CUDA GPUs - Compute Capability | NVIDIA Developer.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2] 5.3
------------------------------------------------------------------------------------------------------------

Thanks.

sdeoras · April 22, 2019, 12:18pm

Hi @AastaLLL, I had the default compute capability 3.5,7.0 for the previous build. I started a new build with 5.3

Does NCCL value of 1.3 look good?

Please specify the NCCL version you want to use. If NCCL 2.2 is not installed, then you can use version 1.3 that can be fetched automatically but it may have worse performance with multiple GPUs. [Default is 2.2]: 1.3

Also wondering if you could share TF v1.13.1 build steps being used by Nvidia. I could not build TF v1.13.1 with gcc 7.3

Thank you.

AastaLLL · April 23, 2019, 2:06am

Hi,

NCCL only supports PCIe based GPU. Please turn it off when building TensorFlow.

For example:

$ bazel build --config=opt --local_resources 2048,3.0,1.0 --config=cuda --config=nonccl //tensorflow/tools/pip_package:build_pip_package

Thanks.

sdeoras · April 23, 2019, 2:28am

Hi AastaLLL,

Thank you so much for your input. GPU build came out fine! All seems to be working but I’ll start another build with bazel CLI options you listed above.

Thanks,
Saurabh

Topic		Replies	Views
TensorFlow libs for C/C++ integration on Jetson Nano Jetson Nano	15	4464	October 18, 2021
Cuda capability problem? Jetson Nano cuda	10	2038	October 18, 2021
Nano with jetpack 4.3 can't find gpu with tensorflow 2.1 Jetson Nano tensorflow	9	1472	October 18, 2021
run tensorflow 1.3 on tx2 stuck Jetson TX2	20	5624	October 18, 2021
How to compile Tensorflow c++ in Jetson nano Jetson Nano	2	1024	October 18, 2021
Libtensorflow_jni for Jetson Nano Jetson Nano	15	1878	October 18, 2021
Nano development board, CUDA has the following problems after manually compiling tensorflow Jetson Nano cuda	6	1037	September 12, 2021
tensorflow was not compiled for cuda support jetson nano ?? Jetson Nano	2	878	October 14, 2021
ARM64 does not support NUMA - returning NUMA node zero Jetson Nano	3	7136	October 18, 2021
TensorFlow GPU not Working Jetson Nano tensorflow	5	2381	October 10, 2021

TensorFlow C-library with CUDA support gets stuck

Related topics