L4t-tensorflow:r32.4.3-tf2.2-py3 cuda linker path problem

Hello,

In the l4t-tensorflow:r32.4.3-tf2.2-py3 available at https://ngc.nvidia.com/catalog/containers/nvidia:l4t-tensorflow, the /etc/ld.so.conf.d/nvidia.conf path is specified as /usr/local/cuda-10.0/targets/aarch64-linux/lib when it should be /usr/local/cuda-10.2/targets/aarch64-linux/lib.

Updating that and running ldconfig will allow tensorflow to properly load libcudart.so.10.2, libcudalas.so.10, libcufft.so.10, libcurand.so.10, libcusolver.so.10, libcusparse.so.10, libcudnn.so.8.

As shipped, tensorflow uses the cpu. Fixing the path to the Cuda 10.2 libraries for dynamic loading will get it using the nano hardware, e.g. the shakespeare rnn example from Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow takes about an hour with the Jetson Nano tweaked for cuDNN usage, or 12 hours with the CPU.

HTH!

Hi,

Thanks for your reply.

We are going to check this issue.
Will share more information with you later.

Thanks.

1 Like

The GRU layers need recurrent_dropout=0 for the fast cuDNN in that Jupyter notebook example. Jetson Nano completes the training more than twice as fast as the example output. Not sure if that’s a fair comparison though…

Hi,

Thanks for your reporting.

Confirmed that the path located at /etc/ld.so.conf.d/nvidia.conf links to the CUDA 10.0 incorrectly.
We are checking this with our internal team. Will keep you updated if we got any feedback.

However, we don’t find any issue when loading the CUDA library from TensorFlow.
The libcudart.so.10.2 can be loaded without modifying the nvidia.conf file.

root@nvidia-desktop:/# python3
Python 3.6.9 (default, Apr 18 2020, 01:56:04) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2020-08-17 04:35:26.598242: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2

Do you meet any error when loading it with the default nvidia.conf?
If yes, would you mind to share the log with us?

By the way, we don’t recommend Jetson device for training since its limited storage and bandwidth.
Please also remember to maximize the device clocks as following to get the optimal performance.

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

You can also monitor the GPU utilization with $ sudo tegrastats.
If the GR3D_FREQ ratio cannot reach 99%, you may meet the IO limitation rather computational bound.

Thanks.

1 Like

How odd, I can’t reproduce it now. It happened or else I wouldn’t have gone digging into ld.so.conf files but I can’t seem to make it happen again with either l4t-tensorflow or l4t-ml. Sorry to waste your time here.