Nano with jetpack 4.3 can't find gpu with tensorflow 2.1

I am going through a lot of struggle getting my nano to work with tensorflow. Today i managed to install Tensorflow 2.1, on a nano with a fresh install of jetpack 4.3. Sadly, TensorFlow doesn’t detect the gpu :( I am doing this:

gpus = tf.config.list_physical_devices(‘GPU’)
print(“gpus:”, gpus)

gpus =

is the result :(

Please help! I need the GPU!

After more investigation, it turns out that the nano GPU is handled as a ‘XLA_GPU’ by TensorFlow 2.1. XLA means ‘Accelerated Linear Algebra’. This is some kind of optimised form of a generic GPU for ‘mobile devices’ as far as i can tell. Maybe good, but why does asking for ‘GPU’ not include ‘XLA_GPU’ devices?
Troublesome…

Hi,

Which TensorFlow package do you install?
Do you follow the steps listed in this document?
https://docs.nvidia.com/deeplearning/frameworks/install-tf-jetson-platform/index.html

Thanks.

Yes i did follow nvidia’s instructions to install the latest tensorflow version that is compatible with jetpack v43. This turned out to be tf 2.1.

Hi,

Thanks for your report.
We are going to reproduce this and will update more information with you later.

Thanks.

I have the idea that the nano is doing the tensorflow operations quite quickly, at least faster than purely on a CPU. With tegrastats i can see that the GPU load is around 25%.

See here the typical output from one of my tensorflow based programs. As you can see, many libraries failed to load. libcuda.so did load however.
jpad@nanon:~/prog/python/neuro/sandscope/isface-2-128$ python3 isface.py
2020-07-16 12:13:39.822447: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcudart.so.10.0’; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2020-07-16 12:13:39.822519: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2020-07-16 12:13:42.581867: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libnvinfer.so.6’; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2020-07-16 12:13:42.582064: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libnvinfer_plugin.so.6’; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-07-16 12:13:42.582107: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
TensorFlow version 2.1.0
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/compat/v2_compat.py:88: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
2020-07-16 12:13:46.129359: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-07-16 12:13:46.135156: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-07-16 12:13:46.135340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.86GiB deviceMemoryBandwidth: 194.55MiB/s
2020-07-16 12:13:46.135617: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcudart.so.10.0’; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2020-07-16 12:13:46.135822: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcublas.so.10.0’; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2020-07-16 12:13:46.135998: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcufft.so.10.0’; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2020-07-16 12:13:46.136178: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcurand.so.10.0’; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2020-07-16 12:13:46.136360: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcusolver.so.10.0’; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2020-07-16 12:13:46.136548: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcusparse.so.10.0’; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2020-07-16 12:13:46.136728: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcudnn.so.7’; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory
2020-07-16 12:13:46.136769: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1592] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices…
devices: [PhysicalDevice(name=’/physical_device:CPU:0’, device_type=‘CPU’), PhysicalDevice(name=’/physical_device:XLA_CPU:0’, device_type=‘XLA_CPU’), PhysicalDevice(name=’/physical_device:XLA_GPU:0’, device_type=‘XLA_GPU’)]
WARNING:tensorflow:From /home/jpad/prog/python/neuro/sandscope/isface-2-128/sampling.py:167: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2020-07-16 12:13:46.446894: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2020-07-16 12:13:46.447578: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2f7865f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-16 12:13:46.447652: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-07-16 12:13:46.526857: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-07-16 12:13:46.527134: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2f6ecd40 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-07-16 12:13:46.527191: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
2020-07-16 12:13:46.527645: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-07-16 12:13:46.527750: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.86GiB deviceMemoryBandwidth: 194.55MiB/s
2020-07-16 12:13:46.527999: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcudart.so.10.0’; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2020-07-16 12:13:46.528163: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcublas.so.10.0’; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2020-07-16 12:13:46.528313: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcufft.so.10.0’; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2020-07-16 12:13:46.528461: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcurand.so.10.0’; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2020-07-16 12:13:46.528602: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcusolver.so.10.0’; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2020-07-16 12:13:46.528740: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcusparse.so.10.0’; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2020-07-16 12:13:46.528880: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcudnn.so.7’; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory
2020-07-16 12:13:46.528913: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1592] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices…
2020-07-16 12:13:46.528961: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-16 12:13:46.528993: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2020-07-16 12:13:46.529018: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
classifier_2_128( (?, 128, 128, 1) )
kernel w (3, 3, 1, 8) b (8,)
WARNING:tensorflow:From isface.py:240: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use rate instead of keep_prob. Rate should be set to rate = 1 - keep_prob.
h0 (?, 64, 64, 8)
kernel w (3, 3, 8, 16) b (16,)
WARNING:tensorflow:From isface.py:247: batch_normalization (from tensorflow.python.layers.normalization) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.BatchNormalization instead. In particular, tf.control_dependencies(tf.GraphKeys.UPDATE_OPS) should not be used (consult the tf.keras.layers.BatchNormalization documentation).
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/layers/normalization.py:327: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use layer.__call__ method instead.
h1 (?, 32, 32, 16)
kernel w (3, 3, 16, 24) b (24,)
h2 (?, 16, 16, 24)
kernel w (3, 3, 24, 32) b (32,)
h3 (?, 8, 8, 32)
kernel w (3, 3, 32, 48) b (48,)
h4 (?, 4, 4, 48)
kernel w (3, 3, 48, 64) b (64,)
h5 (?, 2, 2, 64)
h6 (?, 96)
h7 (?, 2)
128_2 classifier : 78490 parameters
restored checkpoint ‘/home/jpad/prog/python/neuro/sandscope/isface-2-128/good/cnn-78K-gen6b-e915’
initializing UNIX domain socket server @ /tmp/isface.socket
socket /tmp/isface.socket in use, retrying in 1 second…
listening @ /tmp/isface.socket

If you’d like any more info, let me know. I’d like to get the most of this wonderful little machine that the nano is!

Hi,

Could not load dynamic library ‘libnvinfer.so.6’; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory

Based on the log, the error is caused by a missing library.
Could you check if the libnvinfer.so.6 do exist on your environment first?

Thanks.

I am away from that system now, but once i get to it, i will have a look. Where should all the missing libraries normally be present?

Hi,

The library should under /usr/lib/aarch64-linux-gnu/.

/usr/lib/aarch64-linux-gnu/libnvinfer.so.6
/usr/lib/aarch64-linux-gnu/libnvinfer.so.6.0.1

Thanks.