Libcurand.so.10 not found on JetPack 4.6.2 in docker

fujii5 · June 9, 2022, 11:01am

I would like to use (or build) PyTorch with CUDA in docker container, but it seems CUDA files are not mounted from host.

$ docker run --gpus all --rm -it --network host nvcr.io/nvidia/l4t-pytorch:r32.7.1-pth1.9-py3
root@agx:/# python3 -c 'import torch'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 196, in <module>
    _load_global_deps()
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 149, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/usr/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcurand.so.10: cannot open shared object file: No such file or directory
root@agx:/# ls /usr/local/cuda-10.2/targets/aarch64-linux/lib/
libcudadevrt.a  libcudart_static.a  stubs
root@agx:/#

in host:

$ find /usr -name libcurand.so.*
/usr/local/cuda-10.2/doc/man/man7/libcurand.so.7
/usr/local/cuda-10.2/targets/aarch64-linux/lib/libcurand.so.10
/usr/local/cuda-10.2/targets/aarch64-linux/lib/libcurand.so.10.1.2.300

I also try to use nvcr.io/nvidia/l4t-base:r32.7.1 to build Pytorch, but it also doesn’t have CUDA libraries.

# ls /usr/local/cuda-10.2/targets/aarch64-linux/lib/
libcudadevrt.a  libcudart_static.a  stubs

Is it correct to use r32.7.1 images for JetPack 4.6.2 to use CUDA?
Is there any docker base image with CUDA for JetPack 4.6.2, or should I reinstall other version of JetPack to use CUDA from docker image?

Any other way to use or build any version of PyTorch in docker container would be appreciated.

dusty_nv · June 9, 2022, 4:15pm

Hi @fujii5, I think you mean JetPack 4.6.1 (L4T R32.7.1), and yes the r32.7.1 docker images are the right ones to use with JetPack 4.6.1

On JetPack 4.x, CUDA/cuDNN/TensorRT/ect are mounted from your device into the container when --runtime nvidia is used to start the container. On JetPack 5, CUDA/ect are installed inside the container.

I noticed you were using the --gpus all flag, can you try running it like this instead:

$ sudo docker run -it --rm --runtime nvidia --network host nvcr.io/nvidia/l4t-pytorch:r32.7.1-pth1.9-py3

If that still doesn’t work, can you check that you have these CSV files on your device?

$ ls -ll /etc/nvidia-container-runtime/host-files-for-container.d/
total 32
-rw-r--r-- 1 root root    26 May 23  2021 cuda.csv
-rw-r--r-- 1 root root  4250 Jul 13  2021 cudnn.csv
-rw-r--r-- 1 root root 12240 Feb  2 16:30 l4t.csv
-rw-r--r-- 1 root root  1590 Jan 14 04:44 tensorrt.csv
-rw-r--r-- 1 root root   325 Aug 11  2020 visionworks.csv

fujii5 · June 9, 2022, 4:46pm

@dusty_nv , thanks for your response.

I’m using JetPack 4.6.2 (R32.7.2).
As there is no r32.7.2 tag for l4t-pytorch or l4t-base, I’m currently using r32.7.1 images instead.

$ cat /etc/nv_tegra_release
# R32 (release), REVISION: 7.2, GCID: 30192233, BOARD: t186ref, EABI: aarch64, DATE: Sun Apr 17 09:53:50 UTC 2022

Thanks, but it still doesn’t work:

$ sudo docker run -it --rm --runtime nvidia --network host nvcr.io/nvidia/l4t-pytorch:r32.7.1-pth1.9-py
3
root@agx:/# python3 -c 'import torch'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 196, in <module>
    _load_global_deps()
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 149, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/usr/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcurand.so.10: cannot open shared object file: No such file or directory
root@agx:/#

CSV files exist on host:

$ ls -ll /etc/nvidia-container-runtime/host-files-for-container.d/
total 32
-rw-r--r-- 1 root root    26 May 24  2021 cuda.csv
-rw-r--r-- 1 root root  4250 Jul 13  2021 cudnn.csv
-rw-r--r-- 1 root root 12240 Apr 17 09:49 l4t.csv
-rw-r--r-- 1 root root  1590 Jan 14 09:44 tensorrt.csv
-rw-r--r-- 1 root root   325 Aug 11  2020 visionworks.csv

dusty_nv · June 9, 2022, 5:45pm

OK, the r32.7.1 images should still work on r32.7.2.

Can you check another thing, if you run the following does it work?

$ sudo docker run -it --rm --runtime nvidia --network host nvcr.io/nvidia/l4t-base:r32.7.1
# python3 -c 'import tensorrt'

If that doesn’t work either, it would seem there is something wrong with your NVIDIA Container Runtime, and you should either re-install that through apt, or you may just want to re-flash the device if you continue having problems with it.

user100090 · June 9, 2022, 6:08pm

libcurand.so.10 came from the package libcurand-10-2. Do you have it installed on the device before you start the container?

$ apt list --installed | grep libcurand-10-2

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

libcurand-10-2/stable,now 10.1.2.300-1 arm64 [installed]

fujii5 · June 10, 2022, 2:38am

It doesn’t work, even after I re-installed nvidia-container-runtime by sudo apt remove nvidia-container-runtime ; sudo apt install nvidia-container-runtime.
I’ll try re-flash the device.

Yes, I have libcurand on host:

$ apt list --installed | grep libcurand-10-2

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

libcurand-10-2/stable,now 10.1.2.300-1 arm64 [installed,automatic]

user100090 · June 10, 2022, 3:18am

Since your host has the file /etc/nvidia-container-runtime/host-files-for-container.d/cuda.csv that has the content (please double check yours):

dir, /usr/local/cuda-10.2

This means the container /usr/local/cuda-10.2 should be mapped from the host, and has exactly same content.

Since your host has /usr/local/cuda-10.2/targets/aarch64-linux/lib/libcurand.so.10, your container should have it too.

fujii5 · June 10, 2022, 3:54am

My cuda.csv has the same content:

$ cat /etc/nvidia-container-runtime/host-files-for-container.d/cuda.csv
dir, /usr/local/cuda-10.2

But it seems to be failed to map to container:

$ ls /usr/local/cuda-10.2/
EULA.txt  doc     include  nvml  nvvmx    share    tools         version.txt
bin       extras  lib64    nvvm  samples  targets  version.json
$ docker run -it --rm --runtime nvidia --network host nvcr.io/nvidia/l4t-base:r32.7.1
root@agx:/# ls /usr/local/cuda-10.2/
bin  include  lib64  nvvm  nvvmx  targets

naisy · June 10, 2022, 5:57am

What docker and container are installed?

apt list --installed | grep docker
apt list --installed | grep container

(JetPack 4.6.1)

Also, what happens if you force mount?

docker run -it --rm --runtime nvidia --network host -v /usr/local/cuda-10.2/:/usr/local/cuda-10.2/:ro nvcr.io/nvidia/l4t-base:r32.7.1

python3 -c 'import torch'

fujii5 · June 10, 2022, 6:28am

Some packages are newer version than your JetPack 4.6.1:

$ apt list --installed | grep docker

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

docker/bionic,now 1.5-1build1 arm64 [installed]
docker.io/bionic-updates,bionic-security,now 20.10.7-0ubuntu5~18.04.3 arm64 [installed]
nvidia-docker2/bionic,now 2.10.0-1 all [installed]
$ apt list --installed | grep container

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

containerd/bionic-updates,bionic-security,now 1.5.5-0ubuntu3~18.04.2 arm64 [installed,automatic]
libnvidia-container-tools/bionic,now 1.9.0-1 arm64 [installed]
libnvidia-container0/bionic,now 0.11.0+jetpack arm64 [installed]
libnvidia-container1/bionic,now 1.9.0-1 arm64 [installed]
nvidia-container-csv-cuda/stable,now 10.2.460-1 arm64 [installed]
nvidia-container-csv-cudnn/stable,now 8.2.1.32-1+cuda10.2 arm64 [installed]
nvidia-container-csv-tensorrt/stable,now 8.2.1.8-1+cuda10.2 arm64 [installed]
nvidia-container-csv-visionworks/stable,now 1.6.0.501 arm64 [installed]
nvidia-container-runtime/bionic,now 3.9.0-1 all [installed]
nvidia-container-toolkit/bionic,now 1.9.0-1 arm64 [installed]

libcudnn.so.8 is also needed:

$ docker run -it --rm --runtime nvidia --network host -v /usr/local/cuda-10.2/:/usr/local/cuda-10.2/:ro nvcr.io/nvidia/l4t-pytorch:r32.7.1-pth1.9-py3
root@agx:/# python3 -c 'import torch'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 196, in <module>
    _load_global_deps()
  File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 149, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/usr/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/lib/aarch64-linux-gnu/libcudnn.so.8: file too short

After I force mount /usr/lib/aarch64-linux-gnu too, it seems that torch is available!

$ docker run -it --rm --runtime nvidia --network host -v /usr/local/cuda-10.2/:/usr/local/cuda-10.2/:ro -v /usr/lib/aarch64-linux-gnu/:/usr/lib/aarch64-linux-gnu nvcr.io/nvidia/l4t-pytorch:r32.7.1-pth1.9-py3
root@agx:/# python3 -c 'import torch; print(torch.cuda.is_available())'
True

But when I add read-only option to /usr/lib/aarch64-linux-gnu, it fails to run container:

$ docker run -it --rm --runtime nvidia --network host -v /usr/local/cuda-10.2/:/usr/local/cuda-10.2/:ro -v /usr/lib/aarch64-linux-gnu/:/usr/lib/aarch64-linux-gnu:ro nvcr.io/nvidia/l4t-pytorch:r32.7.1-pth1.9-py3
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: src: /etc/vulkan/icd.d/nvidia_icd.json, src_lnk: /usr/lib/aarch64-linux-gnu/tegra/nvidia_icd.json, dst: /mnt/m2ssd/docker/overlay2/2150184e7577a3a38c5cade12e51d8fab5da5ca67e7bad9f0e0d39527d396eac/merged/etc/vulkan/icd.d/nvidia_icd.json, dst_lnk: /usr/lib/aarch64-linux-gnu/tegra/nvidia_icd.json
src: /usr/lib/aarch64-linux-gnu/libcuda.so, src_lnk: tegra/libcuda.so, dst: /mnt/m2ssd/docker/overlay2/2150184e7577a3a38c5cade12e51d8fab5da5ca67e7bad9f0e0d39527d396eac/merged/usr/lib/aarch64-linux-gnu/libcuda.so, dst_lnk: tegra/libcuda.so
, stderr: nvidia-container-cli: mount error: stat failed: /usr/lib/python3.6/dist-packages/onnx_graphsurgeon: no such file or directory: unknown.

naisy · June 10, 2022, 8:17am

The libnvidia-container0/bionic,now 0.11.0+jetpack installed on your Jetson is released for JetPack 5.

Checking the SDKManager’s deb package download directory, sdkm_downloads/, JetPack 4.6.2 is the same as JetPack 4.6.1 libnvidia-container0_0.10.0+jetpack_arm64.deb and libnvidia-container-tools_1.7.0-1_arm64.deb seems to install.

I think you updated to the JetPack5 package based on some method, but as @dusty_nv wrote, after JetPack5, cuda and others will use what is installed inside the container.

On JetPack 5, CUDA/ect are installed inside the container.

I think re-flashing is the quickest way to solve this issue.
If you cannot re-flash, you can try either of the following

get docker-related deb packages from sdkm_downloads/ and install them

or

install the cuda package in the docker container (TensorRT probably cannot be installed, I don’t think it is distributed in deb).

However, I have not heard of success with either of these.
As for 1., I think I saw some failures with JetPack 4.5.x.
As for 2., I am not sure if it was around the time of JetPack 3.1 or before, but it is similar to the way it was done when the L4T kernel was just getting docker support. CUDA+Tensorflow was working.

fujii5 · June 10, 2022, 8:28am

@naisy
Thank you so much for your detailed explanation. I figured out what is happening.
I’m going to re-flash JetPack 4.6.2.

fujii5 · June 10, 2022, 11:13am

After I re-flashed JetPack 4.6.2 via SDKManager, CUDA libraries are correctly mounted to docker container!
Thank you all!

system · July 6, 2022, 2:29am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
NVIDIA L4T PyTorch Jetson AGX Xavier pytorch	8	1200	July 13, 2022
Cuda directory inside container doesn't contain enough libraries to import torch Jetson AGX Xavier tensorrt , cuda , pytorch	4	774	June 16, 2023
L4T Docker Cuda Docker and NVIDIA Docker	5	1375	July 27, 2021
Mounting CUDA onto L4T docker image issues: libcurand.so.10: cannot open .. No such file or directory Jetson Nano cuda	7	8151	October 15, 2021
Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "10.2") Jetson Nano cuda	16	10751	August 25, 2023
Error importing torch inside a L4T container in Jetson Nano Jetson Nano cudnn	4	373	February 5, 2024
Cuda library is not found in jetson-containers docker Jetson Xavier NX cuda , docker	8	2361	February 1, 2023
"libcudnn.so.8: file too short" running Pytorch in docker Jetson Xavier NX cuda , pytorch , python	10	4784	December 29, 2021
OSError: libcurand.so.10: cannot open shared object file: No such file or directory Jetson AGX Xavier docker , pytorch	2	619	June 21, 2022
Missing library libnvbuf_utils.so.1.0.0 when running docker container Jetson AGX Xavier docker	4	2778	October 18, 2021

Libcurand.so.10 not found on JetPack 4.6.2 in docker

Related topics