Docker Nvidia Fails after Software Update

After performing the following software update. Docker Nvidia runtime no longer works.

./jetsonInfo.py                                                                                                                                                    12:06:09
NVIDIA NVIDIA Jetson Xavier NX Developer Kit
 L4T 32.7.4 [ JetPack UNKNOWN ]
   Ubuntu 18.04.6 LTS
   Kernel Version: 4.9.337-tegra
 CUDA 10.2.300
   CUDA Architecture: 7.2
 OpenCV version: 4.1.1
   OpenCV Cuda: NO
 CUDNN: 8.2.1.32
 TensorRT: 8.2.1.9
 Vision Works: 1.6.0.501
 VPI: 1.2.3
 Vulcan: 1.2.70

The upgraded software that caused the break.

$ less /var/log/apt/history.log
Start-Date: 2023-11-21  22:18:02
Commandline: apt upgrade
Install: nvidia-container-toolkit-base:arm64 (1.13.5-1, automatic), ubuntu-pro-client-l10n:arm64 (30~18.04, automatic)
Upgrade: nvidia-docker2:arm64 (2.8.0-1, 2.13.0-1), libnvidia-container-tools:arm64 (1.7.0-1, 1.13.5-1), nvidia-container-runtime:arm64 (3.7.0-1, 3.13.0-1), ubuntu-advantage-tools:arm64 (29.4~18.04, 30~18.04), libnvidia-container0:arm64 (0.10.0+jetpack, 0.11.0+jetpack), libnvidia-container1:arm64 (1.7.0-1, 1.13.5-1), nvidia-container-toolkit:arm64 (1.7.0-1, 1.13.5-1)
End-Date: 2023-11-21  22:18:17

In host machine:

ls -afl /usr/local/cuda/lib64/

libnvToolsExt.so               libnvperf_target.so       libmetis_static.a        libnppist.so.10            libcufft.so.10           libcurand.so.10            liblapack_static.a
libnppidei.so                  libnppicom.so.10.2.1.300  libcublas.so             libnppicc.so.10            libnppicc.so.10.2.1.300  libnvrtc-builtins.so.10.2  libnvrtc-builtins.so
libnppicc_static.a             libnppc.so.10             libcurand.so.10.1.2.300  libnppidei.so.10.2.1.300   libnvrtc.so.10.2.300     libcusparse_static.a       libnppist.so.10.2.1.300
libcublasLt.so.10.2.3.300      libcuinj64.so.10.2.300    libcublas_static.a       libnppif.so                libnppc.so.10.2.1.300    libnppig.so                libcufftw.so.10
libnppim.so                    libcublas.so.10.2.3.300   libnppitc_static.a       libnppig_static.a          libnppisu_static.a       libnppial.so.10            libnpps_static.a
libcublasLt.so                 libcusparse.so            libnppif_static.a        libcupti.so.10.2           libnppial_static.a       libnppisu.so.10.2.1.300    libnppicc.so
libcusolver.so.10.3.0.300      libnvgraph.so             libcufft.so              libculibos.a               libcufftw_static.a       libcusolver.so.10          libcupti.so.10.2.175
libnppif.so.10                 libcurand.so              libcublas.so.10          libnvblas.so.10            libcuinj64.so            libnvgraph_static.a        libcurand_static.a
libcusparse.so.10              libnppif.so.10.2.1.300    libnppist_static.a       libnppicom.so.10           libcupti.so              libnpps.so                 libnvgraph.so.10.2.300
libnppim_static.a              libnppicom.so             libnvToolsExt.so.1.0.0   libnvperf_host.so          libcudart.so.10.2        libnppitc.so.10.2.1.300    libnppidei.so.10
.                              libnppig.so.10            libnpps.so.10            libcufft_static.a          stubs                    libnppc.so                 ..
libcufft_static_nocallback.a   libnppim.so.10.2.1.300    libcuinj64.so.10.2       libcusparse.so.10.3.1.300  libcufftw.so.10.1.2.300  libnppig.so.10.2.1.300     libnvblas.so.10.2.3.300
libnvrtc-builtins.so.10.2.300  libcublasLt.so.10         libnppicom_static.a      libnppidei_static.a        libnppitc.so             libnvToolsExt.so.1         libcusolver.so
libcufft.so.10.1.2.300         libnppial.so              libcublasLt_static.a     libnvrtc.so                libcufftw.so             libnvrtc.so.10.2           libnppc_static.a
libnppisu.so                   libcudart.so.10.2.300     libcudart.so             libnvgraph.so.10           libnpps.so.10.2.1.300    libcudart_static.a         libnppial.so.10.2.1.300
libnvblas.so                   libcusolver_static.a      libcudadevrt.a           libnppist.so               libnppitc.so.10          libnppim.so.10             libnppisu.so.10

Container contents:

$ docker run -it --rm --net=host --runtime nvidia -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/l4t-base:r32.7.1
$ ls -laf /usr/local/cuda/lib64
..  .  stubs  libcudart_static.a  libcudadevrt.a

More info on the nvidia packages installed:

libnvidia-container-tools/bionic,now 1.13.5-1 arm64 [installed]
libnvidia-container0/bionic,now 0.11.0+jetpack arm64 [installed]
libnvidia-container1/bionic,now 1.13.5-1 arm64 [installed]
nvidia-container-csv-cuda/stable,now 10.2.460-1 arm64 [installed]
nvidia-container-csv-cudnn/stable,now 8.2.1.32-1+cuda10.2 arm64 [installed]
nvidia-container-csv-tensorrt/stable,now 8.2 arm64 [installed]
nvidia-container-csv-visionworks/stable,now 1.6.0.501 arm64 [installed]
nvidia-container-runtime/bionic,now 3.13.0-1 all [installed]
nvidia-container-toolkit/bionic,now 1.13.5-1 arm64 [installed]
nvidia-container-toolkit-base/bionic,now 1.13.5-1 arm64 [installed,automatic]
nvidia-docker2/bionic,now 2.13.0-1 all [installed]
nvidia-l4t-3d-core/stable,now 32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-apt-source/stable,now 32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-bootloader/stable,now 32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-camera/stable,now 32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-configs/stable,now 32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-core/stable,now 32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-cuda/stable,now 32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-firmware/stable,now 32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-gputools/stable,now 32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-graphics-demos/stable,now 32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-gstreamer/stable,now 32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-init/stable,now 32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-initrd/stable,now 32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-jetson-io/stable,now 32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-jetson-multimedia-api/stable,now 32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-kernel/stable,now 4.9.337-tegra-32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-kernel-dtbs/stable,now 4.9.337-tegra-32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-kernel-headers/stable,now 4.9.337-tegra-32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-libvulkan/stable,now 32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-multimedia/stable,now 32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-multimedia-utils/stable,now 32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-oem-config/stable,now 32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-tools/stable,now 32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-wayland/stable,now 32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-weston/stable,now 32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-x11/stable,now 32.7.4-20230608211515 arm64 [installed]
nvidia-l4t-xusb-firmware/stable,now 32.7.4-20230608211515 arm64 [installed]

I don’t have physical access to the device to re-flash it JetPack. Please advice on how to resolve this issue.
Thanks.

Hi,

What kind of errors or issues do you encounter when you launch the container on the updated environment?

Thanks.

The container contents show that Nvidia runtime was not properly mounted into the container. Likely due to an error/bug in one of the functions that mount it. This will cause many errors, including the following

python3 -c "import cv2"

ImportError: libcublas.so.10: cannot open shared object file: No such file or directory

Hi,

Could you try the command outside of the container?
Since the JetPack 4 container mounts libraries from the host, the testing can help to clear the cause from the environment or container.

Thanks.

There is no problems outside the containers. It’s the container runtime giving issues.

Hi,

Could you check if there is the libcublas.so.10 file within the container?

On JetPack 4, the CUDA libs should be mounted from the host.
If it doesn’t exist, please check if any update is required for the CSV file.

Thanks.

libcublas.so.10 is not correctly mounted into the container. That is the bug.

@AastaLLL What CSV file are you referring to?

Hi,

Please check the cuda.csv file located at /etc/nvidia-container-runtime/host-files-for-container.d/.

Thanks.

I can’t answer that question anymore. I rolled back the update, and we fixed the issues caused by it. So the problem has been “solved” by holding back the packages. But there is still something buggy about that update.

Thanks for the update.
Please create a new topic if you encounter the issue again.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.