Nvida Container Toolkit: Failed to initialize NVML: Unknown Error

Hardware: intel x64 system
OS: Ubuntu 20.04
CUDA: 12.4

I’ve followed the instructions to install Nvidia Container Toolkit (as part of the TAO Toolkit). I got as far as running a sample workload:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html

I try:

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

and get

Failed to initialize NVML: Unknown Error

This is the debug log:

cat /var/log/nvidia-container-toolkit.log 

-- WARNING, the following logs are for debugging purposes only --

I0315 13:06:28.386197 7668 nvc.c:393] initializing library context (version=1.14.6, build=d2eb0afe86f0b643e33624ee64f065dd60e952d4)
I0315 13:06:28.386233 7668 nvc.c:364] using root /
I0315 13:06:28.386239 7668 nvc.c:365] using ldcache /etc/ld.so.cache
I0315 13:06:28.386244 7668 nvc.c:366] using unprivileged user 65534:65534
I0315 13:06:28.386257 7668 nvc.c:410] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0315 13:06:28.386340 7668 nvc.c:412] dxcore initialization failed, continuing assuming a non-WSL environment
I0315 13:06:28.386778 7675 nvc.c:278] loading kernel module nvidia
I0315 13:06:28.386850 7675 nvc.c:282] running mknod for /dev/nvidiactl
I0315 13:06:28.386882 7675 nvc.c:286] running mknod for /dev/nvidia0
I0315 13:06:28.386905 7675 nvc.c:290] running mknod for all nvcaps in /dev/nvidia-caps
I0315 13:06:28.391114 7675 nvc.c:218] running mknod for /dev/nvidia-caps/nvidia-cap1 from /proc/driver/nvidia/capabilities/mig/config
I0315 13:06:28.391209 7675 nvc.c:218] running mknod for /dev/nvidia-caps/nvidia-cap2 from /proc/driver/nvidia/capabilities/mig/monitor
I0315 13:06:28.392613 7675 nvc.c:301] loading kernel module nvidia_uvm
I0315 13:06:28.392646 7675 nvc.c:305] running mknod for /dev/nvidia-uvm
I0315 13:06:28.392715 7675 nvc.c:310] loading kernel module nvidia_modeset
I0315 13:06:28.392745 7675 nvc.c:314] running mknod for /dev/nvidia-modeset
I0315 13:06:28.392910 7676 rpc.c:71] starting driver rpc service
I0315 13:06:28.398686 7677 rpc.c:71] starting nvcgo rpc service
I0315 13:06:28.399267 7668 nvc_container.c:240] configuring container with 'no-cgroups compute utility supervised'
I0315 13:06:28.399362 7668 nvc_container.c:262] setting pid to 7649
I0315 13:06:28.399369 7668 nvc_container.c:263] setting rootfs to /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged
I0315 13:06:28.399374 7668 nvc_container.c:264] setting owner to 0:0
I0315 13:06:28.399379 7668 nvc_container.c:265] setting bins directory to /usr/bin
I0315 13:06:28.399385 7668 nvc_container.c:266] setting libs directory to /usr/lib/x86_64-linux-gnu
I0315 13:06:28.399390 7668 nvc_container.c:267] setting libs32 directory to /usr/lib/i386-linux-gnu
I0315 13:06:28.399395 7668 nvc_container.c:268] setting cudart directory to /usr/local/cuda
I0315 13:06:28.399400 7668 nvc_container.c:269] setting ldconfig to @/sbin/ldconfig.real (host relative)
I0315 13:06:28.399405 7668 nvc_container.c:270] setting mount namespace to /proc/7649/ns/mnt
I0315 13:06:28.399413 7668 nvc_info.c:797] requesting driver information with ''
I0315 13:06:28.400219 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.550.54.14
I0315 13:06:28.400261 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.550.54.14
I0315 13:06:28.400289 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.550.54.14
I0315 13:06:28.400319 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.550.54.14
I0315 13:06:28.400359 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libnvidia-pkcs11.so.550.54.14
I0315 13:06:28.400384 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libnvidia-pkcs11-openssl3.so.550.54.14
I0315 13:06:28.400409 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.550.54.14
I0315 13:06:28.400451 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.550.54.14
I0315 13:06:28.400482 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.550.54.14
I0315 13:06:28.400524 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.550.54.14
I0315 13:06:28.400552 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14
I0315 13:06:28.400593 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libnvidia-gpucomp.so.550.54.14
I0315 13:06:28.400621 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.550.54.14
I0315 13:06:28.400654 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.550.54.14
I0315 13:06:28.400682 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.550.54.14
I0315 13:06:28.400710 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.550.54.14
I0315 13:06:28.400752 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.550.54.14
I0315 13:06:28.400792 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.550.54.14
I0315 13:06:28.400821 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.550.54.14
I0315 13:06:28.400863 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.550.54.14
I0315 13:06:28.400906 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.550.54.14
I0315 13:06:28.401128 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libcudadebugger.so.550.54.14
I0315 13:06:28.401157 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.14
I0315 13:06:28.401285 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.550.54.14
I0315 13:06:28.401315 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.550.54.14
I0315 13:06:28.401345 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.550.54.14
I0315 13:06:28.401377 7668 nvc_info.c:175] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.550.54.14
I0315 13:06:28.401418 7668 nvc_info.c:175] selecting /usr/lib/i386-linux-gnu/libnvidia-tls.so.550.54.14
I0315 13:06:28.401446 7668 nvc_info.c:175] selecting /usr/lib/i386-linux-gnu/libnvidia-ptxjitcompiler.so.550.54.14
I0315 13:06:28.401487 7668 nvc_info.c:175] selecting /usr/lib/i386-linux-gnu/libnvidia-opticalflow.so.550.54.14
I0315 13:06:28.401529 7668 nvc_info.c:175] selecting /usr/lib/i386-linux-gnu/libnvidia-opencl.so.550.54.14
I0315 13:06:28.401558 7668 nvc_info.c:175] selecting /usr/lib/i386-linux-gnu/libnvidia-nvvm.so.550.54.14
I0315 13:06:28.401599 7668 nvc_info.c:175] selecting /usr/lib/i386-linux-gnu/libnvidia-ml.so.550.54.14
I0315 13:06:28.401639 7668 nvc_info.c:175] selecting /usr/lib/i386-linux-gnu/libnvidia-gpucomp.so.550.54.14
I0315 13:06:28.401667 7668 nvc_info.c:175] selecting /usr/lib/i386-linux-gnu/libnvidia-glvkspirv.so.550.54.14
I0315 13:06:28.401694 7668 nvc_info.c:175] selecting /usr/lib/i386-linux-gnu/libnvidia-glsi.so.550.54.14
I0315 13:06:28.401721 7668 nvc_info.c:175] selecting /usr/lib/i386-linux-gnu/libnvidia-glcore.so.550.54.14
I0315 13:06:28.401749 7668 nvc_info.c:175] selecting /usr/lib/i386-linux-gnu/libnvidia-fbc.so.550.54.14
I0315 13:06:28.401792 7668 nvc_info.c:175] selecting /usr/lib/i386-linux-gnu/libnvidia-encode.so.550.54.14
I0315 13:06:28.401832 7668 nvc_info.c:175] selecting /usr/lib/i386-linux-gnu/libnvidia-eglcore.so.550.54.14
I0315 13:06:28.401860 7668 nvc_info.c:175] selecting /usr/lib/i386-linux-gnu/libnvcuvid.so.550.54.14
I0315 13:06:28.401911 7668 nvc_info.c:175] selecting /usr/lib/i386-linux-gnu/libcuda.so.550.54.14
I0315 13:06:28.401959 7668 nvc_info.c:175] selecting /usr/lib/i386-linux-gnu/libGLX_nvidia.so.550.54.14
I0315 13:06:28.401987 7668 nvc_info.c:175] selecting /usr/lib/i386-linux-gnu/libGLESv2_nvidia.so.550.54.14
I0315 13:06:28.402016 7668 nvc_info.c:175] selecting /usr/lib/i386-linux-gnu/libGLESv1_CM_nvidia.so.550.54.14
I0315 13:06:28.402045 7668 nvc_info.c:175] selecting /usr/lib/i386-linux-gnu/libEGL_nvidia.so.550.54.14
W0315 13:06:28.402060 7668 nvc_info.c:401] missing library libnvidia-nscq.so
W0315 13:06:28.402066 7668 nvc_info.c:401] missing library libnvidia-fatbinaryloader.so
W0315 13:06:28.402072 7668 nvc_info.c:401] missing library libnvidia-compiler.so
W0315 13:06:28.402077 7668 nvc_info.c:401] missing library libvdpau_nvidia.so
W0315 13:06:28.402082 7668 nvc_info.c:401] missing library libnvidia-ifr.so
W0315 13:06:28.402087 7668 nvc_info.c:401] missing library libnvidia-cbl.so
W0315 13:06:28.402092 7668 nvc_info.c:405] missing compat32 library libnvidia-cfg.so
W0315 13:06:28.402100 7668 nvc_info.c:405] missing compat32 library libnvidia-nscq.so
W0315 13:06:28.402105 7668 nvc_info.c:405] missing compat32 library libcudadebugger.so
W0315 13:06:28.402110 7668 nvc_info.c:405] missing compat32 library libnvidia-fatbinaryloader.so
W0315 13:06:28.402115 7668 nvc_info.c:405] missing compat32 library libnvidia-allocator.so
W0315 13:06:28.402120 7668 nvc_info.c:405] missing compat32 library libnvidia-compiler.so
W0315 13:06:28.402125 7668 nvc_info.c:405] missing compat32 library libnvidia-pkcs11.so
W0315 13:06:28.402131 7668 nvc_info.c:405] missing compat32 library libnvidia-pkcs11-openssl3.so
W0315 13:06:28.402136 7668 nvc_info.c:405] missing compat32 library libnvidia-ngx.so
W0315 13:06:28.402141 7668 nvc_info.c:405] missing compat32 library libvdpau_nvidia.so
W0315 13:06:28.402146 7668 nvc_info.c:405] missing compat32 library libnvidia-ifr.so
W0315 13:06:28.402151 7668 nvc_info.c:405] missing compat32 library libnvidia-rtcore.so
W0315 13:06:28.402156 7668 nvc_info.c:405] missing compat32 library libnvoptix.so
W0315 13:06:28.402161 7668 nvc_info.c:405] missing compat32 library libnvidia-cbl.so
I0315 13:06:28.402384 7668 nvc_info.c:301] selecting /usr/bin/nvidia-smi
I0315 13:06:28.402401 7668 nvc_info.c:301] selecting /usr/bin/nvidia-debugdump
I0315 13:06:28.402418 7668 nvc_info.c:301] selecting /usr/bin/nvidia-persistenced
I0315 13:06:28.402445 7668 nvc_info.c:301] selecting /usr/bin/nvidia-cuda-mps-control
I0315 13:06:28.402462 7668 nvc_info.c:301] selecting /usr/bin/nvidia-cuda-mps-server
W0315 13:06:28.402492 7668 nvc_info.c:427] missing binary nv-fabricmanager
I0315 13:06:28.402525 7668 nvc_info.c:487] listing firmware path /lib/firmware/nvidia/550.54.14/gsp_ga10x.bin
I0315 13:06:28.402531 7668 nvc_info.c:487] listing firmware path /lib/firmware/nvidia/550.54.14/gsp_tu10x.bin
I0315 13:06:28.402553 7668 nvc_info.c:560] listing device /dev/nvidiactl
I0315 13:06:28.402559 7668 nvc_info.c:560] listing device /dev/nvidia-uvm
I0315 13:06:28.402564 7668 nvc_info.c:560] listing device /dev/nvidia-uvm-tools
I0315 13:06:28.402569 7668 nvc_info.c:560] listing device /dev/nvidia-modeset
I0315 13:06:28.402593 7668 nvc_info.c:345] listing ipc path /run/nvidia-persistenced/socket
W0315 13:06:28.402612 7668 nvc_info.c:351] missing ipc path /var/run/nvidia-fabricmanager/socket
W0315 13:06:28.402625 7668 nvc_info.c:351] missing ipc path /tmp/nvidia-mps
I0315 13:06:28.402631 7668 nvc_info.c:853] requesting device information with ''
I0315 13:06:28.408193 7668 nvc_info.c:744] listing device /dev/nvidia0 (GPU-494c6315-3bf5-85de-e949-394a51497849 at 00000000:01:00.0)
I0315 13:06:28.408239 7668 nvc_mount.c:366] mounting tmpfs at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/proc/driver/nvidia
E0315 13:06:28.408430 7668 utils.c:547] The path /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/usr/bin alreay exists with the required mode; skipping create
I0315 13:06:28.408520 7668 nvc_mount.c:134] mounting /usr/bin/nvidia-smi at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/usr/bin/nvidia-smi
I0315 13:06:28.408564 7668 nvc_mount.c:134] mounting /usr/bin/nvidia-debugdump at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/usr/bin/nvidia-debugdump
I0315 13:06:28.408603 7668 nvc_mount.c:134] mounting /usr/bin/nvidia-persistenced at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/usr/bin/nvidia-persistenced
I0315 13:06:28.408640 7668 nvc_mount.c:134] mounting /usr/bin/nvidia-cuda-mps-control at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/usr/bin/nvidia-cuda-mps-control
I0315 13:06:28.408678 7668 nvc_mount.c:134] mounting /usr/bin/nvidia-cuda-mps-server at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/usr/bin/nvidia-cuda-mps-server
E0315 13:06:28.408709 7668 utils.c:547] The path /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/usr/lib/x86_64-linux-gnu alreay exists with the required mode; skipping create
I0315 13:06:28.408779 7668 nvc_mount.c:134] mounting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14 at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14
I0315 13:06:28.408820 7668 nvc_mount.c:134] mounting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.550.54.14 at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.550.54.14
I0315 13:06:28.408857 7668 nvc_mount.c:134] mounting /usr/lib/x86_64-linux-gnu/libcuda.so.550.54.14 at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/usr/lib/x86_64-linux-gnu/libcuda.so.550.54.14
I0315 13:06:28.408896 7668 nvc_mount.c:134] mounting /usr/lib/x86_64-linux-gnu/libcudadebugger.so.550.54.14 at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/usr/lib/x86_64-linux-gnu/libcudadebugger.so.550.54.14
I0315 13:06:28.408933 7668 nvc_mount.c:134] mounting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.550.54.14 at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.550.54.14
I0315 13:06:28.408971 7668 nvc_mount.c:134] mounting /usr/lib/x86_64-linux-gnu/libnvidia-gpucomp.so.550.54.14 at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/usr/lib/x86_64-linux-gnu/libnvidia-gpucomp.so.550.54.14
I0315 13:06:28.409012 7668 nvc_mount.c:134] mounting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.550.54.14 at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.550.54.14
I0315 13:06:28.409053 7668 nvc_mount.c:134] mounting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.550.54.14 at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.550.54.14
I0315 13:06:28.409093 7668 nvc_mount.c:134] mounting /usr/lib/x86_64-linux-gnu/libnvidia-pkcs11.so.550.54.14 at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/usr/lib/x86_64-linux-gnu/libnvidia-pkcs11.so.550.54.14
I0315 13:06:28.409134 7668 nvc_mount.c:134] mounting /usr/lib/x86_64-linux-gnu/libnvidia-pkcs11-openssl3.so.550.54.14 at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/usr/lib/x86_64-linux-gnu/libnvidia-pkcs11-openssl3.so.550.54.14
I0315 13:06:28.409173 7668 nvc_mount.c:134] mounting /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.550.54.14 at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.550.54.14
I0315 13:06:28.409196 7668 nvc_mount.c:527] creating symlink /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/usr/lib/x86_64-linux-gnu/libcuda.so -> libcuda.so.1
I0315 13:06:28.409310 7668 nvc_mount.c:85] mounting /lib/firmware/nvidia/550.54.14/gsp_ga10x.bin at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/lib/firmware/nvidia/550.54.14/gsp_ga10x.bin with flags 0x7
I0315 13:06:28.409368 7668 nvc_mount.c:85] mounting /lib/firmware/nvidia/550.54.14/gsp_tu10x.bin at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/lib/firmware/nvidia/550.54.14/gsp_tu10x.bin with flags 0x7
I0315 13:06:28.409448 7668 nvc_mount.c:261] mounting /run/nvidia-persistenced/socket at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/run/nvidia-persistenced/socket
I0315 13:06:28.409490 7668 nvc_mount.c:230] mounting /dev/nvidiactl at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/dev/nvidiactl
I0315 13:06:28.409531 7668 nvc_mount.c:230] mounting /dev/nvidia-uvm at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/dev/nvidia-uvm
I0315 13:06:28.409569 7668 nvc_mount.c:230] mounting /dev/nvidia-uvm-tools at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/dev/nvidia-uvm-tools
I0315 13:06:28.409617 7668 nvc_mount.c:230] mounting /dev/nvidia0 at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/dev/nvidia0
I0315 13:06:28.409680 7668 nvc_mount.c:440] mounting /proc/driver/nvidia/gpus/0000:01:00.0 at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged/proc/driver/nvidia/gpus/0000:01:00.0
I0315 13:06:28.409698 7668 nvc_ldcache.c:380] executing /sbin/ldconfig.real from host at /var/lib/docker/overlay2/b1e8567f9b2efdfc74ce312d288f8d7c920cbe5561daa967d3ff41058e6af3f4/merged
I0315 13:06:28.447864 7668 nvc.c:452] shutting down library context
I0315 13:06:28.447940 7677 rpc.c:95] terminating nvcgo rpc service
I0315 13:06:28.448238 7668 rpc.c:135] nvcgo rpc service terminated successfully
I0315 13:06:28.449804 7676 rpc.c:95] terminating driver rpc service
I0315 13:06:28.449947 7668 rpc.c:135] driver rpc service terminated successfully

There are some missing libraries flagged in the above log, but I don’t know if that is the problem. Not sure where else to look. I’ve tried removing CUDA and reinstalling, to no avail.

Running nvidia-smi outside docker works just fine:

nvidia-smi
Fri Mar 15 17:58:00 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti     On  |   00000000:01:00.0 Off |                  N/A |
| 23%   27C    P8              8W /  250W |      66MiB /  11264MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1996      G   /usr/lib/xorg/Xorg                             56MiB |
|    0   N/A  N/A      2513      G   /usr/bin/gnome-shell                            6MiB |
+-----------------------------------------------------------------------------------------+

I encountered a similar issue recently. Although I’m unsure why it occurred suddenly, I managed to resolve it by mapping the device using the following command:

docker run --rm --gpus all --device /dev/nvidia0:/dev/nvidia0 \
  --device /dev/nvidiactl:/dev/nvidiactl \
  --device /dev/nvidia-uvm:/dev/nvidia-uvm \
  --device /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools \
  ubuntu nvidia-smi

I solved it, with the help of https://bobcares.com/blog/docker-failed-to-initialize-nvml-unknown-error/ !

I used just the second part of “Method 1”:

  1. nvidia-container configuration
    In the file
/etc/nvidia-container-runtime/config.toml

set the parameter

no-cgroups = false

After that restart docker and run test container:

sudo systemctl restart docker
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

With that, the container loaded and worked immediately. I have no idea why.

10 Likes

Thanks your solution works.

the key here is no-cgroups = false .

I have two machines one is WSL and another one is Ubuntu only

Dev Machine : WSL - Ubuntu22.04 | cuda 12.4 |
Docker Ubuntu22.04 | cuda 11.8 |
Production Machine Ubuntu22.04 | cuda 12.4 | ← this one I have to set no-cgroups = false

1 Like

after I try run " To configure the container runtime for Docker running in Rootless mode, follow these steps: nvidia-ctk runtime configure --runtime=docker --config=$HOME/.config/docker/daemon.json", which makes me can’t run “sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi” properly. I guess nvidia-ctk caused some error and modified /etc/nvidia-container-runtime/config.toml, then I remove /etc/nvidia-container-runtime/config.toml and try again, I can launch the image succeed

Restarting Docker is not necessary, but can’t hurt either.

I was thinking this issue was solved: NOTICE: Containers losing access to GPUs with error: "Failed to initialize NVML: Unknown Error" · Issue #1730 · NVIDIA/nvidia-docker · GitHub

But here I stand corrected. Still an issue today on my Ubuntu 20.04 machine, up to date nvidia driver and Docker version 27.4.0.

However, if you’re using IsaacLab in a docker container, keep the no-cgroups = true, otherwise it will fail to build and run the container.

Thanks!

Regular users in the container must also be added to the group messagebus