Nvidia runtime fails on Jetpack 6 GA

I am using Auvidea’s JNX42 flashed with Jetpack6.0 GA and running into this issue when trying to launch docker with nvidia runtime

ubuntu@ubuntu:~$ docker run -it --privileged --runtime nvidia nvcr.io/nvidia/l4t-jetpack:r36.3.0
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #1: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
NvRmMemInitNvmap failed with Permission denied
356: Memory Manager Not supported



****NvRmMemMgrInit failed**** error type: 196626


libnvrm_gpu.so: NvRmGpuLibOpen failed, error=196626
NvRmMemInitNvmap failed with Permission denied
356: Memory Manager Not supported



****NvRmMemMgrInit failed**** error type: 196626


libnvrm_gpu.so: NvRmGpuLibOpen failed, error=196626
NvRmMemInitNvmap failed with Permission denied
356: Memory Manager Not supported



****NvRmMemMgrInit failed**** error type: 196626


libnvrm_gpu.so: NvRmGpuLibOpen failed, error=196626
nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown.

I can confirm that /etc/nvidia-container-runtime/host-files-for-container.d/drivers.csv has the correct file paths and those compiled libraries exist on the device too.

Hi,

Do you need to rootless mode?
Have you tried to launch the docker with sudo to see if it works?

Thanks.

Even without the privileged mode, the problem persists. Yes, I have tried using launching the docker withsudo and the problem still persists. Given that all the libraries exists, what should I do?

Hi,

We test the command and it works well with sudo:

$ sudo docker run -it --privileged --runtime nvidia nvcr.io/nvidia/l4t-jetpack:r36.3.0
root@808b75f8eaf2:/# nvidia-smi
Mon Jun  3 07:47:30 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 540.3.0                Driver Version: N/A          CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Orin (nvgpu)                  N/A  | N/A              N/A |                  N/A |
| N/A   N/A  N/A               N/A /  N/A | Not Supported        |     N/A          N/A |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Thanks.

Hi, I have the same problem as @kumarakshay. I’m on an AGX Orin Devkit 32GB with R36.3.0, and see the same error output that was posted in the original question.

I tried the suggested command and it does not solve the problem.

$ sudo docker run -it --privileged --runtime nvidia nvcr.io/nvidia/l4t-jetpack:r36.3.0
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #1: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
NvRmMemInitNvmap failed with Permission denied
356: Memory Manager Not supported



****NvRmMemMgrInit failed**** error type: 196626


libnvrm_gpu.so: NvRmGpuLibOpen failed, error=196626
NvRmMemInitNvmap failed with Permission denied
356: Memory Manager Not supported



****NvRmMemMgrInit failed**** error type: 196626


libnvrm_gpu.so: NvRmGpuLibOpen failed, error=196626
NvRmMemInitNvmap failed with Permission denied
356: Memory Manager Not supported



****NvRmMemMgrInit failed**** error type: 196626


libnvrm_gpu.so: NvRmGpuLibOpen failed, error=196626
nvidia-container-cli: detection error: nvml error: unknown error: unknown.
ERRO[0000] error waiting for container:  

Here is some additional information from the system where the problem occurs. This system was setup by SDK manager.

$ groups
<username snipped> adm cdrom sudo audio dip video plugdev render i2c lpadmin sambashare gdm docker weston-launch gpio


$ cat /etc/nv_tegra_release 
# R36 (release), REVISION: 3.0, GCID: 36191598, BOARD: generic, EABI: aarch64, DATE: Mon May  6 17:34:21 UTC 2024
# KERNEL_VARIANT: oot
TARGET_USERSPACE_LIB_DIR=nvidia
TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia


$ apt list --installed |grep nvidia-container
libnvidia-container-tools/stable,now 1.14.2-1 arm64 [installed,automatic]
libnvidia-container1/stable,now 1.14.2-1 arm64 [installed,automatic]
nvidia-container-toolkit-base/stable,now 1.14.2-1 arm64 [installed,automatic]
nvidia-container-toolkit/stable,now 1.14.2-1 arm64 [installed,automatic]
nvidia-container/stable,now 6.0+b106 arm64 [installed]


$ apt list --installed |grep docker
docker.io/jammy-updates,now 24.0.7-0ubuntu2~22.04.1 arm64 [installed]

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.