Nvidia-container runtime with containerd error on Jetpack 6

I have an AGX Orin running JP6.
I’m trying to setup nvidia container runtime for containerd.

I’m having errors with some containers:

2024-03-09T12:14:01.150Z NvRmMemInitNvmap failed with No such file or directory
2024-03-09T12:14:01.150Z 356: Memory Manager Not supported
2024-03-09T12:14:01.150Z 
2024-03-09T12:14:01.150Z 
2024-03-09T12:14:01.150Z 
2024-03-09T12:14:01.150Z ****NvRmMemMgrInit failed**** error type: 196626
2024-03-09T12:14:01.150Z 
2024-03-09T12:14:01.150Z 
2024-03-09T12:14:01.150Z libnvrm_gpu.so: NvRmGpuLibOpen failed, error=196626
2024-03-09T12:14:01.150Z NvRmMemInitNvmap failed with No such file or directory
2024-03-09T12:14:01.150Z 356: Memory Manager Not supported
2024-03-09T12:14:01.150Z 
2024-03-09T12:14:01.150Z 
2024-03-09T12:14:01.150Z 
2024-03-09T12:14:01.150Z ****NvRmMemMgrInit failed**** error type: 196626
2024-03-09T12:14:01.150Z 
2024-03-09T12:14:01.150Z 
2024-03-09T12:14:01.150Z libnvrm_gpu.so: NvRmGpuLibOpen failed, error=196626
2024-03-09T12:14:01.150Z NvRmMemInitNvmap failed with No such file or directory
2024-03-09T12:14:01.150Z 356: Memory Manager Not supported
2024-03-09T12:14:01.150Z 
2024-03-09T12:14:01.150Z 
2024-03-09T12:14:01.150Z 
2024-03-09T12:14:01.150Z ****NvRmMemMgrInit failed**** error type: 196626
2024-03-09T12:14:01.150Z 
2024-03-09T12:14:01.150Z 
2024-03-09T12:14:01.150Z libnvrm_gpu.so: NvRmGpuLibOpen failed, error=196626

I installed nvidia runtime using the official documentation, and configured containerd to use it. Here is my config.toml:

oom_score = 0
root = "/var/lib/containerd"
state = "/run/containerd"
version = 2

[debug]
  level = "info"

[grpc]
  max_recv_message_size = 16777216
  max_send_message_size = 16777216

[metrics]
  address = ""
  grpc_histogram = false

[plugins]

  [plugins."io.containerd.grpc.v1.cri"]
    max_container_log_line_size = -1
    sandbox_image = "k8s.gcr.io/pause:3.3"

    [plugins."io.containerd.grpc.v1.cri".containerd]
      default_runtime_name = "nvidia"
      snapshotter = "overlayfs"

      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]

        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
          runtime_engine = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
            BinaryName = "/usr/bin/nvidia-container-runtime"
            systemdCgroup = true

        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          runtime_engine = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            systemdCgroup = true

    [plugins."io.containerd.grpc.v1.cri".registry]

      [plugins."io.containerd.grpc.v1.cri".registry.mirrors]

        [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
          endpoint = ["https://registry-1.docker.io"]

By the way, I always followed the same process for JP5, and it always worked.

Any idea ?

Hi

There is a similar topic that reports k3s is not working on the JetPack 6 DP.
Could you check if you are facing something similar?

We have verified the issue is fixed in the upcoming JetPack 6 GA release.
Thanks.

1 Like

Thanks @AastaLLL ,

We’ll wait for the GA then.

By the way, can we have an idea what was the issue ? @AastaLLL

Hi

Sure, the issue comes from a missing kernel configuration required by the tool.
Please check the topic shared above for more info (including a WAR on JetPack 6DP).

Thanks.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.