I’m trying to run containers using rootless podman or rootless docker and accessing an NVIDA GPU.
The instructions at Installation Guide — NVIDIA Cloud Native Technologies documentation got me to the point where I can run as root, but not as a regular user. When running as a regular user I see the error below. Running rootless docker produces the same error message.
$ podman run -it --rm --runtime nvidia nvidia/cuda:11.0-base nvidia-smi
2022/03/10 09:38:07 Error running [/usr/bin/nvidia-container-runtime delete --force 2a82614297c63b515fded4af4b4dfdbc44fb61e5a6cbb0cf4805b29f246787b9]: error creating runtime: error constructing runc runtime: error locating runtime: no runtime binary found from candidate list: [docker-runc runc]
ERRO[0000] Error removing container 2a82614297c63b515fded4af4b4dfdbc44fb61e5a6cbb0cf4805b29f246787b9 from runtime after creation failed
Error: OCI runtime error: time="2022-03-10T09:38:13-05:00" level=warning msg="unable to get oom kill count" error="no directory specified for memory.oom_control"
time="2022-03-10T09:38:13-05:00" level=error msg="container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: privilege change failed: invalid argument\n"
I’ve made the cgroup change to /etc/nvidia-container-runtime/config.toml
$ cat /etc/nvidia-container-runtime/config.toml
disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false
[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
#no-cgroups = false
no-cgroups = true
#user = "root:video"
ldconfig = "@/sbin/ldconfig.real"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
#debug = "~/.local/nvidia-container-runtime.log"
#debug = "/tmp/nvidia-container-runtime.log"
This is on Ubuntu 20.04.
Package versions:
$ apt list --installed '*nvidia*'
Listing... Done
libnvidia-cfg1-510/unknown,now 510.47.03-0ubuntu1 amd64 [installed,automatic]
libnvidia-common-510/unknown,now 510.47.03-0ubuntu1 all [installed,automatic]
libnvidia-compute-510/unknown,now 510.47.03-0ubuntu1 amd64 [installed,automatic]
libnvidia-container-tools/bionic,now 1.8.1-1 amd64 [installed,automatic]
libnvidia-container1/bionic,now 1.8.1-1 amd64 [installed,automatic]
libnvidia-decode-510/unknown,now 510.47.03-0ubuntu1 amd64 [installed,automatic]
libnvidia-encode-510/unknown,now 510.47.03-0ubuntu1 amd64 [installed,automatic]
libnvidia-extra-510/unknown,now 510.47.03-0ubuntu1 amd64 [installed,automatic]
libnvidia-fbc1-510/unknown,now 510.47.03-0ubuntu1 amd64 [installed,automatic]
libnvidia-gl-510/unknown,now 510.47.03-0ubuntu1 amd64 [installed,automatic]
nvidia-compute-utils-510/unknown,now 510.47.03-0ubuntu1 amd64 [installed,automatic]
nvidia-container-toolkit/bionic,now 1.8.1-1 amd64 [installed]
nvidia-dkms-510/unknown,now 510.47.03-0ubuntu1 amd64 [installed,automatic]
nvidia-driver-510/unknown,now 510.47.03-0ubuntu1 amd64 [installed,automatic]
nvidia-kernel-common-510/unknown,now 510.47.03-0ubuntu1 amd64 [installed,automatic]
nvidia-kernel-source-510/unknown,now 510.47.03-0ubuntu1 amd64 [installed,automatic]
nvidia-modprobe/unknown,now 510.47.03-0ubuntu1 amd64 [installed,automatic]
nvidia-prime/focal-updates,now 0.8.16~0.20.04.1 all [installed,automatic]
nvidia-settings/unknown,now 510.47.03-0ubuntu1 amd64 [installed,automatic]
nvidia-utils-510/unknown,now 510.47.03-0ubuntu1 amd64 [installed,automatic]
xserver-xorg-video-nvidia-510/unknown,now 510.47.03-0ubuntu1 amd64 [installed,automatic]
$ podman --version
podman version 3.4.2
Does anyone know how to resolve this privilege error?