Unable to run a nvidia docker on AGX Xavier

kansai · June 24, 2020, 2:16pm

I have been trying to do something very simple:

docker run --runtime=nvidia --rm nvidia/cuda

However I got the error

docker: Error response from daemon: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v1.linux/moby/8cb963c23bee566216d2d890e60f62ae497be2857ef31e519ebd31e43e91a865/log.json: no such file or directory): exec: “nvidia-container-runtime”: executable file not found in $PATH: unknown.

So I tried to do sudo apt install nvidia-container-runtime

but I got

E: Unable to locate package nvidia-container-runtime

So I followed the advice of this page
and I did

curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
  sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update

with this I could do sudo apt install nvidia-container-runtime

then I tried to run the docker container of the start of this question
docker run --runtime=nvidia --rm nvidia/cuda

and now I got a complete different error

docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused “process_linux.go:430: container init caused "process_linux.go:413: running prestart hook 1 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\n\""”: unknown.

I don’t know how to proceeed from here to be able to run the container. Any help will be greatly appreciated

Andrey1984 · June 26, 2020, 6:48pm

^^ should be for x86_64 architecture;
for Xavier AGX you may like to use l4t containers from ngc.nvidia.com:

nvcr.io/nvidia/l4t-base:r32.4.2

Thuy.Truong · July 24, 2020, 5:12pm

Sorry to jump in here, I just started with the AGX Xavier, and having issue with the docker.

thuy@worker03-xavieragx:~$ sudo docker run --runtime nvidia --network host -it -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/l4t-base:r32.3.1
Unable to find image ‘nvcr.io/nvidia/l4t-base:r32.3.1’ locally
r32.3.1: Pulling from nvidia/l4t-base
8aaa03d29a6e: Pull complete
e73d3a974854: Pull complete
2c14cdba18f5: Pull complete
23dd63c7659b: Pull complete
3bd414bd9504: Pull complete
cafd526eb263: Pull complete
483b0873e636: Pull complete
2568c5428ff2: Pull complete
6bcd9356d42f: Pull complete
c7f6d0180a4e: Pull complete
beddc9b83fb0: Pull complete
656f2307c79e: Pull complete
fe2e73a571b7: Pull complete
f5decba41c07: Pull complete
f0b6e413c48c: Pull complete
Digest: sha256:e8987d52ddb9496948e02656fc62d46561abce25bfe83203f4bc24c67e094578
Status: Downloaded newer image for nvcr.io/nvidia/l4t-base:r32.3.1
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused “process_linux.go:449: container init caused "process_linux.go:432: running prestart hook 0 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\n\""”: unknown.
ERRO[0072] error waiting for container: context canceled

thuy@worker03-xavieragx:~$ nvidia-container-cli list
nvidia-container-cli: initialization error: driver error: failed to process request

Not sure why I have issue with the driver error as everything has been installed through the JetPack 4.4 which should include all necessary drivers for nvidia?

Can you give me some pointers here or let me know if I should open a new thread for this?

Thanks,
Thuy

Andrey1984 · July 24, 2020, 5:37pm

does the issue persist if you run the docker without the runtime argument?

Thuy.Truong · July 24, 2020, 5:50pm

No, there is no issue with docker itself. Switching to the second Xavier, it is working fine (so I can remove the current drivers and reinstall them later). I’m then having issue with this xavier when joining an existing kubernetes cluster, the nvidia-device-plugin-daemonset does not work. I just want to expose the GPU to the cluster.

I used this command for the plugin in the master-node:
$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.6.0/nvidia-device-plugin.yml

thuy@thuy-xavier-02:~$ kubectl version
Client Version: version.Info{Major:“1”, Minor:“18”, GitVersion:“v1.18.6”, GitCommit:“dff82dc0de47299ab66c83c626e08b245ab19037”, GitTreeState:“clean”, BuildDate:“2020-07-15T16:58:53Z”, GoVersion:“go1.13.9”, Compiler:“gc”, Platform:“linux/arm64”}

thuy@thuy-xavier-02:~$ docker version
Client:
Version: 19.03.6
API version: 1.40
Go version: go1.12.17
Git commit: 369ce74a3c
Built: Fri Feb 28 23:47:53 2020
OS/Arch: linux/arm64
Experimental: false

thuy@thuy-xavier-02:~$ cat /etc/docker/daemon.json
{
“runtimes”: {
“nvidia”: {
“path”: “nvidia-container-runtime”,
“runtimeArgs”:
}
}
}

thuy@thuy-xavier-02:~$ nvidia-docker version
NVIDIA Docker: 2.0.3
Client:
Version: 19.03.6
API version: 1.40
Go version: go1.12.17
Git commit: 369ce74a3c
Built: Fri Feb 28 23:47:53 2020
OS/Arch: linux/arm64
Experimental: false

Sorry for switching the topic,

Thuy.Truong · July 27, 2020, 3:41pm

I figured out the issue is that the nvidia-device-plugin in kubernetes requires nvidia-smi to work with, but I don’t have nvidia-smi, but only have tegrastats.

I don’t know if I should try to install nvidia-smi in xavier agx board, so try to figure out how to enable the nvidia-device-plugin to work with tegrastats.

If you have any experience with this, it’d be great to know.

Thanks,
Thuy

Topic		Replies	Views
Error run docker image nvcr.io/nvidia/l4t-base:r32.6.1 on jetson AGX Jetson AGX Xavier docker , jetson	9	2441	November 10, 2021
Creating Containers Using nvidia-docker with AGX Xavier Jetson AGX Xavier docker	8	2737	October 18, 2021
Unable to run nvidia docker Jetson Xavier NX docker	4	3527	December 8, 2021
Jetson AGX Xavier cannot start a basic docker Jetson AGX Xavier docker	8	1288	June 23, 2021
Docker run deepstream error:OCI runtime create failed: container_linux.go:345 in Jetson NX DeepStream SDK	6	4557	October 12, 2021
Cannot run docker with --runtime nvidia Jetson Xavier NX docker , containers	8	7266	December 22, 2021
Trouble running docker on Xavier Jetson AGX Xavier cuda , docker	4	742	August 23, 2023
`nvidia-container-cli` driver error when trying to run Nvidia docker on Jetson Nano Jetson Nano cuda , containers	6	7252	October 18, 2021
Jetson Xavier AGX and Docker Checkpoint issue Jetson Xavier NX kernel , docker	4	993	January 11, 2023
Problem to build a docker container and use the GPU on JETSON AGX ORIGIN Jetson AGX Orin docker	3	564	August 30, 2023

Unable to run a nvidia docker on AGX Xavier

Related topics