Unable to run nvidia docker

Hi,

I’ve been looking through the forum in an attempt to solve my issue with running l4t docker images:

docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.

I started off Containers For Deep Learning Frameworks User Guide :: NVIDIA Deep Learning Frameworks Documentation set up the repositories and installed nvidia-docker2 nvidia-container-runtime from here:
https://nvidia.github.io/nvidia-container-runtime/ and https://nvidia.github.io/nvidia-docker/
The contents of nvidia-container-runtime was present in nvidia-docker2 so I only kept nvidia-docker2 (/etc/apt/sources.list.d/nvidia-docker.list)
Pulled nvcr.io/nvidia/l4t-base r32.6.1 from https://catalog.ngc.nvidia.com/

docker and nvidia-docker commands give me the same error:

sudo docker run -it --rm --net=host --runtime nvidia -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/l4t-base:r32.6.1
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
sudo nvidia-docker run -it --rm --net=host -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/l4t-base:r32.6.1
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.

However, removing --runtime nvidia from docker run puts me at prompt:

sudo docker run -it --rm --net=host -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/l4t-base:r32.6.1
root@tc-xavier:/# 

I tried to downgrade docker.io as suggested by another post, but it gave me the same result.
The oldest version, and the one I have installed now:

dpkg -l | grep docker.io
ii  docker.io    20.10.2-0ubuntu1~18.04.2    arm64    Linux container runtime

I was thinking that there are some problems with the container runtime but I don’t know how to proceed.
Any suggestions to steer me onto the correct path?

Some information about the system I’m running.

  • Installed from SD with JetPack 4.6.
  • rootfs is located on m2ssd instead of sd card.
uname -a
Linux tc-xavier 4.9.253-tegra #31 SMP PREEMPT Thu Sep 30 13:31:51 EEST 2021 aarch64 aarch64 aarch64 GNU/Linux
cat /etc/nv_tegra_release 
# R32 (release), REVISION: 6.1, GCID: 27863751, BOARD: t186ref, EABI: aarch64, DATE: Mon Jul 26 19:36:31 UTC 2021
cat /etc/os-release 
NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.6 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
cat /etc/apt/sources.list.d/nvidia-docker.list 
deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /
#deb https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/$(ARCH) /
#deb https://nvidia.github.io/nvidia-container-runtime/experimental/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/$(ARCH) /
cat /etc/docker/daemon.json 
{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
sudo docker image ls
REPOSITORY                TAG       IMAGE ID       CREATED        SIZE
ubuntu                    latest    d5ca7a445605   4 weeks ago    65.6MB
hello-world               latest    18e5af790473   7 weeks ago    9.14kB
nvcr.io/nvidia/l4t-base   r32.6.1   1735192d3d51   3 months ago   636MB

Thanks in advance :)

Hi @tor.christian.eriksen - ok, this is the same as my working Docker shows on L4T R32.6.1. So I don’t think it’s that persay.

Docker and the NVIDIA Container Runtime are already installed on the SD card image (if you’re using the devkit) or when you use SDK Manager to flash/setup the module. So you shouldn’t manually have to do those install steps - probably something is still missing or misconfigured.

You could try installing these packages with apt:

apt-cache search nvidia-container
libnvidia-container-tools - NVIDIA container runtime library (command-line tools)
libnvidia-container0 - NVIDIA container runtime library
nvidia-container-csv-cuda - Jetpack CUDA CSV file
nvidia-container-csv-cudnn - Jetpack CUDNN CSV file
nvidia-container-csv-tensorrt - Jetpack TensorRT CSV file
nvidia-container-csv-visionworks - Jetpack VisionWorks CSV file
nvidia-container-runtime - NVIDIA container runtime
nvidia-container-toolkit - NVIDIA container runtime hook
nvidia-docker2 - nvidia-docker CLI wrapper
nvidia-container - NVIDIA Container Meta Package
nvidia-container-csv-opencv - Jetpack OpenCV CSV file

Alternatively, before spending to much time on it, I would just re-flash your SD card and Docker/l4t-base should work out-of-the-box for you. And avoid doing the apt upgrade for now (or apt-mark the docker packages as hold) to avoid that other docker issue for now until it’s fully resolved.

1 Like

You could try installing these packages with apt

I did this and reinstalled the packages you listed, and the following command puts me at prompt, so now it is working.
Felt I did this multiple times already but somehow I must have messed it up.

sudo nvidia-docker run -it --rm --net=host -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/l4t-base:r32.6.1

The nbody sample simulation is working so it is looking good so far.
I appreciate the super fast reply, thank you very much! :)

OK great, glad that you got it working! Thanks for letting us know.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.