`nvidia-container-cli` driver error when trying to run Nvidia docker on Jetson Nano

Hi all,

I am following https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-Container-Runtime-on-Jetson and try to run the deviceQuery container on a Jetson Nano node. But when building the container, I got the following error:

$ sudo docker build -t devicequery .
Sending build context to Docker daemon  214.2MB
Step 1/6 : FROM nvcr.io/nvidia/l4t-base:r32.3.1
 ---> aaaa63e7b12d
Step 2/6 : RUN apt-get update && apt-get install -y --no-install-recommends make g++
 ---> Running in a9a6681a68c9
OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown

When I run deviceQuery on the host without container, it can work properly. I search a lot for solution and I found that it might because the driver is not initialized properly:

$ sudo nvidia-container-cli -k -d /dev/tty info
-- WARNING, the following logs are for debugging purposes only --

I0731 03:40:00.306584 31410 nvc.c:282] initializing library context (version=1.2.0, build=d22237acaea94aa5ad5de70aac903534ed598819)
I0731 03:40:00.306735 31410 nvc.c:256] using root /
I0731 03:40:00.306769 31410 nvc.c:257] using ldcache /etc/ld.so.cache
I0731 03:40:00.306789 31410 nvc.c:258] using unprivileged user 65534:65534
I0731 03:40:00.306848 31410 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0731 03:40:00.307075 31410 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment
W0731 03:40:00.307355 31410 nvc.c:172] failed to detect NVIDIA devices
I0731 03:40:00.307634 31416 nvc.c:192] loading kernel module nvidia
E0731 03:40:00.308079 31416 nvc.c:194] could not load kernel module nvidia
I0731 03:40:00.308105 31416 nvc.c:204] loading kernel module nvidia_uvm
E0731 03:40:00.308336 31416 nvc.c:206] could not load kernel module nvidia_uvm
I0731 03:40:00.308356 31416 nvc.c:212] loading kernel module nvidia_modeset
E0731 03:40:00.308589 31416 nvc.c:214] could not load kernel module nvidia_modeset
I0731 03:40:00.309024 31417 driver.c:101] starting driver service
E0731 03:40:00.309567 31417 driver.c:161] could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
I0731 03:40:00.309821 31410 driver.c:196] driver service terminated successfully
nvidia-container-cli: initialization error: driver error: failed to process request

I have also tried re-installed nvidia-docker2 and reboot the device and docker a couple of times. All of them does not work. The following is my system information:

$ cat /etc/nv_tegra_release 
# R32 (release), REVISION: 4.3, GCID: 21589087, BOARD: t210ref, EABI: aarch64, DATE: Fri Jun 26 04:38:25 UTC 2020
$ sudo dpkg -l | grep nvidia
ii  libnvidia-container-tools                     1.2.0-1                                          arm64        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container0:arm64                    0.9.0~beta.1                                     arm64        NVIDIA container runtime library
ii  libnvidia-container1:arm64                    1.2.0-1                                          arm64        NVIDIA container runtime library
ii  nvidia-container-csv-cuda                     10.2.89-1                                        arm64        Jetpack CUDA CSV file
ii  nvidia-container-csv-cudnn                    8.0.0.180-1+cuda10.2                             arm64        Jetpack CUDNN CSV file
ii  nvidia-container-csv-tensorrt                 7.1.3.0-1+cuda10.2                               arm64        Jetpack TensorRT CSV file
ii  nvidia-container-csv-visionworks              1.6.0.501                                        arm64        Jetpack VisionWorks CSV file
ii  nvidia-container-runtime                      3.3.0-1                                          arm64        NVIDIA container runtime
ii  nvidia-container-toolkit                      1.2.1-1                                          arm64        NVIDIA container runtime hook
ii  nvidia-docker2                                2.4.0-1                                          all          nvidia-docker CLI wrapper
ii  nvidia-l4t-3d-core                            32.4.3-20200625213809                            arm64        NVIDIA GL EGL Package
ii  nvidia-l4t-apt-source                         32.4.3-20200625213809                            arm64        NVIDIA L4T apt source list debian package
ii  nvidia-l4t-bootloader                         32.4.3-20200709231554                            arm64        NVIDIA Bootloader Package
ii  nvidia-l4t-camera                             32.4.3-20200625213809                            arm64        NVIDIA Camera Package
ii  nvidia-l4t-configs                            32.4.3-20200625213809                            arm64        NVIDIA configs debian package
ii  nvidia-l4t-core                               32.4.3-20200625213809                            arm64        NVIDIA Core Package
ii  nvidia-l4t-cuda                               32.4.3-20200625213809                            arm64        NVIDIA CUDA Package
ii  nvidia-l4t-firmware                           32.4.3-20200625213809                            arm64        NVIDIA Firmware Package
ii  nvidia-l4t-graphics-demos                     32.4.3-20200625213809                            arm64        NVIDIA graphics demo applications
ii  nvidia-l4t-gstreamer                          32.4.3-20200625213809                            arm64        NVIDIA GST Application files
ii  nvidia-l4t-init                               32.4.3-20200625213809                            arm64        NVIDIA Init debian package
ii  nvidia-l4t-initrd                             32.4.3-20200625213809                            arm64        NVIDIA initrd debian package
ii  nvidia-l4t-jetson-io                          32.4.3-20200625213809                            arm64        NVIDIA Jetson.IO debian package
ii  nvidia-l4t-jetson-multimedia-api              32.4.3-20200625213809                            arm64        NVIDIA Jetson Multimedia API is a collection of lower-level APIs that support flexible application development.
ii  nvidia-l4t-kernel                             4.9.140-tegra-32.4.3-20200625213809              arm64        NVIDIA Kernel Package
ii  nvidia-l4t-kernel-dtbs                        4.9.140-tegra-32.4.3-20200625213809              arm64        NVIDIA Kernel DTB Package
ii  nvidia-l4t-kernel-headers                     4.9.140-tegra-32.4.3-20200625213809              arm64        NVIDIA Linux Tegra Kernel Headers Package
ii  nvidia-l4t-multimedia                         32.4.3-20200625213809                            arm64        NVIDIA Multimedia Package
ii  nvidia-l4t-multimedia-utils                   32.4.3-20200625213809                            arm64        NVIDIA Multimedia Package
ii  nvidia-l4t-oem-config                         32.4.3-20200625213809                            arm64        NVIDIA OEM-Config Package
ii  nvidia-l4t-tools                              32.4.3-20200709231554                            arm64        NVIDIA Public Test Tools Package
ii  nvidia-l4t-wayland                            32.4.3-20200625213809                            arm64        NVIDIA Wayland Package
ii  nvidia-l4t-weston                             32.4.3-20200625213809                            arm64        NVIDIA Weston Package
ii  nvidia-l4t-x11                                32.4.3-20200625213809                            arm64        NVIDIA X11 Package
ii  nvidia-l4t-xusb-firmware                      32.4.3-20200625213809                            arm64        NVIDIA USB Firmware Package

$ sudo docker info
Client:
 Debug Mode: false

Server:
 Containers: 58
  Running: 16
  Paused: 0
  Stopped: 42
 Images: 18
 Server Version: 19.03.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: nvidia runc
 Default Runtime: nvidia
 Init Binary: docker-init
 containerd version: 
 runc version: 
 init version: 
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.9.140-tegra
 Operating System: Ubuntu 18.04.4 LTS
 OSType: linux
 Architecture: aarch64
 CPUs: 4
 Total Memory: 3.871GiB
 Name: jetson-0
 ID: J3ZG:UP5R:OCKL:LPDA:HOKI:5DH5:2ZHS:2R7H:CGC3:DK5V:AAN6:UPWA
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Can anyone give me some suggestions about this? Thank you so much!

Hi,

Please noticed that there are some version dependency between Nano and docker.
This is because Jetson docker will mount library from the host directly.

It looks like you are using JetPack4.4 GA which includes rel-32.4.3.
So please update the dockerfile for rel-32.4.3 too.

FROM nvcr.io/nvidia/l4t-base:r32.4.3

RUN apt-get update && apt-get install -y --no-install-recommends make g++
COPY ./samples /tmp/samples

WORKDIR /tmp/samples/1_Utilities/deviceQuery
RUN make clean && make

CMD ["./deviceQuery"]
EOF

Thanks.

Hi @AastaLLL,

Thanks for responding. I changed the Dockerfile as you indicated. But it issued the same problem:

$ sudo docker build -t devicequery .

Sending build context to Docker daemon  214.2MB
Step 1/6 : FROM nvcr.io/nvidia/l4t-base:r32.4.3
 ---> c93fc89026d9
Step 2/6 : RUN apt-get update && apt-get install -y --no-install-recommends make g++
 ---> Running in a0318d71e788
OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown

Thanks for the testing.

We are going to reproduce this issue.
Will share more information with you later.

Hi @AastaLLL,

Thanks for your time and effort. I just reflash my Nanos and the problem disappear.

I guess the cause is I was trying to set up a GPU-enable Kubernetes cluster following this instruction: https://github.com/NVIDIA/k8s-device-plugin , which install the nvidia-docker2 package and may overwrite the pre-built libraries.

Really good to know the issue is gone.
Thanks for updating this to us.