How to Reinstall Docker and Nvidia-Docker2 on Jetson Nano - Jetpack 4.5.1?

I started to remove Iptables (sudo apt remove iptables) and the package manager removed Docker and Nvidia-Docker in the process. All of the nvidia packages (sudo dpkg-query -l | grep nvidia) seem to be intact, however.

Other than reflashing the Jetson Nano with the SD Card image for Jetpack 4.5.1, is there a way to simply reinstall the correct version of Docker and Nvidia-Docker2 that was used in the original Jetpack image (https://developer.nvidia.com/jetson-nano-sd-card-image.zip)?

Also, can Docker be reinstalled with sudo or does Docker have to be installed as root (sudo su)?

I tried to reinstall Docker by following the instructions at the following links:

  1. Installation Guide — NVIDIA Cloud Native Technologies documentation
  2. Install Docker Engine on Ubuntu | Docker Documentation

Docker version 19.03 is working on the Nano. Verified by running " sudo docker run hello-world"
Nvidia-Docker (NVIDIA Container Toolkit) was also installed successfully.

However, when I verified the install using the following command, I ran into a run-time error: (Driver issue?)

Command: (Cuda 10)
docker run --gpus all -it --rm --network host --volume ~/nvdli-data:/nvdli-nano/data --device /dev/video0 nvcr.io/nvidia/dli/dli-nano-ai:v2.0.1-r32.5.0

Error:
docker: Error response from daemon: OCI runtime create failed: container_linux.g o:367: starting container process caused: process_linux.go:495: container init c aused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nv idia-container-cli: initialization error: driver error: failed to process reques t: unknown.

Hi,

The nvidia-docker2_2.2.0-1_all.deb is included in the JetPack.
Please use SDKmanager to download it (click reflash just for downloading the package).

The default folder is ${HOME}/Downloads/nvidia/sdkm_downloads.

Thanks.

Thanks. I installed SDKmanager on Ubuntu 18 (running dual boot on my laptop). If there are any (how to) posts on using SDKmanager to download and run a single package/debian file, please send the link/url when you get a chance.

Can SDKManager be run from the command line in Ubuntu 20?

Also, can the debian package be downloaded/installed directly from the Nano?

user@nano:~$ cat /etc/apt/sources.list.d/nvidia-container-runtime.list*

deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /

#deb https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) /

deb https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/$(ARCH) /

#deb https://nvidia.github.io/nvidia-container-runtime/experimental/ubuntu18.04/$(ARCH) /

user@nano:~$ cat /etc/apt/sources.list.d/nvidia-docker.list

deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /

#deb https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) /

deb https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/$(ARCH) /

#deb https://nvidia.github.io/nvidia-container-runtime/experimental/ubuntu18.04/$(ARCH) /

deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/$(ARCH) /

Also, I did see that there is an SDKmanager docker container available. Does this mean that we can install/upgrade to Ubuntu 20 on the host and use SDKmanager docker container to flash jetson embedded devices?

Keeping Developers on Ubuntu18 solely for purposes of using SDKmanager is going to be a tough sell…

I saw this post that support was not ready for Ubuntu20:

I downloaded and ran the nvidia_docker debian file as advised. I restarted docker, but I still ran into the same issue. I think the problem is related to the nvidia-container-cli library/module: Driver issue?

Command:
sudo docker run -it --rm --net=host --runtime nvidia -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/l4t-base:r32.4.4

Error:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.

I ran a few more checks:

user@nano:~/Downloads/nvidia/sdkm_downloads$ nvidia-container-cli list --libraries
nvidia-container-cli: initialization error: driver error: failed to process request

user@nano:~/Downloads/nvidia/sdkm_downloads$ nvidia-container-cli -k -d /dev/tty info

– WARNING, the following logs are for debugging purposes only –

I0416 03:07:42.835547 20727 nvc.c:372] initializing library context (version=1.3.3, build=bd9fc3f2b642345301cb2e23de07ec5386232317)
I0416 03:07:42.835653 20727 nvc.c:346] using root /
I0416 03:07:42.835674 20727 nvc.c:347] using ldcache /etc/ld.so.cache
I0416 03:07:42.835697 20727 nvc.c:348] using unprivileged user 1000:994
I0416 03:07:42.835769 20727 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0416 03:07:42.836068 20727 nvc.c:391] dxcore initialization failed, continuing assuming a non-WSL environment
W0416 03:07:42.836381 20727 nvc.c:254] failed to detect NVIDIA devices
W0416 03:07:42.836820 20728 nvc.c:269] failed to set inheritable capabilities
W0416 03:07:42.836886 20728 nvc.c:270] skipping kernel modules load due to failure
I0416 03:07:42.837351 20729 driver.c:101] starting driver service
E0416 03:07:42.837708 20729 driver.c:161] could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
I0416 03:07:42.837944 20727 driver.c:196] driver service terminated successfully
nvidia-container-cli: initialization error: driver error: failed to process request

SOLVED:

There were some package version conflicts (apt package repository issue), so I uninstalled and reinstalled nvidia_docker (and related packages) and cleaned up the apt repository issues (disabled some of the auto updates/upgrades, etc). I essentially restored the Jetpack versions of the packages downloaded with nvidia’s sdkmanager tool.

nvidia_docker and nvidia_container_cli are working again and I can run nvidia containers from nvidia’s container catalog( https://ngc.nvidia.com).

Steps:

  1. uninstall nvidia_docker2 (sudo apt remove nvidia_docker) - nvidia_container_toolkit and nvidia-container-runtime also removed in the process
  2. sudo apt remove libnvidia-container1
  3. sudo dpkg -i libnvidia-container-tools_0.9.0_beta.1_arm64.deb
  4. sudo dpkg -i libnvidia-container-tools_0.9.0_beta.1_arm64.deb
  5. sudo dpkg -i nvidia-container-toolkit_1.0.1-1_arm64.deb
  6. sudo dpkg -i nvidia-container-runtime_3.1.0-1_arm64.deb
  7. sudo dpkg -i nvidia-docker2_2.2.0-1_all.deb
  8. restart docker (sudo systemctl restart docker)

I’ll post the solution on the nvidia_docker git repository as well.