I started to remove Iptables (sudo apt remove iptables) and the package manager removed Docker and Nvidia-Docker in the process. All of the nvidia packages (sudo dpkg-query -l | grep nvidia) seem to be intact, however.
Other than reflashing the Jetson Nano with the SD Card image for Jetpack 4.5.1, is there a way to simply reinstall the correct version of Docker and Nvidia-Docker2 that was used in the original Jetpack image (https://developer.nvidia.com/jetson-nano-sd-card-image.zip)?
Also, can Docker be reinstalled with sudo or does Docker have to be installed as root (sudo su)?
I tried to reinstall Docker by following the instructions at the following links:
Docker version 19.03 is working on the Nano. Verified by running " sudo docker run hello-world"
Nvidia-Docker (NVIDIA Container Toolkit) was also installed successfully.
However, when I verified the install using the following command, I ran into a run-time error: (Driver issue?)
Thanks. I installed SDKmanager on Ubuntu 18 (running dual boot on my laptop). If there are any (how to) posts on using SDKmanager to download and run a single package/debian file, please send the link/url when you get a chance.
Can SDKManager be run from the command line in Ubuntu 20?
Also, can the debian package be downloaded/installed directly from the Nano?
Also, I did see that there is an SDKmanager docker container available. Does this mean that we can install/upgrade to Ubuntu 20 on the host and use SDKmanager docker container to flash jetson embedded devices?
Keeping Developers on Ubuntu18 solely for purposes of using SDKmanager is going to be a tough sell…
I saw this post that support was not ready for Ubuntu20:
I downloaded and ran the nvidia_docker debian file as advised. I restarted docker, but I still ran into the same issue. I think the problem is related to the nvidia-container-cli library/module: Driver issue?
Error:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
I ran a few more checks:
user@nano:~/Downloads/nvidia/sdkm_downloads$ nvidia-container-cli list --libraries
nvidia-container-cli: initialization error: driver error: failed to process request
user@nano:~/Downloads/nvidia/sdkm_downloads$ nvidia-container-cli -k -d /dev/tty info
– WARNING, the following logs are for debugging purposes only –
I0416 03:07:42.835547 20727 nvc.c:372] initializing library context (version=1.3.3, build=bd9fc3f2b642345301cb2e23de07ec5386232317)
I0416 03:07:42.835653 20727 nvc.c:346] using root /
I0416 03:07:42.835674 20727 nvc.c:347] using ldcache /etc/ld.so.cache
I0416 03:07:42.835697 20727 nvc.c:348] using unprivileged user 1000:994
I0416 03:07:42.835769 20727 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0416 03:07:42.836068 20727 nvc.c:391] dxcore initialization failed, continuing assuming a non-WSL environment
W0416 03:07:42.836381 20727 nvc.c:254] failed to detect NVIDIA devices
W0416 03:07:42.836820 20728 nvc.c:269] failed to set inheritable capabilities
W0416 03:07:42.836886 20728 nvc.c:270] skipping kernel modules load due to failure
I0416 03:07:42.837351 20729 driver.c:101] starting driver service
E0416 03:07:42.837708 20729 driver.c:161] could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
I0416 03:07:42.837944 20727 driver.c:196] driver service terminated successfully
nvidia-container-cli: initialization error: driver error: failed to process request
There were some package version conflicts (apt package repository issue), so I uninstalled and reinstalled nvidia_docker (and related packages) and cleaned up the apt repository issues (disabled some of the auto updates/upgrades, etc). I essentially restored the Jetpack versions of the packages downloaded with nvidia’s sdkmanager tool.
nvidia_docker and nvidia_container_cli are working again and I can run nvidia containers from nvidia’s container catalog( https://ngc.nvidia.com).
Steps:
uninstall nvidia_docker2 (sudo apt remove nvidia_docker) - nvidia_container_toolkit and nvidia-container-runtime also removed in the process
I’m having the same issue upgrading docker from 19 to 20.
@kaisark How do you get libnvidia-container-tools_0.9.0_beta.1_arm64.deb ? I cant find it anywhere.
nvidia’s sdkmanager does not allow to choose package version.
I added https://nvidia.githib.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) to /etc/aptsources.list.d/nvidia-container-runtime.list
but after apt update , the oldest version of libnvidia-container-tools is the buggy 0.10.0 . With apt cache policy I can see all the newest version of the package up to 1.5.1-1 . I tried to upgrade it, but I still have the same issue.
I also tried to upgrade nvidia-docker2 up to the latest 2.6.0-1 , and still having the issue.