Nvidia-docker

Please provide complete information as applicable to your setup.

Hardware Platform NVIDIA RTX 3080Ti
GPU DRIVER VERSION 535

TAO nvidia-docker container installation problem

quest@quest-ROG-Strix-G533ZX-G533ZXZ:~$ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \

&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list |
sed ‘s#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g’ |
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
File ‘/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg’ exists. Overwrite? (y/N) y
deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/deb/$(ARCH) /
#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/experimental/deb/$(ARCH) /
quest@quest-ROG-Strix-G533ZX-G533ZXZ:~$ sudo apt-get update
Hit:1 https://nvidia.github.io/libnvidia-container/stable/deb/amd64 InRelease
Hit:2 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64 InRelease
Hit:3 Index of linux/ubuntu/ bionic InRelease
Hit:4 Index of linux/ubuntu/ focal InRelease
Hit:5 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 InRelease
Hit:6 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 InRelease
Hit:7 Index of /ubuntu focal InRelease
Hit:8 Index of /ubuntu focal-security InRelease
Hit:9 Index of /ubuntu focal-updates InRelease
Hit:10 Index of /ubuntu focal-backports InRelease
Reading package lists… Done
quest@quest-ROG-Strix-G533ZX-G533ZXZ:~$ sudo apt-get install -y nvidia-container-toolkit
Reading package lists… Done
Building dependency tree
Reading state information… Done
nvidia-container-toolkit is already the newest version (1.15.0-1).
0 upgraded, 0 newly installed, 0 to remove and 13 not upgraded.
quest@quest-ROG-Strix-G533ZX-G533ZXZ:~$ sudo nvidia-ctk runtime configure --runtime=docker
INFO[0000] Loading config from /etc/docker/daemon.json
INFO[0000] Wrote updated config to /etc/docker/daemon.json
INFO[0000] It is recommended that docker daemon be restarted.
quest@quest-ROG-Strix-G533ZX-G533ZXZ:~$ sudo systemctl restart docker
quest@quest-ROG-Strix-G533ZX-G533ZXZ:~$ nvidia-ctk runtime configure --runtime=docker --config=$HOME/.config/docker/daemon.json
INFO[0000] Loading config from /home/quest/.config/docker/daemon.json
INFO[0000] Wrote updated config to /home/quest/.config/docker/daemon.json
INFO[0000] It is recommended that docker daemon be restarted.
quest@quest-ROG-Strix-G533ZX-G533ZXZ:~$ systemctl --user restart docker
quest@quest-ROG-Strix-G533ZX-G533ZXZ:~$ sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place
quest@quest-ROG-Strix-G533ZX-G533ZXZ:~$ nvidia-docker
nvidia-docker: command not found
quest@quest-ROG-Strix-G533ZX-G533ZXZ:~$ nvidia-docker --version
nvidia-docker: command not found
quest@quest-ROG-Strix-G533ZX-G533ZXZ:~$ nvidia-docker2 --version
nvidia-docker2: command not found
quest@quest-ROG-Strix-G533ZX-G533ZXZ:~$

quest@quest-ROG-Strix-G533ZX-G533ZXZ:~$ systemctl --user restart docker
quest@quest-ROG-Strix-G533ZX-G533ZXZ:~$ systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset>
Active: active (running) since Tue 2024-05-14 01:42:26 IST; 8min ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Main PID: 33010 (dockerd)
Tasks: 26
Memory: 30.0M
CGroup: /system.slice/docker.service
└─33010 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/con>

May 14 01:42:26 quest-ROG-Strix-G533ZX-G533ZXZ dockerd[33010]: time="2024-05-14>
May 14 01:42:26 quest-ROG-Strix-G533ZX-G533ZXZ dockerd[33010]: time="2024-05-14>
May 14 01:42:26 quest-ROG-Strix-G533ZX-G533ZXZ dockerd[33010]: time="2024-05-14>
May 14 01:42:26 quest-ROG-Strix-G533ZX-G533ZXZ dockerd[33010]: time="2024-05-14>
May 14 01:42:26 quest-ROG-Strix-G533ZX-G533ZXZ dockerd[33010]: time="2024-05-14>
May 14 01:42:26 quest-ROG-Strix-G533ZX-G533ZXZ dockerd[33010]: time="2024-05-14>
May 14 01:42:26 quest-ROG-Strix-G533ZX-G533ZXZ dockerd[33010]: time="2024-05-14>
May 14 01:42:26 quest-ROG-Strix-G533ZX-G533ZXZ dockerd[33010]: time="2024-05-14>
May 14 01:42:26 quest-ROG-Strix-G533ZX-G533ZXZ dockerd[33010]: time="2024-05-14>
May 14 01:42:26 quest-ROG-Strix-G533ZX-G533ZXZ systemd[1]: Started Docker Appli>

quest@quest-ROG-Strix-G533ZX-G533ZXZ:~$

There seems to be no problem. In the latest version of nvidia-container-toolkit, nvidia-docker is no longer included. If the following command line runs successfully, it means that the installation is also successful.

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi 

Pleasse refer this installation guide.

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#

(launcher) quest@quest-ROG-Strix-G533ZX-G533ZXZ:~/getting_started_v5.3.0/notebooks/tao_data_services$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Failed to initialize NVML: Unknown Error

Try reboot, or try the following method

(launcher) quest@quest-ROG-Strix-G533ZX-G533ZXZ:~$ tao dataset annotations convert -e $HOST_SPECS_DIR/convert.yaml
2024-05-16 10:29:52,041 [TAO Toolkit] [INFO] root 160: Registry: [‘nvcr.io’]
2024-05-16 10:29:52,075 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.3.0-data-services
2024-05-16 10:29:52,083 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 293:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/quest/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
2024-05-16 10:29:52,083 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
Error response from daemon: No such container: 042fd856b124a51e1b61cd4a34e15242abecc04a3aef7f48ccf70a4d8426fdca
2024-05-16 10:29:52,663 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

NVML error was solved …thanks…But now this error pops up

Looks like it’s a TAO problem, I don’t have much experience with this.

You can discuss this issue in TAO forum.