Installing Nvidia toolkit on host not working

Please provide the following info (tick the boxes after creating this topic):
Software Version
DRIVE OS 6.0.8.1

Target Operating System
Linux

Hardware Platform

DRIVE AGX Orin Developer Kit (not sure its number)

Host Machine Version

native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers

Following the instruction at Build and Run Sample Applications for DRIVE OS 6.x Linux | NVIDIA Docs and Installing the NVIDIA Container Toolkit — NVIDIA Container Toolkit 1.14.3 documentation , the final test Running Sample Code give this error:

aseyoum@6368-desktop:~$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown.

There were no error in the priors commands
aseyoum@6368-desktop:~$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown.

This looks the same as the first output. Could you paste the correct one for reference?

Here is the entire sequence of commands on the host PC based on the instruction on this link Installing the NVIDIA Container Toolkit — NVIDIA Container Toolkit 1.14.3 documentation.

aseyoum@6368-desktop:~$ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed ‘s#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g’ | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
File ‘/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg’ exists. Overwrite? (y/N) y
deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/deb/$(ARCH) /
#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/experimental/deb/$(ARCH) /
aseyoum@6368-desktop:~$
aseyoum@6368-desktop:~$
aseyoum@6368-desktop:~$ sudo apt-get update
Get:1 file:/var/cuda-repo-ubuntu2004-11-4-local InRelease [1,575 B]
Get:2 file:/var/nv-driveos-repo-sdk-linux-6.0.6.0-32443318 InRelease
Ign:2 file:/var/nv-driveos-repo-sdk-linux-6.0.6.0-32443318 InRelease
Get:3 file:/var/nv-driveos-repo-sdk-linux-6.0.6.0-32443318 Release [497 B]
Get:1 file:/var/cuda-repo-ubuntu2004-11-4-local InRelease [1,575 B]
Get:3 file:/var/nv-driveos-repo-sdk-linux-6.0.6.0-32443318 Release [497 B]
Get:4 file:/var/nv-driveos-repo-sdk-linux-6.0.6.0-32443318 Release.gpg
Ign:4 file:/var/nv-driveos-repo-sdk-linux-6.0.6.0-32443318 Release.gpg
Hit:5 http://mirrors.zooxlabs.com/ubuntu/latest-b focal InRelease
Hit:6 http://mirrors.zooxlabs.com/ubuntu/latest-b focal-backports InRelease
Hit:7 http://mirrors.zooxlabs.com/zoox-third-party/latest-b/focal focal InRelease
Hit:8 https://nvidia.github.io/libnvidia-container/stable/deb/amd64 InRelease
Hit:9 Index of /compute/cuda/repos/ubuntu2004/x86_64 InRelease
Hit:10 Index of linux/ubuntu/ focal InRelease
Hit:11 https://app.drivestrike.com/static/apt stretch InRelease
Reading package lists… Done
aseyoum@6368-desktop:~$ ^C
aseyoum@6368-desktop:~$
aseyoum@6368-desktop:~$ sudo apt-get install -y nvidia-container-toolkit
Reading package lists… Done
Building dependency tree
Reading state information… Done
nvidia-container-toolkit is already the newest version (1.14.3-1).
The following packages were automatically installed and are no longer required:
amd64-microcode intel-microcode iucode-tool linux-hwe-5.15-headers-5.15.0-83 linux-hwe-5.15-headers-5.15.0-88 thermald
Use ‘sudo apt autoremove’ to remove them.
0 upgraded, 0 newly installed, 0 to remove and 171 not upgraded.
aseyoum@6368-desktop:~$
aseyoum@6368-desktop:~$
aseyoum@6368-desktop:~$ sudo nvidia-ctk runtime configure --runtime=docker
INFO[0000] Loading config from /etc/docker/daemon.json
INFO[0000] Wrote updated config to /etc/docker/daemon.json
INFO[0000] It is recommended that docker daemon be restarted.
aseyoum@6368-desktop:~$
aseyoum@6368-desktop:~$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
aseyoum@6368-desktop:~$
aseyoum@6368-desktop:~$
aseyoum@6368-desktop:~$ sudo systemctl restart docker
aseyoum@6368-desktop:~$
aseyoum@6368-desktop:~$
aseyoum@6368-desktop:~$ sudo nvidia-ctk runtime configure --runtime=containerd
INFO[0000] Loading config from /etc/containerd/config.toml
INFO[0000] Wrote updated config to /etc/containerd/config.toml
INFO[0000] It is recommended that containerd daemon be restarted.
aseyoum@6368-desktop:~$
aseyoum@6368-desktop:~$
aseyoum@6368-desktop:~$ sudo systemctl restart containerd
aseyoum@6368-desktop:~$
aseyoum@6368-desktop:~$
aseyoum@6368-desktop:~$
aseyoum@6368-desktop:~$ sudo nvidia-ctk runtime configure --runtime=crio
INFO[0000] Loading config: /etc/crio/crio.conf
INFO[0000] Loading config from /etc/crio/crio.conf
INFO[0000] Successfully loaded config
INFO[0000] Wrote updated config to /etc/crio/crio.conf
INFO[0000] It is recommended that crio daemon be restarted.
aseyoum@6368-desktop:~$
aseyoum@6368-desktop:~$ sudo systemctl restart crio
Failed to restart crio.service: Unit crio.service not found.

aseyoum@6368-desktop:~$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown.

It appears that there is a driver/library version mismatch error with your GPU drivers. To troubleshoot this, I recommend the following steps:

  • Update your GPU drivers to the latest version compatible with your system.
  • Ensure that nvidia-smi can run successfully outside the Docker container.

Once you’ve completed these steps, try running the Docker command again.

Are you saying to update the host’s GPU driver? If so, where are the compatible drivers located?
Does the same command work to run it outside the Docker container after update the GPU?

Before nvidia-smi, there is this error
$ sudo systemctl restart crio
Failed to restart crio.service: Unit crio.service not found.

What is the solution?

1 Like

Please refer to the “Graphics Driver” column on System Requirements. Yes, it should work after resolving the unmatching issue.

The error with the command should not impact the last Docker command you are trying to run.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.