I am continually getting the error Error: error setting up CDI devices: unresolvable CDI devices nvidia.com/gpu=all.
- Unbuntu 22.04
- Podman version 4.4.1
- NVIDIA Driver Driver Version: 535.129.03 CUDA Version: 12.2
- NVIDIA Container Toolkit CLI version 1.14.3
I have followed these instructions here.
I have checked or did the following:
1. sudo apt-get install -y nvidia-container-toolkit
Reading package lists… Done
Building dependency tree… Done
Reading state information… Done
nvidia-container-toolkit is already the newest version (1.14.3-1).
2. nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 2060 (UUID: GPU-2f68f999-7b41-e64a-3709-0c8c4fa756c0)
3. nvidia-ctk --version
NVIDIA Container Toolkit CLI version 1.14.3
4. sudo nvidia-ctk cdi list
INFO Found 2 CDI devices
5. sudo nvidia-ctk cdi generate --output=/var/run/cdi/nvidia.yaml and sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml followed by sudo chmod a+r /var/run/cdi/nvidia.yaml and sudo chmod a+r /etc/cdi/nvidia.yaml
6. Check user groups, they are: adm cdrom sudo dip video plugdev kvm render lpadmin lxd sambashare vtune
Note: The above points are not in order.
Yet running the command: sudo podman run --rm --device=nvidia.com/gpu=all --security-opt=label=disable ubuntu nvidia-smi -L or podman run --rm --device nvidia.com/gpu=all --security-opt=label=disable ubuntu nvidia-smi -L, results int the error:
Error: error setting up CDI devices: unresolvable CDI devices nvidia.com/gpu=all
Any advice on how to get past this error would be gratefully received. Thank you.
podman --debug should give a better indication of the reason for the failure to resolve devices.
It is most likely that the version of podman being used does not support the v0.5.0 CDI specification being generated by default when running the
nvidia-ctk cdi generate command.
Running (in addition to your other arguments):
nvidia-ctk cdi generate --device-name-strategy=type-index
should generate a CDI specification with nvidia.com/gpu=gpu0 and nvidia.com/gpu=all devices which will have spec version v0.3.0 which has a wider compatibility with Podman.
Note that manually running chmod on the generated cdi specification should not longer be required as this is handled (or should be) by the nvidia-ctk cdi generate command.
Solved the issue with Podman. The Podman installation instructions are not clear and do not state the current issue with Ubuntu apt installer not installing the latest version, a v4.x.x version, only v 3.4.4. V3.4.4 is not compatible with the latest nvidia-ctk tool.
To install the latest Podman follow the instructions on the Podman installation page in the Ubuntu section here: Podman Installation | Podman. Follow the instructions for the alternative kubic repository to get the newer versions (I ignored this as it said they were not production versions. A Podman advisor recommended this (only) approach to get the later versions).
This will download the latest version. At the time of writing v4.6.2. This works.
Trying the alternative installation approaches linked in their pages failed for me.