I’m stuck on installing an NVIDIA driver on Ubuntu 22.04 with NVIDIA A100. Does anyone have any suggestion? Basically I’m following the instructions of “NVIDIA Driver Installation Quickstart Guide”.
This is my environment.
$ lspci
…
06:00.0 3D controller [0302]: NVIDIA Corporation GA100 [A100 PCIe 80GB] [10de:20b5] (rev a1)
Subsystem: NVIDIA Corporation GA100 [A100 PCIe 80GB] [10de:1593]
Kernel modules: nvidiafb, nouveau, nvidia
…
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.2 LTS
Release: 22.04
Codename: jammy
$ uname -r
5.15.0-73-generic
After running “sudo apt-get -y install cuda-drivers”, I rebooted the server and did the post-installation stuff in /etc/environment.
$ cat /etc/environment
PATH=“/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/cuda-12.1/bin”
LD_LIBRARY_PATH=“/usr/local/cuda-12.1/lib64”
$ echo $PATH
/home/ubuntu/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/cuda-12.1/bin
$ echo $LD_LIBRARY_PATH
/usr/local/cuda-12.1/lib64
However, nvidia-smi fails and syslog shows the device 10de:20b5 is not supported by the driver 530.30.02.
$ nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
$ sudo tail -n20 /var/log/syslog
Jun 6 05:21:28 kernel: [ 849.878253] nvidia: probe of 0000:06:00.0 failed with error -1
Jun 6 05:21:28 kernel: [ 849.878279] NVRM: The NVIDIA probe routine failed for 1 device(s).
Jun 6 05:21:28 kernel: [ 849.878281] NVRM: None of the NVIDIA devices were initialized.
Jun 6 05:21:28 kernel: [ 849.878478] nvidia-nvlink: Unregistered Nvlink Core, major device number 234
Jun 6 05:21:28 systemd-udevd[1091]: nvidia: Process ‘/sbin/modprobe nvidia-uvm’ failed with exit code 1.
Jun 6 05:21:28 systemd[1]: nvidia-persistenced.service: Start request repeated too quickly.
Jun 6 05:21:28 systemd[1]: nvidia-persistenced.service: Failed with result ‘exit-code’.
Jun 6 05:21:28 systemd[1]: Failed to start NVIDIA Persistence Daemon.
Jun 6 05:21:28 kernel: [ 850.002682] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
Jun 6 05:21:28 kernel: [ 850.002688] NVRM: The NVIDIA GPU 0000:06:00.0 (PCI ID: 10de:20b5)
Jun 6 05:21:28 kernel: [ 850.002688] NVRM: installed in this system is not supported by the
Jun 6 05:21:28 kernel: [ 850.002688] NVRM: NVIDIA 530.30.02 driver release.
Jun 6 05:21:28 kernel: [ 850.002688] NVRM: Please see ‘Appendix A - Supported NVIDIA GPU Products’
Jun 6 05:21:28 kernel: [ 850.002688] NVRM: in this release’s README, available on the operating system
Jun 6 05:21:28 kernel: [ 850.002688] NVRM: specific graphics driver download page at www+nvidia+com.
Jun 6 05:21:28 kernel: [ 850.009560] nvidia: probe of 0000:06:00.0 failed with error -1
Jun 6 05:21:28 kernel: [ 850.009585] NVRM: The NVIDIA probe routine failed for 1 device(s).
Jun 6 05:21:28 kernel: [ 850.009587] NVRM: None of the NVIDIA devices were initialized.
Jun 6 05:21:28 kernel: [ 850.009816] nvidia-nvlink: Unregistered Nvlink Core, major device number 234
Jun 6 05:21:28 systemd-udevd[1091]: nvidia: Process ‘/sbin/modprobe nvidia-modeset’ failed with exit code 1.
According to the README of 530.30.02 [Appendix A. Supported NVIDIA GPU Products], it supports NVIDIA A100 80GB PCIe 20B5 10DE 1642. Am I missing something?