Unable to install Tesla V100 GPU drivers on Ubuntu 20.04

Hi Nvidia!

I’m having a very difficult time getting Nvidia drivers to run on my Ubuntu 20.04 vm. Everything appears to be loading, and accessible, but the driver fails to initialize. The main error I’m getting is this:

[ 14.764808] NVRM: GPU 0000:04:01.0: RmInitAdapter failed! (0x11:0x45:2550)

I’ve tried many combinations of supported OS and driver version combinations, and always hit the same issue. Here’s some more detailed evidence and I’ve attached an Nvidia Bug Report log.

If anyone can help me troubleshoot this or make recommendations, it will be greatly appreciated.

Thanks in advance,
James

DETAILS:

ubuntu@ubuntu-2004-fresh:~$ lsmod | grep nouveau

ubuntu@ubuntu-2004-fresh:~$ lsmod | grep nvidia
nvidia_uvm 1548288 0
nvidia_drm 94208 0
nvidia_modeset 1327104 1 nvidia_drm
nvidia 56172544 2 nvidia_uvm,nvidia_modeset
drm_kms_helper 307200 1 nvidia_drm
drm 618496 4 drm_kms_helper,nvidia,nvidia_drm

ubuntu@ubuntu-2004-fresh:~$ lspci -k
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
Subsystem: Red Hat, Inc. Qemu virtual machine
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
Subsystem: Red Hat, Inc. Qemu virtual machine
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
Subsystem: Red Hat, Inc. Qemu virtual machine
Kernel driver in use: ata_piix
Kernel modules: pata_acpi
00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
Subsystem: Red Hat, Inc. QEMU Virtual Machine
Kernel driver in use: uhci_hcd
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
Subsystem: Red Hat, Inc. Qemu virtual machine
Kernel driver in use: piix4_smbus
Kernel modules: i2c_piix4
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
Subsystem: Red Hat, Inc. QEMU Virtual Machine
Kernel modules: cirrusfb, cirrus
00:03.0 Ethernet controller: Red Hat, Inc. Virtio network device
Subsystem: Red Hat, Inc. Virtio network device
Kernel driver in use: virtio-pci
00:04.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
00:05.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
00:06.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
00:07.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
00:08.0 SCSI storage controller: Red Hat, Inc. Virtio block device
Subsystem: Red Hat, Inc. Virtio block device
Kernel driver in use: virtio-pci
00:09.0 SCSI storage controller: Red Hat, Inc. Virtio block device
Subsystem: Red Hat, Inc. Virtio block device
Kernel driver in use: virtio-pci
00:0a.0 SCSI storage controller: Red Hat, Inc. Virtio block device
Subsystem: Red Hat, Inc. Virtio block device
Kernel driver in use: virtio-pci
00:0b.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon
Subsystem: Red Hat, Inc. Virtio memory balloon
Kernel driver in use: virtio-pci
00:0c.0 Unclassified device [00ff]: Red Hat, Inc. Virtio RNG
Subsystem: Red Hat, Inc. Virtio RNG
Kernel driver in use: virtio-pci
00:1c.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
00:1d.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
00:1e.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
00:1f.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
04:01.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)
Subsystem: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB]
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

ubuntu@ubuntu-2004-fresh:~$ sudo lshw -c video
*-display UNCLAIMED
description: VGA compatible controller
product: GD 5446
vendor: Cirrus Logic
physical id: 2
bus info: pci@0000:00:02.0
version: 00
width: 32 bits
clock: 33MHz
capabilities: vga_controller
configuration: latency=0
resources: memory:fa000000-fbffffff memory:fde90000-fde90fff memory:c0000-dffff
*-display
description: 3D controller
product: GV100GL [Tesla V100 PCIe 16GB]
vendor: NVIDIA Corporation
physical id: 1
bus info: pci@0000:04:01.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress bus_master cap_list
configuration: driver=nvidia latency=0
resources: iomemory:140-13f iomemory:180-17f irq:11 memory:fc000000-fcffffff memory:1400000000-17ffffffff memory:1800000000-1801ffffff

ubuntu@ubuntu-2004-fresh:~$ nvidia-smi
No devices were found

$ cat /proc/modules | grep nvidia
nvidia_uvm 1548288 0 - Live 0x0000000000000000 (POE)
nvidia_drm 94208 0 - Live 0x0000000000000000 (POE)
nvidia_modeset 1327104 1 nvidia_drm, Live 0x0000000000000000 (POE)
nvidia 56172544 2 nvidia_uvm,nvidia_modeset, Live 0x0000000000000000 (POE)
drm_kms_helper 307200 1 nvidia_drm, Live 0x0000000000000000
drm 618496 4 nvidia_drm,nvidia,drm_kms_helper, Live 0x0000000000000000

$ lsmod | grep nvidia
nvidia_uvm 1548288 0
nvidia_drm 94208 0
nvidia_modeset 1327104 1 nvidia_drm
nvidia 56172544 2 nvidia_uvm,nvidia_modeset
drm_kms_helper 307200 1 nvidia_drm
drm 618496 4 drm_kms_helper,nvidia,nvidia_drm

$ sudo dmesg | grep NVRM
[ 3.495769] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 545.23.08 Mon Nov 6 23:49:37 UTC 2023
[ 14.764808] NVRM: GPU 0000:04:01.0: RmInitAdapter failed! (0x11:0x45:2550)
[ 14.764975] NVRM: GPU 0000:04:01.0: rm_init_adapter failed, device minor number 0
[ 19.575876] NVRM: GPU 0000:04:01.0: RmInitAdapter failed! (0x11:0x45:2550)
[ 19.576969] NVRM: GPU 0000:04:01.0: rm_init_adapter failed, device minor number 0
[ 24.410782] NVRM: GPU 0000:04:01.0: RmInitAdapter failed! (0x11:0x45:2550)
[ 24.411840] NVRM: GPU 0000:04:01.0: rm_init_adapter failed, device minor number 0
[ 29.238481] NVRM: GPU 0000:04:01.0: RmInitAdapter failed! (0x11:0x45:2550)
[ 29.239520] NVRM: GPU 0000:04:01.0: rm_init_adapter failed, device minor number 0
[ 170.721938] NVRM: GPU 0000:04:01.0: RmInitAdapter failed! (0x11:0x45:2550)
[ 170.722969] NVRM: GPU 0000:04:01.0: rm_init_adapter failed, device minor number 0
[ 175.534128] NVRM: GPU 0000:04:01.0: RmInitAdapter failed! (0x11:0x45:2550)
[ 175.535229] NVRM: GPU 0000:04:01.0: rm_init_adapter failed, device minor number 0
[ 530.279582] NVRM: GPU 0000:04:01.0: RmInitAdapter failed! (0x11:0x45:2550)
[ 530.280652] NVRM: GPU 0000:04:01.0: rm_init_adapter failed, device minor number 0
[ 535.091587] NVRM: GPU 0000:04:01.0: RmInitAdapter failed! (0x11:0x45:2550)
[ 535.092652] NVRM: GPU 0000:04:01.0: rm_init_adapter failed, device minor number 0

ubuntu@ubuntu-2004-fresh:~$ echo “options nvidia NVreg_LogVerbose=1” | sudo tee /etc/modprobe.d/nvidia-debugging.conf
options nvidia NVreg_LogVerbose=1

ubuntu@ubuntu-2004-fresh:~$ sudo update-initramfs -u
update-initramfs: Generating /boot/initrd.img-5.15.0-1046-ibm

$ echo “options nvidia NVreg_LogVerbose=1” | sudo tee /etc/modprobe.d/nvidia-debugging.conf
options nvidia NVreg_LogVerbose=1

ubuntu@ubuntu-2004-fresh:~$ sudo update-initramfs -u
update-initramfs: Generating /boot/initrd.img-5.15.0-1046-ibm

REBOOT…

ubuntu@ubuntu-2004-fresh:~$ sudo dmesg | grep NVRM
[ 3.529295] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 545.23.08 Mon Nov 6 23:49:37 UTC 2023
[ 14.786409] NVRM: GPU 0000:04:01.0: RmInitAdapter failed! (0x11:0x45:2550)
[ 14.786575] NVRM: GPU 0000:04:01.0: rm_init_adapter failed, device minor number 0

ubuntu@ubuntu-2004-fresh:~$ sudo nvidia-bug-report.sh

nvidia-bug-report.log.gz (96.4 KB)