One out of three GPUs is not loading driver in Ubuntu 22.04

I have a workstation with an Asus X99-E WS/USB3.1 running 4001 bios with an Intel Xeon E5-1680v4 CPU. I have Above 4g Decoding enabled in the bios and CSM disabled. I have an RTX 4080 and 2x FE RTX 3090s but the OS is only loading drivers to the 4080 and one of the two 3090s. The 2nd 3090 is present, identified, but “Unclaimed” in lshw and does not show in the Nvidia X Server Settings app or nvidia-smi. I am running the nvidia-driver-550-open kernel module. I included relevant outputs below as well as the dmesg segments related to the “working” 3090 (PCI Device 06) and the one that is not playing well (PCI Device 09) for comparison between the 2. I am not a linux expert but can follow instructions and google adequately. It seems like an issue with memory address allocation/recognition but that runs up against the limits of my knowledge. I would think as a 64-bit system there is plenty of address space (128 GB of system ram). Please help me figure out how to get all 3 GPUs properly loaded and online.

$ uname -a
Linux <server name> 6.5.0-41-generic #41~22.04.2-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun  3 11:32:55 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

$ lspci | grep VGA
05:00.0 VGA compatible controller: NVIDIA Corporation Device 2704 (rev a1)
06:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)
09:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)

$ sudo lshw -c display
[sudo] password for : 
  *-display UNCLAIMED       
       description: VGA compatible controller
       product: GA102 [GeForce RTX 3090]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:09:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller cap_list
       configuration: latency=0
       resources: iomemory:38000-37fff iomemory:38000-37fff memory:380000000000-38000fffffff memory:380010000000-380011ffffff ioport:e000(size=128)
  *-display
       description: VGA compatible controller
       product: GA102 [GeForce RTX 3090]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:06:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
       configuration: driver=nvidia latency=0
       resources: irq:182 memory:f6000000-f6ffffff memory:a0000000-afffffff memory:b0000000-b1ffffff ioport:b000(size=128) memory:f7000000-f707ffff
  *-display
       description: VGA compatible controller
       product: NVIDIA Corporation
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:05:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
       configuration: driver=nvidia latency=0
       resources: irq:183 memory:f8000000-f8ffffff memory:c0000000-cfffffff memory:d0000000-d1ffffff ioport:c000(size=128) memory:f9000000-f907ffff
  *-graphics
       product: EFI VGA
       physical id: 2
       logical name: /dev/fb0
       capabilities: fb
       configuration: depth=32 resolution=2560,1440

$ nvidia-smi
Mon Jul  8 17:14:51 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4080        Off |   00000000:05:00.0  On |                  N/A |
|  0%   32C    P8             11W /  320W |     278MiB /  16376MiB |      8%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        Off |   00000000:06:00.0 Off |                  N/A |
|  0%   31C    P8             15W /  350W |      18MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2080      G   /usr/lib/xorg/Xorg                            135MiB |
|    0   N/A  N/A      2279      G   /usr/bin/gnome-shell                           92MiB |
|    0   N/A  N/A      2297      G   /opt/teamviewer/tv_bin/TeamViewer               3MiB |
|    0   N/A  N/A      2958      G   ...bian-installation/ubuntu12_32/steam          3MiB |
|    0   N/A  N/A      3789      G   /usr/bin/nvidia-settings                        0MiB |
|    0   N/A  N/A      4266      G   ./steamwebhelper                                4MiB |
|    1   N/A  N/A      2080      G   /usr/lib/xorg/Xorg                              4MiB |
+-----------------------------------------------------------------------------------------+

$ sudo dmesg | grep ":06"
[    6.886930] pci 0000:06:00.0: [10de:2204] type 00 class 0x030000
[    6.886950] pci 0000:06:00.0: reg 0x10: [mem 0xf6000000-0xf6ffffff]
[    6.886969] pci 0000:06:00.0: reg 0x14: [mem 0xa0000000-0xafffffff 64bit pref]
[    6.886987] pci 0000:06:00.0: reg 0x1c: [mem 0xb0000000-0xb1ffffff 64bit pref]
[    6.886999] pci 0000:06:00.0: reg 0x24: [io  0xb000-0xb07f]
[    6.887012] pci 0000:06:00.0: reg 0x30: [mem 0xf7000000-0xf707ffff pref]
[    6.887017] pci 0000:06:00.0: enabling Extended Tags
[    6.887083] pci 0000:06:00.0: PME# supported from D0 D3hot
[    6.887171] pci 0000:06:00.0: 126.016 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x16 link at 0000:00:03.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[    6.887261] pci 0000:06:00.1: [10de:1aef] type 00 class 0x040300
[    6.887277] pci 0000:06:00.1: reg 0x10: [mem 0xf7080000-0xf7083fff]
[    6.887322] pci 0000:06:00.1: enabling Extended Tags
[    6.889863] pci 0000:0f:00.0: [1b21:0612] type 00 class 0x010601
[    6.890326] pci 0000:10:00.0: [1b21:0612] type 00 class 0x010601
[    6.900743] pci 0000:06:00.0: vgaarb: bridge control possible
[    6.900744] pci 0000:06:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    6.930579] pci_bus 0000:06: resource 0 [io  0xb000-0xbfff]
[    6.930580] pci_bus 0000:06: resource 1 [mem 0xf6000000-0xf70fffff]
[    6.930582] pci_bus 0000:06: resource 2 [mem 0xa0000000-0xb1ffffff 64bit pref]
[    6.976558] pci 0000:06:00.1: extending delay after power-on from D3hot to 20 msec
[    6.976619] pci 0000:06:00.1: D0 power state depends on 0000:06:00.0
[   11.141984] nvidia 0000:06:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[   11.240607] snd_hda_intel 0000:06:00.1: enabling device (0100 -> 0102)
[   11.240710] snd_hda_intel 0000:06:00.1: Disabling MSI
[   11.240731] snd_hda_intel 0000:06:00.1: Handle vga_switcheroo audio client
[   11.274972] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:03.0/0000:03:00.0/0000:04:08.0/0000:06:00.1/sound/card1/input20
[   11.275101] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:03.0/0000:03:00.0/0000:04:08.0/0000:06:00.1/sound/card1/input21
[   11.275261] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:03.0/0000:03:00.0/0000:04:08.0/0000:06:00.1/sound/card1/input22
[   11.275417] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:03.0/0000:03:00.0/0000:04:08.0/0000:06:00.1/sound/card1/input23
[   12.985655] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:06:00.0 on minor 0
$ sudo dmesg | grep ":09"
[    6.885439] pci 0000:09:00.0: [10de:2204] type 00 class 0x030000
[    6.885459] pci 0000:09:00.0: reg 0x10: [mem 0xfa000000-0xfaffffff]
[    6.885477] pci 0000:09:00.0: reg 0x14: [mem 0x383fe0000000-0x383fefffffff 64bit pref]
[    6.885495] pci 0000:09:00.0: reg 0x1c: [mem 0x383ff0000000-0x383ff1ffffff 64bit pref]
[    6.885508] pci 0000:09:00.0: reg 0x24: [io  0xe000-0xe07f]
[    6.885520] pci 0000:09:00.0: reg 0x30: [mem 0xfb000000-0xfb07ffff pref]
[    6.885526] pci 0000:09:00.0: enabling Extended Tags
[    6.885592] pci 0000:09:00.0: PME# supported from D0 D3hot
[    6.885682] pci 0000:09:00.0: 126.016 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x16 link at 0000:00:02.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[    6.885772] pci 0000:09:00.1: [10de:1aef] type 00 class 0x040300
[    6.885788] pci 0000:09:00.1: reg 0x10: [mem 0xfb080000-0xfb083fff]
[    6.885834] pci 0000:09:00.1: enabling Extended Tags
[    6.900734] pci 0000:09:00.0: vgaarb: setting as boot VGA device
[    6.900736] pci 0000:09:00.0: vgaarb: bridge control possible
[    6.900737] pci 0000:09:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    6.929570] pci 0000:09:00.0: BAR 1: no space for [mem size 0x10000000 64bit pref]
[    6.929571] pci 0000:09:00.0: BAR 1: failed to assign [mem size 0x10000000 64bit pref]
[    6.929572] pci 0000:09:00.0: BAR 3: no space for [mem size 0x02000000 64bit pref]
[    6.929574] pci 0000:09:00.0: BAR 3: failed to assign [mem size 0x02000000 64bit pref]
[    6.930196] pci 0000:09:00.0: BAR 1: assigned [mem 0x380000000000-0x38000fffffff 64bit pref]
[    6.930209] pci 0000:09:00.0: BAR 3: assigned [mem 0x380010000000-0x380011ffffff 64bit pref]
[    6.930221] pci 0000:09:00.0: BAR 0: no space for [mem size 0x01000000]
[    6.930222] pci 0000:09:00.0: BAR 0: failed to assign [mem size 0x01000000]
[    6.930223] pci 0000:09:00.0: BAR 6: no space for [mem size 0x00080000 pref]
[    6.930224] pci 0000:09:00.0: BAR 6: failed to assign [mem size 0x00080000 pref]
[    6.930226] pci 0000:09:00.1: BAR 0: no space for [mem size 0x00004000]
[    6.930227] pci 0000:09:00.1: BAR 0: failed to assign [mem size 0x00004000]
[    6.930570] pci_bus 0000:09: resource 0 [io  0xe000-0xefff]
[    6.930571] pci_bus 0000:09: resource 2 [mem 0x380000000000-0x380017ffffff 64bit pref]
[    6.976509] pci 0000:09:00.1: extending delay after power-on from D3hot to 20 msec
[    6.976545] pci 0000:09:00.1: D0 power state depends on 0000:09:00.0
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:09:00.0)
[   11.141816] nvidia: probe of 0000:09:00.0 failed with error -1
[   11.240314] snd_hda_intel 0000:09:00.1: Disabling MSI
[   11.240345] snd_hda_intel 0000:09:00.1: Handle vga_switcheroo audio client

nvidia-bug-report.log.gz (585.0 KB)

Found the solution, ended up needing to do this:

Add to /etc/default/grub in the GRUB_CMDLINE_LINUX_DEFAULT the parameter pci=realloc=off

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.