Second GPU not picking up the installed NVidia driver

Originally, I was having issues with Ubuntu Server 22.04 installing Cuda and drivers for my two RTX 3090s. After removing one GPU, I was able to get everything installed and working - including running the Cuda bandwidthTest.

I believe the issues I had with the second GPU were related to a bad connection with the riser cable - but I haven’t yet confirmed it.

After installing and configuring the first GPU, I got the second 3090 card now recognized in Ubuntu, however, when running lshw -c display, it shows the adapter as UNCLAIMED and not using the installed nvidia driver.

  *-display:0
       description: VGA compatible controller
       product: GA102 [GeForce RTX 3090]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:03:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller bus_master cap_list
       configuration: driver=nvidia latency=0
       resources: irq:52 memory:fc000000-fcffffff memory:c0000000-cfffffff memory:d2000000-d3ffffff ioport:d80(size=128)
  *-display:1 UNCLAIMED
       description: VGA compatible controller
       product: GA102 [GeForce RTX 3090]
       vendor: NVIDIA Corporation
       physical id: 46
       bus info: pci@0000:03:02.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller cap_list
       configuration: latency=0
       resources: memory:fb000000-fbffffff memory:b0000000-bfffffff memory:d0000000-d1ffffff ioport:d00(size=128)

Please forgive my lack of experience and knowledge working with Linux. Is there an easy remedy for this?

root@k8dp01:~# cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX Open Kernel Module for x86_64  550.54.15  Release Build  (dvs-builder@U16-A24-23-2)  Tue Mar  5 22:15:33 UTC 2024
GCC version:  gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)

It turned out to be the riser cable after all. I had rebooted the VM and nvidia saw the card this time and lshw was showing it was using the nvidia driver. However, when I tried to run the bandwidth test against the second GPU, everything locked up and I had to do an ungraceful shutdown.

At that point I was sure either the GPU had a serious issue or the cable was bad. As soon as I swapped the cable out for a different Gen4 riser, everything worked perfectly for both 3090’s.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.