RTX 6000 Ada driver not loading after displaymodeselector --gpumode compute

Hi, I switched RTX 6000 Ada dedicated to AI/ML workloads to compute mode via v1.59.0 “displaymodeselector --gpumode compute”, it went successfully (log attached). Sadly, the host system doesn’t seem to support this mode and none of the recent drivers are loading. Also latest “displaymodeselector --listgpumodes” hangs or gives an error:

terminate called after throwing an instance of ‘std::runtime_error’
what(): A timeout occurred while waiting for uproc response.

Driver is also not installing. Previously, in graphics mode, driver was loading fine. I suspect the error comes from BIOS as we’re using consumer-grade system. I’ve attached dmesg, nvidia-install.log and dmidecode.

What options do we have, should we use a certified system that supports RTX 6000 Ada in compute mode or is there a way to switch the GPU back to graphics mode?

Thanks in advance.

rtx6000_displaymodeswitch.txt (2.8 KB)
dmidecode.txt (24.8 KB)
dmesg.txt (100.4 KB)
nvidia-installer.log (34.3 KB)

The board seems to support the rtx fine but the gpu is not responding anymore. Seems something went wrong while reflashing. Please try powering off the system and removing power, possibly even removing the rtx from its slot and let in uncharge. Then put it back in and see if it comes alive again.

Thank you! This helped, we actually have 2 identical cards both in compute mode, after letting them discharge we installed them separately into same PCI slot and they loaded drivers.

Now, when we installed them together into 1st and 3rd slot, only one is detected. If we disable the first card, second is still loading driver. If we disable the second GPU PCI slot, first isn’t detected.

So, we have identical cards, one is totally fine and stable, another was working when we let it discharge and then stopped loading driver again.

What do you think would be the right approach to get both cards working? Attaching necessary debug data.

Thanks again for your help!
dmesg_two_cards.txt.gz (23.5 KB)
nvidia-bug-report.log.gz (267.6 KB)

Does the defunct card still work in the other slot?
Maybe check for a bios update for the mainboard.

Yes, both card work separately, if only one of them is installed.

We’ve upgraded to the latest MB BIOS version, behavior didn’t change: one card works fine, and another gives

[ +0.000180] NVRM: The NVIDIA GPU 0000:43:00.0
NVRM: (PCI ID: 10de:26b1) installed in this system has
NVRM: fallen off the bus and is not responding to commands.

Do you think we should switch both cards back to displaymode or it’s rather something with the second card?

Looking at the logs again, the problem seems to be that busmastering can’t be enabled on the second card for unknown reasons. According to lspci -t it sits on the same root complex as the old gt430 you’re using for video. I’d rather try removing that first, it might keep the second rtx from working properly.

Thank you for the suggestion. I removed the old gt430, behavior didn’t change.

Yet, after switching both cards to graphics mode via displaymodeselector v1.60, now both cards are working fine in the same host system. I’ll attach nvidia-bug-report log in case anyone’s interested.

Thank you for you help!
nvidia-bug-report-2cards-graphics-mode.log.gz (628.6 KB)

i also changed my card to compute mode on my HP DL380 Gen9

After rebooting, the BIOS Output is saying:

276-Option Card Configuration Error. One or more option cards are requesting more memory mapped I/O
than is available. Action: Remove one or more option cards to allow the system to boot.

i tried all settings, but with no luck
when i put this card into my desktop pc it does not boot. the bios stops at “VGA load bios” step

how can i get this to run? unloading… how long should i wait before i retry?
thanks for help