RTX 6000 error when booting on Dell Precision Rack 7920

Using a Dell Precision Rack 7920 computer along with an RTX 6000 I get the following error every time I reboot:

“UEFI0077: One or more PCIe device errors occured in the previous boot.
Check the System Event Log (SEL) to identify the PCIe device with errors, and then update its firmware.”

The SEL contains the same error message (UEFI0077) as well as another entry: “PCI1318 - A fatal error was detected at bus 59 device 0 function 2.”

At the same time that the message pops-up, the LED on the front left of the Dell switches from Blue to Orange.

You can press F1 to continue booting and graphics card will work properly, but basically every reboot will require user intervention to complete.

Some data points:

  • Machine is running CentOS 7.4.
  • If I do a full shutdown instead of a reboot, I don't get the error.
  • Trying a different graphics card (P6000 or RTX 8000), I don't get the error.
  • I've updated the workstation BIOS and iDRAC versions to the latest available.
  • The same card used in a different system model does not cause any similar glitch.
  • The current VBIOS on RTX 6000 is 90.02.15.00.04 - is there a newer VBIOS that I could possibly try?

Any clue ?

Thanks,
Hugo

We have the same kind of problem with a 7920, win10 2016 LTSB and a RTX-6000. We are in the PCIe 3 currently.

However, we found a way to avoid pressing F1, ignore the error. It’s still a 5 min booting time each time…

Thanks for the suggestion, that could have been a workaround I suppose, however in my case it does not work.

It seems the error message stays there forever and awaits user input.

I have this issue on a Dell R740, running CentOS v7.7, with an ELSA RTX2080 Ti. Drivers were installed via ElRepo and the 2 boot messages i get are;

UEFI0060 - Power required exceeds the system PSU’s
UEFI0077 - PCIe device error on previous boot

PCI1318 in the machine logs. Any ideas?

UEFI0060 is related to another card, so it’s just 77 where we have the issue.

Same issue with an RTX Titan, same error, keeps asking for a card firmware update?