Bug Report: 455.45.01 - RTX 3090 incompatible ASRock Rack EPYCD8-2T, Xid 62 error

Hello

I have always encountered Xid 62 issue on ubuntu 20.04 with RTX 3090.

Each time when I run nvidia-smi(it outputs No devices were found), an Xid 62 error is logged into dmesg like this

[ 260.970401] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
[ 260.970523] caller os_map_kernel_space.part.0+0x73/0x80 [nvidia] mapping multiple BARs
[ 261.825963] NVRM: GPU at PCI:0000:01:00: GPU-64588dc5-df58-792c-c418-b5d69e1102e5
[ 261.825967] NVRM: GPU Board Serial Number:
[ 261.825971] NVRM: Xid (PCI:0000:01:00): 62, pid=1904, 0000(0000) 00000000 00000000
[ 269.925557] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x53:0x65:2109)
[ 269.925661] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

Here is my hardware and environment

MOBO: ASRock Rack EPYCD8-2T
CPU: AMD EPYC 7402
GPU: GeForce RTX 3090 (with driver 455.38)
OS: Ubuntu 20.04.1 LTS server (with kernel 5.4.0 and 5.8.0), Windows 7

I had no issue with my hardware and environment with RTX 2080 Ti and Quadro RTX 4000 with driver 450.80 and 455.38 before upgrading the graphics card to RTX 3090.

I also tested the same hardware configuration with Windows 7, it turned out Windows 7 can not auto update the driver for 3090, and I can not even manually install driver 457 on Windows 7.

I have tested RTX 3090 on my 2 other PCs(1 with Windows 7, 1 with Ubuntu 20.04 server), it worked well on both PCs.

So the only possible is that the driver issue, or is RTX3090 incompatible with the motherboard.

nvidia-bug-report log attached.

I can provide ssh access to the server if needed.

A similar bug report can be found here but without answer: AsRock ROMED8-2D + RTX 3090 black screen on Ubuntu 20.04

I also read this post: Random Xid 61 and Xorg lock-up
But the xid I got is 62, not x61, and xid 62 error always happens at each time when I try to run nvidia-smi or other commands which tries to talk to the graphics card.

nvidia-bug-report.log.gz (130.4 KB)

I have tried 2 different PCIE slots on the motherboard, can not make it work neither. I know the PCIE slots are working because previously it was populated with RTX 2080 Ti.

My initial suspect is that this is a VBIOS issue:

root@worker7402:~# cat /proc/driver/nvidia/gpus/0000:01:00.0/information
Model: GeForce RTX 3090
IRQ: 211
GPU UUID: GPU-64588dc5-df58-792c-c418-b5d69e1102e5
Video BIOS: ??.??.??.??.??
Bus Type: PCIe
DMA Size: 47 bits
DMA Mask: 0x7fffffffffff
Bus Location: 0000:01:00.0
Device Minor: 0
Blacklisted: No

The other proof is that Windows 7 refused to install the latest 457 driver on this system.

Hello dodohack,

I went through the same exact issue - same error - vbios as ??? on arch and error 43 in Windows. It turned out I was using a PCI-E 3 riser cable.

Not sure if you’re connecting the GPU straight to the MOBO but I thought I would let you know.

Good luck.

Hi Alex

Thanks for the info.
I’m not using PCIE riser with my 3090, however my motherboard only support PCI-E 3.0, I suspect the issue is with the PCI-E 4.0 and 3.0 compatibility.

I am the author of the bug report you linked. The root cause of my issue was using a PCIe 3.0 cable with BIOS set at PCIe 4.0 speed. Changing the BIOS to 3.0 speed lead to our WS functioning without issues.

You can easily bypass this by selecting gen3 instead of auto in the bios.
testet on EPYCD8-2T with bios 2.60.
bios -> advanced -> chipset -> pcie[1-7]-link-speed set [auto] to [GEN3]

with setting auto: xid 62 / restarts / kernel panic
with setting gen3: working.

kind regards
mario