I’m trying to make my NVidia Quadro P5000 work in a Razer Core X Chroma eGPU, but it just fails to initialize. (The GPU had been working fine for years in a regular desktop machine.) What could be wrong? I’m out of ideas.
On a Desktop
Motherboard: ASRock x570 Creator
CPU: AMD Ryzen 3950X
System: ArchLinux with kernel 5.9.11
GPU in the on-board PCIe: AMD Radeon Pro W5700
Related kernel flags: pci=realloc,assign-busses,hpbussize=0x33 radeon.auxch=1 mem_encrypt=on
Without the pci=... flag, Thunderbolt devices don’t work. With the flag they appear to work just fine (tested e.g. with a Lenovo Thunderbolt 3 dock).
Here’s a dmesg output when I plug in the eGPU. The most relevant part might be:
Nov 30 16:44:33 charon kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 234
Nov 30 16:44:33 charon kernel: nvidia 0000:3d:00.0: enabling device (0000 -> 0003)
Nov 30 16:44:33 charon kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR0 is 0M @ 0x0 (PCI:0000:3d:00.0)
Nov 30 16:44:33 charon kernel: NVRM: The system BIOS may have misconfigured your GPU.
Nov 30 16:44:33 charon kernel: nvidia: probe of 0000:3d:00.0 failed with error -1
Nov 30 16:44:33 charon kernel: NVRM: The NVIDIA probe routine failed for 1 device(s).
Nov 30 16:44:33 charon kernel: NVRM: None of the NVIDIA devices were initialized.
Nov 30 16:44:33 charon kernel: nvidia-nvlink: Unregistered the Nvlink Core, major device number 234
I’ve searched for the error messages. Starting from the NVidia forums (1) (2), I’ve double-checked that
- Above 64b decoding is enabled in my UEFI setup and
- I do have at least one 64-bit window (>8 hex digits) earlier in
dmesg:Nov 30 15:44:31 archlinux kernel: pci_bus 0000:00: root bus resource [io 0x0000-0x03af window] Nov 30 15:44:31 archlinux kernel: pci_bus 0000:00: root bus resource [io 0x03e0-0x0cf7 window] Nov 30 15:44:31 archlinux kernel: pci_bus 0000:00: root bus resource [io 0x03b0-0x03df window] Nov 30 15:44:31 archlinux kernel: pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window] Nov 30 15:44:31 archlinux kernel: pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window] Nov 30 15:44:31 archlinux kernel: pci_bus 0000:00: root bus resource [mem 0x000c0000-0x000dffff window] Nov 30 15:44:31 archlinux kernel: pci_bus 0000:00: root bus resource [mem 0xb0000000-0xefffffff window] Nov 30 15:44:31 archlinux kernel: pci_bus 0000:00: root bus resource [mem 0x2050000000-0x7fffffffff window] Nov 30 15:44:31 archlinux kernel: pci_bus 0000:00: root bus resource [bus 00-ff]
The eGPU appears normally in boltctl list (authorized etc.). NVidia Quadro P5000 appears in lspci. However, nothing else works, neither the NVidia itself nor the USB hub(s) (with a built-in ASIX ethernet) in the eGPU.
Some threads recommended /sys/bus/pci/devices gymnastics, such as this post, but that not only doesn’t work for me, but this crash from 2015 still crashes my machine today — my system freezes and panic-reboots when I try that. So I haven’t experimented any further.
On a Laptop
Machine: Lenovo X1 Carbon v7
CPU: Intel Core i7-8665U
System: Debian with kernel 5.9.8
Related kernel flags: pci=noats
Importantly, the laptop does not have the NVidia driver installed — some forum posts explicitly asked for dmesg without the NVidia driver. So here it is — a dmesg output from the laptop without NVidia drivers.
Again, boltctl list looks normal (authorized etc.). The NVidia Quadro P5000 appears in lspci. The difference from the desktop case above is that at least something works — the USB buses and the ASIX network card (ax88179_178a). But the NVidia card doesn’t work — “no space for” occurs a number of times in dmesg.