I’m trying to make my NVidia Quadro P5000 work in a Razer Core X Chroma eGPU, but it just fails to initialize. (The GPU had been working fine for years in a regular desktop machine.) What could be wrong? I’m out of ideas.
On a Desktop
Motherboard: ASRock x570 Creator
CPU: AMD Ryzen 3950X
System: ArchLinux with kernel 5.9.11
GPU in the on-board PCIe: AMD Radeon Pro W5700
Related kernel flags: pci=realloc,assign-busses,hpbussize=0x33 radeon.auxch=1 mem_encrypt=on
Without the pci=...
flag, Thunderbolt devices don’t work. With the flag they appear to work just fine (tested e.g. with a Lenovo Thunderbolt 3 dock).
Here’s a dmesg
output when I plug in the eGPU. The most relevant part might be:
Nov 30 16:44:33 charon kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 234
Nov 30 16:44:33 charon kernel: nvidia 0000:3d:00.0: enabling device (0000 -> 0003)
Nov 30 16:44:33 charon kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR0 is 0M @ 0x0 (PCI:0000:3d:00.0)
Nov 30 16:44:33 charon kernel: NVRM: The system BIOS may have misconfigured your GPU.
Nov 30 16:44:33 charon kernel: nvidia: probe of 0000:3d:00.0 failed with error -1
Nov 30 16:44:33 charon kernel: NVRM: The NVIDIA probe routine failed for 1 device(s).
Nov 30 16:44:33 charon kernel: NVRM: None of the NVIDIA devices were initialized.
Nov 30 16:44:33 charon kernel: nvidia-nvlink: Unregistered the Nvlink Core, major device number 234
I’ve searched for the error messages. Starting from the NVidia forums (1) (2), I’ve double-checked that
- Above 64b decoding is enabled in my UEFI setup and
- I do have at least one 64-bit window (>8 hex digits) earlier in
dmesg
:Nov 30 15:44:31 archlinux kernel: pci_bus 0000:00: root bus resource [io 0x0000-0x03af window] Nov 30 15:44:31 archlinux kernel: pci_bus 0000:00: root bus resource [io 0x03e0-0x0cf7 window] Nov 30 15:44:31 archlinux kernel: pci_bus 0000:00: root bus resource [io 0x03b0-0x03df window] Nov 30 15:44:31 archlinux kernel: pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window] Nov 30 15:44:31 archlinux kernel: pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window] Nov 30 15:44:31 archlinux kernel: pci_bus 0000:00: root bus resource [mem 0x000c0000-0x000dffff window] Nov 30 15:44:31 archlinux kernel: pci_bus 0000:00: root bus resource [mem 0xb0000000-0xefffffff window] Nov 30 15:44:31 archlinux kernel: pci_bus 0000:00: root bus resource [mem 0x2050000000-0x7fffffffff window] Nov 30 15:44:31 archlinux kernel: pci_bus 0000:00: root bus resource [bus 00-ff]
The eGPU appears normally in boltctl list
(authorized etc.). NVidia Quadro P5000 appears in lspci
. However, nothing else works, neither the NVidia itself nor the USB hub(s) (with a built-in ASIX ethernet) in the eGPU.
Some threads recommended /sys/bus/pci/devices
gymnastics, such as this post, but that not only doesn’t work for me, but this crash from 2015 still crashes my machine today — my system freezes and panic-reboots when I try that. So I haven’t experimented any further.
On a Laptop
Machine: Lenovo X1 Carbon v7
CPU: Intel Core i7-8665U
System: Debian with kernel 5.9.8
Related kernel flags: pci=noats
Importantly, the laptop does not have the NVidia driver installed — some forum posts explicitly asked for dmesg
without the NVidia driver. So here it is — a dmesg
output from the laptop without NVidia drivers.
Again, boltctl list
looks normal (authorized etc.). The NVidia Quadro P5000 appears in lspci
. The difference from the desktop case above is that at least something works — the USB buses and the ASIX network card (ax88179_178a
). But the NVidia card doesn’t work — “no space for
” occurs a number of times in dmesg
.