eGPU not being picked up by Nvidia control panel + Unable to force device by PCIE address

Hi All,

So I’m just experimenting with Linux as a workstation and the first hurdle I’m hitting is getting the eGPU going.

I’ve tried updating the xorg.conf using the nvidia-xconfig utility and then updating the PCI bus ID to “PCI:8:0:0” as per the various outputs showing the device there but not getting display on the eGPU outputs.

sudo dmesg | grep -i nvidia
[ 11.344061] nvidia: module license ‘NVIDIA’ taints kernel.
[ 11.475339] nvidia-nvlink: Nvlink Core is being initialized, major device number 506
[ 11.476487] nvidia 0000:01:00.0: enabling device (0000 → 0003)
[ 11.476601] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[ 11.521726] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 525.85.05 Sat Jan 14 00:49:50 UTC 2023
[ 11.568942] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 525.85.05 Sat Jan 14 00:40:03 UTC 2023
[ 11.577151] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ 11.580241] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card2/input34
[ 11.580426] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card2/input35
[ 11.581173] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card2/input36
[ 11.581306] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card2/input37
[ 11.581414] input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card2/input38
[ 11.703905] audit: type=1400 audit(1676407721.600:5): apparmor=“STATUS” operation=“profile_load” profile=“unconfined” name=“nvidia_modprobe” pid=932 comm=“apparmor_parser”
[ 11.703909] audit: type=1400 audit(1676407721.600:6): apparmor=“STATUS” operation=“profile_load” profile=“unconfined” name=“nvidia_modprobe//kmod” pid=932 comm=“apparmor_parser”
[ 12.318833] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
[ 12.355658] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[ 12.381779] nvidia-uvm: Loaded the UVM driver, major device number 504.
[ 13.278286] nvidia 0000:08:00.0: enabling device (0000 → 0003)
[ 13.278399] nvidia 0000:08:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[ 13.769665] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:1d.4/0000:03:00.0/0000:04:01.0/0000:06:00.0/0000:07:01.0/0000:08:00.1/sound/card6/input46
[ 13.769784] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:1d.4/0000:03:00.0/0000:04:01.0/0000:06:00.0/0000:07:01.0/0000:08:00.1/sound/card6/input47
[ 13.769874] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:1d.4/0000:03:00.0/0000:04:01.0/0000:06:00.0/0000:07:01.0/0000:08:00.1/sound/card6/input48
[ 13.769958] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:1d.4/0000:03:00.0/0000:04:01.0/0000:06:00.0/0000:07:01.0/0000:08:00.1/sound/card6/input49
[ 13.770042] input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:1d.4/0000:03:00.0/0000:04:01.0/0000:06:00.0/0000:07:01.0/0000:08:00.1/sound/card6/input50
[ 13.770126] input: HDA NVidia HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:1d.4/0000:03:00.0/0000:04:01.0/0000:06:00.0/0000:07:01.0/0000:08:00.1/sound/card6/input51
[ 13.770221] input: HDA NVidia HDMI/DP,pcm=12 as /devices/pci0000:00/0000:00:1d.4/0000:03:00.0/0000:04:01.0/0000:06:00.0/0000:07:01.0/0000:08:00.1/sound/card6/input52

lspci | grep -i vga
00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-H GT2 [UHD Graphics 630]
01:00.0 VGA compatible controller: NVIDIA Corporation TU106M [GeForce RTX 2070 Mobile] (rev a1)
08:00.0 VGA compatible controller: NVIDIA Corporation Device 2203 (rev a1)

xorg.conf

nvidia-xconfig: X configuration file generated by nvidia-xconfig

nvidia-xconfig: version 525.85.05

Section “ServerLayout”
Identifier “Default Layout”
Screen “Default Screen” 0 0
InputDevice “Keyboard0” “CoreKeyboard”
InputDevice “Mouse0” “CorePointer”
EndSection

Section “Module”
Load “modesetting”
Load “glx”
EndSection

Section “InputDevice”
# generated from default
Identifier “Keyboard0”
Driver “kbd”
EndSection

Section “InputDevice”
# generated from default
Identifier “Mouse0”
Driver “mouse”
Option “Protocol” “auto”
Option “Device” “/dev/psaux”
Option “Emulate3Buttons” “no”
Option “ZAxisMapping” “4 5”
EndSection

Section “Device”
Identifier “nvidia”
Driver “nvidia”
BusID “PCI:1:0:0”
EndSection

Section “Screen”
Identifier “Default Screen”
Device “nvidia”
DefaultDepth 24
Option “AllowExternalGpus” “true”
SubSection “Display”
Depth 24
Modes “nvidia-auto-select”
EndSubSection
EndSection

nvidia-bug-report.log.gz (575.6 KB)

Please let me know if there is anything else you need.

Many thanks,
Martin.

This looks like a timing issue. When the egpu gets added to the system (13s), the nvidia-drm module is already initialized (11s) due to your on-board mobile nvidia gpu. In contrast to the nvidia module, nvidia-drm does not seem to be hot-plug capable, resulting in the egpu visible in nvidia-smi but not usable for the xorg server.
Maybe use some script to unload/reload nvidia-modeset/nvidia-drm when the egpu gets added triggered by udev or systemd.