Hello! I’ve got problems with a Linux Laptop + eGPU setup.
Specs:
OS: Fedora 38 KDE Plasma
eGPU: RTX 3070 TI
Driver version: 535 and 470
I’ve installed drivers following these recommendations and tried to use both Wayland and X11. Tried to use the latest ones and 470 ones. The issue is that driver cannot find eGPU device for some reason even though device is detected by the system:
For configuration and management I also tried to use all-ways-egpu for Wayland, gswitch and egpu-switcher for X11. Attaching artifacts from nvidia-bug-report.sh for both 535 and 470 versions. Also need to mention that this exact hardware setup was working fine about a year ago, when I just configured it with Ubuntu, a couple months later after a some update it broke and never repaired since then.
Setting any of the options in GRUB pcie_aspm=off nouveau.modeset=0 nvidia.NVreg_OpenRmEnableUnsupportedGpus=1 didn’t help neither with akmod-nvidia nor with akmod-nvidia-open.
Installation of the latest 535 using run file: sudo ./NVIDIA-Linux-x86_64-535.113.01.run -m=kernel-open didn’t work.
FoundRmInitAdapter failed! error in the logs generated by nvidia-bug-report.sh both for proprietary and open versions. Got compilation errors during my attempts to build 470.82.00 and 515.105.01 driver versions (again using run files) with Fedora Linux (6.5.8-200.fc38.x86_64) kernel, so no luck with that.
Going to try the latest at the moment version (545.23.06) using the run file, see no more options if this fails.
Installing the latest beta driver using run file also didn’t work: sudo ./NVIDIA-Linux-x86_64-545.23.06.run -m=kernel-open. So I currently see virtually no possibilities to make an external GPU work with the latest kernels.
That’s weird, the log above is from either akmod-nvidia-open package or NVIDIA-Linux-x86_64-535.113.01.run run file (can’t recall exactly), also you can see that /tmp/selfgz5811/NVIDIA-Linux-x86_64-535.113.01/kernel-open path was used along the compilation and [ 146.427273] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64 535.113.01 Release Build (dvs-builder@U16-I2-C03-37-4) Tue Sep 12 19:48:46 UTC 2023 was set.
Yeah, this one nvidia-bug-report-535-open.log.gz, you can search for NVIDIA UNIX Open Kernel in this file. There’s just a collection of all my attempts I guess for the past few days :)
Okay, I’ll try again the akmod-nvidia-open package and generate the report once again. Practically I’ve got the same result on the open one, the external monitor didn’t get the signal and overall it’s a black screen.
This might be just a timing issue, i.e. the driver loads too late and the Xserver/Wayland is already up. Can’t really tell without a proper log, though.
Okay, here are the new attempt, interesting part starts form line 7k. I’ve booted with nvidia.NVreg_OpenRmEnableUnsupportedGpus=1 parameter set and open driver version was loaded, but I’ve got RmInitAdapter failed!. Also with the OpenRmEnableUnsupportedGpus parameter set I’ve got nothing on lsmod | grep -i nvidia, so modules didn’t load. Without this parameter modules are loaded.
Yes, now I can see something new, but idk what it means) I think I’d need to create a new issue on GitHub…
Oct 31 17:53:41 fedora kernel: NVRM objClInitPcieChipset: *** Chipset Setup Function Error!
Oct 31 17:53:44 fedora kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:52:00.0 on minor 1
Oct 31 17:53:44 fedora systemd[1]: nvidia-fallback.service - Fallback to nouveau as nvidia did not load was skipped because of an unmet condition check (ConditionPathExists=!/sys/module/nvidia).
Oct 31 17:54:06 fedora kernel: NVRM unixCallVideoBIOS: int10h(4f02, 0000) vesa call failed! (4f02, 0000)
Oct 31 17:54:06 fedora kernel: NVRM nvCheckOkFailedNoLog: Check failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from pRmApi->Control(pRmApi, nv->rmapi.hClient, nv->rmapi.hSubDevice, NV2080_CTRL_CMD_INTERNAL_DISPLAY_POST_RESTORE, &restoreParams, sizeof(restoreParams)) @ unix_console.c:197
Alright, I update it periodically, but I’ll check if they got anything new. Again, it was working good a year ago and after a some system update (can’t recall if a BIOS update was also involved) it broke…
Thanks!
Yep, I was trying to restart it, but it doesn’t help. Checked everything, updated outdated, but the BIOS version is the latest (N3AET77W (1.42) 2023-09-21). Also eGPU works on Windows just fine.
Yes, the error 0x26,0x56 RmInitAdapter failed! (0x26:0x56:1482)
is specific to the proprietary linux driver. Seems to expect something special from the system bios not available for eGPUs over TB.
Usually the open driver works without an issue in that case. Maybe open an issue on github with the -open driver to shed some light on this.