Hi,
I added an eGPU to my Linux host for some IA/LLM applications. I would enjoy using it for ffmpeg also.
But since a few days, I cannot get a stable setup using a RTX 5060 Ti. Which I have set up via Oculink, a free nvme slot and an ADT-Link F9G adapter (4x 4.0 PCIe). Other people seem happy with such setups.
I believe I have tried all available open drivers (575, 570…). On either the host or inside VMs. Recent 576.52 drivers for Windows 11 inside a VM also.
Currently, I have:
- that eGPU setup with a 650W PSU
- PC/host with Ryzen 9 7945HX(16C/32T) with 96GB of RAM
- working KVM and pci passtrough for the VMs
- a nice Linux/Debian 12 VM
- a nice updated Windows 11 VM
For other usages, without the eGPU, that PC and the VMs run fine.
Inside the Debian 12 VM, after startup, I can successfully run gpu_burn for an hour. For such an initial run, nvidia-smi or nvtop outputs are fine. But if I repeat this burn test, I do quickly get “GSP timeout” crash or such, then I need to cold start the PC.
ollama (latest version) may often (often only) load a small LLM model into the GPU RAM, then it quickly fails or hangs also.
Some CUDA samples like for PCIe bandwidth tests I had used are crashing also.
In the Windows 11 VM, after start up, the driver status and the outputs of nvidia-smi seems fine also; with some Windows processes loaded and running on the eGPU. But if I do use a little bit that VM (using Firefox via the VM console, or RDP for remote connection), I do get a blue screen at the console within seconds. ollama in that VM kills it also.
So far, I couldn’t figure out what could be wrong with the softwares (drivers, kernels, …) I’m using, or if I have any hardware related issue with that eGPU setup. I don’t have another PC to test that 5060 Ti.
I’ve now ordered another Oculink and nvme adapter to rule out any hardware/cable/connections issue on that side.
What else could be done to diagnose or fix such a crashing setup?
Best regards
PS: I do not share any debugs for the time being as they would be needless in case of faulty hardware related crashes.
This is from inside the Win 11 WM, working fine untill I start to use some graphics stressing applications:
>nvidia-smi
Mon Jun 16 20:07:01 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 576.52 Driver Version: 576.52 CUDA Version: 12.9 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 5060 Ti WDDM | 00000000:05:00.0 Off | N/A |
| 0% 36C P8 4W / 180W | 84MiB / 16311MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1052 C+G ...yb3d8bbwe\Notepad\Notepad.exe N/A |
| 0 N/A N/A 5864 C+G ...yb3d8bbwe\WindowsTerminal.exe N/A |
| 0 N/A N/A 7072 C+G C:\Windows\explorer.exe N/A |
| 0 N/A N/A 7924 C+G ...y\StartMenuExperienceHost.exe N/A |
| 0 N/A N/A 7948 C+G ..._cw5n1h2txyewy\SearchHost.exe N/A |
| 0 N/A N/A 8044 C+G ...cw5n1h2txyewy\WidgetBoard.exe N/A |
+-----------------------------------------------------------------------------------------+
> nvidia-smi pmon
# gpu pid type sm mem enc dec jpg ofa command
# Idx # C/G % % % % % % name
0 1052 C+G - - - - - - Notepad.exe
0 5864 C+G - - - - - - WindowsTerminal.
0 7072 C+G - - - - - - explorer.exe
0 7924 C+G - - - - - - StartMenuExperie
0 7948 C+G - - - - - - SearchHost.exe
0 8044 C+G - - - - - - WidgetBoard.exe
0 1052 C+G - - - - - - Notepad.exe
0 5864 C+G - - - - - - WindowsTerminal.
0 7072 C+G - - - - - - explorer.exe
0 7924 C+G - - - - - - StartMenuExperie
0 7948 C+G - - - - - - SearchHost.exe
0 8044 C+G - - - - - - WidgetBoard.exe
0 1052 C+G - - - - - - Notepad.exe
> nvidia-smi pci -gCnt
GPU 0: NVIDIA GeForce RTX 5060 Ti (UUID: GPU-515f7ee1-338e-94e0-ab03-3c881720a7e2)
TX_BYTES: 430684048
RX_BYTES: 290999516
> nvidia-smi pci -gErrCnt
GPU 0: NVIDIA GeForce RTX 5060 Ti (UUID: GPU-515f7ee1-338e-94e0-ab03-3c881720a7e2)
REPLAY_COUNTER: 0
REPLAY_ROLLOVER_COUNTER: 0
L0_TO_RECOVERY_COUNTER: 0
CORRECTABLE_ERRORS: 0
NAKS_RECEIVED: 0
RECEIVER_ERROR: 0
BAD_TLP: 0
NAKS_SENT: 0
BAD_DLLP: 0
NON_FATAL_ERROR: 0
FATAL_ERROR: 0
UNSUPPORTED_REQ: 0
LCRC_ERROR: 0
LANE_ERROR:
lane 0: 0
lane 1: 0
lane 2: 0
lane 3: 0
lane 4: 0
lane 5: 0
lane 6: 0
lane 7: 0
lane 8: 0
lane 9: 0
lane 10: 0
lane 11: 0
lane 12: 0
