Hi everyone, I’m trying to get GPU passthrough working in Proxmox VE, but I’m hitting a consistent reset failure / power state issue with my NVIDIA RTX 5070 Ti.
I’ve done extensive troubleshooting and would really appreciate help confirming whether this is a known limitation, firmware bug, or if there’s anything else I can try.
System details
-
GPU: NVIDIA RTX 5070 Ti (GB203)
-
Motherboard: MSI MAG B860 Tomahawk
-
CPU: (Intel, VT-d enabled)
-
Proxmox: 9.1.1
-
Kernel: 6.17.2-1-pve
-
QEMU: 10.1.2
-
VM: Windows 11 (OVMF / Q35)
Current VM config (relevant parts)
hostpci0: 0000:02:00.0,pcie=1,x-vga=1
hostpci1: 0000:02:00.1,pcie=1
cpu: host (with kvm=off set via QEMU)
After a cold boot, the GPU is healthy: setpci -s 0000:02:00.0 0x00.w 0x02.w returns 10de 2c05
On a fresh Windows install, I once saw the NVIDIA GPU in Device Manager
As soon as I start the VM
1. GPU goes into D3 / low power state
2. VFIO attempts reset
3. GPU fails to recover
4. Eventually becomes unresponsive: setpci → ffff ffff
Only recovery is full power cycle (PSU OFF)
dmesg logs (key errors)
pcieport 0000:00:06.0: Data Link Layer Link Active not set in 100 msec
vfio-pci 0000:02:00.0: timed out waiting for pending transaction; performing function level reset anyway
vfio-pci 0000:02:00.0: not ready 1023ms after FLR; waiting
vfio-pci 0000:02:00.0: not ready 2047ms after FLR; waiting
vfio-pci 0000:02:00.0: not ready 4095ms after FLR; waiting
The GPU exposes only 2 reset methods: cat /sys/bus/pci/devices/0000:02:00.0/reset_method → flr bus
Reset type
Result
FLR → hangs (“not ready after FLR”)
Bus reset (bridge) → GPU becomes unresponsive (ffff)
Attempting echo none > reset_method fails Invalid reset method ‘none’
PCIe link status
lspci -vv -s 00:06.0
lspci -vv -s 02:00.0
-
Root port max: Gen3 x16 (8GT/s)
-
GPU capable: Gen5
-
Running: Gen3 x16 (downgraded) Link appears stable when idle.
BIOS:
VT-d enabled
Above 4G decoding enabled
ASPM disabled
PCIe forced to Gen3
Secure Boot disabled (kernel lockdown = none)
Kernel parameters:
intel_iommu=on iommu=pt
pcie_aspm=off
pcie_port_pm=off
vfio-pci.disable_idle_d3=1
VFIO:
-
Devices bound correctly
-
Verified via lspci -nnk
Reset attempts:
-
Secondary bus reset → breaks GPU
-
FLR → hangs
-
reset_method=none → not supported
Other attempts:
-
Different VM configs
-
Fresh Windows install
-
NVIDIA drivers (fail to initialize GPU)
-
Verified GPU visible only once (after reinstall)
This appears to be : A broken GPU reset path (FLR) combined with unsafe bus reset on this root port (00:06.0), which leads to:
-
VFIO forcing FLR → failure
-
Bus reset → hardware becomes inaccessible
-
No valid reset fallback available
1. Is this a known issue with RTX 50-series GPUs in VFIO passthrough?
2. Is there any way to disable FLR in vfio-pci or QEMU in newer kernels?
3. Could this be:
-
BIOS / PCIe firmware issue?
-
Root port (00:06.0) limitations
-
Are there known workarounds besides:
-
different motherboard / slot
-
or avoiding passthrough entirely?
-
I would really appreciate any help at this point.