RTX 5070 Ti GPU passthrough fails in Proxmox – stuck in D3 / FLR reset loop → device becomes unresponsive (ffff)

Hi everyone, I’m trying to get GPU passthrough working in Proxmox VE, but I’m hitting a consistent reset failure / power state issue with my NVIDIA RTX 5070 Ti.

I’ve done extensive troubleshooting and would really appreciate help confirming whether this is a known limitation, firmware bug, or if there’s anything else I can try.

System details

  • GPU: NVIDIA RTX 5070 Ti (GB203)

  • Motherboard: MSI MAG B860 Tomahawk

  • CPU: (Intel, VT-d enabled)

  • Proxmox: 9.1.1

  • Kernel: 6.17.2-1-pve

  • QEMU: 10.1.2

  • VM: Windows 11 (OVMF / Q35)

Current VM config (relevant parts)

hostpci0: 0000:02:00.0,pcie=1,x-vga=1

hostpci1: 0000:02:00.1,pcie=1

cpu: host (with kvm=off set via QEMU)

After a cold boot, the GPU is healthy: setpci -s 0000:02:00.0 0x00.w 0x02.w returns 10de 2c05

On a fresh Windows install, I once saw the NVIDIA GPU in Device Manager

As soon as I start the VM

1. GPU goes into D3 / low power state
2. VFIO attempts reset
3. GPU fails to recover
4. Eventually becomes unresponsive: setpci → ffff ffff

Only recovery is full power cycle (PSU OFF)

dmesg logs (key errors)

pcieport 0000:00:06.0: Data Link Layer Link Active not set in 100 msec

vfio-pci 0000:02:00.0: timed out waiting for pending transaction; performing function level reset anyway

vfio-pci 0000:02:00.0: not ready 1023ms after FLR; waiting

vfio-pci 0000:02:00.0: not ready 2047ms after FLR; waiting

vfio-pci 0000:02:00.0: not ready 4095ms after FLR; waiting

The GPU exposes only 2 reset methods: cat /sys/bus/pci/devices/0000:02:00.0/reset_method → flr bus

Reset type

Result

FLRhangs (“not ready after FLR”)

Bus reset (bridge)GPU becomes unresponsive (ffff)

Attempting echo none > reset_method fails Invalid reset method ‘none’

PCIe link status

lspci -vv -s 00:06.0

lspci -vv -s 02:00.0

  • Root port max: Gen3 x16 (8GT/s)

  • GPU capable: Gen5

  • Running: Gen3 x16 (downgraded) Link appears stable when idle.

BIOS:

VT-d enabled

Above 4G decoding enabled

ASPM disabled

PCIe forced to Gen3

Secure Boot disabled (kernel lockdown = none)

Kernel parameters:

intel_iommu=on iommu=pt

pcie_aspm=off

pcie_port_pm=off

vfio-pci.disable_idle_d3=1

VFIO:

  • Devices bound correctly

  • Verified via lspci -nnk

Reset attempts:

  • Secondary bus reset → breaks GPU

  • FLR → hangs

  • reset_method=none → not supported

Other attempts:

  • Different VM configs

  • Fresh Windows install

  • NVIDIA drivers (fail to initialize GPU)

  • Verified GPU visible only once (after reinstall)

This appears to be : A broken GPU reset path (FLR) combined with unsafe bus reset on this root port (00:06.0), which leads to:

  • VFIO forcing FLR → failure

  • Bus reset → hardware becomes inaccessible

  • No valid reset fallback available

1. Is this a known issue with RTX 50-series GPUs in VFIO passthrough?

2. Is there any way to disable FLR in vfio-pci or QEMU in newer kernels?

3. Could this be:

  • BIOS / PCIe firmware issue?

  • Root port (00:06.0) limitations

  • Are there known workarounds besides:

    • different motherboard / slot

    • or avoiding passthrough entirely?

I would really appreciate any help at this point.