I’m trying to passthrough RTX 3060 to instance (virtual machine) under KVM (qemu).
Host is ubuntu20.04. VM is Win10.
Host bios config is :
intel_iommu=on iommu=pt vfio-pci.ids=10de:2487,10de:228b vfio-pci.disable_idle_d3=1
Qemu command line is:
-device vfio-pci,host=0000:18:00.0,id=hostdev0,bus=pci.0,addr=0x9
-device vfio-pci,host=0000:18:00.1,id=hostdev1,bus=pci.0,addr=0xa
At first, it worked very well. VM can use RTX 3060 normally.
I can check RTX 3060 on host as following :
# lspci | grep NVIDIA
18:00.0 VGA compatible controller: NVIDIA Corporation Device 2487 (rev a1)
18:00.1 Audio device: NVIDIA Corporation Device 228b (rev a1)
# lspci -s 18:00.0 -vv
18:00.0 VGA compatible controller: NVIDIA Corporation Device 2487 (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation Device 1530
Physical Slot: 6
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 11
……
……
# lspci -t
……
+-[0000:17]-+-00.0
| +-00.1
| +-00.2
| +-00.4
| \-02.0-[18]--+-00.0
| \-00.1
……
Then I restart VM many times, I lost RTX 3060.
At this time, I check RTX 3060 on host
# lspci | grep NVIDIA
18:00.0 VGA compatible controller: NVIDIA Corporation Device 2487 (rev ff)
18:00.1 Audio device: NVIDIA Corporation Device 228b (rev ff)
# lspci -s 18:00.0 -vv
18:00.0 VGA compatible controller: NVIDIA Corporation Device 2487 (rev ff) (prog-if ff)
!!! Unknown header type 7f
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau
# dmesg | grep vfio
[ 1523.197552] vfio-pci 0000:51:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
[ 1523.197564] vfio-pci 0000:51:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[ 1523.197568] vfio-pci 0000:51:00.0: vfio_ecap_init: hiding ecap 0x26@0xc1c
[ 1523.197569] vfio-pci 0000:51:00.0: vfio_ecap_init: hiding ecap 0x27@0xd00
[ 1523.197570] vfio-pci 0000:51:00.0: vfio_ecap_init: hiding ecap 0x25@0xe00
[ 1523.198897] vfio-pci 0000:51:00.1: vfio_ecap_init: hiding ecap 0x25@0x160
[ 1524.421723] vfio-pci 0000:51:00.1: vfio_bar_restore: reset recovery - restoring BARs
[ 1524.421771] vfio-pci 0000:51:00.0: vfio_bar_restore: reset recovery - restoring BARs
[ 1525.165440] vfio-pci 0000:51:00.0: timed out waiting for pending transaction; performing function level reset anyway
[ 1526.413440] vfio-pci 0000:51:00.0: not ready 1023ms after FLR; waiting
[ 1527.469443] vfio-pci 0000:51:00.0: not ready 2047ms after FLR; waiting
[ 1529.581440] vfio-pci 0000:51:00.0: not ready 4095ms after FLR; waiting
[ 1533.933439] vfio-pci 0000:51:00.0: not ready 8191ms after FLR; waiting
[ 1542.381440] vfio-pci 0000:51:00.0: not ready 16383ms after FLR; waiting
[ 1559.789426] vfio-pci 0000:51:00.0: not ready 32767ms after FLR; waiting
[ 1567.229554] vfio-pci 0000:18:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
[ 1567.229567] vfio-pci 0000:18:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[ 1567.229572] vfio-pci 0000:18:00.0: vfio_ecap_init: hiding ecap 0x26@0xc1c
[ 1567.229573] vfio-pci 0000:18:00.0: vfio_ecap_init: hiding ecap 0x27@0xd00
[ 1567.229574] vfio-pci 0000:18:00.0: vfio_ecap_init: hiding ecap 0x25@0xe00
[ 1567.231029] vfio-pci 0000:18:00.1: vfio_ecap_init: hiding ecap 0x25@0x160
[ 1594.605404] vfio-pci 0000:51:00.0: not ready 65535ms after FLR; giving up
[ 1595.006376] vfio-pci 0000:51:00.0: vfio_bar_restore: reset recovery - restoring BARs
[ 1595.014162] vfio-pci 0000:51:00.1: vfio_bar_restore: reset recovery - restoring BARs
[ 1821.133200] vfio-pci 0000:51:00.0: timed out waiting for pending transaction; performing function level reset anyway
[ 1822.381198] vfio-pci 0000:51:00.0: not ready 1023ms after FLR; waiting
[ 1823.437196] vfio-pci 0000:51:00.0: not ready 2047ms after FLR; waiting
[ 1825.517195] vfio-pci 0000:51:00.0: not ready 4095ms after FLR; waiting
[ 1829.869194] vfio-pci 0000:51:00.0: not ready 8191ms after FLR; waiting
[ 1838.317188] vfio-pci 0000:51:00.0: not ready 16383ms after FLR; waiting
[ 1856.749168] vfio-pci 0000:51:00.0: not ready 32767ms after FLR; waiting
[ 1891.565131] vfio-pci 0000:51:00.0: not ready 65535ms after FLR; giving up
I attempted rescan PCI device manually, but I lost the device forever.
# file /sys/devices/pci0000:17/0000:17:02.0/0000:18:00.0
/sys/devices/pci0000:17/0000:17:02.0/0000:18:00.0: directory
# echo 1 > /sys/devices/pci0000:17/0000:17:02.0/0000:18:00.0/remove
# echo 1 > /sys/devices/pci0000:17/0000:17:02.0/0000:18:00.1/remove
# echo 1 > /sys/devices/pci0000:17/0000:17:02.0/rescan
# file /sys/devices/pci0000:17/0000:17:02.0/0000:18:00.0
/sys/devices/pci0000:17/0000:17:02.0/0000:19:00.0: cannot open `/sys/devices/pci0000:17/0000:17:02.0/0000:19:00.0' (No such file or directory)
# lspci | grep NVIDIA
(display nothing)
The only method is reboot host.
Any idea is appreciate.