VFIO VGA arbitration lock

mbroemme · May 29, 2013, 6:29pm

Hi,

recently I’ve tried to use the NVIDIA driver for my primary GeForce GT
520 and use VFIO (Virtual Function I/O) framework to passthrough my
secondary GPU to KVM virtual machine. Obviously it failed because the
‘nvidia’ kernel module locks the legacy I/O and memory region. Due to
this ‘qemu’ runs in a deadlock. This works fine with the open-source
‘nouveau’ driver and attached is a fix for the binary driver version
319.23. However I don’t know if it is the proper way. After some testing
(Xorg, DRI, VDPAU, switching between X and text-console) it looks
stable. This patch was also posted on ‘qemu-devel’ and ‘kvm’ mailing
list.

[url]http://lists.gnu.org/archive/html/qemu-devel/2013-05/msg04300.html[/url]

Basically the vga_tryget code doesn’t make much sense at all because the
driver never do a vga_put and vga_tryget is actually locking VGA arbitration.
This makes multi-GPU systems with different GPU vendors mostly
non-working.

diff -Nur NVIDIA-Linux-x86_64-319.23/kernel/nv-linux.h NVIDIA-Linux-x86_64-319.23-vfio-vgaarb-fix/kernel/nv-linux.h
— NVIDIA-Linux-x86_64-319.23/kernel/nv-linux.h 2013-05-17 04:00:02.000000000 +0200
+++ NVIDIA-Linux-x86_64-319.23-vfio-vgaarb-fix/kernel/nv-linux.h 2013-05-29 18:09:42.382925622 +0200
@@ -151,9 +151,9 @@
#error “struct file_operations compile test likely failed!”
#endif

-#if defined(CONFIG_VGA_ARB)
-#include <linux/vgaarb.h>
-#endif
+//#if defined(CONFIG_VGA_ARB)
+//#include <linux/vgaarb.h>
+//#endif

#if defined(NV_VM_INSERT_PAGE_PRESENT)
#include <linux/pagemap.h>
diff -Nur NVIDIA-Linux-x86_64-319.23/kernel/nv.c NVIDIA-Linux-x86_64-319.23-vfio-vgaarb-fix/kernel/nv.c
— NVIDIA-Linux-x86_64-319.23/kernel/nv.c 2013-05-17 04:00:02.000000000 +0200
+++ NVIDIA-Linux-x86_64-319.23-vfio-vgaarb-fix/kernel/nv.c 2013-05-29 18:10:01.494277314 +0200
@@ -2914,12 +2914,12 @@

 pci_set_master(dev);

-#if defined(CONFIG_VGA_ARB)
-#if defined(VGA_DEFAULT_DEVICE)

vga_tryget(VGA_DEFAULT_DEVICE, VGA_RSRC_LEGACY_MASK);
-#endif
vga_set_legacy_decoding(dev, VGA_RSRC_NONE);
-#endif
+//#if defined(CONFIG_VGA_ARB)
+//#if defined(VGA_DEFAULT_DEVICE)
+// vga_tryget(VGA_DEFAULT_DEVICE, VGA_RSRC_LEGACY_MASK);
+//#endif
+// vga_set_legacy_decoding(dev, VGA_RSRC_NONE);
+//#endif

if (NV_IS_GVI_DEVICE(nv))
{

Kind regards
Maik

ajs124 · October 4, 2013, 9:44pm

Hi, I’ve been using this patch for a few months now.

I’m interested if this is the right way to do this, too and would like to have an official response.

Ideally I want to be able to run a unpatched driver, so this should be solved somehow.

sandipt · October 14, 2013, 9:48am

Hey guys, Thanks for reporting this issue. Please provide reproduction steps step-by-step so I can reproduce this issue in house. Also attach nvidia bug report.

ajs124 · January 10, 2014, 9:29pm

Thanks for your response, it seems as if I didn’t get a notification for this or something.

I run into problems when using vfio to pass another GPU to a guest and this patch fixed those problems.

The best step-by-step reproduction is probably this forum post which I followed to set this up.

Basically you need a kernel newer than 3.9 with specific options enabled and a very recent version of qemu and seabios. Qemu is then called with some parameters to specify the device to pass through and stuff:

/usr/bin/qemu-system-x86_64 -M q35 -enable-kvm -vga none -nographic -bios /usr/share/qemu/bios.bin -cpu host -device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 -device vfio-pci,host=06:00.0,x-vga=on,addr=0.0,multifunction=on,bus=root.1 -device vfio-pci,host=06:00.1,addr=0.1,bus=root.1

The nvidia-bug-report.log is (hopefully) attached.

nvidia-bug-report.log.gz (268 KB)

mbroemme · February 5, 2014, 4:08pm

If you need any further help please tell me. I’m also highly interested in seeing this issue fixed officially.

teekay78 · February 5, 2014, 8:13pm

I too have been using this patch for some months already with a GT430 (host) and an elderly 8500GT (guest) in a KVM VFIO VGA passthrough setup. Works just fine, no side effects whatsoever.

Actually, because this setup works so well, I’m thinking of buying a more powerful nVidia card for the windows virtual machine. Seeing this “feature” officially supported (or that bug fixed, whatever you may call it) would second this being a good investment.

mbroemme · February 18, 2014, 1:49am

Just to bump up. There is a very long discussion in the Arch Linux forum about VFIO primary GPU passthrough already and several people are using this patch.

https://bbs.archlinux.org/viewtopic.php?id=162768

It becomes more critical to fix it as wider audience is using it. Moreover for now Nvidia is the only binary driver which will work on hostnode using VFIO primary/secondary GPU passthrough with it.

sandipt · February 19, 2014, 6:03am

mbroemme, - Could you please provide nvidia bug report of your system?

I’ve tried to use the NVIDIA driver for my primary GeForce GT
520 and use VFIO (Virtual Function I/O) framework to passthrough my
secondary GPU to KVM virtual machine.

What is make and model of you secondary GPU to KVM virtual machine ?
Also when the issue occurs - As soon as you fire command and launch VM?
What error did you observe in dmesg Or logs ?

sandipt · February 19, 2014, 6:17am

Filed bug internal 1463972 to track this issue.

sandipt · February 22, 2014, 2:08pm

we followed the procedure as per KVM VGA-Passthrough using the new vfio-vga support in kernel =>3.9 / Kernel & Hardware / Arch Linux Forums link.

While launching the VM with AMD GPU pass-through we are seeing below error message:

qemu-system-x86_64: -device vfio-pci,host=06:00.0,x-vga=on,addr=0.0,multifunction=on,bus=root.1: vfio: error no iommu_group for device
qemu-system-x86_64: -device vfio-pci,host=06:00.0,x-vga=on,addr=0.0,multifunction=on,bus=root.1: Device initialization failed.
qemu-system-x86_64: -device vfio-pci,host=06:00.0,x-vga=on,addr=0.0,multifunction=on,bus=root.1: Device ‘vfio-pci’ could not be initialized

Comment which we used, “/usr/bin/qemu-system-x86_64 -M q35 -enable-kvm -vga none -nographic -bios /usr/share/qemu/bios.bin -cpu host -device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 -device vfio-pci,host=06:00.0,x-vga=on,addr=0.0,multifunction=on,bus=root.1 -device vfio-pci,host=06:00.1,addr=0.1,bus=root.1”

mostlyharmless · April 16, 2014, 11:02pm

Just a suggestion, as I see there’s been no activity on this for 2 months; a lot of people have the error message you have, depending on their exact hardware setup. I would recommend posting on the KVM VGA-Passthrough using the new vfio-vga support in kernel =>3.9 / Kernel & Hardware / Arch Linux Forums thread to help resolve it in order to move forward. There is a lot of activity on that thread and the instructions are updated and kept current on page 1. There is a hardware compatibility list posted on it as well, later in the thread in the 50-60 page area I believe.

Kinslayer · October 13, 2014, 11:46pm

sandipt:

we followed the procedure as per KVM VGA-Passthrough using the new vfio-vga support in kernel =>3.9 / Kernel & Hardware / Arch Linux Forums link.

While launching the VM with AMD GPU pass-through we are seeing below error message:

qemu-system-x86_64: -device vfio-pci,host=06:00.0,x-vga=on,addr=0.0,multifunction=on,bus=root.1: vfio: error no iommu_group for device
qemu-system-x86_64: -device vfio-pci,host=06:00.0,x-vga=on,addr=0.0,multifunction=on,bus=root.1: Device initialization failed.
qemu-system-x86_64: -device vfio-pci,host=06:00.0,x-vga=on,addr=0.0,multifunction=on,bus=root.1: Device ‘vfio-pci’ could not be initialized

Comment which we used, “/usr/bin/qemu-system-x86_64 -M q35 -enable-kvm -vga none -nographic -bios /usr/share/qemu/bios.bin -cpu host -device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 -device vfio-pci,host=06:00.0,x-vga=on,addr=0.0,multifunction=on,bus=root.1 -device vfio-pci,host=06:00.1,addr=0.1,bus=root.1”

I was getting this error when I had forgotten to boot with intel_iommu=on in my kernel parameters.

ILMostro · January 5, 2015, 10:11pm

So…any updates/news/announcements regarding this development?

As @Kinslayer already mentioned, booting a system with Intel VT-d capabilities the

intel_iommu=on

kernel parameter is required to enable the PCI device assignment (passthrough) to KVM.

On AMD systems, the kernel parameter is

amd_iommu=on

.

After you’ve got those preliminary steps completed, as indicated by a successful system-boot, proceed with your previously-outlined steps as per KVM VGA-Passthrough using the new vfio-vga support in kernel =>3.9 / Kernel & Hardware / Arch Linux Forums link.

riczxc · March 20, 2016, 1:26pm

I’ve been using the patch for at least two years and so far everthing’s alright.

I wonder if you might consider fixing this for the mainline driver?

Topic		Replies	Views
GTX 1080 & KVM PCI passthrough to guest CUDA Setup and Installation	12	17442	February 23, 2017
VGA-passthrough on KVM Linux	2	8607	August 16, 2014
Trying to get discrete laptop GPU running in QEMU KVM Windows Linux	20	3946	February 12, 2023
nvidia gtx1060 kvm passthrough Linux	6	4878	February 12, 2018
Broken GPU state query failure in AMD + H100 Confidential Computing	10	955	February 15, 2024
RTX 3060 PCI passthrough to guest under KVM（qemu） Linux	18	3453	July 24, 2023
This PCI I/O region assigned to your NVIDIA device is invalid: Linux cuda	5	5612	October 12, 2021
[370.28] with kernel [4.8] on >=2015 machines: driver claims card not supported if nvidia is not primary card Linux	37	21408	September 26, 2017
Nvidia graphic card detected ; modprobe nvidia failed but nvidia driver perfectly installed. what's wrong? Linux	0	1572	November 2, 2020
PCI passthrough KVM for CUDA usage CUDA Setup and Installation	6	6632	April 5, 2016

VFIO VGA arbitration lock

Related topics