Error when allocating multiple vGPUs in a single VM with Ubuntu KVM hypervisor

boo3821 · December 15, 2021, 5:18am

Although the vGPU document for Ubuntu KVM explains that multiple vGPUs are supported (within a limited situation), starting a VM with multiple vGPUs (have tried two and three vGPUs) returns an error.
Here is the link of the document: https://docs.nvidia.com/grid/latest/grid-vgpu-release-notes-ubuntu/index.html#multiple-vgpu-support
I have succeeded to start the VM with a single vGPU, and the nvidia-smi command in the VM shows the vGPU fine.
However, both time-shared vGPUs and MIG-enabled vGPUs fail to start the VM with multiple vGPUs.

I am using Ubuntu 18.04.5 with kernel version 5.4.0-90-generic, and using the 13.x vGPU software.
nvidia-smi shows that the driver version is 470.82, so I think it is vGPU software v13.1.
I am using the Ubuntu KVM + QEMU v4.0 for the hypervisor, and also using Ubuntu 18.04 VM images.
Also, I am using the A100-pcie-40GB GPU.

Here are some error messages when I run # virsh create vm.xml

error: Failed to create domain from vm.xml
error: internal error: qemu unexpectedly closed the monitor: 2021-12-15T02:29:57.628152Z qemu-system-x86_64: -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/10106e78-7703-4099-ac5d-ac49ebdd2cbc,bus=pci.0,addr=0x9: warning: vfio 10106e78-7703-4099-ac5d-ac49ebdd2cbc: Could not enable error recovery for the device
2021-12-15T02:29:57.661640Z qemu-system-x86_64: -device vfio-pci,id=hostdev1,sysfsdev=/sys/bus/mdev/devices/f3751074-bca9-4602-b621-7285cf5d5b2f,bus=pci.0,addr=0xa: vfio f3751074-bca9-4602-b621-7285cf5d5b2f: error getting device from group 182: Input/output error
Verify all devices in group 182 are bound to vfio-<bus> or pci-stub and not already in use

Also, my journalctl log shows that the first vGPU was initialized, but the second vGPU incurs an error,
which says that: multiple vGPUs in a VM not supported.

Dec 15 11:30:56 mango1 nvidia-vgpu-mgr[7293]: error: vmiop_log: (0x1): init_device_instance failed for inst 1 with error 1 (multiple vGPUs in a VM not supported)
Dec 15 11:30:56 mango1 nvidia-vgpu-mgr[7293]: error: vmiop_log: (0x1): Initialization: init_device_instance failed error 1
Dec 15 11:30:56 mango1 nvidia-vgpu-mgr[7293]: error: vmiop_log: display_init failed for inst: 1
Dec 15 11:30:56 mango1 nvidia-vgpu-mgr[7293]: error: vmiop_env_log: (0x1): vmiope_process_configuration failed with 0x1f
Dec 15 11:30:56 mango1 nvidia-vgpu-mgr[7293]: error: vmiop_env_log: (0x1): plugin_initialize failed  with error:0x1f
Dec 15 11:31:01 mango1 nvidia-vgpu-mgr[7293]: notice: vmiop_log: (0x0): Srubbing completed but notification missed

The vGPU documentation also provides the steps to utilize multiple vGPUs in a single KVM-backed VM.
(Link: https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#adding-vgpu-to-red-hat-el-kvm-vm)

How can I use multiple vGPUs in a single VM backed with Ubuntu KVM, as the document suggests?

Any help would be very grateful, and thanks in advance.

sschaber · December 15, 2021, 6:51am

Hi,
I don’t have own experience with this setup but I found a similiar issue with Ubuntu as guest where changing from BIOS to UEFI did the trick. Maybe worth a try?

regards Simon

Benutzer4796 · January 14, 2022, 7:06pm

Hi,
I have exactly the same problem, but I am using Red Hat Linux 8.4 with KVM. Switching the boot option on the guest from BIOS to UEFI did not work. I hope someone can provide a workaround for this. We need more than one vGPU on the guest VM.

Regards
Mevludin

Topic		Replies	Views
Failed to attach multiple vGPU to a VM General Discussion	4	2416	July 14, 2022
Trouble assigning vGPU General Discussion	3	7499	February 20, 2019
VFIO VGA arbitration lock Linux	13	16190	March 20, 2016
AMD SEV-SNP + H100 hangs or fails during CVM launch Confidential Computing	9	645	January 17, 2024
Problem configuring vGPU access using Kubevirt General Discussion	0	721	May 14, 2023
vGPU timeout status 0x65, VFIO error, QEMU/KVM RHEL7.6 Tesla Boards	6	7643	February 27, 2019
GTX 1080 & KVM PCI passthrough to guest CUDA Setup and Installation	12	17483	February 23, 2017
VGPU migration under VFIO. nvidia-vgpu-mgr: Error saving page in pipelined mode on 550.90.05 driver (Debian12,libvirt 10.5.0 qemu 9.1.1) General Discussion	2	124	December 14, 2024
Error - VM with vGPU on Redhat Linux 7.5 General Discussion	0	1892	October 10, 2018
vGPU support ? General Discussion	7	19379	July 23, 2015

Error when allocating multiple vGPUs in a single VM with Ubuntu KVM hypervisor

Related topics