vGPU on CentOS 7.4 VM with RHEV 4.2

Hi,

I’m having trouble getting my Nvidia vGPU to work on a VM.

Setup: RHEV 4.2 on RHEL 7.5, Tesla M60 (switched to graphics mode). I’m using the NVIDIA-GRID-RHEL-7.5-410.92-410.91-412.16.zip package from Nvidia.

On the hypervisor, I’ve installed the NVIDIA-vGPU-rhel-7.5-410.91.x86_64 rpm. vfio kernel modules are loaded, nvidia-smi shows the card, and I can see all the vGPUs via vdsm-client

I’ve created a CentOS 7.4 VM and added a ‘B’ type vGPU instance in ‘custom properties’. I’ve configured gridd.conf to point to the license server and it reports picking up a license in /var/log/messages. I installed the driver via the .run file (NVIDIA-Linux-x86_64-410.92-grid.run). The nvidia kernel module is loaded, but so also is the ‘qxl’ paravirtual driver.

lspci reports:

00:02.0 VGA compatible controller: Red Hat, Inc. QXL paravirtual graphic card (rev 04)
00:07.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1)

The Xorg.0.log reports:

[ 1622.212] (–) PCI:*(0:0:2:0) 1b36:0100:1af4:1100 rev 4, Mem @ 0xf0000000/134217728, 0xfb000000/8388608, 0xfb870000/8192, I/O @ 0x0000c100/32, BIOS @ 0x???/65536
[ 1622.212] (–) PCI: (0:0:7:0) 10de:13f2:10de:1177 rev 161, Mem @ 0xfa000000/16777216, 0xd0000000/268435456, 0xf8000000/33554432, I/O @ 0x0000c000/128, BIOS @ 0x???/131072
[ 1622.212] (II) LoadModule: "glx"
[ 1622.212] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so
[ 1622.213] (II) Module glx: vendor="X.Org Foundation"
[ 1622.213] compiled for 1.19.3, module version = 1.0.0
[ 1622.213] ABI class: X.Org Server Extension, version 10.0
[ 1622.213] (II) LoadModule: "nvidia"
[ 1622.214] (II) Loading /usr/lib64/xorg/modules/drivers/nvidia_drv.so
[ 1622.214] (II) Module nvidia: vendor="NVIDIA Corporation"
[ 1622.214] compiled for 4.0.2, module version = 1.0.0
[ 1622.214] Module class: X.Org Video Driver
[ 1622.214] (II) NVIDIA dlloader X Driver 410.92 Thu Dec 20 04:48:17 CST 2018
[ 1622.214] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[ 1622.214] (++) using VT number 1
[ 1622.214] (EE) No devices detected.
[ 1622.214] (EE)
Fatal server error:
[ 1622.214] (EE) no screens found(EE)
[ 1622.214] (EE)

I’ve tried blacklisting the qxl module on the VM in case it is blocking the nvidia driver (though that didn’t work anyway). I suspect it is something on the hypervisor.

I’ve tried running nvidia-xconfig to generate a xorg.conf as well as a custom one specifying the BusID of the card, but neither works (same error).

Thanks in advance for any help.

Cam