[418.92] vGPU VFIO module not loading

I am setting up a KVM Hypervisor in evaluation mode to demonstrate multi-monitor vGPU with SPICE.
KVM is set-up and working perfectly.
nVidia License Manager is setup and working perfectly.

I installed the nVidia Linux Driver included in the vGPU download (for my RTX8000 card) and it seems to be working perfectly. Here is the output of ‘nvidia-smi’:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 418.92 Driver Version: 418.92 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro RTX 8000 On | 00000000:02:00.0 Off | Off |
| 33% 38C P8 29W / 260W | 133MiB / 48573MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2218 G /usr/libexec/Xorg 63MiB |
| 0 2335 G /usr/bin/gnome-shell 69MiB |
±----------------------------------------------------------------------------+

My problem is the nVidia vGPU modules (specifically the VFIO module) is not loaded.
Here is the problem from ‘dmesg | grep vfio’:
[ 5.433000] nvidia_vgpu_vfio: Unknown symbol nvidia_vgpu_vfio_get_ops (err 0)
[ 5.433109] nvidia_vgpu_vfio: Unknown symbol nvidia_vgpu_vfio_set_ops (err 0)

The issue is apparent at the install via RPM:
$ sudo rpm -Uhv NVIDIA-vGPU-rhel-8.0-418.92.x86_64.rpm
Verifying… ################################# [100%]
Preparing… ################################# [100%]
Updating / installing…
1:NVIDIA-vGPU-rhel-1:8.0-418.92 ################################# [100%]
chcon: can’t apply partial context to unlabeled file ‘/lib/modules/4.18.0-74.el8.x86_64/extra/nvidia/nvidia.ko’
chcon: can’t apply partial context to unlabeled file ‘/lib/modules/4.18.0-74.el8.x86_64/extra/nvidia/nvidia-vgpu-vfio.ko’
depmod: WARNING: /lib/modules/4.18.0-80.el8.x86_64/weak-updates/nvidia/nvidia-vgpu-vfio.ko needs unknown symbol nvidia_vgpu_vfio_get_ops
depmod: WARNING: /lib/modules/4.18.0-80.el8.x86_64/weak-updates/nvidia/nvidia-vgpu-vfio.ko needs unknown symbol nvidia_vgpu_vfio_set_ops

Things I’ve tried:

  • The latest nVidia GRID driver set [430.27/430.30/431.02]
  • depmod -a before and after install
  • dracut --force before and after
  • Install graphics driver in ‘multi-user.target’ and ‘graphics.target’

Not sure what I’m missing. Anybody have any ideas?
I’ve attached the nvidia-bug-report.log.gz file

nvidia-bug-report.log.gz (1.03 MB)

Hello,

I have the same issue on Rehdat 7. When installing NVIDIA-vGPU-rhel-7.7-430.46.x86_64.rpm , I get this error:

depmod: WARNING: /lib/modules/3.10.0-1062.1.1.el7.x86_64/weak-updates/nvidia/nvidia-vgpu-vfio.ko needs unknown symbol nvidia_vgpu_vfio_get_ops
depmod: WARNING: /lib/modules/3.10.0-1062.1.1.el7.x86_64/weak-updates/nvidia/nvidia-vgpu-vfio.ko needs unknown symbol nvidia_vgpu_vfio_set_ops

Any hint?

Regards,
M.

Maybe this?:
[url]Oracle® Linux 7 Administrator's Guide - Oracle® Linux 7: Administrator's Guide