[418.92] vGPU VFIO module not loading

I am setting up a KVM Hypervisor in evaluation mode to demonstrate multi-monitor vGPU with SPICE.
KVM is set-up and working perfectly.
nVidia License Manager is setup and working perfectly.

I installed the nVidia Linux Driver included in the vGPU download (for my RTX8000 card) and it seems to be working perfectly. Here is the output of ‘nvidia-smi’:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 418.92 Driver Version: 418.92 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro RTX 8000 On | 00000000:02:00.0 Off | Off |
| 33% 38C P8 29W / 260W | 133MiB / 48573MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2218 G /usr/libexec/Xorg 63MiB |
| 0 2335 G /usr/bin/gnome-shell 69MiB |
±----------------------------------------------------------------------------+

My problem is the nVidia vGPU modules (specifically the VFIO module) is not loaded.
Here is the problem from ‘dmesg | grep vfio’:
[ 5.433000] nvidia_vgpu_vfio: Unknown symbol nvidia_vgpu_vfio_get_ops (err 0)
[ 5.433109] nvidia_vgpu_vfio: Unknown symbol nvidia_vgpu_vfio_set_ops (err 0)

The issue is apparent at the install via RPM:
$ sudo rpm -Uhv NVIDIA-vGPU-rhel-8.0-418.92.x86_64.rpm
Verifying… ################################# [100%]
Preparing… ################################# [100%]
Updating / installing…
1:NVIDIA-vGPU-rhel-1:8.0-418.92 ################################# [100%]
chcon: can’t apply partial context to unlabeled file ‘/lib/modules/4.18.0-74.el8.x86_64/extra/nvidia/nvidia.ko’
chcon: can’t apply partial context to unlabeled file ‘/lib/modules/4.18.0-74.el8.x86_64/extra/nvidia/nvidia-vgpu-vfio.ko’
depmod: WARNING: /lib/modules/4.18.0-80.el8.x86_64/weak-updates/nvidia/nvidia-vgpu-vfio.ko needs unknown symbol nvidia_vgpu_vfio_get_ops
depmod: WARNING: /lib/modules/4.18.0-80.el8.x86_64/weak-updates/nvidia/nvidia-vgpu-vfio.ko needs unknown symbol nvidia_vgpu_vfio_set_ops

Things I’ve tried:

  • The latest nVidia GRID driver set [430.27/430.30/431.02]
  • depmod -a before and after install
  • dracut --force before and after
  • Install graphics driver in ‘multi-user.target’ and ‘graphics.target’

Not sure what I’m missing. Anybody have any ideas?
I’ve attached the nvidia-bug-report.log.gz file
nvidia-bug-report.log.gz (1.03 MB)

1 Like

Hi,

Have you managed to resolve this issue? I am having the same problem when installing the driver.

Hi, I am having the same problem when installing the driver with Quadro RTX6000 and RHEL 8.2
Have you managed to resolve this issue?

Seeing a very similar thing with the NVIDIA-GRID-RHEL-8.5-510.47.03-511.65 driver package.
nvidia-vgpu-vfio doesn’t load, and dmesg | grep vfio shows:

[    3.778830] [nvidia-vgpu-vfio] Unable to get symbol for nvidia_vgpu_vfio_get_ops from nvidia.ko

The versions of the drivers are:

$ modinfo nvidia-vgpu-vfio | grep vermagic
vermagic:       4.18.0-348.el8.x86_64 SMP mod_unload modversions
$ modinfo nvidia | grep vermagic
vermagic:       4.18.0-348.el8.x86_64 SMP mod_unload modversions