510.47.03, RHEL8.5/KVM, RTXA5000: [nvidia-vgpu-vfio] Unable to get symbol for nvidia_vgpu_vfio_get_ops from nvidia.ko

I have an RTX A5000, RHEL 8.5 with KVM, and I’m unable to get the nvidia-vgpu-vfio driver to load. I’m using the NVIDIA-GRID-RHEL-8.5-510.47.03-511.65 driver package.

Nvidia driver seems to install ok:

$ nvidia-smi
Fri Jun 10 10:04:35 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A5000    On   | 00000000:41:00.0 Off |                    0 |
| 30%   28C    P8     4W / 230W |      4MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2483      G   /usr/libexec/Xorg                   4MiB |
+-----------------------------------------------------------------------------+

After installing the vgpu manager (NVIDIA-vGPU-rhel-8.5-510.47.03.x86_64.rpm) and restarting, the nvidia-vgpu-vfio driver is not loaded, and I get this:

$ dmesg | grep nvidia
[    2.654630] nvidia: loading out-of-tree module taints kernel.
[    2.654641] nvidia: module license 'NVIDIA' taints kernel.
[    2.664201] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    2.674685] nvidia-nvlink: Nvlink Core is being initialized, major device number 239
[    2.675826] nvidia 0000:41:00.0: enabling device (0000 -> 0002)
[    2.755637] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  510.47.03  Mon Jan 24 22:51:43 UTC 2022
[    2.759610] [drm] [nvidia-drm] [GPU ID 0x00004100] Loading driver
[    2.759612] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:41:00.0 on minor 1
[    3.778830] [nvidia-vgpu-vfio] Unable to get symbol for nvidia_vgpu_vfio_get_ops from nvidia.ko
[    5.598480] NVRM: Persistence mode is deprecated and will be removed in a future release. Please use nvidia-persistenced instead.

Note in particular the line: [nvidia-vgpu-vfio] Unable to get symbol for nvidia_vgpu_vfio_get_ops from nvidia.ko.

Some more details for context:

$ uname -r
4.18.0-348.2.1.el8_5.x86_64
$ modinfo nvidia | grep vermagic
vermagic:       4.18.0-348.el8.x86_64 SMP mod_unload modversions
$ modinfo nvidia-vgpu-vfio | grep vermagic
vermagic:       4.18.0-348.el8.x86_64 SMP mod_unload modversions
$ lsmod | grep 'nvidia\|vfio'
vfio_mdev              16384  0
mdev                   20480  1 vfio_mdev
vfio_iommu_type1       36864  0
vfio                   36864  2 vfio_mdev,vfio_iommu_type1
nvidia_drm             69632  2
nvidia_modeset       1159168  2 nvidia_drm
nvidia              39055360  116 nvidia_modeset
drm_kms_helper        253952  4 drm_vram_helper,ast,nvidia_drm
drm                   573440  11 drm_kms_helper,drm_vram_helper,ast,nvidia,drm_ttm_helper,nvidia_drm,ttm

Could someone help me out on this? I’m at a loss…

Thanks!