I have an RTX A5000, RHEL 8.5 with KVM, and I’m unable to get the nvidia-vgpu-vfio driver to load. I’m using the NVIDIA-GRID-RHEL-8.5-510.47.03-511.65
driver package.
Nvidia driver seems to install ok:
$ nvidia-smi
Fri Jun 10 10:04:35 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A5000 On | 00000000:41:00.0 Off | 0 |
| 30% 28C P8 4W / 230W | 4MiB / 23028MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2483 G /usr/libexec/Xorg 4MiB |
+-----------------------------------------------------------------------------+
After installing the vgpu manager (NVIDIA-vGPU-rhel-8.5-510.47.03.x86_64.rpm
) and restarting, the nvidia-vgpu-vfio driver is not loaded, and I get this:
$ dmesg | grep nvidia
[ 2.654630] nvidia: loading out-of-tree module taints kernel.
[ 2.654641] nvidia: module license 'NVIDIA' taints kernel.
[ 2.664201] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 2.674685] nvidia-nvlink: Nvlink Core is being initialized, major device number 239
[ 2.675826] nvidia 0000:41:00.0: enabling device (0000 -> 0002)
[ 2.755637] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 510.47.03 Mon Jan 24 22:51:43 UTC 2022
[ 2.759610] [drm] [nvidia-drm] [GPU ID 0x00004100] Loading driver
[ 2.759612] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:41:00.0 on minor 1
[ 3.778830] [nvidia-vgpu-vfio] Unable to get symbol for nvidia_vgpu_vfio_get_ops from nvidia.ko
[ 5.598480] NVRM: Persistence mode is deprecated and will be removed in a future release. Please use nvidia-persistenced instead.
Note in particular the line: [nvidia-vgpu-vfio] Unable to get symbol for nvidia_vgpu_vfio_get_ops from nvidia.ko
.
Some more details for context:
$ uname -r
4.18.0-348.2.1.el8_5.x86_64
$ modinfo nvidia | grep vermagic
vermagic: 4.18.0-348.el8.x86_64 SMP mod_unload modversions
$ modinfo nvidia-vgpu-vfio | grep vermagic
vermagic: 4.18.0-348.el8.x86_64 SMP mod_unload modversions
$ lsmod | grep 'nvidia\|vfio'
vfio_mdev 16384 0
mdev 20480 1 vfio_mdev
vfio_iommu_type1 36864 0
vfio 36864 2 vfio_mdev,vfio_iommu_type1
nvidia_drm 69632 2
nvidia_modeset 1159168 2 nvidia_drm
nvidia 39055360 116 nvidia_modeset
drm_kms_helper 253952 4 drm_vram_helper,ast,nvidia_drm
drm 573440 11 drm_kms_helper,drm_vram_helper,ast,nvidia,drm_ttm_helper,nvidia_drm,ttm
Could someone help me out on this? I’m at a loss…
Thanks!