I have a RHEL 8.7 physical server with a Tesla P40 installed.
It shows up in lspci output:
…
04:00.0 3D controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1)
…
However, after installing the downloaded drivers from the licensing portal I can see that:
$ nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
and that:
$ dmesg | grep vfio
[ 3.464768] [nvidia-vgpu-vfio] Unable to get symbol for nvidia_vgpu_vfio_get_ops from nvidia.ko
[ 13.558106] [nvidia-vgpu-vfio] Unable to get symbol for nvidia_vgpu_vfio_get_ops from nvidia.ko
I have blacklisted nouveau as suggested in RedHat documentation, but the nvidia_vgpu_vfio driver module will not load.
How can troubleshoot to find the reason why it will not load? What is the correct procedure for installation? It seems like I’ve missed a step somewhere. My server supports SR-IOV and that is enabled in the BIOS. There are no other NVIDIA cards in the system–just the onboard graphics.
I used dnf to install the provided rpm from the download. I’m not sure what else to do with it. It seems others that have had this issue didn’t have nouveau blacklisted as I do.
$ cat /etc/modprobe.d/nvidia-installer-disable-nouveau.conf
blacklist nouveau
options nouveau modeset=0
and
$ lsmod | grep nouveau
$ – no output –
But I do see it in lspci -k
$ lspci -k | grep nvidia
Kernel modules: nouveau, nvidia_vgpu_vfio, nvidia
How is this driver supposed to work?
This is the file I downloaded
NVIDIA-GRID-RHEL-8.7-525.85.07-525.85.05-528.24.zip