vGPU vfio issues.

Kernel: RHEL 7.5 3.10.0-862.el7.x86_64
GPU Tesla M60
Drivers -390.57
Trying to install vgpu software to use with RHEL KVM.
Recently did a full update to 7.5. I’ve installed the gpu drivers and the license manager as well as a license.
For some reason, when using “rpm -ivh NVIDIA-vGPU-rhel-7.5-390.57.x86_64.rpm” the kernel module gets installed to “/usr/lib/modules/3.10.0-851.el7.x86_64/extra/nvidia/nvidia-vgpu-vfio.ko”

and I cannot load the module using modprobe.

I created a symlink to try and load the module and this did not work either.

What steps am I doing wrong, if I followed the manual? Please assist. Priority for this is a bit high as I’m trying to perform this for a customer. (support sent me here)

Regards,

Chris Ramirez

Hi,
Do you resolve this?
I get the same issues!

No,

I’m still waiting for a response.

THis is a problem with NVIDIA not using DKMS when they developed the RPM - basically it created a hard dependency on the kernel that they were using on their development machine at the time they created the RPM.

Instead they should have allowed it to install to which ever kernel was current or soft-linked it somehow.

To ge this to work you need to install/downgrade to the kernel version that matches the one they had when they built the RPM - which is next to impossible now unless you downloaded those RPMs already (kernel-devel and kernel-headers should also be downloaded and installed).

Alternatively I recommend that you look at the newer implementation they used in vGPU version 7.x where they are using a better, more dynamic approach with the shell scripts (.run).

They still screw this up though (the .run file is referencing missing a header that is required to build DKMS kernel) so you might need to reinstall after every kernel upgrade but it doesn’t stick you with an old, defunct kernel. You’ll be prompted for DKMS but on a clean install this will still fail - atleast on CentOS 7.6 (which maps closely to RHEL of the same version).

I hope this helps.