Unable to install 3D driver on VM on RedHat 7.9 OS version

Hi All,

am trying to install 3D driver(390.115.) on VM on RedHat 7.9 but am unable to complete the installation process due to drm issue

PCI device had added to the vm
02:01.0 VGA compatible controller: NVIDIA Corporation GP104GL [Tesla P6]

Here am attaching the logs information

tmp/selfgz6915/NVIDIA-Linux-x86_64-390.115-grid/kernel/nvidia-drm/nvidia-drm-helper.c:152:17: error: implicit declaration of function ‘drm_framebuffer_reference’ [-Werror=implicit-function-declaration]
/tmp/selfgz6915/NVIDIA-Linux-x86_64-390.115-grid/kernel/nvidia-drm/nvidia-drm-helper.c:157:17: error: implicit declaration of function ‘drm_framebuffer_unreference’ [-Werror=implicit-function-declaration]
cc1: some warnings being treated as errors
make[3]: *** [/tmp/selfgz6915/NVIDIA-Linux-x86_64-390.115-grid/kernel/nvidia-drm/nvidia-drm-helper.o] Error 1
/tmp/selfgz6915/NVIDIA-Linux-x86_64-390.115-grid/kernel/nvidia-drm/nvidia-drm-gem.h:82:5: error: implicit declaration of function ‘drm_gem_object_unreference_unlocked’ [-Werror=implicit-function-declaration]
/tmp/selfgz6915/NVIDIA-Linux-x86_64-390.115-grid/kernel/nvidia-drm/nvidia-drm-gem.h:82:5: error: implicit declaration of function ‘drm_gem_object_unreference_unlocked’ [-Werror=implicit-function-declaration]
/tmp/selfgz6915/NVIDIA-Linux-x86_64-390.115-grid/kernel/nvidia-drm/nvidia-drm-gem.h:162:5: error: implicit declaration of function ‘drm_gem_object_unreference’ [-Werror=implicit-function-declaration]
/tmp/selfgz6915/NVIDIA-Linux-x86_64-390.115-grid/kernel/nvidia-drm/nvidia-drm-gem.h:162:5: error: implicit declaration of function ‘drm_gem_object_unreference’ [-Werror=implicit-function-declaration]

The driver is too old for the running kernel, please try with a more recent/latest version.

Thanks for the replay even I tried with NVIDIA-Linux-x86_64-410.104.run am getting the same error.

CONFTEST: pci_dma_mapping_error
ERROR: Unable to load the kernel module ‘nvidia.ko’. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA GPU(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release.

Both drivers you tried are from 02/2019, really old. Please try a recent one like:
https://http.download.nvidia.com/XFree86/Linux-x86_64/470.74/

getting the same error

→ Skipping GLVND file: “libEGL.so”
Will install libEGL vendor library config file to /usr/share/glvnd/egl_vendor.d
→ Searching for conflicting files:
→ done.
→ Installing ‘NVIDIA Accelerated Graphics Driver for Linux-x86_64’ (470.74):
executing: ‘/usr/sbin/ldconfig’…
executing: ‘/usr/bin/systemctl daemon-reload’…
→ done.
→ Driver file installation is complete.
→ Installing DKMS kernel module:
→ done.
ERROR: Unable to load the ‘nvidia-drm’ kernel module.

Please attach the installer log.

nv.txt (2.2 KB)

adding the installer logs

You chose to use DKMS to compile the driver, do you have dkms installed at all? Please post the output of
dkms status

dkms status
nvidia, 470.57.02, 3.10.0-1160.36.2.el7.x86_64, x86_64: built
nvidia, 470.74, 3.10.0-1160.36.2.el7.x86_64, x86_64: installed

even without dkms option getting the same result

The driver is properly installed. If it doesn’t load, there’s already another driver loaded or secure boot enabled, did you blacklist nouveau? Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.

nvidia-bug-report.log.gz (64.8 KB)

cat /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0

There’s also an incompatible 418 kernel driver installed which is blocking the newer drivers from loading. Please check if it has been installed using rpm packages and remove it.

rpm -qa | grep -i nvidia
nvidia-modprobe-latest-dkms-470.57.02-1.el7.x86_64
nvidia-xconfig-latest-dkms-470.57.02-1.el7.x86_64
nvidia-driver-latest-dkms-libs-470.57.02-1.el7.x86_64
nvidia-driver-latest-dkms-devel-470.57.02-1.el7.x86_64
yum-plugin-nvidia-0.5-1.el7.noarch
nvidia-driver-latest-dkms-cuda-470.57.02-1.el7.x86_64
nvidia-driver-latest-dkms-NVML-470.57.02-1.el7.x86_64
nvidia-driver-latest-dkms-NvFBCOpenGL-470.57.02-1.el7.x86_64
kmod-nvidia-latest-dkms-470.57.02-1.el7.x86_64
nvidia-persistenced-latest-dkms-470.57.02-1.el7.x86_64
nvidia-driver-latest-dkms-cuda-libs-470.57.02-1.el7.x86_64
nvidia-driver-latest-dkms-470.57.02-1.el7.x86_64

removed incompatible 418 kernel after also getting the
same error

Did you also clean it from the initrd (dracut -f) and reboot? If so, please attach a new nvidia-bug-report.log.

yes performed (dracut -f) and reboot
nvidia-bug-report.log.gz (74.4 KB)

The 470 driver is now active, but

NVRM: installed in this system is not supported by the
NVRM: NVIDIA 470.74 driver release.
NVRM: Please see ‘Appendix A - Supported NVIDIA GPU Products’
NVRM: in this release’s README, available on the operating system
NVRM: specific graphics driver download page at www.nvidia.com.

This doesn’t mean that your gpu is unsupported but you’re running an unsupported VM config. (unsupported hypervisor, consumer card, vm boot) Since VMWare + Tesla is supported, your VM and/or host bios is incorrectly set up, I suspect. Looking at the boot messages, you’re running a bios boot with 32bit resources, please see this on how to correctly set up a VM for passthrough:
https://blogs.vmware.com/apps/2018/09/using-gpus-with-virtual-machines-on-vsphere-part-2-vmdirectpath-i-o.html

verified the vm config and we are using shared pci. but am able to installing on rhel 7.6 but 7.9 am getting the same error

nvidia-smi
Tue Nov 9 08:34:23 2021
±----------------------------------------------------------------------------+
| NVIDIA-SMI 390.115 Driver Version: 390.115 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID P6-2B On | 00000000:02:01.0 Off | N/A |
| N/A N/A P8 N/A / N/A | 144MiB / 2048MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+
[root@awv344802 nxf43580]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.6 (Maipo)

I only now noticed that you’re running a vGPU setup (GRID P6-2B), please try using the grid driver.

could you please share the grid driver version which support rhel 7u9

Those should be available at the licensing portal:
https://www.nvidia.com/en-us/drivers/vgpu-software-driver/