I installed the nvidia drivers on RHEL8.8 by using this procedure 1. Introduction — Installation Guide for Linux 12.3 documentation. After this installation, doing nvidia-smi I got the error.
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
This is my config:
lspci | grep -i nvidia
02:00.0 VGA compatible controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB] (rev a1)
02:01.0 VGA compatible controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB] (rev a1)
grep nvidia /etc/modprobe.d/* /lib/modprobe.d/*
/lib/modprobe.d/dist-blacklist.conf:blacklist nvidiafb
/lib/modprobe.d/nvidia.conf:# Make a soft dependency for nvidia-uvm as adding the module loading to
/lib/modprobe.d/nvidia.conf:# /usr/lib/modules-load.d/nvidia-uvm.conf for systemd consumption, makes the
/lib/modprobe.d/nvidia.conf:softdep nvidia post: nvidia-uvm
/lib/modprobe.d/nvidia.conf:options nvidia NVreg_DynamicPowerManagement=0x02
/lib/modprobe.d/nvidia.conf:# options nvidia-drm mod
dkms status
nvidia/545.23.06: added
sudo dkms install nvidia/545.23.06
Error! Your kernel headers for kernel 4.18.0-477.27.1.el8_8.x86_64 cannot be found at /lib/modules/4.18.0-477.27.1.el8_8.x86_64/build or /lib/modules/4.18.0-477.27.1.el8_8.x86_64/source.
Please install the linux-headers-4.18.0-477.27.1.el8_8.x86_64 package or use the --kernelsourcedir option to tell DKMS where it's located.
Repository epel is listed more than once in the configuration
Last metadata expiration check: 2:24:12 ago on Tue 14 Nov 2023 12:02:20 PM CET.
No match for argument: kernel-devel-4.18.0-477.27.1.el8_8.x86_64
No match for argument: kernel-headers-4.18.0-477.27.1.el8_8.x86_64
Error: Unable to find a match: kernel-devel-4.18.0-477.27.1.el8_8.x86_64 kernel-headers-4.18.0-477.27.1.el8_8.x86_64
Then there’s something wrong with your RHEL repos, 4.18.0-477.27.1 should be the latest kernel for rhel 8.8 so the -headers and -devel packages should be available. The initial 4.18.0-477.10.1 kernel is complete, though.
but dkms command still not working. the error is different
dkms install nvidia/545.23.06
Sign command: /lib/modules/4.18.0-477.27.1.el8_8.x86_64/build/scripts/sign-file
Signing key: /var/lib/dkms/mok.key
Public certificate (MOK): /var/lib/dkms/mok.pub
Error! Could not find module source directory.
Directory: /usr/src/nvidia-545.23.06 does not exist.
Odd. Please try reinstalling the driver sudo dnf module reinstall nvidia-driver:latest-dkms
Post any errors, afterwards the output of
dkms status
ls -l /usr/src
dnf module install nvidia-driver:latest-dkms
Updating Subscription Management repositories.
Unable to read consumer identity
This system is not registered with an entitlement server. You can use subscription-manager to register.
Repository epel is listed more than once in the configuration
Last metadata expiration check: 0:10:11 ago on Wed 15 Nov 2023 10:14:28 AM CET.
NOTE: Skipping kernel installation since no kernel module package kmod-nvidia-545.23.06-4.18.0-513.5.1 for kernel version 4.18.0-513.5.1.el8_9 and NVIDIA driver 545.23.06 could be found
Error:
Problem: problem with installed package kmod-nvidia-545.23.06-4.18.0-477.27.1-3:545.23.06-3.el8_8.x86_64
- package kmod-nvidia-545.23.06-4.18.0-477.27.1-3:545.23.06-3.el8_8.x86_64 conflicts with kmod-nvidia-latest-dkms provided by kmod-nvidia-latest-dkms-3:545.23.06-1.el8.x86_64
- conflicting requests
(try to add '--allowerasing' to command line to replace conflicting packages or '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
i add --allowerasing after i have this
dkms install nvidia/545.23.06
Module nvidia/545.23.06 already installed on kernel 4.18.0-477.27.1.el8_8.x86_64 (x86_64), skip. You may override by specifying --force.
ll /usr/src/
total 4
drwxr-xr-x 2 root root 35 Nov 9 09:52 annobin
drwxr-xr-x. 2 root root 6 Jun 21 2021 debug
drwxr-xr-x. 4 root root 78 Nov 15 08:07 kernels
drwxr-xr-x 8 root root 4096 Nov 15 10:25 nvidia-545.23.06
Incompatible driver. Seems you’re inside a VM on a vGPU system. Please use the GRID driver for your vGPU version instead of the normal nvidia driver. Please uninstall any nvidia packages first.