Streamlining NVIDIA Driver Deployment on RHEL 8 with Modularity Streams

Hi @routenull
Okay I see you have the latest (precompiled) stream enabled, so I won’t ask about kernel-headers and kernel-devel packages. Also that should mean you have installed/upgraded to the 510.47.03 driver.

Please check a few things

  • Was this a fresh install or an upgrade?

    • Only installed RPMs or has a NVIDIA .run installer been used in the past?
  • Did you reboot before / after installation? (uptime)

  • Running kernel version?
    uname -a

    • Did you install a kernel update recently? Maybe around the same time as installing the NVIDIA driver?
      There was a new RHEL kernel (4.18.0-348.20.1) released on March 10th. [1]
  • Which packages are installed? I’m most interested in the kmod package(s).
    rpm -qa | grep nvidia | sort

  • Were the .ko files generated?
    ls -1 /usr/lib/modules/$(uname -r)/extra/nvidia*

  • Does this entry exist in /proc and does the driver version match?
    cat /proc/driver/nvidia/version

EDIT: one more thing

  • Do you see anything interesting in dmesg ?

@kmittman

Thanks for the quick reply!

Was this a fresh install or an upgrade?
    This was a fresh RHEL8.5 install.

Did you reboot before / after installation? (uptime)
    I did reboot.

Running kernel version?
uname -a
    Did you install a kernel update recently? Maybe around the same time as installing the NVIDIA driver?
    There was a new RHEL kernel (4.18.0-348.20.1) released on March 10th. [1]

    uname -a
    Linux chronos01 4.18.0-348.20.1.el8_5.x86_64 #1 SMP Tue Mar 8 12:56:54 EST 2022 x86_64 x86_64 x86_64 GNU/Linux

Which packages are installed? I’m most interested in the kmod package(s).
rpm -qa | grep nvidia | sort

rpm -qa | grep nvidia | sort
dnf-plugin-nvidia-2.0-1.el8.noarch
kmod-nvidia-510.47.03-4.18.0-348.20.1-510.47.03-3.el8_5.x86_64
nvidia-driver-510.47.03-1.el8.x86_64
nvidia-driver-cuda-510.47.03-1.el8.x86_64
nvidia-driver-cuda-libs-510.47.03-1.el8.x86_64
nvidia-driver-devel-510.47.03-1.el8.x86_64
nvidia-driver-libs-510.47.03-1.el8.x86_64
nvidia-driver-NvFBCOpenGL-510.47.03-1.el8.x86_64
nvidia-driver-NVML-510.47.03-1.el8.x86_64
nvidia-kmod-common-510.47.03-1.el8.noarch
nvidia-libXNVCtrl-510.47.03-1.el8.x86_64
nvidia-libXNVCtrl-devel-510.47.03-1.el8.x86_64
nvidia-modprobe-510.47.03-1.el8.x86_64
nvidia-persistenced-510.47.03-1.el8.x86_64
nvidia-settings-510.47.03-1.el8.x86_64
nvidia-xconfig-510.47.03-1.el8.x86_64

Were the .ko files generated?
ls -1 /usr/lib/modules/$(uname -r)/extra/nvidia*

ls -1 /usr/lib/modules/$(uname -r)/extra/nvidia*
ls: cannot access '/usr/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/nvidia*': No such file or directory

Does this entry exist in /proc and does the driver version match?
cat /proc/driver/nvidia/version

cat /proc/driver/nvidia/version
cat: /proc/driver/nvidia/version: No such file or directory

It appears the driver didn’t get installed?

@kmittman

Doing some further testing on another fresh install of RHEL8, I found that the .ko files are at the following path:
ls -alh /usr/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/drivers/video/nvidia/
total 48M
drwxr-xr-x 2 root root 115 Mar 23 11:34 .
drwxr-xr-x. 3 root root 20 Mar 23 11:34 …
-rw-r–r-- 1 root root 239K Mar 23 11:34 nvidia-drm.ko
-rw-r–r-- 1 root root 44M Mar 23 11:34 nvidia.ko
-rw-r–r-- 1 root root 1.6M Mar 23 11:34 nvidia-modeset.ko
-rw-r–r-- 1 root root 115K Mar 23 11:34 nvidia-peermem.ko
-rw-r–r-- 1 root root 2.2M Mar 23 11:34 nvidia-uvm.ko

And there is no nvidia/ under /proc/driver/.

Okay, @routenull I finally got access to a RHEL8 machine with a Quadro P2000 GPU. The installation was successful on bare-metal. Is it possible that there is a configuration issue with the GPU pass-through to VM?

Anyway, I am providing step-by-step with output so you can follow along and let me know where there is divergence on your machine. Note: for precompiled, the following are optional: gcc, EPEL repo (dkms), kernel-devel and kernel-headers packages.

Pre-installation actions

$ lspci | grep -i nvidia
65:00.0 VGA compatible controller: NVIDIA Corporation GP106GL [Quadro P2000] (rev a1)
65:00.1 Audio device: NVIDIA Corporation GP106 High Definition Audio Controller (rev a1)
$ uname -m && cat /etc/*release
x86_64
NAME="Red Hat Enterprise Linux"
VERSION="8.5 (Ootpa)"
...
$ gcc --version
gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4)
...
$ uname -r
4.18.0-348.20.1.el8_5.x86_64

Verify matching versions

$ sudo dnf install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
$ rpm -qa | grep kernel | sort | grep $(uname -r)
kernel-4.18.0-348.20.1.el8_5.x86_64
kernel-core-4.18.0-348.20.1.el8_5.x86_64
kernel-devel-4.18.0-348.20.1.el8_5.x86_64
kernel-headers-4.18.0-348.20.1.el8_5.x86_64
kernel-modules-4.18.0-348.20.1.el8_5.x86_64
kernel-tools-4.18.0-348.20.1.el8_5.x86_64
kernel-tools-libs-4.18.0-348.20.1.el8_5.x86_64

CUDA Download Page instructions

$ sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
$ sudo dnf clean all
$ sudo dnf repolist
repo id                            repo name
cuda-rhel8-x86_64                  cuda-rhel8-x86_64
epel                               Extra Packages for Enterprise Linux 8 - x86_64
epel-modular                       Extra Packages for Enterprise Linux Modular 8 - x86_64
rhel-8-for-x86_64-appstream-rpms   Red Hat Enterprise Linux 8 for x86_64 - AppStream (RPMs)
rhel-8-for-x86_64-baseos-rpms      Red Hat Enterprise Linux 8 for x86_64 - BaseOS (RPMs)

Install the latest precompiled kernel module stream

$ sudo dnf module install nvidia-driver:latest
Updating Subscription Management repositories.
Last metadata expiration check: 0:00:24 ago on Thu 31 Mar 2022 02:15:56 PM PDT.
Dependencies resolved.
=====================================================================================================================================
 Package                                      Architecture  Version                    Repository                               Size
=====================================================================================================================================
Installing group/module packages:
 cuda-drivers                                 x86_64        510.47.03-1                cuda-rhel8-x86_64                       7.0 k
 nvidia-driver                                x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                        22 M
 nvidia-driver-NVML                           x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                       516 k
 nvidia-driver-NvFBCOpenGL                    x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                        52 k
 nvidia-driver-cuda                           x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                       591 k
 nvidia-driver-cuda-libs                      x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                        63 M
 nvidia-driver-devel                          x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                        12 k
 nvidia-driver-libs                           x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                       168 M
 nvidia-kmod-common                           noarch        3:510.47.03-1.el8          cuda-rhel8-x86_64                        12 k
 nvidia-libXNVCtrl                            x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                        25 k
 nvidia-libXNVCtrl-devel                      x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                        55 k
 nvidia-modprobe                              x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                        36 k
 nvidia-persistenced                          x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                        42 k
 nvidia-settings                              x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                       832 k
 nvidia-xconfig                               x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                       105 k
Installing dependencies:
 dnf-plugin-nvidia                            noarch        2.0-1.el8                  cuda-rhel8-x86_64                        12 k
 egl-wayland                                  x86_64        1.1.7-1.el8                rhel-8-for-x86_64-appstream-rpms         34 k
 kmod-nvidia-510.47.03-4.18.0-348.20.1        x86_64        3:510.47.03-3.el8_5        cuda-rhel8-x86_64                        29 M
 libX11-devel                                 x86_64        1.6.8-5.el8                rhel-8-for-x86_64-appstream-rpms        976 k
 libXau-devel                                 x86_64        1.0.9-3.el8                rhel-8-for-x86_64-appstream-rpms         21 k
 libglvnd-opengl                              x86_64        1:1.3.2-1.el8              rhel-8-for-x86_64-appstream-rpms         47 k
 libvdpau                                     x86_64        1.4-2.el8                  rhel-8-for-x86_64-appstream-rpms         41 k
 libxcb-devel                                 x86_64        1.13.1-1.el8               rhel-8-for-x86_64-appstream-rpms        1.1 M
 mesa-vulkan-drivers                          x86_64        21.1.5-1.el8               rhel-8-for-x86_64-appstream-rpms        6.1 M
 ocl-icd                                      x86_64        2.2.12-1.el8               rhel-8-for-x86_64-appstream-rpms         51 k
 opencl-filesystem                            noarch        1.0-6.el8                  rhel-8-for-x86_64-appstream-rpms        8.5 k
 vulkan-loader                                x86_64        1.2.198.0-2.el8_5          rhel-8-for-x86_64-appstream-rpms        123 k
 xorg-x11-proto-devel                         noarch        2020.1-3.el8               rhel-8-for-x86_64-appstream-rpms        280 k
Installing module profiles:
 nvidia-driver/default                                                                                                              
Enabling module streams:
 nvidia-driver                                              latest                                                                  

Transaction Summary
=====================================================================================================================================
Install  28 Packages

Total download size: 292 M
Installed size: 697 M
Is this ok [y/N]: y
$ rpm -qa | grep nvidia | sort
dnf-plugin-nvidia-2.0-1.el8.noarch
kmod-nvidia-510.47.03-4.18.0-348.20.1-510.47.03-3.el8_5.x86_64
nvidia-driver-510.47.03-1.el8.x86_64
nvidia-driver-cuda-510.47.03-1.el8.x86_64
nvidia-driver-cuda-libs-510.47.03-1.el8.x86_64
nvidia-driver-devel-510.47.03-1.el8.x86_64
nvidia-driver-libs-510.47.03-1.el8.x86_64
nvidia-driver-NvFBCOpenGL-510.47.03-1.el8.x86_64
nvidia-driver-NVML-510.47.03-1.el8.x86_64
nvidia-kmod-common-510.47.03-1.el8.noarch
nvidia-libXNVCtrl-510.47.03-1.el8.x86_64
nvidia-libXNVCtrl-devel-510.47.03-1.el8.x86_64
nvidia-modprobe-510.47.03-1.el8.x86_64
nvidia-persistenced-510.47.03-1.el8.x86_64
nvidia-settings-510.47.03-1.el8.x86_64
nvidia-xconfig-510.47.03-1.el8.x86_64
$ find /lib/modules -name "nvidia*ko*" | sort
/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/drivers/video/nvidia/nvidia-drm.ko
/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/drivers/video/nvidia/nvidia.ko
/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/drivers/video/nvidia/nvidia-modeset.ko
/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/drivers/video/nvidia/nvidia-peermem.ko
/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/drivers/video/nvidia/nvidia-uvm.ko
$ sudo reboot

After rebooting

$ lsmod | grep nvidia | sort
drm                   573440  12 drm_kms_helper,drm_vram_helper,ast,nvidia,drm_ttm_helper,nvidia_drm,ttm
drm_kms_helper        253952  5 drm_vram_helper,ast,nvidia_drm
nvidia              38502400  321 nvidia_uvm,nvidia_modeset
nvidia_drm             61440  3
nvidia_modeset       1118208  6 nvidia_drm
nvidia_uvm           1085440  0
$ nvidia-smi 
Thu Mar 31 14:24:17 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P2000        Off  | 00000000:65:00.0  On |                  N/A |
| 45%   33C    P8     5W /  75W |     65MiB /  5120MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2278      G   /usr/libexec/Xorg                  40MiB |
|    0   N/A  N/A      3393      G   /usr/bin/gnome-shell               22MiB |
+-----------------------------------------------------------------------------+
$ cat /proc/driver/nvidia/version 
NVRM version: NVIDIA UNIX x86_64 Kernel Module  510.47.03  Mon Jan 24 22:58:54 UTC 2022
GCC version:  gcc version 8.5.0 20210514 (Red Hat 8.5.0-4) (GCC) 
1 Like

@kmittman

Thank you for the detailed response and my apologies for the lengthy delay in responding, I hadn’t set the notifications correctly for this post.

Is it possible that there is a configuration issue with the GPU pass-through to VM?

It is entirely possible, right now I have the VMWare passthru configured as: DirectPath IO
https://i.imgur.com/QFNjGty.png

All my steps match your steps line by line, until after the reboot. There is no driver loaded, but the hardware is seen by the OS install.

[chronos01 ~] # lspci | grep -i nvidia
0b:00.0 VGA compatible controller: NVIDIA Corporation GP106GL [Quadro P2000] (rev a1)
0b:00.1 Audio device: NVIDIA Corporation GP106 High Definition Audio Controller (rev a1)

[chronos01 ~] # rpm -qa | grep nvidia | sort
dnf-plugin-nvidia-2.0-1.el8.noarch
kmod-nvidia-510.47.03-4.18.0-348.20.1-510.47.03-3.el8_5.x86_64
nvidia-driver-510.47.03-1.el8.x86_64
nvidia-driver-cuda-510.47.03-1.el8.x86_64
nvidia-driver-cuda-libs-510.47.03-1.el8.x86_64
nvidia-driver-devel-510.47.03-1.el8.x86_64
nvidia-driver-libs-510.47.03-1.el8.x86_64
nvidia-driver-NvFBCOpenGL-510.47.03-1.el8.x86_64
nvidia-driver-NVML-510.47.03-1.el8.x86_64
nvidia-kmod-common-510.47.03-1.el8.noarch
nvidia-libXNVCtrl-510.47.03-1.el8.x86_64
nvidia-libXNVCtrl-devel-510.47.03-1.el8.x86_64
nvidia-modprobe-510.47.03-1.el8.x86_64
nvidia-persistenced-510.47.03-1.el8.x86_64
nvidia-settings-510.47.03-1.el8.x86_64
nvidia-xconfig-510.47.03-1.el8.x86_64

[chronos01 ~] # find /lib/modules -name “nvidiako” | sort
/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/drivers/video/nvidia/nvidia-drm.ko
/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/drivers/video/nvidia/nvidia.ko
/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/drivers/video/nvidia/nvidia-modeset.ko
/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/drivers/video/nvidia/nvidia-peermem.ko
/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/drivers/video/nvidia/nvidia-uvm.ko

I will try to see if changing any pass-through options makes a difference and report back.

Does this work yet on RHEL 9, if not what is the best way to install nvidia on the new RHEL 9 release?

Hi @rafjaimes
Official NVIDIA driver RPM packages for RHEL9 (both DKMS and precompiled streams) are not yet available but will be coming soon. The latest CUDA 11.7.0 / 515 driver was released on May 11th; Red Hat released RHEL 9.0 on May 18th.

1 Like

Currently does RHEL 8 support using dual NVIDIA® RTX™ A4000, 16 GB GDDR6 with the drivers above?

Thanks!

Hi @rafjaimes
The first precompiled kmod package for RHEL9 is now available (NVIDIA driver 515.48.07 @ 5.14.0-70.13.1 kernel):
https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/precompiled/

sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
sudo dnf module install nvidia-driver:latest
1 Like

I’m currently experiencing issues with the module. Computer doesn’t boot after installing module, but I don’t have this issue when installing from driver downloaded from site. It doesn’t seem to pick up the RAID card after rebooting. I’m currently using RHEL8, Broadcom RAID card, and a RTX 4000.

Hi @stso
Can you please provide more information to help reproduce the issue

  • Exact CLI commands followed
  • Which modularity stream and profile is enabled
  • RHEL8 kernel version
  • NVIDIA driver version and packages installed
  • Please double-check the GPU SKU

Sorry for the late response.

Commands:

subscription-manager repos --enable=codeready-builder-for-rhel-8-x86_64-rpms
dnf config-manager --add-repo=https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
dnf module install nvidia-driver:latest

Modularity stream/Profile: latest/default
Kernel: 4.18.0-240.el8.x86_64
NVIDIA Driver: Latest(at the time) and 470.42.01 Same result for both
SKU: NVIDIA Corporation TU104GL [Quadro RTX 4000]

Last week (Oct 2022), Fedora announced that EPEL 8 Modularity will be discontinued. How will this affect this process to install NVIDIA drivers? Will it become obsolete?

Hi @rafjaimes
That deprecation should not affect NVIDIA driver installation on those distros

  • Precompiled streams have no dependency on the EPEL repositories
  • DKMS streams only have a dependency on the dkms package (EPEL8 “everything” repo)

The NVIDIA driver modularity YAML is hosted on the CUDA repository.

More information in the docs: CUDA Installation Guide for Linux

1 Like

nvidia rpms have been a gray area in the past for non-CUDA installations however is there any reason you can’t or shouldn’t use the CUDA repos for just the GPU display driver?

Hi @dereksybau8
What do you mean by “gray area” ? Installation via the package manager is the recommended method.
Or do you mean the “CUDA repo” versus 3rd party repositories (RPM Fusion, negativo17, ELRepo, etc.) ?

This blog post and forum thread is specifically about precompiled kmod RPM streams, which are aligned to official “stable” RHEL kernels, excluding: RHEL EUS kernels, RHCOS/OpenShift kernels. That also excludes RockyLinux and other RHEL-like distros, in which case DKMS streams are the supported installation method.

Yes, using the cuda repo vs third-party repos that are also prebuilt kmods is what I mean.

I remember asking this a while ago (2018?) when the cudo repos were relatively new and it was recommended to not use the cudo repo as a display driver and rather use the .run (or third party rpms). I believe the claim was cuda repos at the time were not advertised to be display drivers (even though it worked), version x.y.z of cuda repo vs x.y.z of the .run were not “the same”.

What’s the exact incompatibility when excluding non-RHEL EL distros (Rocky, Oracle, Alma)? If these are built the same as RHEL they should just work. I think the only issue is when RHEL releases are a little ahead of non-RHEL due to the updated releases haven’t been fully released due to build times/QA.

I’ve installed the nvidia-driver:latest-dkms for RHEL 9.1, but when I enter the command: ‘nvidia-smi’
I get the error:
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure the latest NVIDIA driver is installed and running.
If I perform ‘rpm -qa | grep nvidia-driver’ I see that nvidia-driver-530.30.02-1.el9.x86_64 is installed.
There are ko.xz files in the /usr/lib/modules/$(uname -r)/extra/ directory.
But there is no /proc/driver/nvidia directory.
How do I get to the next step and have the Nvidia driver ‘installed and running’?

What are the kernel package signing key’s dates?
NVIDIA2019-public_key.der

Has a similar key been created for RHEL 9?

(replying to thread)

@dereksybau8

What’s the exact incompatibility when excluding non-RHEL EL distros (Rocky, Oracle, Alma)?

I did some investigation:
https://github.com/NVIDIA/yum-packaging-precompiled-kmod/issues/43
Please direct any follow-up questions on Github.


@gerald.trummer.ctr
As per the blog post at the beginning of this thread, the precompiled streams are recommended to avoid such issues.

When using DKMS streams, it is important to verify that the kernel-devel and kernel-headers matching your running kernel are installed. In the case when the kernel is upgraded before or during the driver installation, without an explicit reboot then DKMS may build against the previous kernel, in which the modules do not match and cannot load.

Another thing to check is dmesg and that both the kernel string and driver string are the expected versions:

for ko in $(find /usr/lib -name "nvidia*ko*"); do echo "==> $ko"; xzcat "$ko" | strings | grep "^version="; done

@gsnead
It is the same X.509 certificate for RHEL8 and RHEL9 precompiled kmod packages, 2019 to present.
More info in this guide: https://github.com/NVIDIA/yum-packaging-precompiled-kmod/blob/main/UEFI.md