Streamlining NVIDIA Driver Deployment on RHEL 8 with Modularity Streams

jwitsoe · October 9, 2020, 8:49pm

Originally published at: https://developer.nvidia.com/blog/streamlining-nvidia-driver-deployment-on-rhel-8-with-modularity-streams/

NVIDIA GPUs have become mainstream for accelerating a variety of workloads from machine learning, high-performance computing (HPC), content creation workflows, and data center applications. For these enterprise use cases, NVIDIA provides a software stack powered by the CUDA platform: drivers, CUDA-X acceleration libraries, CUDA-optimized applications, and frameworks. Deploying the NVIDIA driver is one of the…

kmittman · October 9, 2020, 9:22pm

Hi this is Kevin, hope you’ve enjoyed reading my blog post. Be sure to check out my presentations on this subject, at NVIDIA GTC Fall 2020 and Red Hat Summit 2020. On a related note, the yum-packaging-precompiled-kmod repository on GitHub, has a detailed README and pull requests are welcome. Shout out to our friends at Red Hat, that collaborated on this project. Finally, if you have any questions or comments, please let us know.

ilkka.tengvall_nv · January 11, 2021, 9:18am

Hi,

thanks for the guide. However, are you aware that these drivers have not worked since RHEL 8.3 came out, for few months? Any plans to fix the build?

kmittman · January 11, 2021, 6:21pm

Hi @ilkka.tengvall_nv could you expand on “drivers have not worked”? Please fill in the blanks:

NVIDIA driver version:
RHEL kernel version:
modularity stream:
modularity profile: <default, ks, fm>

I have confirmed that the precompiled driver packages are functional on RHEL 8.3, with two exceptions

418 driver (a fix is coming soon)
kernel 4.18.0-240.1.1 and newer (4.18.0-240 from the RHEL 8.3 ISO image was released on the same day and thus skipped over)

Also feel free to report such issues here: Issues · NVIDIA/yum-packaging-precompiled-kmod · GitHub

ilkka.tengvall_nv · January 11, 2021, 8:31pm

Thanks, I’m not at the computer now, but I have some older posts here describing the problem:

I’ve had troubles with it for long.

kmittman · March 6, 2021, 1:37am

By the way, I’ve added this page: https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/precompiled/ to the repository, that includes a table of available NVIDIA kernel module packages.

alvinting · June 28, 2021, 9:26pm

Hi Sir

We have problem having 2 Xdisplays on rhel8, centos or stream. yes we can create 2 X displays using nvidia-settngs but 2nd screen keep crashing using xfce, gnome, kde.

example: setenv DISPLAY :0.0 and DISPLAY :0.1

2nd monitor is plain black and unable to do right click or anything

previously on rhel7, we can use 2 Xdisplays. how can we achieve this on rhel8 ?

joachima · September 21, 2021, 1:34am

Does this streamlined approach for NVIDIA driver installation work on secure boot systems? Especially upgrades are a pain on such systems, one can accidentally upgrade the kernel and the system won’t boot, if the required NVIDIA kernel modules are not available or not signed.

kmittman · September 21, 2021, 2:00am

Hi @joachima
So the precompiled modules are signed and I would like for UEFI Secure Boot to work out-of-the-box, but there are a couple of pieces missing. The big one is the bootloader shim does not trust the NVIDIA certificate, so the public key has to be manually enrolled in the MOK.

As far “modules are not available”, the nvidia dnf plugin will block new kernel updates, until a new compatible precompiled kmod package is available (usually within 24 hours).

Regarding system upgrades, one of the things that precompiled solves today is when the kernel-devel and kernel-headers packages do not match the target kernel, resulting in the kernel modules failing to build; and subsequently failing to load on reboot (especially booting into a new kernel). This is not an issue for precompiled streams, likewise prior to shipping, they are tested against that specific kernel version, so it is known to work.

joachima · September 21, 2021, 6:17pm

Thanks for the quick reply. This sounds good. Adding a certificate via the MOK is needed either way, even with the old way to install the driver. I’m inclined to give this a try. I’ll need to review the installation instructions to see what I need to do to get the certificate registered.

joachima · September 21, 2021, 11:37pm

Okay, going through the instructions on your Web page, I don’t see it mention which key I need to register with the MOK utility. In the instructions for building ones own custom drivers one must generate a new key, which makes sense, but where is NVIDIA’s public key?

kmittman · September 22, 2021, 5:29am

Yes, I just realized we don’t have instructions written for this nor is the public key certificate hosted externally. So congratulations on being the first person to ask! I went through the procedure tonight from a clean bare-metal install and tomorrow I will work on writing up the all the steps and making the public key available. Thank you for your patience.

kmittman · September 23, 2021, 3:36am

Hi @joachima okay here are the instructions: yum-packaging-precompiled-kmod/UEFI.md at main · NVIDIA/yum-packaging-precompiled-kmod · GitHub

check out the asciinema screencast
import the public key with mokutil
follow the MOK manager instructions on reboot

If you run into any problems please file a GitHub Issue on that repository. Also please note the key is subject to change.

mabarkdoll · September 23, 2021, 2:49pm

Thank you!!! This should also work with Rocky Linux 8 (except for secure boot until they enable it)? Going to give it a try now.

joachima · September 28, 2021, 8:31pm

This worked. Thank You!

user112167 · December 30, 2021, 10:20am

Hi Kmittman,

I followed the step to install the Nvidia latest driver on RHEL 8.4, but after the installation completed,
nvidia-smi didn’t work, the message is:

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

The RHEL8.4 is a VM on an EXSI system

mabarkdoll · January 5, 2022, 10:16pm

Hi @kmittman

This guide works on most my systems but on an older Nvidia GeForce GTX 745 we’re experiencing:

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Driver is version 495.29.05 and kernel is 4.18.0-348.7.1 which should’ve been supported since 12/21/2021. Is this graphic card just too old? Any advise, thanks.

kmittman · January 6, 2022, 12:09am

Hi @mabarkdoll
Is this in a laptop (GT 745M) or a desktop machine? This is right on the border, for GeForce cards between Kepler (470.xx / CUDA 10.2) and Maxwell architectures; the GT 745M is Kepler and the GTX 745 is Maxwell.

For Kepler GPUs, the last supported CUDA version in 10.2.x and the last NVIDIA driver branch is 470.
However, if this is indeed a Maxwell GPU then it should be working.

Are you using a DKMS or a precompiled stream? ($ dnf module list nvidia-driver)
Does it work with 470 driver or below?
Can you provide a bug report using $ nvidia-bug-report.sh ?
Anything unusual about your system (like is it a VM or container)?

mabarkdoll · January 6, 2022, 5:20pm

Thanks, I think this Dell Optiplex 9020 is just too old. I’m not seeing the nvidia MOK inside mokutil --list-enrolled after a reboot. Despite having manually enrolled the Nvidia MOK key during boot up. Also, Rocky Linux 8 supports secure boot (it boots), but their Rocky Linux key doesn’t show up as well. I’m not sure if the following will resolve it or it is just my hardware being too old (https://bugs.rockylinux.org/show_bug.cgi?id=174).

This is a desktop GPU so I believe it is the Maxwell that you noted.
01:00.0 VGA compatible controller: NVIDIA Corporation GM107 [GeForce GTX 745] (rev a2)

Are you using a DKMS or a precompiled stream? ( $ dnf module list nvidia-driver )
I was using the precompiled latest, but I tried 470-dkms as well.
Does it work with [470 driver]
(https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/precompiled/) or below?
No, I tried all the precompiled versions that were available.
Can you provide a bug report using $ nvidia-bug-report.sh ?
Yes, attached, but I don’t think this issue is on your end.
Anything unusual about your system (like is it a VM or container)?
Dell Optiplex 9020 running Rocky Linux 8 with secure boot off.
I found that after turning off secureboot nvidia-smi works on this machine, so I believe the issue with the older hardware not supporting secure boot properly with RHEL 8.

Anyway, I can use this with secure boot off and still have the hard disk encrypted which is probably good enough for this older machines use case. Thanks for your help developing this driver it is much appreciated.

nvidia-bug-report.log.gz (81.5 KB)

routenull · March 21, 2022, 5:46pm

I followed the step to install the Nvidia latest driver on RHEL 8.5, but after the installation completed,
nvidia-smi didn’t work, the message is:

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

The RHEL8.5 is a VM on an EXSI system
GPU: Nvidia Quadro P2000 (Configured for Passthru to the VM)

lspci | grep Quadro
0b:00.0 VGA compatible controller: NVIDIA Corporation GP106GL [Quadro P2000] (rev a1)

dnf module list nvidia-driver
Updating Subscription Management repositories.
Last metadata expiration check: 0:40:57 ago on Mon 21 Mar 2022 01:05:09 PM EDT.
cuda-rhel8-x86_64
Name Stream Profiles Summary
nvidia-driver latest [e] default [d] [i], fm, ks, src Nvidia driver for latest branch

Topic		Replies	Views
Driver doesn't get loaded on RHEL8 Linux	13	8947	January 12, 2021
NVIDIA Introduces Precompiled Driver Packages for RHEL 8 to Streamline Installs Technical Blog	0	349	August 21, 2022
Nvidia driver not booting Linux	2	1330	February 6, 2023
No devices found RHEL 9.2 precompiled module stream Linux driver , rhel	2	705	May 27, 2023
Nvidia driver 525.60.13, confirm Red Hat (8.7) kernel support Linux cuda , kernel	5	1865	January 5, 2023
RED HAT 8 Problem Compile Driver For 330M NVIDIA-Linux-x86_64-340.108.run Help Me! Plese! Linux	5	2141	January 28, 2021
Unable to install Nvidia / Cuda Driver on RHEL 6 Kernel 2.6.32-754.9.1.el6.x86_64 Linux	1	842	January 11, 2019
Looking RHEL kernel module 4.18.0-348.23.1 Linux	6	1982	September 1, 2022
GeForce RTX 2080 Rocky Linux release 8.6 couldn't communicate with the NVIDIA driver Linux	11	1490	November 2, 2022
No NVIDIA driver kernel module package kmod-nvidia-450 CUDA Setup and Installation	1	4342	July 28, 2020

Streamlining NVIDIA Driver Deployment on RHEL 8 with Modularity Streams

Related topics