Streamlining NVIDIA Driver Deployment on RHEL 8 with Modularity Streams

Originally published at: https://developer.nvidia.com/blog/streamlining-nvidia-driver-deployment-on-rhel-8-with-modularity-streams/

NVIDIA GPUs have become mainstream for accelerating a variety of workloads from machine learning, high-performance computing (HPC), content creation workflows, and data center applications. For these enterprise use cases, NVIDIA provides a software stack powered by the CUDA platform: drivers, CUDA-X acceleration libraries, CUDA-optimized applications, and frameworks. Deploying the NVIDIA driver is one of the…

Hi this is Kevin, hope you’ve enjoyed reading my blog post. Be sure to check out my presentations on this subject, at NVIDIA GTC Fall 2020 and Red Hat Summit 2020. On a related note, the yum-packaging-precompiled-kmod repository on GitHub, has a detailed README and pull requests are welcome. Shout out to our friends at Red Hat, that collaborated on this project. Finally, if you have any questions or comments, please let us know.

Hi,

thanks for the guide. However, are you aware that these drivers have not worked since RHEL 8.3 came out, for few months? Any plans to fix the build?

Hi @ilkka.tengvall_nv could you expand on “drivers have not worked”? Please fill in the blanks:

  1. NVIDIA driver version:
  2. RHEL kernel version:
  3. modularity stream:
  4. modularity profile: <default, ks, fm>

I have confirmed that the precompiled driver packages are functional on RHEL 8.3, with two exceptions

  • 418 driver (a fix is coming soon)
  • kernel 4.18.0-240.1.1 and newer (4.18.0-240 from the RHEL 8.3 ISO image was released on the same day and thus skipped over)

Also feel free to report such issues here: Issues · NVIDIA/yum-packaging-precompiled-kmod · GitHub

Thanks, I’m not at the computer now, but I have some older posts here describing the problem:

I’ve had troubles with it for long.

By the way, I’ve added this page: https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/precompiled/ to the repository, that includes a table of available NVIDIA kernel module packages.

Hi Sir

We have problem having 2 Xdisplays on rhel8, centos or stream. yes we can create 2 X displays using nvidia-settngs but 2nd screen keep crashing using xfce, gnome, kde.

example: setenv DISPLAY :0.0 and DISPLAY :0.1

2nd monitor is plain black and unable to do right click or anything

previously on rhel7, we can use 2 Xdisplays. how can we achieve this on rhel8 ?

Does this streamlined approach for NVIDIA driver installation work on secure boot systems? Especially upgrades are a pain on such systems, one can accidentally upgrade the kernel and the system won’t boot, if the required NVIDIA kernel modules are not available or not signed.

Hi @joachima
So the precompiled modules are signed and I would like for UEFI Secure Boot to work out-of-the-box, but there are a couple of pieces missing. The big one is the bootloader shim does not trust the NVIDIA certificate, so the public key has to be manually enrolled in the MOK.

As far “modules are not available”, the nvidia dnf plugin will block new kernel updates, until a new compatible precompiled kmod package is available (usually within 24 hours).

Regarding system upgrades, one of the things that precompiled solves today is when the kernel-devel and kernel-headers packages do not match the target kernel, resulting in the kernel modules failing to build; and subsequently failing to load on reboot (especially booting into a new kernel). This is not an issue for precompiled streams, likewise prior to shipping, they are tested against that specific kernel version, so it is known to work.

Thanks for the quick reply. This sounds good. Adding a certificate via the MOK is needed either way, even with the old way to install the driver. I’m inclined to give this a try. I’ll need to review the installation instructions to see what I need to do to get the certificate registered.

Okay, going through the instructions on your Web page, I don’t see it mention which key I need to register with the MOK utility. In the instructions for building ones own custom drivers one must generate a new key, which makes sense, but where is NVIDIA’s public key?

Yes, I just realized we don’t have instructions written for this nor is the public key certificate hosted externally. So congratulations on being the first person to ask! I went through the procedure tonight from a clean bare-metal install and tomorrow I will work on writing up the all the steps and making the public key available. Thank you for your patience.

Hi @joachima okay here are the instructions: yum-packaging-precompiled-kmod/UEFI.md at main · NVIDIA/yum-packaging-precompiled-kmod · GitHub

  • check out the asciinema screencast
  • import the public key with mokutil
  • follow the MOK manager instructions on reboot

If you run into any problems please file a GitHub Issue on that repository. Also please note the key is subject to change.

1 Like

Thank you!!! This should also work with Rocky Linux 8 (except for secure boot until they enable it)? Going to give it a try now.

This worked. Thank You!

Hi Kmittman,

I followed the step to install the Nvidia latest driver on RHEL 8.4, but after the installation completed,
nvidia-smi didn’t work, the message is:

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

The RHEL8.4 is a VM on an EXSI system

Hi @kmittman

This guide works on most my systems but on an older Nvidia GeForce GTX 745 we’re experiencing:

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Driver is version 495.29.05 and kernel is 4.18.0-348.7.1 which should’ve been supported since 12/21/2021. Is this graphic card just too old? Any advise, thanks.

Hi @mabarkdoll
Is this in a laptop (GT 745M) or a desktop machine? This is right on the border, for GeForce cards between Kepler (470.xx / CUDA 10.2) and Maxwell architectures; the GT 745M is Kepler and the GTX 745 is Maxwell.

For Kepler GPUs, the last supported CUDA version in 10.2.x and the last NVIDIA driver branch is 470.
However, if this is indeed a Maxwell GPU then it should be working.

  • Are you using a DKMS or a precompiled stream? ($ dnf module list nvidia-driver)
  • Does it work with 470 driver or below?
  • Can you provide a bug report using $ nvidia-bug-report.sh ?
  • Anything unusual about your system (like is it a VM or container)?

Thanks, I think this Dell Optiplex 9020 is just too old. I’m not seeing the nvidia MOK inside mokutil --list-enrolled after a reboot. Despite having manually enrolled the Nvidia MOK key during boot up. Also, Rocky Linux 8 supports secure boot (it boots), but their Rocky Linux key doesn’t show up as well. I’m not sure if the following will resolve it or it is just my hardware being too old (https://bugs.rockylinux.org/show_bug.cgi?id=174).

This is a desktop GPU so I believe it is the Maxwell that you noted.
01:00.0 VGA compatible controller: NVIDIA Corporation GM107 [GeForce GTX 745] (rev a2)

  • Are you using a DKMS or a precompiled stream? ( $ dnf module list nvidia-driver )
    I was using the precompiled latest, but I tried 470-dkms as well.
  • Does it work with [470 driver]
    (https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/precompiled/) or below?
    No, I tried all the precompiled versions that were available.
  • Can you provide a bug report using $ nvidia-bug-report.sh ?
    Yes, attached, but I don’t think this issue is on your end.
  • Anything unusual about your system (like is it a VM or container)?
    Dell Optiplex 9020 running Rocky Linux 8 with secure boot off.
    I found that after turning off secureboot nvidia-smi works on this machine, so I believe the issue with the older hardware not supporting secure boot properly with RHEL 8.

Anyway, I can use this with secure boot off and still have the hard disk encrypted which is probably good enough for this older machines use case. Thanks for your help developing this driver it is much appreciated.

nvidia-bug-report.log.gz (81.5 KB)

I followed the step to install the Nvidia latest driver on RHEL 8.5, but after the installation completed,
nvidia-smi didn’t work, the message is:

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

The RHEL8.5 is a VM on an EXSI system
GPU: Nvidia Quadro P2000 (Configured for Passthru to the VM)

lspci | grep Quadro
0b:00.0 VGA compatible controller: NVIDIA Corporation GP106GL [Quadro P2000] (rev a1)

dnf module list nvidia-driver
Updating Subscription Management repositories.
Last metadata expiration check: 0:40:57 ago on Mon 21 Mar 2022 01:05:09 PM EDT.
cuda-rhel8-x86_64
Name Stream Profiles Summary
nvidia-driver latest [e] default [d] [i], fm, ks, src Nvidia driver for latest branch