Streamlining NVIDIA Driver Deployment on RHEL 8 with Modularity Streams

Hi @routenull
Okay I see you have the latest (precompiled) stream enabled, so I won’t ask about kernel-headers and kernel-devel packages. Also that should mean you have installed/upgraded to the 510.47.03 driver.

Please check a few things

  • Was this a fresh install or an upgrade?

    • Only installed RPMs or has a NVIDIA .run installer been used in the past?
  • Did you reboot before / after installation? (uptime)

  • Running kernel version?
    uname -a

    • Did you install a kernel update recently? Maybe around the same time as installing the NVIDIA driver?
      There was a new RHEL kernel (4.18.0-348.20.1) released on March 10th. [1]
  • Which packages are installed? I’m most interested in the kmod package(s).
    rpm -qa | grep nvidia | sort

  • Were the .ko files generated?
    ls -1 /usr/lib/modules/$(uname -r)/extra/nvidia*

  • Does this entry exist in /proc and does the driver version match?
    cat /proc/driver/nvidia/version

EDIT: one more thing

  • Do you see anything interesting in dmesg ?

@kmittman

Thanks for the quick reply!

Was this a fresh install or an upgrade?
    This was a fresh RHEL8.5 install.

Did you reboot before / after installation? (uptime)
    I did reboot.

Running kernel version?
uname -a
    Did you install a kernel update recently? Maybe around the same time as installing the NVIDIA driver?
    There was a new RHEL kernel (4.18.0-348.20.1) released on March 10th. [1]

    uname -a
    Linux chronos01 4.18.0-348.20.1.el8_5.x86_64 #1 SMP Tue Mar 8 12:56:54 EST 2022 x86_64 x86_64 x86_64 GNU/Linux

Which packages are installed? I’m most interested in the kmod package(s).
rpm -qa | grep nvidia | sort

rpm -qa | grep nvidia | sort
dnf-plugin-nvidia-2.0-1.el8.noarch
kmod-nvidia-510.47.03-4.18.0-348.20.1-510.47.03-3.el8_5.x86_64
nvidia-driver-510.47.03-1.el8.x86_64
nvidia-driver-cuda-510.47.03-1.el8.x86_64
nvidia-driver-cuda-libs-510.47.03-1.el8.x86_64
nvidia-driver-devel-510.47.03-1.el8.x86_64
nvidia-driver-libs-510.47.03-1.el8.x86_64
nvidia-driver-NvFBCOpenGL-510.47.03-1.el8.x86_64
nvidia-driver-NVML-510.47.03-1.el8.x86_64
nvidia-kmod-common-510.47.03-1.el8.noarch
nvidia-libXNVCtrl-510.47.03-1.el8.x86_64
nvidia-libXNVCtrl-devel-510.47.03-1.el8.x86_64
nvidia-modprobe-510.47.03-1.el8.x86_64
nvidia-persistenced-510.47.03-1.el8.x86_64
nvidia-settings-510.47.03-1.el8.x86_64
nvidia-xconfig-510.47.03-1.el8.x86_64

Were the .ko files generated?
ls -1 /usr/lib/modules/$(uname -r)/extra/nvidia*

ls -1 /usr/lib/modules/$(uname -r)/extra/nvidia*
ls: cannot access '/usr/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/nvidia*': No such file or directory

Does this entry exist in /proc and does the driver version match?
cat /proc/driver/nvidia/version

cat /proc/driver/nvidia/version
cat: /proc/driver/nvidia/version: No such file or directory

It appears the driver didn’t get installed?

@kmittman

Doing some further testing on another fresh install of RHEL8, I found that the .ko files are at the following path:
ls -alh /usr/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/drivers/video/nvidia/
total 48M
drwxr-xr-x 2 root root 115 Mar 23 11:34 .
drwxr-xr-x. 3 root root 20 Mar 23 11:34 …
-rw-r–r-- 1 root root 239K Mar 23 11:34 nvidia-drm.ko
-rw-r–r-- 1 root root 44M Mar 23 11:34 nvidia.ko
-rw-r–r-- 1 root root 1.6M Mar 23 11:34 nvidia-modeset.ko
-rw-r–r-- 1 root root 115K Mar 23 11:34 nvidia-peermem.ko
-rw-r–r-- 1 root root 2.2M Mar 23 11:34 nvidia-uvm.ko

And there is no nvidia/ under /proc/driver/.

Okay, @routenull I finally got access to a RHEL8 machine with a Quadro P2000 GPU. The installation was successful on bare-metal. Is it possible that there is a configuration issue with the GPU pass-through to VM?

Anyway, I am providing step-by-step with output so you can follow along and let me know where there is divergence on your machine. Note: for precompiled, the following are optional: gcc, EPEL repo (dkms), kernel-devel and kernel-headers packages.

Pre-installation actions

$ lspci | grep -i nvidia
65:00.0 VGA compatible controller: NVIDIA Corporation GP106GL [Quadro P2000] (rev a1)
65:00.1 Audio device: NVIDIA Corporation GP106 High Definition Audio Controller (rev a1)
$ uname -m && cat /etc/*release
x86_64
NAME="Red Hat Enterprise Linux"
VERSION="8.5 (Ootpa)"
...
$ gcc --version
gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4)
...
$ uname -r
4.18.0-348.20.1.el8_5.x86_64

Verify matching versions

$ sudo dnf install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
$ rpm -qa | grep kernel | sort | grep $(uname -r)
kernel-4.18.0-348.20.1.el8_5.x86_64
kernel-core-4.18.0-348.20.1.el8_5.x86_64
kernel-devel-4.18.0-348.20.1.el8_5.x86_64
kernel-headers-4.18.0-348.20.1.el8_5.x86_64
kernel-modules-4.18.0-348.20.1.el8_5.x86_64
kernel-tools-4.18.0-348.20.1.el8_5.x86_64
kernel-tools-libs-4.18.0-348.20.1.el8_5.x86_64

CUDA Download Page instructions

$ sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
$ sudo dnf clean all
$ sudo dnf repolist
repo id                            repo name
cuda-rhel8-x86_64                  cuda-rhel8-x86_64
epel                               Extra Packages for Enterprise Linux 8 - x86_64
epel-modular                       Extra Packages for Enterprise Linux Modular 8 - x86_64
rhel-8-for-x86_64-appstream-rpms   Red Hat Enterprise Linux 8 for x86_64 - AppStream (RPMs)
rhel-8-for-x86_64-baseos-rpms      Red Hat Enterprise Linux 8 for x86_64 - BaseOS (RPMs)

Install the latest precompiled kernel module stream

$ sudo dnf module install nvidia-driver:latest
Updating Subscription Management repositories.
Last metadata expiration check: 0:00:24 ago on Thu 31 Mar 2022 02:15:56 PM PDT.
Dependencies resolved.
=====================================================================================================================================
 Package                                      Architecture  Version                    Repository                               Size
=====================================================================================================================================
Installing group/module packages:
 cuda-drivers                                 x86_64        510.47.03-1                cuda-rhel8-x86_64                       7.0 k
 nvidia-driver                                x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                        22 M
 nvidia-driver-NVML                           x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                       516 k
 nvidia-driver-NvFBCOpenGL                    x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                        52 k
 nvidia-driver-cuda                           x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                       591 k
 nvidia-driver-cuda-libs                      x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                        63 M
 nvidia-driver-devel                          x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                        12 k
 nvidia-driver-libs                           x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                       168 M
 nvidia-kmod-common                           noarch        3:510.47.03-1.el8          cuda-rhel8-x86_64                        12 k
 nvidia-libXNVCtrl                            x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                        25 k
 nvidia-libXNVCtrl-devel                      x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                        55 k
 nvidia-modprobe                              x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                        36 k
 nvidia-persistenced                          x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                        42 k
 nvidia-settings                              x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                       832 k
 nvidia-xconfig                               x86_64        3:510.47.03-1.el8          cuda-rhel8-x86_64                       105 k
Installing dependencies:
 dnf-plugin-nvidia                            noarch        2.0-1.el8                  cuda-rhel8-x86_64                        12 k
 egl-wayland                                  x86_64        1.1.7-1.el8                rhel-8-for-x86_64-appstream-rpms         34 k
 kmod-nvidia-510.47.03-4.18.0-348.20.1        x86_64        3:510.47.03-3.el8_5        cuda-rhel8-x86_64                        29 M
 libX11-devel                                 x86_64        1.6.8-5.el8                rhel-8-for-x86_64-appstream-rpms        976 k
 libXau-devel                                 x86_64        1.0.9-3.el8                rhel-8-for-x86_64-appstream-rpms         21 k
 libglvnd-opengl                              x86_64        1:1.3.2-1.el8              rhel-8-for-x86_64-appstream-rpms         47 k
 libvdpau                                     x86_64        1.4-2.el8                  rhel-8-for-x86_64-appstream-rpms         41 k
 libxcb-devel                                 x86_64        1.13.1-1.el8               rhel-8-for-x86_64-appstream-rpms        1.1 M
 mesa-vulkan-drivers                          x86_64        21.1.5-1.el8               rhel-8-for-x86_64-appstream-rpms        6.1 M
 ocl-icd                                      x86_64        2.2.12-1.el8               rhel-8-for-x86_64-appstream-rpms         51 k
 opencl-filesystem                            noarch        1.0-6.el8                  rhel-8-for-x86_64-appstream-rpms        8.5 k
 vulkan-loader                                x86_64        1.2.198.0-2.el8_5          rhel-8-for-x86_64-appstream-rpms        123 k
 xorg-x11-proto-devel                         noarch        2020.1-3.el8               rhel-8-for-x86_64-appstream-rpms        280 k
Installing module profiles:
 nvidia-driver/default                                                                                                              
Enabling module streams:
 nvidia-driver                                              latest                                                                  

Transaction Summary
=====================================================================================================================================
Install  28 Packages

Total download size: 292 M
Installed size: 697 M
Is this ok [y/N]: y
$ rpm -qa | grep nvidia | sort
dnf-plugin-nvidia-2.0-1.el8.noarch
kmod-nvidia-510.47.03-4.18.0-348.20.1-510.47.03-3.el8_5.x86_64
nvidia-driver-510.47.03-1.el8.x86_64
nvidia-driver-cuda-510.47.03-1.el8.x86_64
nvidia-driver-cuda-libs-510.47.03-1.el8.x86_64
nvidia-driver-devel-510.47.03-1.el8.x86_64
nvidia-driver-libs-510.47.03-1.el8.x86_64
nvidia-driver-NvFBCOpenGL-510.47.03-1.el8.x86_64
nvidia-driver-NVML-510.47.03-1.el8.x86_64
nvidia-kmod-common-510.47.03-1.el8.noarch
nvidia-libXNVCtrl-510.47.03-1.el8.x86_64
nvidia-libXNVCtrl-devel-510.47.03-1.el8.x86_64
nvidia-modprobe-510.47.03-1.el8.x86_64
nvidia-persistenced-510.47.03-1.el8.x86_64
nvidia-settings-510.47.03-1.el8.x86_64
nvidia-xconfig-510.47.03-1.el8.x86_64
$ find /lib/modules -name "nvidia*ko*" | sort
/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/drivers/video/nvidia/nvidia-drm.ko
/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/drivers/video/nvidia/nvidia.ko
/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/drivers/video/nvidia/nvidia-modeset.ko
/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/drivers/video/nvidia/nvidia-peermem.ko
/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/drivers/video/nvidia/nvidia-uvm.ko
$ sudo reboot

After rebooting

$ lsmod | grep nvidia | sort
drm                   573440  12 drm_kms_helper,drm_vram_helper,ast,nvidia,drm_ttm_helper,nvidia_drm,ttm
drm_kms_helper        253952  5 drm_vram_helper,ast,nvidia_drm
nvidia              38502400  321 nvidia_uvm,nvidia_modeset
nvidia_drm             61440  3
nvidia_modeset       1118208  6 nvidia_drm
nvidia_uvm           1085440  0
$ nvidia-smi 
Thu Mar 31 14:24:17 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P2000        Off  | 00000000:65:00.0  On |                  N/A |
| 45%   33C    P8     5W /  75W |     65MiB /  5120MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2278      G   /usr/libexec/Xorg                  40MiB |
|    0   N/A  N/A      3393      G   /usr/bin/gnome-shell               22MiB |
+-----------------------------------------------------------------------------+
$ cat /proc/driver/nvidia/version 
NVRM version: NVIDIA UNIX x86_64 Kernel Module  510.47.03  Mon Jan 24 22:58:54 UTC 2022
GCC version:  gcc version 8.5.0 20210514 (Red Hat 8.5.0-4) (GCC) 
1 Like

@kmittman

Thank you for the detailed response and my apologies for the lengthy delay in responding, I hadn’t set the notifications correctly for this post.

Is it possible that there is a configuration issue with the GPU pass-through to VM?

It is entirely possible, right now I have the VMWare passthru configured as: DirectPath IO
https://i.imgur.com/QFNjGty.png

All my steps match your steps line by line, until after the reboot. There is no driver loaded, but the hardware is seen by the OS install.

[chronos01 ~] # lspci | grep -i nvidia
0b:00.0 VGA compatible controller: NVIDIA Corporation GP106GL [Quadro P2000] (rev a1)
0b:00.1 Audio device: NVIDIA Corporation GP106 High Definition Audio Controller (rev a1)

[chronos01 ~] # rpm -qa | grep nvidia | sort
dnf-plugin-nvidia-2.0-1.el8.noarch
kmod-nvidia-510.47.03-4.18.0-348.20.1-510.47.03-3.el8_5.x86_64
nvidia-driver-510.47.03-1.el8.x86_64
nvidia-driver-cuda-510.47.03-1.el8.x86_64
nvidia-driver-cuda-libs-510.47.03-1.el8.x86_64
nvidia-driver-devel-510.47.03-1.el8.x86_64
nvidia-driver-libs-510.47.03-1.el8.x86_64
nvidia-driver-NvFBCOpenGL-510.47.03-1.el8.x86_64
nvidia-driver-NVML-510.47.03-1.el8.x86_64
nvidia-kmod-common-510.47.03-1.el8.noarch
nvidia-libXNVCtrl-510.47.03-1.el8.x86_64
nvidia-libXNVCtrl-devel-510.47.03-1.el8.x86_64
nvidia-modprobe-510.47.03-1.el8.x86_64
nvidia-persistenced-510.47.03-1.el8.x86_64
nvidia-settings-510.47.03-1.el8.x86_64
nvidia-xconfig-510.47.03-1.el8.x86_64

[chronos01 ~] # find /lib/modules -name “nvidiako” | sort
/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/drivers/video/nvidia/nvidia-drm.ko
/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/drivers/video/nvidia/nvidia.ko
/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/drivers/video/nvidia/nvidia-modeset.ko
/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/drivers/video/nvidia/nvidia-peermem.ko
/lib/modules/4.18.0-348.20.1.el8_5.x86_64/extra/drivers/video/nvidia/nvidia-uvm.ko

I will try to see if changing any pass-through options makes a difference and report back.

Does this work yet on RHEL 9, if not what is the best way to install nvidia on the new RHEL 9 release?

Hi @rafjaimes
Official NVIDIA driver RPM packages for RHEL9 (both DKMS and precompiled streams) are not yet available but will be coming soon. The latest CUDA 11.7.0 / 515 driver was released on May 11th; Red Hat released RHEL 9.0 on May 18th.

Currently does RHEL 8 support using dual NVIDIA® RTX™ A4000, 16 GB GDDR6 with the drivers above?

Thanks!

Hi @rafjaimes
The first precompiled kmod package for RHEL9 is now available (NVIDIA driver 515.48.07 @ 5.14.0-70.13.1 kernel):
https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/precompiled/

sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
sudo dnf module install nvidia-driver:latest