CUDA 11 dkms build errors, RHEL 8.8 Beta, 4.18.0-477.el8.x86_64

Hello

We’re preparing for an upgrade to the RHEL 8.8 release, and are not able to build the nvidia driver with the local RPM provided here: CUDA Toolkit 11.4 Update 3 Downloads | NVIDIA Developer

We need the 470.xx series of nvidia drivers for our older systems with K80 GPUs. I’m not seeing any errors installing the newer CUDA 12.0 local RPM, but that’s bundled with newer drivers that ignore the K80 GPUs. My goal is to get a CUDA RPM that’s at least 11.X with any driver that’s compatible with the K80s.

I know for sure the step that’s failing is the dkms build -m nvidia -v 470.82.01 -q || : part of the scriptlet from the kmod-nvidia-latest-dkms-470.82.01-1.el8.x86_64 RPM. I’m attaching the /var/lib/dkms/nvidia/470.82.01/build/make.log as make.log.1
make.log.1 (19.3 KB)

Here’s the last 10 lines of make.log.1:

/var/lib/dkms/nvidia/470.82.01/build/nvidia-drm/nvidia-drm-drv.c: In function ‘nv_drm_init_mode_config’:
/var/lib/dkms/nvidia/470.82.01/build/nvidia-drm/nvidia-drm-drv.c:257:21: error: ‘struct drm_mode_config’ has no member named ‘allow_fb_modifiers’
     dev->mode_config.allow_fb_modifiers = true;
                     ^
cc1: some warnings being treated as errors
make[2]: *** [scripts/Makefile.build:317: /var/lib/dkms/nvidia/470.82.01/build/nvidia-drm/nvidia-drm-drv.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [Makefile:1616: _module_/var/lib/dkms/nvidia/470.82.01/build] Error 2
make[1]: Leaving directory '/usr/src/kernels/4.18.0-477.el8.x86_64'
make: *** [Makefile:80: modules] Error 2

Our kernel for the RHEL 8.8 Beta is 4.18.0-477.el8.x86_64.

Previously, in the update to RHEL 8.7, we had kernel version 4.18.0-425.3.1.el8.x86_64. For that I got this same error, but was able to get around it by explicitly not installing the nvidia-drm kernel module since that was apparently the problem, and this AFAIK it’s only important for X11-enabled programs.

I did that by manually taking out nvidia-drm from the dkms.conf, editing the Makefile, and then continuing with the dkms build and dkms install that the scriptlet does. This is output I got from running diff:

[root@server nvidia-470.82.01]# diff dkms.conf.bak dkms.conf
13c13
< BUILT_MODULE_NAME[2]="nvidia-drm"
---
> BUILT_MODULE_NAME[2]="nvidia-uvm"
16c16
< BUILT_MODULE_NAME[3]="nvidia-uvm"
---
> BUILT_MODULE_NAME[3]="nvidia-peermem"
18,20d17
<
< BUILT_MODULE_NAME[4]="nvidia-peermem"
< DEST_MODULE_LOCATION[4]="/extra"
[root@server nvidia-470.82.01]# diff Makefile.bak Makefile
61c61
<   NV_KERNEL_MODULES ?= $(wildcard nvidia nvidia-uvm nvidia-vgpu-vfio nvidia-modeset nvidia-drm nvidia-peermem)
---
>   NV_KERNEL_MODULES ?= $(wildcard nvidia nvidia-uvm nvidia-vgpu-vfio nvidia-modeset nvidia-peermem)
[root@server nvidia-470.82.01]#

For RHEL 8.8, I also tried this dkms edit with the following result, shown in make.log.2.
make.log.2 (18.3 KB)

Here’s the last 10 lines of make.log.2:

  LD [M]  /var/lib/dkms/nvidia/470.82.01/build/nvidia-uvm.o
  LD [M]  /var/lib/dkms/nvidia/470.82.01/build/nvidia-modeset.o
  LD [M]  /var/lib/dkms/nvidia/470.82.01/build/nvidia-peermem.o
  Building modules, stage 2.
  MODPOST 4 modules
FATAL: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol 'cc_mkdec'
make[2]: *** [scripts/Makefile.modpost:91: __modpost] Error 1
make[1]: *** [Makefile:1620: modules] Error 2
make[1]: Leaving directory '/usr/src/kernels/4.18.0-477.el8.x86_64'
make: *** [Makefile:80: modules] Error 2

Can anyone help us get around this so we can continue using our K80s?

Hello Joseph

Thank you for your tip. I had the same issue for 8.7.

As 8.8 is not yet released. maybe we’ll need to wait for the final version?

Werner

I’d expect to get the same error with the full release of RHEL 8.8, but it’s worth trying again when the full release comes out.

I also found a workaround for us, though we’re not planning on upgrading to RHEL 8.8 right away:

  1. install the newer 470.X series driver by going through the menu here:
    Official Drivers | NVIDIA

  2. install rpmfusion’s xorg-x11-drv-nvidia-470xx-cuda-libs-3:470.182.03-1.el8.x86_64 package

This gave me a working nvidia and CUDA (v11.4) driver to use with the K80s on the 8.8 beta.