Hello
We’re preparing for an upgrade to the RHEL 8.8 release, and are not able to build the nvidia driver with the local RPM provided here: CUDA Toolkit 11.4 Update 3 Downloads | NVIDIA Developer
We need the 470.xx series of nvidia drivers for our older systems with K80 GPUs. I’m not seeing any errors installing the newer CUDA 12.0 local RPM, but that’s bundled with newer drivers that ignore the K80 GPUs. My goal is to get a CUDA RPM that’s at least 11.X with any driver that’s compatible with the K80s.
I know for sure the step that’s failing is the dkms build -m nvidia -v 470.82.01 -q || :
part of the scriptlet from the kmod-nvidia-latest-dkms-470.82.01-1.el8.x86_64
RPM. I’m attaching the /var/lib/dkms/nvidia/470.82.01/build/make.log
as make.log.1
make.log.1 (19.3 KB)
Here’s the last 10 lines of make.log.1
:
/var/lib/dkms/nvidia/470.82.01/build/nvidia-drm/nvidia-drm-drv.c: In function ‘nv_drm_init_mode_config’:
/var/lib/dkms/nvidia/470.82.01/build/nvidia-drm/nvidia-drm-drv.c:257:21: error: ‘struct drm_mode_config’ has no member named ‘allow_fb_modifiers’
dev->mode_config.allow_fb_modifiers = true;
^
cc1: some warnings being treated as errors
make[2]: *** [scripts/Makefile.build:317: /var/lib/dkms/nvidia/470.82.01/build/nvidia-drm/nvidia-drm-drv.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [Makefile:1616: _module_/var/lib/dkms/nvidia/470.82.01/build] Error 2
make[1]: Leaving directory '/usr/src/kernels/4.18.0-477.el8.x86_64'
make: *** [Makefile:80: modules] Error 2
Our kernel for the RHEL 8.8 Beta is 4.18.0-477.el8.x86_64.
Previously, in the update to RHEL 8.7, we had kernel version 4.18.0-425.3.1.el8.x86_64
. For that I got this same error, but was able to get around it by explicitly not installing the nvidia-drm kernel module since that was apparently the problem, and this AFAIK it’s only important for X11-enabled programs.
I did that by manually taking out nvidia-drm
from the dkms.conf
, editing the Makefile
, and then continuing with the dkms build and dkms install that the scriptlet does. This is output I got from running diff:
[root@server nvidia-470.82.01]# diff dkms.conf.bak dkms.conf
13c13
< BUILT_MODULE_NAME[2]="nvidia-drm"
---
> BUILT_MODULE_NAME[2]="nvidia-uvm"
16c16
< BUILT_MODULE_NAME[3]="nvidia-uvm"
---
> BUILT_MODULE_NAME[3]="nvidia-peermem"
18,20d17
<
< BUILT_MODULE_NAME[4]="nvidia-peermem"
< DEST_MODULE_LOCATION[4]="/extra"
[root@server nvidia-470.82.01]# diff Makefile.bak Makefile
61c61
< NV_KERNEL_MODULES ?= $(wildcard nvidia nvidia-uvm nvidia-vgpu-vfio nvidia-modeset nvidia-drm nvidia-peermem)
---
> NV_KERNEL_MODULES ?= $(wildcard nvidia nvidia-uvm nvidia-vgpu-vfio nvidia-modeset nvidia-peermem)
[root@server nvidia-470.82.01]#
For RHEL 8.8, I also tried this dkms edit with the following result, shown in make.log.2
.
make.log.2 (18.3 KB)
Here’s the last 10 lines of make.log.2
:
LD [M] /var/lib/dkms/nvidia/470.82.01/build/nvidia-uvm.o
LD [M] /var/lib/dkms/nvidia/470.82.01/build/nvidia-modeset.o
LD [M] /var/lib/dkms/nvidia/470.82.01/build/nvidia-peermem.o
Building modules, stage 2.
MODPOST 4 modules
FATAL: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol 'cc_mkdec'
make[2]: *** [scripts/Makefile.modpost:91: __modpost] Error 1
make[1]: *** [Makefile:1620: modules] Error 2
make[1]: Leaving directory '/usr/src/kernels/4.18.0-477.el8.x86_64'
make: *** [Makefile:80: modules] Error 2
Can anyone help us get around this so we can continue using our K80s?