Hi there!
I’m responsible for developing a Linux distro based on Debian 12 stable (bookworm). This distro will ship with an updated, custom Linux 6.4.3 kernel (at this time) compiled from the source. The distro is targeted to run on high-end machines with NVIDIA GPUs in order to support GPU-intensive, AI-based applications, so compatibility with the NVIDIA driver is a must.
Everything seems to work with the custom Kernel except with the NVIDIA Driver 525 supplied by the Debian repo. I also tried version 535 from the NVIDIA website, but I experienced issues running it on my laptop such as moving my mouse for the second screen to update on-screen with dual GPU configuration, so I stuck myself on version 525. None of them will install with the custom kernel at all. I even managed to install the drivers from the concurrent manufacturer, but one of the AI-based models I use uses Tensorflow 1.8, which runs better on NVIDIA GPUs.
I have two Debian 12 installs on my laptop: One for production, and the other for testing the development of the Linux distro that I’m in charge of.
The production install uses the NVIDIA Driver 525 supplied by the Debian repo. When I try to install the custom kernel on the product install I got the following error after typing make install:
INSTALL /boot
run-parts: executing /etc/kernel/postinst.d/dkms 6.4.3.66267-amd64-lowlatency /boot/vmlinuz-6.4.3.66267-amd64-lowlatency
dkms: running auto installation service for kernel 6.4.3.66267-amd64-lowlatency.
Sign command: /lib/modules/6.4.3.66267-amd64-lowlatency/build/scripts/sign-file
Signing key: /var/lib/dkms/mok.key
Public certificate (MOK): /var/lib/dkms/mok.pub
Building module:
Cleaning build area...(bad exit status: 2)
env NV_VERBOSE=1 make -j12 modules KERNEL_UNAME=6.4.3.66267-amd64-lowlatency...(bad exit status: 2)
Error! Bad return status for module build on kernel: 6.4.3.66267-amd64-lowlatency (x86_64)
Consult /var/lib/dkms/nvidia-current/525.105.17/build/make.log for more information.
Error! One or more modules failed to install during autoinstall.
Refer to previous errors for more information.
dkms: autoinstall for kernel: 6.4.3.66267-amd64-lowlatency failed!
run-parts: /etc/kernel/postinst.d/dkms exited with return code 11
make: *** [arch/x86/Makefile:292: install] Error 1
The /var/lib/dkms/nvidia-current/525.105.17/build/make.log contains the following content:
DKMS make.log for nvidia-current-525.105.17 for kernel 6.4.3.66267-amd64-lowlatency (x86_64)
Tue Jul 18 22:16:04 -03 2023
Makefile:18: /Kbuild: No such file or directory
make[1]: *** No rule to make target '/Kbuild'. Stop.
On the other hand, when I try the install the NVIDIA driver on the development install, that comes with the custom Linux kernel, there are several errors related with the NVIDIA driver listed by systemd, making it fail to load and instead the nvidiafb driver loads instead (The Nouveau driver has been removed from the custom kernel because it conflicts with the NVIDIA driver).
I generated the nvidia-bug-report.log.gz file when I try to boot with the custom kernel with driver remnants (Because the apt keeps drivers remnants when the driver installation fails).
When I try to install the NVIDIA Driver via the nvidia-driver command, I get this:
Setting up nvidia-kernel-dkms (525.105.17-1) ...
Loading new nvidia-current-525.105.17 DKMS files...
Building for 6.4.3.66267-amd64-lowlatency
Building initial module for 6.4.3.66267-amd64-lowlatency
Error! Bad return status for module build on kernel: 6.4.3.66267-amd64-lowlatency (x86_64)
Consult /var/lib/dkms/nvidia-current/525.105.17/build/make.log for more information.
dpkg: error processing package nvidia-kernel-dkms (--configure):
installed nvidia-kernel-dkms package post-installation script subprocess returned error exit status 10
dpkg: dependency problems prevent configuration of nvidia-driver:
nvidia-driver depends on nvidia-kernel-dkms (= 525.105.17-1) | nvidia-kernel-525.105.17 | nvidia-open-kernel-525.105.17 | nvidia-open-kernel-525.105.17; however:
Package nvidia-kernel-dkms is not configured yet.
Package nvidia-kernel-525.105.17 is not installed.
Package nvidia-kernel-dkms which provides nvidia-kernel-525.105.17 is not configured yet.
Package nvidia-open-kernel-525.105.17 is not installed.
Package nvidia-open-kernel-525.105.17 is not installed.
dpkg: error processing package nvidia-driver (--configure):
dependency problems - leaving unconfigured
Processing triggers for libc-bin (2.36-9) ...
Processing triggers for initramfs-tools (0.142) ...
update-initramfs: Generating /boot/initrd.img-6.4.3.66267-amd64-lowlatency
Processing triggers for update-glx (1.2.2) ...
Processing triggers for glx-alternative-nvidia (1.2.2) ...
update-alternatives: using /usr/lib/nvidia to provide /usr/lib/glx (glx) in auto mode
Processing triggers for glx-alternative-mesa (1.2.2) ...
Processing triggers for libc-bin (2.36-9) ...
Processing triggers for initramfs-tools (0.142) ...
update-initramfs: Generating /boot/initrd.img-6.4.3.66267-amd64-lowlatency
Errors were encountered while processing:
nvidia-kernel-dkms
nvidia-driver
E: Sub-process /usr/bin/dpkg returned an error code (1)
And I get a 2000-line long make.log file (Sorry, I didn’t post it beforehand because the forum limits one attachment at a time for new users, but it is relevant I can provide it in a reply).
How can I proceed to put the NVIDIA Driver to work with the custom kernel?
nvidia-installation-log.txt (29.4 KB)