Clear Linux - ERROR: Unable to load the 'nvidia-drm' kernel module - GTX 970

I’m trying to install the NVidia driver on a fresh installation of Clear Linux. The machine is a desktop tower with a GTX 970 card.

During installation, it fails with this error:

ERROR: Unable to load the 'nvidia-drm' kernel module.

The only info around this error I’ve found mention disabling UEFI, so I booted into Linux in legacy mode. I used the awesome-clear-linux scripts to run the installer. I made several attempts with different kernel versions, and all had the same outcome: 4.19.124, 5.4.42, and 5.6.14. Also, I was able to go through this process on this machine just about a month ago.

Attached is the nvidia bug report log and the install log.

nvidia-bug-report.log (204.5 KB)
nvidia-installer.log (2.8 KB)

Looks like you’re using dkms to compile the modules, is dkms installed at all?
Please post the output of
dkms status
Furthermore, please remove initcall_debug from kernel command-line, it floods the dmesg so nothing of value can be seen in the log, then create a new nvidia-bug-report.log.

1 Like

Thanks for replying! Attached is the new bug report log. Also, DKMS is installed, that’s part of what the awesome-clear-Linux scripts do. Here’s what I got:

$ dkms status
nvidia, 440.82: added

nvidia-bug-report.log (203.4 KB)

Do you know if there’s a particular compiler version needed to compile the modules? I think Clear Linux recently updated, and that has caused some issues for me in other areas too.

The active cc compiler version has to match the version the kernel was compiled with. Seems to be correctly set in your case.
I suspect you’re missing the kernel headers so the driver doesn’t compile. Please (re)install them for the currently running kernel, then run
sudo dkms install nvidia/440.82 --all
note any errors, then check
dkms status
if the driver state changed to ‘installed’.

That seems like a good place to start. I’ll need to figure out how to install kernel headers in Clear Linux. When I’ve done that, I’ll try the steps you described and post results.

According to a comment on the Clear Linux repo, I should already have the headers installed.

If that’s the case, what other issues might I look for?

If it’s what I’m experiencing, then you must have a kernel configured with SECTION_MISMATCH_WARN_ONLY (Kernel hacking -> Compile-time checks and compiler options -> Make section mismatch errors non-fatal) for the driver to compile because somehow there’s a mismatch problem in the NVIDIA driver, somehow.
That’s probably due to GCC10 compatibility issues.
The linux-clear aur kernel has that issue at least so those are my two bits
https://aur.archlinux.org/packages/linux-clear/

You can also check if the kernel headers are correctly registered by using

sudo dkms install nvidia/440.82 -k $(uname -r)

and look for error messages. If none appear, check the build log in
/var/lib/dkms/nvidia/440.82/build/make.log

I don’t know how to do that on Clear Linux from a command line, unfortunately. Can you link to any instructions around that? However, the script I’m using does check for that mis-matching version, and it hasn’t complained.

I ran that command @generix. It returned almost immediately with no output. The /var/lib/dkms/nvidia/440.82/build/ directory exists, but is empty.

I still suspect the kernel headers are somehow missing/incorrectly registered.

/lib/modules/$(uname -r)/build

should be a symlink to them, does it exist?

Yup, it exists.

$ ls /lib/modules/$(uname -r)/build
arch
block
certs
crypto
Documentation
drivers
fs
include
init
ipc
Kbuild
Kconfig
kernel
lib
lts2019
Makefile
mm
Module.symvers
net
samples
scripts
security
sound
System.map
tools
usr
virt

No idea. Does running the .run installer with -K option to just build and install the modules yield any info?

After two days trying to solve this, I’ve finally managed to install 440.82 with the latest kernel-native (custom built with SECTION_MISMATCH_WARN_ONLY). Perhaps nvidia or intel staff will clarify what’s at fault here, but there is some problematic interaction between 5.6.15 kernel vs GCC 10.1 vs nvidia 440.92 installer in 1) creating the proper dkms build and source tree in “/var/lib/dkms/nvidia/440.82/” and 2) issuing the proper install command to the /usr/bin/dkms tool.

Using the following installation command (per https://docs.01.org/clearlinux/latest/tutorials/nvidia.html plus “–no-cc-version-check” just as a guarantee and “–expert” instead of “–silent” for a more verbose installation)

sudo ./NVIDIA-Linux-x86_64-440.82.run
–utility-prefix=/opt/nvidia
–opengl-prefix=/opt/nvidia
–compat32-prefix=/opt/nvidia
–compat32-libdir=lib32
–x-prefix=/opt/nvidia
–x-module-path=/opt/nvidia/lib64/xorg/modules
–x-library-path=/opt/nvidia/lib64
–x-sysconfig-path=/etc/X11/xorg.conf.d
–documentation-prefix=/opt/nvidia
–application-profile-path=/etc/nvidia/nvidia-application-profiles-rc.d
–no-precompiled-interface
–no-distro-scripts
–force-libglx-indirect
–glvnd-egl-config-path=/etc/glvnd/egl_vendor.d
–egl-external-platform-config-path=/etc/egl/egl_external_platform.d
–dkms
–no-cc-version-check
–expert

Using the “–expert” options reveals why the installer issues “ERROR: Unable to load the ‘nvidia-drm’ kernel module” without any explanation at all:

-> Driver file installation is complete.
-> Installing DKMS kernel module:
-> done.
ERROR: Unable to load the ‘nvidia-drm’ kernel module: ‘modprobe: ERROR: ctx=0x5646828152a0 path=/lib/modules/5.6.15-957.native/kernel/drivers/video/nvidia-modeset.ko error=No such file or directory
modprobe: ERROR: ctx=0x5646828152a0 path=/lib/modules/5.6.15-957.native/kernel/drivers/video/nvidia-modeset.ko error=No such file or directory
modprobe: ERROR: could not insert ‘nvidia_drm’: Unknown symbol in module, or unknown parameter (see dmesg)’

The DKMS kernel modules are being built, but they are not being finally installed from the proper build directory to the corresponding kernel modules path. That is why they are not being loaded by modprobe. ls /var/lib/dkms/nvidia/440.82/ reveals the following:

source/
build/

The correct source symlink directory pointing to the nvidia kernel modules sources. And the build symlink directory which after the successful build should hold the resulting binaries, make.log, etc. And that is not happening. build/ is an empty directory. The solution is to manually build and install the dkms nvidia source tree. Here is the fix, after rebuilding the kernel with “SECTION_MISMATCH_WARN_ONLY”. Start by installing the driver again, this time with the --silent flag instead of the --expert flag:

sudo ./NVIDIA-Linux-x86_64-440.82.run
–utility-prefix=/opt/nvidia
–opengl-prefix=/opt/nvidia
–compat32-prefix=/opt/nvidia
–compat32-libdir=lib32
–x-prefix=/opt/nvidia
–x-module-path=/opt/nvidia/lib64/xorg/modules
–x-library-path=/opt/nvidia/lib64
–x-sysconfig-path=/etc/X11/xorg.conf.d
–documentation-prefix=/opt/nvidia
–application-profile-path=/etc/nvidia/nvidia-application-profiles-rc.d
–no-precompiled-interface
–no-distro-scripts
–force-libglx-indirect
–glvnd-egl-config-path=/etc/glvnd/egl_vendor.d
–egl-external-platform-config-path=/etc/egl/egl_external_platform.d
–dkms
–no-cc-version-check
–silent

Go to the same /var/lib/dkms/nvidia/440.82/ directory. Enter source/, where dkms.conf and Makefile is. Try this:

sudo dkms autoinstall

All nvidia modules will be successfully build, added and installed. ls /var/lib/dkms/nvidia/440.82/ will now correctly show a proper dkms compilation tree:
source/
5.6.15-957.native/

The kernel should be custom rebuild with SECTION_MISMATCH_WARN_ONLY (Kernel hacking -> Compile-time checks and compiler options -> Make section mismatch errors non-fatal), following the simple guide: https://docs.01.org/clearlinux/latest/guides/kernel/kernel-development.html. Remember to install the *-dev package of your newly custom built kernel as well: rpm2cpio linux-dev-5.6.15-957.x86_64.rpm | (cd /; sudo cpio -i -d -u -v);

Follow https://github.com/clearlinux/distribution/issues/1994 for more.

I’ve found another workaround that hopefully could lead to a solution within the NVidia driver.

If I run the installer .run file with the env var CONFIG_SECTION_MISMATCH_WARN_ONLY=y set and without DKMS, then it will compile and install correctly. Of course this means that updating the kernel won’t rebuild the kernel module, so I will have to manually reinstall the driver any time I update the kernel.

@generix is it possible to configure DKMS to set that environmental variable when it is doing the module build? I suspect that could fix this whole issue.

Credit for solution here: https://github.com/clearlinux/distribution/issues/1994#issuecomment-636455574

You could add setting the env var as PRE_BUILD= directive in dkms.conf of the driver.

Can you clarify how I might do that? Before running, all I have is the .run file, and once it’s running I’d have no opportunity to edit it before the install fails.

Try this:
extract the .run installer using the -x option, then modify ‘createddir’/kernel/dkms.conf, afterwards use ‘createddir’/nvidia-installer with your options to install.

Thanks, I’ll try that later.

In the effort toward a long-term solution, according one of the CL members, this is happening because the NVidia Linux module is out-of-sync with the upstream kernel.

https://github.com/clearlinux/distribution/issues/1994#issuecomment-637001227

What are the odds of this being fixed any time soon? Is the dev team aware of it?