Hi,
tried to compile the latest beta driver v510.39.01 but it fails every time with:
ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most
frequently when this kernel module was built against the wrong or
improperly configured kernel sources, with a version of gcc that differs
from the one used to build the target kernel, or if another driver, such
as nouveau, is present and prevents the NVIDIA kernel module from
obtaining ownership of the NVIDIA device(s), or no NVIDIA device
installed in this system is supported by this NVIDIA Linux graphics
driver release.
Please see the log entries 'Kernel module load error' and 'Kernel
messages' at the end of the file '/var/log/nvidia-installer.log' for
more information.
As a side information I compile the driver in a Docker container to create a driver package and there is no Nvidia GPU installed.
Is there maybe a way/workaround to not load ânvidia.koâ so that the installation of the files to the destination succeeds?
Could it be possible that this caused because of this change from the changelog:
Updated nvidia.ko to load even if no supported NVIDIA GPUs are present when an NVIDIA NVSwitch device is detected in the system. Previously, nvidia.ko would fail to load into the kernel if no supported GPUs were present.
is causing this?
Thank you for the response!
But what can it be else that it now fails?
My best guess was that it has to do with nvidia.ko because it fails to load and after that it simply exits.
Not a single file is installed to the specified directories, no modules/firmware/binaries. The installer exits after the above mentioned error message that is also visible in the nvidia-installer.log (I can also upload a nvidia-install.log from a successful driver installation from one of the earlier drivers if needed uploaded a successful log from driver version 495.46: nvidia-installer.log (32.9 KB)
).
I now also tried it on bare metal without the container and itâs the same as in the container.
Looking at the log, the installer previously installed all files before trying to modprobe the modules so failing to load them had no adverse effect on installation. Now the modprobe happens right after compiling them, before any files are installed.
Very inconvenient, also in other cases. In your case, maybe replacing modprobe/insmod with a stub that always returns success may help.
Exactly, but wouldnât this also affect other people too since I even donât can successfully compile the new driver on bare metal without a Nvidia GPU installed.
What about package maintainers for various distributions?
Canât be a command line argument added to the installer so that the modules are not loaded, someone submitted already a PR about 2 years ago for this in the exact same Github that youâve linked.
Wouldnât this be an option so the installer officially supports it?
Pretty sure most of us donât use nvidia-installer as anything but a reference (too many hardcoded paths, auto-detection, etcâŠ), or at least I donât use it on Gentoo.
I know what you mean, but it would be neat to have this feature anyways I think, also the PR is already on Github that would make this possible.
Slackware for example uses the installer (I know Slackware is really old but it should be only an example :) )
Iâm also not happy about this as some users have to use the installer, canât be recommended any repo and if modprobing fails I now canât even ask them to run nvidia-bug-report.sh to find the cause. Itâs not installed in that case. Catch22.
More that this, I see this simply as a bug since youâre using the --kernel-name option which is for building modules for non-running kernels so modprobing is expected to fail.
@generix late but better than neverâŠ
Iâve created a PR with a fix for the compilation/installation failing when providing the argument ââkernel-nameâ to the installer, hopefully it will be merged.