510.39.01 BETA driver fails to compile if no GPU installed

Hi,
tried to compile the latest beta driver v510.39.01 but it fails every time with:

ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most
       frequently when this kernel module was built against the wrong or
       improperly configured kernel sources, with a version of gcc that differs
       from the one used to build the target kernel, or if another driver, such
       as nouveau, is present and prevents the NVIDIA kernel module from
       obtaining ownership of the NVIDIA device(s), or no NVIDIA device
       installed in this system is supported by this NVIDIA Linux graphics
       driver release.
       
       Please see the log entries 'Kernel module load error' and 'Kernel
       messages' at the end of the file '/var/log/nvidia-installer.log' for
       more information.

As a side information I compile the driver in a Docker container to create a driver package and there is no Nvidia GPU installed.
Is there maybe a way/workaround to not load ‘nvidia.ko’ so that the installation of the files to the destination succeeds?

Could it be possible that this caused because of this change from the changelog:

Updated nvidia.ko to load even if no supported NVIDIA GPUs are present when an NVIDIA NVSwitch device is detected in the system. Previously, nvidia.ko would fail to load into the kernel if no supported GPUs were present.
is causing this?

I’ve also attached the log:
nvidia-installer.log (35.8 KB)

All other drivers are just compiling and installing fine.

AFAIK, shouldn’t matter, the modprobe happens after all of the driver has been installed. Also happens if an other driver is already loaded.

Thank you for the response!
But what can it be else that it now fails?
My best guess was that it has to do with nvidia.ko because it fails to load and after that it simply exits.

Which files are not installed?

Not a single file is installed to the specified directories, no modules/firmware/binaries. The installer exits after the above mentioned error message that is also visible in the nvidia-installer.log (I can also upload a nvidia-install.log from a successful driver installation from one of the earlier drivers if needed uploaded a successful log from driver version 495.46: nvidia-installer.log (32.9 KB)
).

I now also tried it on bare metal without the container and it’s the same as in the container.

Looking at the log, the installer previously installed all files before trying to modprobe the modules so failing to load them had no adverse effect on installation. Now the modprobe happens right after compiling them, before any files are installed.
Very inconvenient, also in other cases. In your case, maybe replacing modprobe/insmod with a stub that always returns success may help.

Just in case:
https://github.com/NVIDIA/nvidia-installer

1 Like

Exactly, but wouldn’t this also affect other people too since I even don’t can successfully compile the new driver on bare metal without a Nvidia GPU installed.
What about package maintainers for various distributions?

Can’t be a command line argument added to the installer so that the modules are not loaded, someone submitted already a PR about 2 years ago for this in the exact same Github that you’ve linked.
Wouldn’t this be an option so the installer officially supports it?

Pretty sure most of us don’t use nvidia-installer as anything but a reference (too many hardcoded paths, auto-detection, etc…), or at least I don’t use it on Gentoo.

1 Like

I know what you mean, but it would be neat to have this feature anyways I think, also the PR is already on Github that would make this possible.
Slackware for example uses the installer (I know Slackware is really old but it should be only an example :) )

I’m also not happy about this as some users have to use the installer, can’t be recommended any repo and if modprobing fails I now can’t even ask them to run nvidia-bug-report.sh to find the cause. It’s not installed in that case. Catch22.
More that this, I see this simply as a bug since you’re using the --kernel-name option which is for building modules for non-running kernels so modprobing is expected to fail.

1 Like

I think it is a bug, shouldn’t this line be:

    if (op->kernel_module_only || op->kernel_name) {

That’s only the check for an already loaded driver but should be OR instead of AND as well.

@generix late but better than never…
I’ve created a PR with a fix for the compilation/installation failing when providing the argument “–kernel-name” to the installer, hopefully it will be merged.