Nvidia driver not loaded after reboot, but loaded after shutdown and boot

Hi.

I am on fedora server 39 with gnome shell 45.3 with X
My driver version is 545.29.06

I installed nvidia driver from rpmfusion according to Howto/NVIDIA - RPM Fusion
But after successfully installing, my nvidia driver is gone every time I reboot the PC with command reboot . I tried lsmod | grep nvidia from ssh when the screen have no output, the result is empty.

But if I force shutdown the PC and then restart after a few mins, the driver is magically back!
lsmod | grep nvidia

nvidia_drm            118784  12
nvidia_modeset       1585152  10 nvidia_drm
nvidia_uvm           3522560  8
nvidia              62394368  227 nvidia_uvm,nvidia_modeset
video                  77824  2 amdgpu,nvidia_modeset

I have also tried akmods --rebuild when the driver is not loaded, but after that, after running systemctl restart display-manager, the screen is still dark and lsmod | grep nvidia is still empty

Any idea why?

Here are my kernel command line parameters

BOOT_IMAGE=(hd2,gpt2)/vmlinuz-6.6.9-200.fc39.x86_64 root=/dev/mapper/fedora-root ro rd.driver.blacklist=nouveau modprobe.blacklist=nouveau rd.lvm.lv=fedora/root selinux=0 initcall_blacklist=simpledrm_platform_driver_init

Here is my nvidia-bug-report, it seemed to hang, but I still upload it anyway
nvidia-bug-report.log.gz (321.0 KB)

Hi there @miyuki4737 and welcome to the NVIDIA developer forums.

Looking at the report log it brings up the question, do you even have an NVIDIA GPU in your system?

/sbin/lspci -nn should list the NVIDIA GPU as a VGA compatible controller but the only one in your log is

10:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Raphael [1002:164e] (rev c1)

Your Hardware does seem to recognize any NVIDIA GPU.

Hi, I can confirm similar things happened to my server. Reboot will cause the loss of whole nvidia device, which means you can’t see it in lspci output. Shutdown and start will bring it back, which is pretty strange to me.

I am a new owner of RTX 4070 and i am utilizing it on Fedora 40 linux desktop. I downloaded nvidia cuda library version 12.6 and driver 560 and downloaded kernel modules from github and built them and signed and installed them and everything started working.

This was on linux 6.10.9-200.fc40.x86_64, then i made the mistake of updating the linux and the version changed to 6.10.10-200.fc40.x86_64. And now nvidia modules disappeared. I assumed it was the linux update that caused the problem so i purged all the nvidia drivers and modules and everything and repeated a 1 hour effort of reinstalling and rebuilding everything and lo and behold everything started working again.

Now i shutdown and reboot but the nvidia modules are no longer loaded. I would hate to repeat all the steps after every single reboot.

this is what lspci -nn shows
lspci -nn |grep -i nvidia
0000:01:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD104 [GeForce RTX 4070] [10de:2786] (rev a1)

The kernel headers that are installed are at version:
kernel-headers-6.10.3-200.fc40.x86_64

and they do not match the current uname -r
6.10.10-200.fc40.x86_64

However new kernel headers were not generated at 6.10.10-200.fc40.x86_64. Reinstalling kernel headers does nothing, it just resintalles the version of kernel header before.

What could be the problem here? updating the kernel did cause the modules to disappear and then after reinstalling everything worked but now reboot is causing the modules to disappear.

Please help, i am kind of clueless at this point.

Another shutdown and reboot appears to have brought back the kernel modules. Well i have no clue what happened. But if someone does, please provide an explanation so that it can be of service to others that run into the same issue. I have provide fair bit of details above so that others can match up with the use case at hand

While the drivers are loaded, there still appears to be something wrong, they are not working and nvidia-smi just hangs, so does nvtop … Shudown and reboot, hangs the reboot of the PC itself … i am guessing something is wrong with the kernel modules… What should i do? Should i go into BIOS setup f2 durng boot and somehow prevent loading the kernel modules?

It finally does complete reboot and now the kernel modules are no longer loaded again:
modprobe -v nvidia
insmod /lib/modules/6.10.10-200.fc40.x86_64/extra/nvidia.ko.xz NVreg_DynamicPowerManagement=0x02 NVreg_EnableS0ixPowerManagement=1 NVreg_PreserveVideoMemoryAllocations=1
modprobe: ERROR: could not insert ‘nvidia’: Key was rejected by service

Hi @cun23, welcome to the NVIDIA developer forums.

From your description it is really hard to say what might go wrong. But I think you are on the right track regarding mismatching kernel modules.

I am not enough of a Fedora Linux expert to help with debugging what might cause the kernel modules to be unloaded.

One suggestion would be to go to the Linux category here on the server, follow the posting instructions and create a new post over there.

Thanks!

Thanks for you response. For the benefit of the community. This is what i had to do to make things work.

use dkms to uninstall existing kernel modules.
use dkms to install the kernel modules.
Change the location of the public and private keys for signing the kernel models.

The issue has arisen because the kernel modules were not being signed and since i originally downloaded the kernel module source code from github and built from source code and installed, dkms may not have been able to keep it updated with kernel updates perhaps.

So do not build kernel modules from source code
utilize dkms
use the right signing keys that have been enrolled with mok utilities. If this step is not done then the modules will still not be loaded.

hope this is of help to others.

does dkms keep modules updated with kernel updates? If so this should help much better