555 driver hangs on booting 6.5.0-45-generic kernel

Hi everyone,

I followed some related topics, but none of the solutions worked for me.

The problem: hang on boot

It happens after system update and reverting doesn’t fix it.

All hardware tests show no errors.

I’ve discovered an ACPI error reported as a BIOS bug, but it does not seem to be the problem. I’ve tried all older driver versions back to 454 without success - all hang on boot.

Additionally, I can run Ctrl+Alt+F2 and then nvidia-bug-report.sh and nvidia-smi. Both work fine.

Only the gdm fails to load just before the login screen appears. Can’t use any nvidia driver only nouveau

The only meaningful error found sofar is in the nvidia-bug-report.log:

All hardware tests show no errors.

I’ve discovered an ACPI error reported as a BIOS bug, but it does not seem to be the problem. I’ve tried all older driver versions back to 454 without success - all hang on boot.

[    3.490857] hid-multitouch 0018:2808:0102.0001: input,hidraw0: I2C HID v1.00 Mouse [FTCS1000:01 2808:0102] on i2c-FTCS1000:01
[    3.560664] nvidia: loading out-of-tree module taints kernel.
[    3.560672] nvidia: module license 'NVIDIA' taints kernel.
[    3.560673] Disabling lock debugging due to kernel taint

[    3.560675] nvidia: module verification failed: signature and/or required key missing - tainting kernel

[    3.560675] nvidia: module license taints kernel.
[    3.688897] nvidia-nvlink: Nvlink Core is being initialized, major device number 510

[    3.690179] nvidia 0000:01:00.0: enabling device (0100 -> 0103)
[    3.698777] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none

I’ve tried those drivers listed by apt-cache or ubuntu-drivers autoinstall back to 454 - no difference.

What confuses me is the error in nvidia-bug-report.log
module verification failed: signature and/or required key missing - tainting kernel

Here is some technical info:

$ inxi -SG 
System:
  Host: laptopWithLinux Kernel: 6.5.0-45-generic x86_64 bits: 64
    Desktop: GNOME 42.9 Distro: Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Graphics:
  Device-1: Intel Alder Lake-P Integrated Graphics driver: i915 v: kernel
  Device-2: NVIDIA GA106M [GeForce RTX 3060 Mobile / Max-Q] driver: nouveau
    v: kernel
  Device-3: Chicony USB2.0 Camera type: USB driver: uvcvideo
  Display: x11 server: X.Org v: 1.21.1.4 driver: X: loaded: modesetting
    unloaded: fbdev,vesa gpu: i915 resolution: 2560x1600
  OpenGL: renderer: NV176 v: 4.3 Mesa 23.2.1-1ubuntu3.1~22.04.2
nvidia_uvm           4943872  2
nvidia_drm            122880  3
nvidia_modeset       1368064  6 nvidia_drm
nvidia              54575104  62 nvidia_uvm,nvidia_modeset
drm_kms_helper        274432  3 drm_display_helper,nvidia_drm,i915
drm                   765952  12 drm_kms_helper,drm_display_helper,nvidia,drm_buddy,nvidia_drm,i915,ttm
video                  73728  2 i915,nvidia_modeset
Mon Aug  5 21:54:47 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.58.02              Driver Version: 555.58.02      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060 ...    Off |   00000000:01:00.0 Off |                  N/A |
| N/A   47C    P8             14W /   80W |      24MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                        
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2385      G   /usr/lib/xorg/Xorg                              7MiB |
|    0   N/A  N/A      2593      G   /usr/bin/gnome-shell                            3MiB |
+-----------------------------------------------------------------------------------------+

Thanks

Here are some log messages that look like a hint:

/var/log/dmesg.0:[    2.425820] kernel: nvidia: loading out-of-tree module taints kernel.                              │
/var/log/dmesg.0:[    2.425827] kernel: nvidia: module license 'NVIDIA' taints kernel.                                 │
/var/log/dmesg.0:[    2.425828] kernel: Disabling lock debugging due to kernel taint                                   │
/var/log/dmesg.0:[    2.425830] kernel: nvidia: module verification failed: signature and/or required key missing - tai│
nting kernel                                                                                                           │
/var/log/dmesg.0:[    2.425831] kernel: nvidia: module license taints kernel.                                          │
/var/log/dmesg.0:[    4.522197] kernel: nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietar│
y module nvidia, inheriting taint.  

it is from latest 555 driver

these lines appear each time nvidia driver installation attempt since 23 Jul 2024

After hours of investigation, here are the current findings:

[    24.617] (WW) Warning, couldn't open module nvidia
[    24.617] (EE) Failed to load module "nvidia" (module does not exist, 0)
[    24.671] (WW) Falling back to old probe method for fbdev
[    24.671] (EE) open /dev/fb0: Permission denied

also:

$ ls /sys/firmware/
acpi  dmi  efi  memmap

$ sudo mokutil --sb-state
SecureBoot disabled

The question is why Gnome/Xorg can’t find it.

The additional scrutiny added these:

  • SecureBoot is disabled;
  • (U)EFI is used
  • kernel has CONFIG_MODULE_SIG=y, but CONFIG_MODULE_SIG_FORCE is not set
  • NVIDIA module is indeed signed with sha512 which is required by the kernel
  • in system log there is a record: nvidia: module verification failed: signature and/or required key missing

Ok here is the important information I was able to identify from your posts…

Linux Kernel: 6.5.0-45-generic x86_64

Distro: Ubuntu 22.04.4 LTS (Jammy Jellyfish)

Desktop: GNOME 42.9

Display: x11 server

Graphics:
Device-1: Intel Alder Lake-P Integrated Graphics driver: i915 v: kernel
Device-2: NVIDIA GA106M [GeForce RTX 3060 Mobile / Max-Q] driver: nouveau
v: kernel

The problem: hang on boot

It happens after system update and reverting doesn’t fix it.

And, you say that secureboot is disabled, and that is what would require your kernel modules to be signed, otherwise it is normal if they are not.

So, it appears to me, that updates broke your system…which is a common problem.

The error messages you posted are normal(which means they can be ignored), until your next post,
where you found…

EE) Failed to load module "nvidia" (module does not exist, 0)

Jammy Jellyfish is the super stable ubuntu right now, with the most resources available to it, so it’s the ideal system, but it’s also so old, that there are issues with using it, that don’t get fixed, since development is focused on newer versions of ubuntu. That means, we the users, have to do a little manual labor sometimes, which isn’t so bad…

Please add how you installed your system, and how you attempted to install the NVIDIA driver the last time you attempted to install it.

And I would recommend using the Nvidia.run tool to attempt to fix your problem, which has a lot of neat features built into it, presents you with the latest driver, the latest libraries, which will integrate with your system well…

All graphics/gaming related stuff should be updated to the latest versions from their resources directly ideally to begin with, and distributions don’t automate that for us.

The only thing that sucks about ubuntu is that the grub menu isn’t active by default if my memory is correct, which makes doing low level stuff like installing the Nvidia driver easier.

In this post I describe the way I install it, but there are other ways.

Please pay attention to the installers instructions, and attempt to correctly install the driver as I outline in the post. The Nvidia.run file will actually uninstall your old driver, and then, reinstall it correctly.

Also you have two graphics cards on your system, identify in your computer’s BIOS which graphics card is set to be your primary graphics card, to make sure you are actually using nvidia, and later maybe you can set up prime if you’re feeling adventurous

@LinuxGaming81734 thanks a log for the detailed explanations.

As for my way of installation, I used ubuntu-drivers autoinstall and apt install nvidia-driver-XXX - regardless of all efforts the system still hangs. I use ppa graphics-drivers/ppa

Will give it a go to the nvidia.run, but I’m overall sceptical.

From the information gathered so far, it looks like some wrong paths and/or misconfiguration, but it is quite hard to find which one exactly and where.

As the driver is properly signed with sha512 as the kernel expects, this means that the needed key is either missing or misplaced and at this point, verification fails and the system hangs.

I haven’t tried to disable the other card because 535 used to work well for some time. It was an update after 22 Jul that seems to screw the picture but can’t spot if this is true and which update did it.

Here there is a chat following several hours efforts to fix it: Discussion on question by i100: Ubuntu 22.04 LTS with 6.5.0-45-generic kernel and nvidia 555 driver hangs on boot | chat.stackexchange.com

Update: The attempt to install NVIDIA-Linux-x86_64-550.107.02.run failed.

Here is the install log:

nvidia-installer log file '/var/log/nvidia-installer.log'
creation time: Sat Aug 10 20:38:36 2024
installer version: 550.107.02

PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin

nvidia-installer command line:
    ./nvidia-installer

Using: nvidia-installer ncurses v6 user interface
-> Detected 20 CPUs online; setting concurrency level to 20.
-> Scanning the initramfs with lsinitramfs...
-> Executing: /usr/bin/lsinitramfs   -l /boot/initrd.img-6.5.0-45-generic
-> Initramfs scan complete.
-> Installing NVIDIA driver version 550.107.02.
-> An alternate method of installing the NVIDIA driver was detected. (This is usually a package provided by your distributor.) A driver installed via that method may integrate better with your system than a driver installed by nvidia-installer.

Please review the message provided by the maintainer of this alternate installation method and decide how to proceed:

The NVIDIA driver provided by Ubuntu can be installed by launching the "Software & Updates" application, and by selecting the NVIDIA driver from the "Additional Drivers" tab.


(Answer: Continue installation)
-> Performing CC sanity check with CC="/usr/bin/cc".
-> Performing CC check.
WARNING: The Nouveau kernel driver is currently in use by your system.  This driver is incompatible with the NVIDIA driver, and must be disabled before proceeding.
-> Nouveau can usually be disabled by adding files to the modprobe configuration directories and rebuilding the initramfs.

Would you like nvidia-installer to attempt to create these modprobe configuration files for you? (Answer: Yes)
-> One or more modprobe configuration files to disable Nouveau have been written.  You will need to reboot your system and possibly rebuild the initramfs before these changes can take effect.  Note if you later wish to reenable Nouveau, you will need to delete these files: /usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf, /etc/modprobe.d/nvidia-installer-disable-nouveau.conf
-> nvidia-installer is not able to perform some of the sanity checks which detect potential installation problems while Nouveau is loaded. Would you like to continue installation without these sanity checks, or abort installation, confirm that Nouveau has been properly disabled, and attempt installation again later? (Answer: Abort installation)
-> The initramfs will likely need to be rebuilt due to the following condition(s):
  * nvidia-installer attempted to disable Nouveau.

Would you like to rebuild the initramfs? (Answer: Rebuild initramfs)
-> /usr/sbin/update-initramfs requires a file path argument, but none was given.
-> Processing the initramfs:
-> Executing: /usr/sbin/update-initramfs -u  
-> done
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

Yea, that means you did everything correctly, the first time you ran it, it would
have uninstalled the old driver if it found it, and blacklisted nouveau

now you need to reboot back into run level 3 (a terminal)

and run the installer again like this

sudo sh ./NVIDIA-Linux-x86_64-550.107.02.run

Thanks for the feedback, @LinuxGaming81734. I’ll repeat the process again because I had to revert everything to get my laptop back for other important tasks. I’ll keep posting the results.

Here is the result of the next attempt - NVIDIA-Linux-x86_64-555.58.02.run:

-> Your X configuration file has been successfully updated.  Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version: 555.58.02) is now complete.

but

$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

$ lsmod | grep nvidia
(nothing there)
$ cat /usr/local/cuda/version.txt
cat: /usr/local/cuda/version.txt: No such file or directory

and start spinning around.

To recap:

If neuveau is blacklisted - GUI hangs, otherwise nvidia is ignored and neuveau is used

After days of struggle, the battle is finally over with a happy ending. I’ve documented the journey here, including all the steps I took to reach this solution. If anyone needs the information, you’ll find all the commands along with their results. I hope this will be helpful.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.