Nvidia linux driver load error can't change power state from D3cold to D0

have an HP ZBook Fury G8 17,3" FHD IPS i7-11800H 128GB/2TB SSD T1200 4A698EA.
On second SSD there is Windows 10 Pro installed.
Windows can use my T1200 nvidia card without problems.

On a fresh instlled xubuntu 21.10 5.13.0-27-generic
with latest update i have the problem, that nvidia-driver did’nt load.

sudo ubuntu-drivers list
nvidia-driver-470, (kernel modules provided by linux-modules-nvidia-470-generic)
nvidia-driver-495, (kernel modules provided by linux-modules-nvidia-495-generic)
nvidia-driver-460-server, (kernel modules provided by linux-modules-nvidia-460-server-generic)
nvidia-driver-470-server, (kernel modules provided by linux-modules-nvidia-470-server-generic)

sudo ubuntu-drivers autoinstall

or
sudo apt install nvidia-driver-470
installs the software without error.
reboot
leads to an 100% core load with a proces that tried modeset with no success.
At the ent nvidia driver was’nt load.

sudo dmesg | grep -i D3cold

give me

"nvidia 0000:01:00.0: can’t change power state from D3cold to D0 (config space inaccessible) vgaarb: changed VGA decodes: olddecodes=none,decodes=none:owns=none nvidia probe of 0000:01:00.0 failed with error -1 nvidia-nvlink: Unregistered the Nvlink Core, major device number 507 Nvlink Core is being initialized, major device number 507
"<<
nfo:

— lspci -v | grep VGA —
00:02.0 VGA compatible controller: Intel Corporation TigerLake-H GT1 [UHD Graphics] (rev 01) (prog-if 00 [VGA controller])
01:00.0 VGA compatible controller: NVIDIA Corporation Device 1fbc (rev ff) (prog-if ff)

— inxi -G —
Graphics: Device-1: Intel TigerLake-H GT1 [UHD Graphics] driver: i915 v: kernel
Device-2: NVIDIA driver: N/A
Display: x11 server: X.Org 1.20.13 driver: loaded: modesetting unloaded: fbdev,vesa resolution: 1920x1080~60Hz
OpenGL: renderer: Mesa Intel UHD Graphics (TGL GT1) v: 4.6 Mesa 21.2.2

— dkms status —

— prime-select query —
on-demand

— glxinfo|egrep “OpenGL vendor|OpenGL renderer*” —
OpenGL vendor string: Intel
OpenGL renderer string: Mesa Intel(R) UHD Graphics (TGL GT1)

Welcher Kernel-Modul bedient VGA ?
— lspci -nnk | grep -i vga -A3 | grep ‘in use’
Kernel driver in use: i915
Kernel driver in use: snd_hda_intel

n my opinion there are a new way how powermanagement works.
In BIOS i can only choose Hybrid Grafic or UHD, no Power-Mode.
I choosed Hybrid Grafic, because in the other case nvidia controller is not visible.
That new powermanagement did’nt allow to power on some devices (not only
grafics) from within a kernel module.
There must be an other way to do it (in Windows it works).
I think the nvidia-driver did’nt recognize the situation.
nvidia-bug-report.log.gz (39.9 KB)
test_wich_grafic_adapter_are_present.log.gz (17.8 KB)

The Ubuntu kernel 5.13.0-27 has a bug so that the gpu can’t be turned on. You could try using the liquorix kernel ppa.

Thanks for Your assistance.

The 5.16.0-4.2-liquorix-amd64 kernel loads as
expected.

Unfortunately the nvidia-495 didn’t.
The behavior is the same as before.
Here is a new nvidia-bug-report
nvidia-bug-report.log.gz (31.9 KB)
.
I hope, you have some more ideas ?
Eventualy there is a newer drive version ?
Mine is fom ubuntu 21.10 repository.

A different driver doesn’t help as it’s a kernel bug. Please check if you can boot to a 5.11 kernel in grub menu.

There is no 5.11 kernel on my machine.
Is it possible to install one in my xubuntu 21.10 release ?

You could try one from the mainline repo
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.11.22/
You will need 4 packages
linux-headers-XX
linux-headers-XX-generic
linux-image-unsigned-XX-generic
linux-modules-XX-generic

Otherwise, use 20.04 and choose the GA (5.4) kernel on install.

Kernel 5.11 comes up but he will never like my mouse-pad.
But that’s essential for me.
I think ist’s because of the relatively new hardware in my laptop.
So i haven’t try the nvidia driver setup.

Here is my last message:

With kernel 5.15.0-18.1-liquorix-amd64 i am able to install nvidia-495 driver (but with 5.16.0-4.2-liquorix-amd64 I’m not!)

But it was tricky.
First I tested the 5.15.0-18 installation in a live-stick at my laptop. This works.
Next i have done the installation on the laptop itself , kernel comes up, nvidia-495 driver were installable but after reboot initrd hangs without message. It’s not the D3cold problem, i think in the kernel installation goes something wrong. I was prompted to install a additional library and in /lib/modules/(uname -r) i’m missing a subdirectory “updates” with dkms modules that exists on the live-stick .
So i think initrd wasn’t build right.

Now I have no idea, what to do, but i’m comfort in 22.04 i will be work.

P.S:

Without nvidia-495 that means with noveau i can start the system normaly !

most likely you were missing the kernel headers, which are needed to build the nvidia driver.
The packages are called linux-headers-$(uname -r).

dpkg -l | grep 5.15.0

ii linux-headers-5.15.0-18.1-liquorix-amd64 5.15-22ubuntu1~impish amd64 Header files for Linux 5.15.0-18.1-liquorix-amd64

ii linux-image-5.15.0-18.1-liquorix-amd64 5.15-22ubuntu1~impish amd64 Linux 5.15 for 64-bit PCs

Please show the output of:
dkms status
dpkg -l |grep nvidia

and create a new bug report.

As i wrote before, i can’t start the system if i had installed nvidia-495 because initrd hang without message after reboot !
Without nvidia-495 system boots as aspected.

dpkg_log.log.gz (569 Bytes)
nvidia-bug-report.log.gz (35.2 KB)

This is spamming dmesg, making it useless:

[ 2656.550206] x86/split lock detection: #AC: vmx-svga/4245 took a split_lock trap at address: 0x5621c78feb57
[ 2656.620785] x86/split lock detection: #AC: main-svga/4248 took a split_lock trap at address: 0x55a2ba8a2e3d

Are you running KVM, VmWare, or VirtualBox?

I have VmwareWorkstationPlayer installed.

Any way you could disable it for testing?
Also remove for testing these kernel parameters: “quiet splash” - edit /etc/default/grub - then run sudo update-grub - so you could maybe see where the system hangs if trying to boot with the nvidia driver installed.

If i had installed nvidia-xxx, ths system hangs while booting after loading the kernel. I’ve done the two steps in grub2 manually:

– linux /boot/vmlinuz-5.15.0-18.1-liquorix-amd64 root=UUID=45556e4f-b4f7-4a95-8015-533f158b5aa7 ro usbcore.quirks=0bda:8156:k $vt_handoff

(usbcore is for disabling power-management r8156 2,5gbit usb ethernet)

with success

– initrd /boot/initrd.img-5.15.0-18.1-liquorix-amd64

which hangs without messages.

The problem is inItrd hangs. On my live-stick, where 5.15 comes up properly, i have installed vmware too. So i can’t see a reason to disable i.e uninstall vmware. I use vmware to go to internet and without it i can not write emails an so.

Ok, let us please try this:

Add this kernel boot parameter split_lock_detect=off, to get rid of this message flood.
https://lwn.net/Articles/810317/

May give some explanation.

Are you able to enter a virtual console at that point? - Press Ctrl+Alt+F3 (or F1-F8).
If yes, do journalctl -b0 >journal.txt and post that file here.

If not, reboot into a state where you can log in and run journalctl -b1 >journal.txt and post that file here.

I will do so tomorrow in the morning.

Thanks for your previous assistance !

In the case i have to reboot, i must boot an older kernel, remove nvidia-xx and than i can boot into 5.15.
There are some more messages from journalctl.

Good morning.

Bad news.

I boot into 5.15.

As if i try to do your recommendation, i find that “additional drivers”
(the app from the settings menu) i’ve used to install nvidia-xxx or
remove it shows only: “use manually installed driver” !!!
No other posibility. Behavior has changed over night.

I haven’t abloluty nothing installed since yesterday.
The only one modification i did was to switch grafic bios mode from “Hybrid”
(the only were the nvidia card is visible) to “UMA” where only the
intel device is visible.
I did it to reduce heating by switch off nvidia card.

I did so some times before with no problem.

Today I swiched it back to hybrid mode and the system boots 5.15 as
aspected with nvidia card on.

I want to install nvidia-xxx to prepare our test case
but than the surprise: “use manually installed driver”

So i boot the old 5.13.0-28 kernel and check “additional drivers”.
It shows all posibilities (495 470 …-server, noveau).
I am confused !

Next i did “sudo update-initramfs -u” on the old system with the
intention to do the same in 5.15 and see what happend.
Before i do the second step i checked /boot directory inside the old
system. I don’t know why.

But I found the next surprise:
Doing update-initramfs on the 5.13 old system
generates a new initrd.img-5.15.0-18.1-liquorix-amd64 !!!???
Old initrd initrd.img-5.13.0-28-generic was unchaged.

Maybe here is the reason why initrd hangs →
Every time, initrd hangs i must boot the old kernel and
remove nvidia-xxx by using “additional drivers” and there
switch to noveau.
(I think this does a update-initramfs.)
So the next boot into 5.15. uses a initrd-file generate from the old
5.13 system. Despite of that, 5.15. boot with it.

Maybe this behavior has to do with a crash i’ve got first time
i tryed to install 5.15 because of a missing library:

linux-headers-5.15.0-18.1-liquorix-amd64_5.15-22ubuntu1~impish_amd64.deb
linux-headers-liquorix-amd64_5.15-22ubuntu1~impish_amd64.deb
linux-image-5.15.0-18.1-liquorix-amd64_5.15-22ubuntu1~impish_amd64.deb
linux-image-liquorix-amd64_5.15-22ubuntu1~impish_amd64.deb
sudo dpkg -i *.deb
sudo update-grub

I could install the missing library: sudo apt install libelf-dev
(from the running old system) an next try to install
kernel 5.15 works fine.

Here is what we see after a boot into 5.15 in /boot:

Jan 13 18:13 config-5.13.0-28-generic
Jan 29 20:09 config-5.15.0-18.1-liquorix-amd64
Jan 1 1970 efi/
Feb 3 06:47 grub/
Feb 1 20:41 initrd.img → initrd.img-5.13.0-28-generic
Feb 1 21:02 initrd.img-5.13.0-28-generic
Feb 3 07:19 initrd.img-5.15.0-18.1-liquorix-amd64
Feb 1 20:41 initrd.img.old → initrd.img-5.15.0-18.1-liquorix-amd64
Okt 7 12:20 memtest86+.bin
Okt 7 12:20 memtest86+.elf
Okt 7 12:20 memtest86+_multiboot.bin
Jan 13 18:13 System.map-5.13.0-28-generic
Jan 29 20:09 System.map-5.15.0-18.1-liquorix-amd64
Feb 1 20:41 vmlinuz → vmlinuz-5.13.0-28-generic
Jan 13 18:10 vmlinuz-5.13.0-28-generic
Jan 29 20:09 vmlinuz-5.15.0-18.1-liquorix-amd64
Feb 1 20:41 vmlinuz.old → vmlinuz-5.15.0-18.1-liquorix-amd64

initrd.img and vmlinuz point to the old kernel files, but i think this
doesn’t matter because grub.cfg doesn’t contain it. grub.cfg uses always
the versioned filenames.

conclusion

  1. update-initramfs in the old 5.13 should be corrected to use the right file.
    But how ?

  2. “additional drivers” in 5.15 should give me all posibilities back !
    But how to get this ?

I think, before this, we shouldn’t do any other digging.