NVIDIA drivers failing to load GPU on Ubuntu 22.04 & 23.04

Hello,
Recently I got into a problem, which is produced on 4 different setups:

  1. Laptop with intel and nvidia RTC 2070 gpu (limited BIOS options)
  2. Desktop with Intel and nvidia RTX 2080Ti (BIOS can disable Intel)
  3. 2x Desktops with nvidia RTX 3080 (BIOS can disable Intel)

I am not sure what is going on, but what I am sure about is the installation steps and the producers are the same used on several setups before.

Also to mention, the environment on the laptop was working until the update was triggered.

The errors seemed the be the same on all the machines, and the behavior of the hardware looked similar. When the GPU fan start on the boot time and just stops when the log says “failed to register device”

nvidia-smi reports there is no device detected, when the dmesg reports i2c failure, as well as failed to allocate nvkmskapidevice

I tested on Ubuntu 22.04, 23.04, and LinuxMint 21.1
Kernels 5.15, and above
Drivers 515 and above up to 530

and after I gave up, I found the driver 525, 515 are working on the laptop, only if the drivers are installed and booted without the AC adapter being plugged in (in another words, only on battery)

Can anyone help me to understand what is going on?

This is the bug report of my laptop when it is working on the battery only, ubuntu 23.04 and driver 525.105.17
Processing: nvidia-bug-report.log.gz…

on aside:
If the AC Adapter is plugged in after the boot and the driver is working, then the GPU hangs up, the screen freezes but the audio keeps going.
with ctrl + alt + T I open new terminal and reboot, the PC reboots but the GPU failed to load until I reboot with the AC adapter plugged out.

The upload is stuck, please unzip and upload again.

I am afraid I cannot do that.
After several and desperate debugging, I have decided to install windows along side Linux to confirm the hardware is working and not faulty.
After installing Windows 10 pro (the recovery of the original OS of my laptop). I installed Nvidia driver, then after the successful installation I selected Nvidia card as primary. Then rebooted and since then the laptop doesn’t boot, and does not even show the BIOS, just black screen without even showing the 1st screen with the maker logo.

Now it is just a dead PC!!

Any idea!?

Usually, I’d start by detaching AC and removing battery to discharge and reset the notebook’s mainboard. Might be difficult, you could check if your model has either a reset button for this on the bottom side. Still under warranty?

sorry, it was not an easy job. But it has been done!

After removing the mainboard battery and put it back, I can boot back to the system.

this is the log when intel was set as a primary card using prime-select:
intel-selected.log (1.4 MB)

and this when NVIDIA is set as a primary card using prime-select:
nvidia-selected.log (1.7 MB)

Thank you in advance!

I can confirm the GPU works under Linux when the laptop is booted on Battery only and freezes if I connect the AC adapter after that. If the AC adapter is connected before the boot, there is no freeze, but the driver failed to register device.

on the other hand, on windows, the driver and Nvidia control panel are not working at all. On both battery and AC adapter. However, the driver is installed successfully, and I can see the registers and memory correct.

Can I assume the device is OK, since it is working already on Linux when it is booted on battery?
This behavior just started when I updated Linux kernel and the GPU driver on Ubuntu 22.04.

I am not familiar how the driver updates works and what is actually installing, but at any chance is it updating the BIOS or the firmware?

after some trails on Windows, I got the same behavior as Linux after installing another driver version:

  • Driver Date:12/5/2022
  • Driver Version: 31.0.15.2756

So now the Hardware works only if the OS (Windows - Linux) booted on battery, and crashes if the AC Adapter is connected after the boot.

On Linux the system just freezes, However on Windows it gives a blue screen and auto restarting.

I changed the AC adapter with new one, and still the same! What I am missing here?