System seems locked while rebooting with Linux 5.2.1 and nvidia drivers 430.34 or 430.26

[s]Problem still persists with Linux 5.2.11 and Nvidia drivers 435.21…

It’s becoming very frustrating, since I’m literally afraid of logging off or rebooting the system. The only way to shutdown the system is by using SysRq…[/s]

I’m taking this back. It seems that 435.21 does work fine. I’ve rebooted 4 times and the reboot times are normal. I’ll have to test some more.

Hi GoofyX,

Thanks for the results, please update with your further tests once done.

Hi dodo.godlike,

Can you please also test with driver 435.21 and share results with us.

Unfortunately, while I thought that version 435.21 of the drivers solved the issue for me, it didn’t. Last night, I clicked to shutdown the system, the graphical screen disappeared, the initial console with the kernel messages appeared and it stayed there. While at that point, I’d use SysRq to shutdown the system (from the point in time that this whole issue started appearing), I gave it a chance and left it there just in case it shuts down by itself. Unfortunately it didn’t and not only that, but after 4 minutes when I tried to issue SysRq, the system was locked. No NumLock switch in the keyboard worked, no SysRq, nothing. I had to forcibly shut it down by the power button.

That’s a very sad experience by the nvidia drivers. I wish I could provide Nvidia with more information that would help in identifying the root cause of the issue.

@amrits, is there a way to provide you with more information?

Tried with drivers 435.21 and kernel 5.2.11, same issues as before.

I’m switching to GPU pass-through these days in order to fully exploit my NVIDIA RTX GPU on a Windows 10 VM, and also to use host with the integrated Intel GPU which allows me to use wayland. I probably won’t be able to provide much more testing for this issue, but I can try.

This issue has been reported on gigabyte motherboards,so I tried on couple of configuration setups but not able to repro issue so far.

Gigabyte X470 AORUS ULTRA GAMING + AMD Ryzen 7 2700X Eight-Core Processor + Ubuntu 18.04.1 LTS + Kernel 5.2.9 + GeForce GTX 1050 Ti + Driver 430.34.

H370 AORUS GAMING + Intel(R) Core™ i7-6700K CPU @ 4.00GHz + Ubuntu 18.04.1 LTS + Kernel 5.2.9 + GeForce GTX 1050 Ti + Driver 430.34.

Installed KDE Plasma Desktop environment; started sddm service and tried rebooting/shutdown from GUI multiple times where it powered down without any issues.

Please let me know if you have been following different steps or configuration to repro issue.

The only difference in our setups is the Xorg server package version. Ubuntu LTS ships with an older version of xorg, as I can see here Ubuntu – Details of package xserver-xorg-core in bionic .

On the other and me and @GoofyX have the latest version which is 1.20.5 (Arch Linux - xorg-server 21.1.4-1 (x86_64)).

@amrits can you please try with a recent xorg-version, since IMO it could be involved in the issue?

Yes, I’m using xorg-server 1.20.5 too.

@amrits, did you compile the kernel (5.2.9) using my configuration?

I will try with xorg-server 1.20.5 and update accordingly.

I compiled kernel downloaded from below website -
https://kernel.org

I asked if you used the kernel configuration I sent you a while back with the kernel you downloaded from kernel.org. If not, then you might try to use that configuration to compile the kernel and test again.

I may have another proof that this problem is not related to the NVIDIA driver, but exclusively to the linux kernel.

As I said in my previous post, I setup GPU pass-through on the machine. So now the host operating systems runs using the Intel iGPU, and the guest Windows 10 VM will use the NVIDIA GPU using pci pass-through.

If I run a session with the host machine without running the guest machine everything works correctly, machine will shutdown without problems.

If during the session I run the Windows VM, then the host will hang on shutdown like it happened before when I was using the NVIDIA GPU with the host OS.

I’ll update this post with logs for the previous shutdown, but I think we should open an issue on the linux bug tracker.

I found a forum thread in the Manjaro Linux community that describes a similar situation: Does installing kde plasma and removing gnome affect performance? - Kde Plasma - Manjaro Linux Forum

Unfortunately journalctl output of a faulty shutdown process doesn’t show anything useful. It seems that systemd does everything right, and the fault happens after journalctl ends logging. I’ll try to enable some extra debug logs for the kernel and report back if I find anything.

EDIT: I just saw on the Gigabyte website (Z390 AORUS ELITE (rev. 1.0) Support | Motherboard - GIGABYTE Global) that a new BIOS update for my MB is out, I’ll try to see if it solves the problem

Have you managed to make some progress with this issue?

I’m running out of options here. :(

Every time I shutdown my system with SysRq, sync+unmount+boot and poweroff by hand. This is unacceptable for me, but I don’t know what my alternatives are. Sell the Nvidia card and fall back to the integrated to the CPU Intel one? Change my kernel config? How?

I recently tried Linux 5.3.0 and the problem is still there. I had to revert back to 5.2.15, since I need Virtualbox and vmware modules to build, which they don’t with the latest stable kernel.

Right now I’m on kernel 5.3.1, and the only difference from before is that reboot works 100% of the time while shutdown always hangs. I still need to update the BIOS, but looking at the changelog I don’t think it will change anything (there are some security fixes).

Hello everyone.
Have same problem.
Hardware:

  1. gigabyte motherboard h370hd3
  2. gigabyte video card with gpu rtx 2060 6gb

Software: Ubuntu 18.04.3 LTS
Nvidia driver: 430

Problem:

  1. computer not shutdown,
  2. hangs on reboot

I try ubuntu 19.04, fedora 28 29 30, several version of drivers all time have this problems.
For my hardware not bios updates, and i use last version of them.

While shutdown my system hangs by Xorg process. It not unload. In htop i can see that processes use nbidia drivers hangs in state “D”, it can not be unload.

I try disable i2c_nvidia_gpu module (opensource) - nothing. (my english bad, sorry)
I try add “acpi=off” parameter in cmdline - now system can reboot and shutdown but xorg and nvidia-smi prosesses hang in state d (but after 1-3 minutes he unload and system shutdown etc…
I can enable persistence mode (nvidia-smi -pm 1) - after that command nvidia-smi run normal and not hang in state D, but xorg anyway hangs at shutdown and reboot by 1-3 minutes.

Write it from mobile. Can add more information if you want.

With nouveau opensource driver my system hangs randomly at work time, but shutdown and reboot fast.

I think that not gpu problem, i think it acpi problem of motherboard but i not sure…

acpi=force

not off

I try disable graphics mode. And boot into multy-uset.target (systemd).

Without graphics load system not hangs at shutdown. nvidia-persistence daemon run without persistence mode (default)

If enable persistence mode and boot into console (multi-uset.target) system hangs by nvidia-persistence daemon - it not unload, after several minuts computer shutdown.

My workaround at this moment:

  1. add acpi=force in /etc/defaults/grub and update-grub
  2. change /etc/systemd/system.conf
    for reduce systemd process kill timer
    DefaultTimeoutStopSec=10s (uncomment and chsnge from 90sec)

Not proper solution but computer shutdown for acceptable time…

If enable persistence mode every boot.
(i change file /lib/systemd/system/nvidia-persistenced.service and delete “–no-persistence-mode” option from ExecStart line)
My software run normal, and not hang system. Xorg load and unload, gdm, gnome-shell, nvidia-smi, etc… But while shutdown nvidia persistence dsemon hang system at several seconds.

I’d like to submit a kernel bug, but I don’t know how to document it. If this is a kernel regression from 5.2, then it must be a kernel bug. Maybe amrits is the one that should open the bug (as Nvidia staff)?

Hi GoofyX,

I have been not able to replicate issue locally so far after matching motherboard and kernel config.
Request you to file a bug regarding kernel regression that will help them to fetch logs from your setup in repro state to debug issue.

Hi All,

Please confirm if issue occurs after uninstalling nvidia driver ?