[s]Problem still persists with Linux 5.2.11 and Nvidia drivers 435.21…
It’s becoming very frustrating, since I’m literally afraid of logging off or rebooting the system. The only way to shutdown the system is by using SysRq…[/s]
I’m taking this back. It seems that 435.21 does work fine. I’ve rebooted 4 times and the reboot times are normal. I’ll have to test some more.
Unfortunately, while I thought that version 435.21 of the drivers solved the issue for me, it didn’t. Last night, I clicked to shutdown the system, the graphical screen disappeared, the initial console with the kernel messages appeared and it stayed there. While at that point, I’d use SysRq to shutdown the system (from the point in time that this whole issue started appearing), I gave it a chance and left it there just in case it shuts down by itself. Unfortunately it didn’t and not only that, but after 4 minutes when I tried to issue SysRq, the system was locked. No NumLock switch in the keyboard worked, no SysRq, nothing. I had to forcibly shut it down by the power button.
That’s a very sad experience by the nvidia drivers. I wish I could provide Nvidia with more information that would help in identifying the root cause of the issue.
@amrits, is there a way to provide you with more information?
Tried with drivers 435.21 and kernel 5.2.11, same issues as before.
I’m switching to GPU pass-through these days in order to fully exploit my NVIDIA RTX GPU on a Windows 10 VM, and also to use host with the integrated Intel GPU which allows me to use wayland. I probably won’t be able to provide much more testing for this issue, but I can try.
Installed KDE Plasma Desktop environment; started sddm service and tried rebooting/shutdown from GUI multiple times where it powered down without any issues.
Please let me know if you have been following different steps or configuration to repro issue.
I asked if you used the kernel configuration I sent you a while back with the kernel you downloaded from kernel.org. If not, then you might try to use that configuration to compile the kernel and test again.
I may have another proof that this problem is not related to the NVIDIA driver, but exclusively to the linux kernel.
As I said in my previous post, I setup GPU pass-through on the machine. So now the host operating systems runs using the Intel iGPU, and the guest Windows 10 VM will use the NVIDIA GPU using pci pass-through.
If I run a session with the host machine without running the guest machine everything works correctly, machine will shutdown without problems.
If during the session I run the Windows VM, then the host will hang on shutdown like it happened before when I was using the NVIDIA GPU with the host OS.
I’ll update this post with logs for the previous shutdown, but I think we should open an issue on the linux bug tracker.
Unfortunately journalctl output of a faulty shutdown process doesn’t show anything useful. It seems that systemd does everything right, and the fault happens after journalctl ends logging. I’ll try to enable some extra debug logs for the kernel and report back if I find anything.
Have you managed to make some progress with this issue?
I’m running out of options here. :(
Every time I shutdown my system with SysRq, sync+unmount+boot and poweroff by hand. This is unacceptable for me, but I don’t know what my alternatives are. Sell the Nvidia card and fall back to the integrated to the CPU Intel one? Change my kernel config? How?
I recently tried Linux 5.3.0 and the problem is still there. I had to revert back to 5.2.15, since I need Virtualbox and vmware modules to build, which they don’t with the latest stable kernel.
Right now I’m on kernel 5.3.1, and the only difference from before is that reboot works 100% of the time while shutdown always hangs. I still need to update the BIOS, but looking at the changelog I don’t think it will change anything (there are some security fixes).
I try ubuntu 19.04, fedora 28 29 30, several version of drivers all time have this problems.
For my hardware not bios updates, and i use last version of them.
While shutdown my system hangs by Xorg process. It not unload. In htop i can see that processes use nbidia drivers hangs in state “D”, it can not be unload.
I try disable i2c_nvidia_gpu module (opensource) - nothing. (my english bad, sorry)
I try add “acpi=off” parameter in cmdline - now system can reboot and shutdown but xorg and nvidia-smi prosesses hang in state d (but after 1-3 minutes he unload and system shutdown etc…
I can enable persistence mode (nvidia-smi -pm 1) - after that command nvidia-smi run normal and not hang in state D, but xorg anyway hangs at shutdown and reboot by 1-3 minutes.
Write it from mobile. Can add more information if you want.
With nouveau opensource driver my system hangs randomly at work time, but shutdown and reboot fast.
I think that not gpu problem, i think it acpi problem of motherboard but i not sure…
I try disable graphics mode. And boot into multy-uset.target (systemd).
Without graphics load system not hangs at shutdown. nvidia-persistence daemon run without persistence mode (default)
If enable persistence mode and boot into console (multi-uset.target) system hangs by nvidia-persistence daemon - it not unload, after several minuts computer shutdown.
My workaround at this moment:
add acpi=force in /etc/defaults/grub and update-grub
change /etc/systemd/system.conf
for reduce systemd process kill timer
DefaultTimeoutStopSec=10s (uncomment and chsnge from 90sec)
Not proper solution but computer shutdown for acceptable time…
If enable persistence mode every boot.
(i change file /lib/systemd/system/nvidia-persistenced.service and delete “–no-persistence-mode” option from ExecStart line)
My software run normal, and not hang system. Xorg load and unload, gdm, gnome-shell, nvidia-smi, etc… But while shutdown nvidia persistence dsemon hang system at several seconds.
I’d like to submit a kernel bug, but I don’t know how to document it. If this is a kernel regression from 5.2, then it must be a kernel bug. Maybe amrits is the one that should open the bug (as Nvidia staff)?
I have been not able to replicate issue locally so far after matching motherboard and kernel config.
Request you to file a bug regarding kernel regression that will help them to fetch logs from your setup in repro state to debug issue.
Hi All,
Please confirm if issue occurs after uninstalling nvidia driver ?