Hi @amrits,
I tried uninstalling nvidia driver. Nouveau kernel module is correctly loaded but xorg wont start since it cannot find the card (probably 2070 RTX is too new to work with nouveau).
Anyway the system is correctly rebooting.
Hi @amrits,
I tried uninstalling nvidia driver. Nouveau kernel module is correctly loaded but xorg wont start since it cannot find the card (probably 2070 RTX is too new to work with nouveau).
Anyway the system is correctly rebooting.
The kernel bug I opened is at 205117 – After Linux 5.2.x, lightdm (or sddm) cannot be terminated using the Nvidia binary
Feel free to contribute any useful comments.
It might help if someone posted a comment in the kernel bug I opened. So far the bug is untouched.
Still no response so far from the kernel bug report…
Well… Until someone from the kernel developers decides to look into the bug, I’m reverting to the internal VGA of the processor. Shutting down the system with the SysRq key for many weeks now has been a joke.
I’m a bit surprised that the Nvidia guys haven’t successfully reproduced the bug, since there are other people experiencing the issue too.
I might sell the VGA too, I haven’t made up my mind yet.
The situation is very depressing…
It’s highly unlikely that kernel developers will help you without you first bisecting the issue and finding the bad commit.
Yeah… Good luck with that. I can’t do that (I wonder if any non-kernel developer is actually able to do it), so the simple solution to remove the Nvidia card, recompile the kernel with Intel support and build any necessary packages works just fine. Maybe the fault is with Gigabyte (some ACPI stuff…? who knows) and my next upgrade will not be based on a Gigabyte motherboard too.
I’m just tired of this issue and I don’t have the patience to deal more with it. I really hope someone is lucky to find the cause.
I’m not a kernel dev, not even a C programmer and I’ve successfully bisected the kernel three times in my life.
There’s nothing technical or difficult about that. The only annoyance of doing that for the kernel is that you’ll have to reboot quite a lot.
In order to do that, you must have a hint where to look. There are literally thousands of commits between Linux 5.1 and 5.2, where should I begin to look?
bisecting is really easy:
[url]https://wiki.gentoo.org/wiki/Kernel_git-bisect[/url]
You’ll just have to know how to build the kernel and create a minimal kernel config that works on your system to save time.
Sometimes I really hate people.
First of all there tens of thousands of commits between 5.1 and 5.2
Second of all, git bisect does not require you to test all of them one by one and reboot tens of thousands of times.
Thirdly, there are plenty of manuals which state that git bisect reduces the number of required reboots by several thousands, i.e. you might need to reboot maybe 20 to 30 times.
Fourthly, kernel devs don’t have your hardware, so there’s no chance they can do that instead of you.
Yeah…
Forgive me for not knowing in advance how to bisect. It might have helped if you provided this information before changing my setup. Right now, I’m pretty satisfied with my new setup and I don’t think I’ll spend more time with it. On the other hand, there are a few other people with similar issue, it’s up to them to follow the bisect procedure themselves.
Just for the record, I removed the Nvidia card from my system, compiled the kernel with support for the internal gpu of the processor and any necessary accompanying packages and I’m really happy with how the system works right now. One more proprietary driver off. Furthermore, power consumption and the noise level have dropped. And from the time I had the Nvidia card on board, I really have to experience any practical advantage in my current working environment over the Intel GPU.
For people trying to bisect the issue: I noticed that the problem exists on kernel 5.0.13 with my setup, but it’s a little less frequent. Since current LTS kernel (4.19) doesn’t give problems, I think we should consider 4.19 as the 100% working tag and start bisecting between 4.19 and 5.1.
Also I noticed that the current nvidia driver 435.21 won’t compile while bisecting kernels older than 5.0, so this complicates things a bit.
@GoofyX I don’t think you should blame nvidia, it’s well known that they have little interest in supporting linux and the first thing you read online about nvidia+linux experience is to avoid nvidia and go for AMD GPU. So basically it’s our fault to have chosen a nvidia GPU for a linux workstation. The only thing we can do is try to find and fix the issue and move on, as the linux community is used to do in these cases.
EDIT: fixed wrong kernel version
EDIT2: added report about nvidia kernel module not compiling
I have reproduced this issue on an Arch Linux system.
Hardware:
Gigabyte Aorus Elite z390
Intel i5 9400
Gigabyte GT 1030
Software:
linux 5.3.7-arch1-1
nvidia 435.21-13
I did a reboot from virtual terminal and took a picture of the error message that shows up after systemctl hangs on stop jobs for my display manager: http://xaptronic.com/error.html (please let me know if there is a better method to share this, I didn’t see anywhere to attach a file here)
I then “downgraded” to linux-lts ( 4.19.80-1 ) from the Arch repos and so far have not experienced the issue.
@dodo.godlike: I agree best thing to do is share specs and info and try and solve the problem. Are you or do you know where discussion is ongoing about bisecting kernel? Seems like this is a kernel regression but also has something to do with nvidia driver. It’s funny you say that nvidia isn’t so supportive of linux - I was under the impression that they had the best support since they were publishing open source drivers…
Another Gigabyte motherboard. Somehow, the issue must be related with the motherboard vendor.
No, actually Nvidia has nothing to do with the open source Nouveau driver. I was also under the impression that the binary Nvidia drivers were of higher quality with respect to the corresponding proprietary AMD drivers for the ATI cards, thus my selection of Nvidia hardware.
Generally, I never had any big issues with the binary Nvidia drivers in the past. The only issues I had is that sometimes a newer Nvidia driver would be required to support newer major kernels, but after a couple or weeks or so, either a patch would surface that would compile the interface with the kernel or a newer version of the drivers would come out that would support the newer kernel. The only serious issue is the one discussed here.
@xaptronic AFAIK discussion is going on here and on the kernel bug tracker [1]. I’m trying to bisect the issue but it’s a bit annoying since latest nvidia driver release is compatible only with 5.0+ kernels, and when the bisect process jumps to an older commit the nvidia driver won’t compile anymore (i’m using the nvidia-dkms package).
This could have been way easier if the nvidia driver was an in-tree kernel module (meaning a module compiled together with the linux kernel source.
Regarding the last part, I’m pretty sure nvidia driver is closed source. Maybe you are referring to the in-tree module nouveau, which anyway is not developed by nvidia and it doesn’t work very well.
Also linux support by nvidia has never been that great, read for example this post [2] which is the latest of many rants you can find online about the situation.
[1] 205117 – After Linux 5.2.x, lightdm (or sddm) cannot be terminated using the Nvidia binary
[2] Nvidia sucks and I'm sick of it
I see same.
And as you can read messages at photo: “acpi_device_remove …”
Problem can be in motherboard.
Try write to Gigabyte Support, and attache this thread.
May be thay reply you. Support in my country say me “we only support windows” thats all.
Ah - sorry I misspoke. Not that nvidia published open source drivers, but that they published drivers at all. IIRC, years ago people said nvidia cards were the best to use because you didn’t have to use Nouveau because nvidia made linux drivers. Perhaps now that has changed, I haven’t had a custom build in a while!
Is the information from my screen shot [1] useful at all? I’ll add this to the bug report. I will try to make some time to do bisection as well. I would if you could get a compile to work using nvidia-lts… the package manager says this is version 435.21-6 and the lts kernel is 4.19…
I put the linux-git bisect on hold, and made some more test by just installing old packages from the archlinux archive [1]. This is the result:
NVIDIA DRIVER VERSION | LINUX VERSION | RESULT
dkms-435.21-9 | 5.0+ | bad
dkms-418.43-4 | 4.20.3 | bad
dkms-418.43-4 | 4.20.1 | bad
dkms-418.43-4 | 4.20 | bad
dkms-418.43-4 | 4.19.12 | good
Note: archlinux archive goes back until 4.20.1, so you need to manually compile 4.20 and 4.19.*
If you can confirm this I think we can reduce the bisect range to 4.19 <-> 4.20.
Some people in this thread claim only Linux 5.3+ is affected, your results indicate that Linux 4.20 is buggy as well.
Are you sure y’all discuss the same bug? ;-)