During logout (this happens once in 3 times or so)
The system get seriously frozen.
nvidia-bug-report.sh gets stuck
XID 44 seems to be the culprit
ssh reboot gets stuck while waiting on a frozen nvidia process
top reports: 100% CPU for irq/152-nvidia
Since I had to kill nvidia-bug-report.sh for being stuck, things seem to be missing.
I have added kern.log’s where long nvidia-related call traces can be seen (please ignore my USB complains due to SD reader being naughty). nvidia-bug-report.log.gz (76 KB) kern.tar.gz (393 KB)
Indeed, I always had those under different occasions. In my investigations I replaced motherboard+cpu and nothing has changed. I should add that I received this card back last week after it got refurbished (still don’t know what was fixed) as XID 79s were unbearable (thus tried changing motherboard = no luck, sent to warranty). At least 79s are gone now.
OK, I have learned from the service that I did receive a new board (bundled in an old box; I see new serials), so this is drivers 100% for sure.
Just for the record, while waiting for a new board, I used integrated intel for 2 months. I had ZERO problems, none.
I hope someone can work on this issue, otherwise dumping NVIDIA is the only solution. So far, 1 year of driver problems, luckily new Vega/Navi should be out in a year from now.
Change was the following: MSI (Z170) + 6700 → ASUS (Z370) + 8700K
Here you go, two hot out of the oven crashes with and without xorg.conf.
Interestingly, I tried logging out first - went ok (tried just once for now), tried second time - crashed. However, there was this peculiarity about the second time:
0. Login in
Can you reproduce it also by switching to VT and back instead of logout?
When replacing the board, did you also replace the memory? If not, test it by pulling all modules but one, if it still crashes, try the next module alone. Don’t use memtest86 or the like, those things are useless with modern memory, will only report errors if the mem is really, really broken.
OK, I will do these tests and come back. Although I am not buying the RAM as the cause: with pure integrated intel gfx the system was the first time ever truly stable. It must be either GPU FW or the driver.
I did notice, however, that the crashes are not 100% reproducible. I tried some more pure login&outs, and they were fine, also repeated the game sequence and the logout survived. Will report with more findings later.