my machine often (~15% of the times) locks up with blank black screen a few seconds (1-3s) after exiting Team Fortress 2 on linux. The lockup is hard (can only use hard reset to get rid of it), the machine does not react to pings while it happens.
It rarely also lock up even during the game menu, but it does not do so when in the gameplay itself, only when in menu screens. It does not lock up during general PC usage, and I didn’t try any other games much, but it only seems to do so when exiting TF2 (never had a lockup when running a different openGL application).
Machine is on current arch linux (kernel 4.0.7-2-ARCH, nvidia driver 352.21-2) but I have this problem for about half a year already. GPU is GTX 770, core i5 cpu 2550K.
I’ve tried monitoring the GPU temperature and it does not go above 70 Degrees Celsius.
I set up a remote dmesg logging via netconsole to try to catch the problem, and this is what it spilled:
[19369.835521] NVRM: GPU at PCI:0000:01:00: GPU-a74182f0-9cfb-d057-7254-b7a624fe7d97 [19369.835545] NVRM: Xid (PCI:0000:01:00): 62, b2dc(5204) 00000000 00000000 [19378.048030] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
[ 4107.394888] NVRM: Xid (PCI:0000:01:00): 62, b2dc(5204) 00000000 00000000
Catched another case, this time it was in TF2 main menu, after selecting a game (right when the loading screen should appear, instead I got the blank screen). Left it running for as long as needed, eventually the machine self rebooted.
[20000.832886] NVRM: GPU at PCI:0000:01:00: GPU-a74182f0-9cfb-d057-7254-b7a624fe7d97 [20000.832899] NVRM: Xid (PCI:0000:01:00): 62, b2dc(5204) 00000000 00000000 [20013.503448] hrtimer: interrupt took 94838960 ns
I’ve generated the nvidia-bug-report.log.gz as instructed
nvidia-bug-report.log.gz (203 KB)