System lockups during the game. [Manjaro, Kernel 4.4; GTX 970, 370 drivers]

So far I experienced lockups in very first Linux beta build of Rocket League and in Team Fortress 2. Haven’t played any other games that much to find anything similar, except for Europa Universalis IV but in 50+ hrs in the last two weeks I haven’t got any alike problems there.

When that thing happens the whole screen suddenly turns in some solid color (I think in different times I saw different shades of light brown and black colors), I still can hear sounds. Judging by sounds game freezes shortly after moment where I’m loosing picture on monitor (but not instantly - I still can hear game noises for a few seconds). Also if there were mumble or player running in the background I still can hear voices/music and (if I recall correctly).
Whole system is completely unresponsive during the whole thing: I can’t Alt+Tab, Switch desktops, kill X server with a bind, access TTY or even safe reboot with Alt+SysRq+RSEIUB. Only way to get PC back to working state is hard reboot.
Nevertheless I’m able to connect to PC through ssh with my phone. And I’m able to run console commands through it.

In the first public Linux beta build of Rocket League that problem happened very often I’d say every 30~45 minutes, in Team Fortress 2 it happening rarely but with inconsistent periodicity - usually once every 15~20 hrs of gameplay, maybe even more rare.

Last time (today) it happened during a competitive match (gladly - just a scrim, not an official game). So I had to quickly reboot system and join back to server as fast as possible, while my team where in numbers disadvantage. I rebooted, joined the server again and got hangup with same blank screen in less than a minute of gameplay. This time I wasted some time to do

sudo /usr/bin/nvidia-bug-report.sh --safe-mode

(it were hanging up without “–safe-mode”). Then I rebooted again, rejoined server and got a blank screen third time. I ran same command again. By the time I rebooted for the 3rd time, game already ended and server reservation ended. So I went to another server and played few hours without any troubles.

It was quite unusual with that rate of hungups. I can’t say for sure if I ever rejoined same server (with the same settings, map and users on it) again after crashing.

I didn’t succeed with setting up “-logverbose 6” - still can’t figure how to use it. I’d like it just to run on the system’s startup.

Here’s what I got today from Team Fortress 2:
First report: https://mega.nz/#!N1YRlLaZ!yunCPiACNdlsj_SoQqAglumzKrafbtMeLLO_KBpXenE
Second report: https://mega.nz/#!A9ozHY7a!SiXGT29PmjAPiZiptHZmb3J7PhTONymvr07MBnUUQsA

Also here’s the reports I got from Rocket League hanups on September 10 (with the first Linux beta build):
This one I ran without “–safe-mode” and it hanged: https://mega.nz/#!Egg3SBgT!f4HnG7wpCIVPudVgyxVAUFOyjN-LTGhxk536MhijMfQ
And this one I ran with “–safe-mode”: https://mega.nz/#!klZwQQDL!0tDWbKSPsTmiyKgIxk82RwkDqzOWbVbJpLwOqBrg9h8

And here is my Steam’s System information:

Computer Information:
    Manufacturer:  Unknown
    Model:  Unknown
    Form Factor: Desktop
    No Touch Input Detected
    
Processor Information:
    CPU Vendor:  GenuineIntel
    CPU Family:  0x6
    CPU Model:  0x3f
    CPU Stepping:  0x2
    CPU Type:  0x0
    Speed:  3600 Mhz
    12 logical processors
    6 physical processors
    HyperThreading:  Supported
    FCMOV:  Supported
    SSE2:  Supported
    SSE3:  Supported
    SSSE3:  Supported
    SSE4a:  Unsupported
    SSE41:  Supported
    SSE42:  Supported
    AES:  Supported
    AVX:  Supported
    CMPXCHG16B:  Supported
    LAHF/SAHF:  Supported
    PrefetchW:  Unsupported
    
Network Information:
    Network Speed:  
    
Operating System Version:
    "Manjaro Linux" (64 bit)
    Kernel Name:  Linux
    Kernel Version:  4.4.20-1-MANJARO
    X Server Vendor:  The X.Org Foundation
    X Server Release:  11804000
    X Window Manager:  Openbox
    Steam Runtime Version:  steam-runtime-beta-release_2016-06-15
    
Video Card:
    Driver:  NVIDIA Corporation GeForce GTX 970/PCIe/SSE2

    Driver Version:  4.5.0 NVIDIA 370.28
    OpenGL Version: 4.5
    Desktop Color Depth: 24 bits per pixel
    Monitor Refresh Rate: 74 Hz
    VendorID:  0x10de
    DeviceID:  0x13c2
    Revision Not Detected
    Number of Monitors:  1
    Number of Logical Video Cards:  1
    Primary Display Resolution:  1440 x 900
    Desktop Resolution: 1440 x 900
    Primary Display Size: 16.14" x 10.24"  (19.09" diag)
                                            41.0cm x 26.0cm  (48.5cm diag)
    Primary Bus: PCI Express 16x
    Primary VRAM: 4096 MB
    Supported MSAA Modes:  2x 4x 8x 16x 
    
Sound card:
    Audio device: Realtek ALC1150
    
Memory:
    RAM:  15947 Mb
    
Miscellaneous:
    UI Language:  English
    LANG:  en_GB.UTF-8
    Microphone:  Not set
    Total Hard Disk Space Available:  432983 Mb
    Largest Free Hard Disk Block:  166458 Mb
    VR Headset: None detected
    
Recent Failure Reports:

Hi RattleWrench,

I’m not probably helping with my post (directly, at lest) - but my issue at this thread seems very similar! From your logs I can see that we have the exact same graphics card - and nearly same distro (Arch - Manjaro)!

The only difference in behaviour I can see, is that in my case nvidia-bug-report.sh ends gracefully without --safe-mode, and that sysrq works here fine after the hang. Otherwise, the symptoms are identical - down to the Xid 16 error (sometimes I get the “fallen of the bus” error instead - seemingly randomly).

I was starting to lean on HW issue, but since two users have a very similar issue, it could possibly be a driver issue after all?

(p.s. you can attach logs here only after posting - it is a bit counter-intuitive, though. Click the paperclip symbol at the top of your post).

Hello Wild Penguin!

Good to see (oh wait, or is it?) that I’m not alone here.

I’ll install XCOM: Enemy Unknown to see if I’ll get any problems there.

I really hope it’s reparable driver bug, since I really don’t want to refund my card.

Yep, just hunged with XCOM: Enemy Unknown after about an half hour of gameplay.

Differences from previous hangups:

  • Blank (light purple colored IIRC) screen turned black much faster - like in 5 seconds or so. In TF2 it took a several minutes before screen turned black.
  • I was able to use SysRq safe reboot. Now I need to test it again with TF2 hangup to prove (or refute) that it dosen't work there.

Also I was able to run nvidia-bug-report.sh successfully without –safe-mode. Apparently it takes more time than I expected and (again) I need to test it with TF2.
nvidia-bug-report.log.gz (304 KB)