GTX970 346.35 & 346.47 Linux Mint 17.1 Steam CSGO Segfaults during play crash the game

As per the title, segfaults are reported in the syslog for events which are crashing csgo. The csgo github issue for this was recommended to be forwarded to nvidia. I am not linking to that github, because I believe it is the reason my other account can’t post on these forums (flagged as a spam account? Really?).

I have crash dumps, nvidia reports, and system info all available, but I don’t want to waste my time if this post isn’t even going to show up like all the other ones. Such a frustrating forum experience…

Please find the files attached to this reply
nvidia-bug-report.log.gz (81.2 KB)

CSGO segfault dumps.7z (157 KB)

The following includes a Sysinfo report, a Steam system report, and two excerpts from the Syslog file.

bump

This issue is tentatively solved by 346.47

Haven’t had a crash yet today, after playing a few matches.

Scratch that, not fixed with 346.47.

New crash dump is attached here
crash_20150227172342_1.dmp.7z (62.8 KB)

Also happens with Kubuntu 14.04, nvidia 770, driver 331.113 (automatic ubuntu drivers).

I can’t seem to attach files so I can’t provide the dmp files.

I ran the nvidia-bug-report with “xserver-command=X -logverbose 6” in my lightdm.conf but it contained nothing except

nvidia-bug-report.sh Version: 18542053

Date: Sun Mar  1 01:06:13 CET 2015
uname: Linux eric-linux 3.13.0-46-generic #76-Ubuntu SMP Thu Feb 26 18:52:13 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
command line flags:

Below are sysinfo and info from syslog.

sysinfo:

System information report, generated by Sysinfo: 28/02/2015 18:00:13
http://sourceforge.net/projects/gsysinfo

SYSTEM INFORMATION
	Running Ubuntu Linux, the Ubuntu 14.04 (trusty) release.
	GNOME: unknown (unknown)
	Kernel version: 3.13.0-46-generic (#76-Ubuntu SMP Thu Feb 26 18:52:13 UTC 2015)
	GCC: 4.8 (x86_64-linux-gnu)
	Xorg: 1.15.1 (12 February 2015  02:49:29PM) (12 February 2015  02:49:29PM)
	Hostname: eric-linux
	Uptime: 0 days 7 h 48 min

CPU INFORMATION
	GenuineIntel, Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz
	Number of CPUs: 4
	CPU clock currently at 1600.000 MHz with 6144 KB cache
	Numbering: family(6) model(42) stepping(7)
	Bogomips: 6606.20
	Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid

MEMORY INFORMATION
	Total memory: 7945 MB
	Total swap: 0 MB

STORAGE INFORMATION
	SCSI device -  scsi2
		Vendor:  ATA      
		Model:  ST4000DM000-1F21 
	SCSI device -  scsi3
		Vendor:  ATA      
		Model:  Samsung SSD 840  
	SCSI device -  scsi8
		Vendor:  WD       
		Model:  10EARS External  

HARDWARE INFORMATION
MOTHERBOARD
	Host bridge
		Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
		Subsystem: Gigabyte Technology Co., Ltd Device 5000
	PCI bridge(s)
		Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09) (prog-if 00 [Normal decode])
		Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09) (prog-if 00 [Normal decode])
		Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 1 (rev c4) (prog-if 00 [Normal decode])
		Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 5 (rev c4) (prog-if 00 [Normal decode])
		Intel Corporation 82801 PCI Bridge (rev c4) (prog-if 01 [Subtractive decode])
		Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 7 (rev c4) (prog-if 00 [Normal decode])
		Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 8 (rev c4) (prog-if 00 [Normal decode])
		Intel Corporation 82801 PCI Bridge (rev 41) (prog-if 01 [Subtractive decode])
		Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09) (prog-if 00 [Normal decode])
		Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port (rev 09) (prog-if 00 [Normal decode])
		Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 1 (rev c4) (prog-if 00 [Normal decode])
		Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 5 (rev c4) (prog-if 00 [Normal decode])
		Intel Corporation 82801 PCI Bridge (rev c4) (prog-if 01 [Subtractive decode])
		Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 7 (rev c4) (prog-if 00 [Normal decode])
		Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 8 (rev c4) (prog-if 00 [Normal decode])
		Intel Corporation 82801 PCI Bridge (rev 41) (prog-if 01 [Subtractive decode])
	ISA bridge
		Intel Corporation Z77 Express Chipset LPC Controller (rev 04)
		Subsystem: Gigabyte Technology Co., Ltd Device 5001

GRAPHIC CARD
	VGA controller
		NVIDIA Corporation GK104 [GeForce GTX 770] (rev a1) (prog-if 00 [VGA controller])
		Subsystem: Micro-Star International Co., Ltd. [MSI] Device 2825

SOUND CARD
	Multimedia controller
		NVIDIA Corporation GK104 HDMI Audio Controller (rev a1)
		Subsystem: Micro-Star International Co., Ltd. [MSI] Device 2825

NETWORK
	Ethernet controller
		Qualcomm Atheros AR8151 v2.0 Gigabit Ethernet (rev c0)
		Subsystem: Gigabyte Technology Co., Ltd Device e000

NVIDIA GRAPHIC CARD INFORMATION
	Model name: unknown
	Card Type: unknown 16x
	Video RAM: 2048 MB
	GPU Frequency: 135 MHz
	Driver version: NVIDIA UNIX x86_64 Kernel Module  331.113  Mon Dec  1 21:08:13 PST 2014

syslog:

Feb 27 22:40:37 eric-linux assert_20150227203618_1.dmp[5012]: Uploading dump (out-of-process)#012/tmp/dumps/assert_20150227203618_1.dmp
Feb 27 22:40:37 eric-linux kernel: [33588.448348] csgo_linux[4711]: segfault at 0 ip 00000000f3410c1b sp 0000000082108e60 error 6 in libtier0_client.so[f33fd000+28000]
Feb 27 22:40:38 eric-linux assert_20150227203618_1.dmp[5012]: Finished uploading minidump (out-of-process): success = no
Feb 27 22:40:38 eric-linux assert_20150227203618_1.dmp[5012]: error: Failure when receiving data from the peer
Feb 27 22:40:38 eric-linux assert_20150227203618_1.dmp[5012]: file ''/tmp/dumps/assert_20150227203618_1.dmp'', upload no: ''Failure when receiving data from the peer''

Attach files after posting, by clicking the paperclip at the top right of your post (after it has been posted). It’s a weird forum system.

bump

I’m having a similar issue. I’d never had any game besides CS GO crash, so I figured it was a game problem until recently. Maybe CS is the only game that pushes the GPU hard enough or uses certain OpenGL instructions…? Not sure. My issue seems to alternate between CS segfaulting and a reported xid error 8 from the Nvidia driver (which usually resulted in at least a lockup in CS). This shows up in the systemd journal for me – I’ll dig up the exact messages and post them. Out of curiosity, do you guys ever get the xid errors instead/in addition?

I decided to do some GPU stress testing in Linux to see if it was ONLY a CS GO problem. I tested using 6 simultaneous gputest (http://www.geeks3d.com/gputest/) instances – I ran 2 x tess_x64, 1 x tess_x32, 1 x triangle, 1 x pixmark, and 1 other. After about 25 minutes, I got the graphics lock-up and xid error issue; I kept it running and it occurred again every couple of minutes after that.

Any chance you guys would be willing to do a similar (or same) GPU stress test to see if we can narrow this down? I’d love to be able to play CS GO under Linux without crashing…

Here’s my original post to the Valve/CS GO github issues: https://github.com/ValveSoftware/Counter-Strike-Global-Offensive/issues/151 . I originally thought it was only an issue watching demos, but I’ve since had it occur in online play and local play with bots.

My system:
Intel i7-930
12GB RAM (tested with Memtest86+ to ensure it wasn’t a RAM issue)
EVGA GTX 670 4GB (overclocked from the factory)
Arch Linux 64-bit Kernel (I’ve had this issue with kernels from 3.16-3.18)
Nvidia binary (I’ve tested from 343.22-346.47)

I’ll get an nvidia-bug-report output posted as well.
What else can I provide to help get this resolved?

(EDIT: Fixed github link)
nvidia-bug-report.log.gz (205 KB)

I attached the nvidia-bug-report.log.gz to my previous message.

Here are my crash entries in the systemd journal (xid errors):

Feb 28 23:24:41 i7-arch.home kernel: NVRM: GPU at PCI:0000:03:00: GPU-d0a78e4c-1f8a-2afd-e2eb-8d86c4d30e51
Feb 28 23:24:41 i7-arch.home kernel: NVRM: Xid (PCI:0000:03:00): 8, Channel 0000000f
Feb 28 23:24:43 i7-arch.home kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Mar 07 10:51:37 i7-arch.home kernel: NVRM: GPU at PCI:0000:03:00: GPU-d0a78e4c-1f8a-2afd-e2eb-8d86c4d30e51
Mar 07 10:51:37 i7-arch.home kernel: NVRM: Xid (PCI:0000:03:00): 8, Channel 0000000e
Mar 07 10:51:39 i7-arch.home kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

They’re almost identical, aside from a slightly different channel – not sure what that means.

And for the record, I can play CS GO in Windows for hours without issue on the same machine.

Quaidd, Why The csgo github suggested to forward this to nvidia? I don’t see game is crashing in any nvidia library . Also did you test with any other nvidia driver version ? If possible attach gdb to game process and check in which library actual crash is?

Hello @sandipt,

There is a perceived link to users with multiple monitors. Certainly the stress test crash described by rage_311 is a strong hint that NVIDIA drivers could be involved.

RE gdb, I’d love to try this. GDB is a horribly user-hostile debugger, from what I’ve learned so far. If anyone can walk me through this, I’d be willing to give it a try.

Without assistance, I can only imagine that GDB results will be a long way off.

deleted

deleted

Hello rage_311, so far I have been unable to replicated the crashes using stress tests. I don’t think Ubuntu 14.10 is running on systemd yet, so the error reporting might be different RE Xid.

Was there a rationale for the number and type of gpu tests you ran? Our systems aren’t identical, so the minor differences might mean I have to run different numbers and combinations of gputest instances. Knowing your rational might speed up the process.

I ran the following set of GPU stress tests for ~two hours, and had no crashes. You can see that the ‘core usage’ rarely dropped below 90%. I’m hoping for some more insight on triggering the crashes using stress tests, if anyone can offer help with that.

1x Triangle (OpenGL 2.1/3.0) - 800x600
1x PixMark Piano (OpenGL 2.1/3.0) - 1024x640
1x TessMark X64 (OpenGL 4.0) - 1920x1080
1x TessMark X64 (OpenGL 4.0) - 800x600
1x TessMark X32 (OpenGL 4.0) - 1920x1080
1x GiMark (OpenGL 3.3) - 1920x1080

When I ran a different set of tests, sufficient to maintain ~99-100% core usage, the system seemingly became unresponsive. CTRL-ALT-F2, still worked, though. I wasn’t sure what to look for in terms of crashes, but if it was anything like the CSGO crashes, the program would simply exit unexpectedly; That didn’t happen. Searching my syslog for “segfault” and also for “dump” gave no hits.

I just ran enough tests to maximize the load on all CPU cores for my i7. It sounds like your system is definitely more stable graphics-wise than mine is.

I was playing CS again on Friday night and I got several segfaults, and only AFTER the segfaults did I get driver crashes (xid errors). I spent a little time this morning looking at a couple of them in gdb. I’m unfamiliar with Ubuntu, so I won’t be able to guide you step-by-step, especially since Ubuntu isn’t on systemd.

You’ll have to find where Ubuntu puts the coredump files, make sure gdb is installed, then run

gdb nameofcoredumpfile

Now enter these commands at the gdb prompt:

(gdb) set logging on
(gdb) bt full
# you might have to press enter a few times until you get the prompt back
(gdb) quit

and you’ll get a gdb.txt file in your current working directory.

My two most recent crashes appear to have originated in libtier0_client.so, which is a CS GO library. Here are my gdb coredump backtraces:
http://sprunge.us/BPDP
http://sprunge.us/BNbV

These definitely seem like pure CS:GO bugs to me, though I don’t know exactly what that library is… I must be having multiple issues here, considering the fact that my system isn’t stable while running some OpenGL stress tests even.

Anyway, maybe check your coredumps and see if there’s anything similar there for you, and we could probably bring some of that info back to Valve’s issues system.

If anybody has any information on how I can get to the bottom of my graphics instability issue, please let me know.

I also have 3 monitors. Have you done any testing with unplugging a couple of your monitors yet? I disabled 2 monitors in nvidia-settings, without unplugging them, but got the same results. Maybe I should try physically unplugging a couple of monitors, regenerating a new xorg.conf file, and trying again. I’ll let you know how this goes.

Possible progress to report.

Though I don’t fully understand it, I was able to eliminate these errors from the system journal on boot:

[   15.458823] NVRM: Your system is not currently configured to drive a VGA console
[   15.458830] NVRM: on the primary VGA device. The NVIDIA Linux graphics driver
[   15.458834] NVRM: requires the use of a text-mode VGA console. Use of other console
[   15.458838] NVRM: drivers including, but not limited to, vesafb, may result in
[   15.458841] NVRM: corruption and stability problems, and is not supported.

and my GPU stress test has been able to run successfully for a couple of hours. I didn’t think that would affect X, but I think that’s the only thing that I’ve changed and it SEEMS to be better now. For the record, the fix is in GRUB’s kernel boot parameters. Uncommenting these:

# Uncomment to use basic console
GRUB_TERMINAL_INPUT=console

# Uncomment to disable graphical terminal
GRUB_TERMINAL_OUTPUT=console

in /etc/default/grub fixed that issue for me – it enables the use of a VGA console (instead of an FB driver?).

Regardless, it doesn’t fix the CS crash. I’ve tried CS again since, and it crashed within 15 minutes or so of playing (deathmatch on a LAN server with bots). I’ve physically disconnected the other two monitors, generated a new xorg.conf via nvidia-settings, rebooted, and tried again… CS still crashes. It might be time for me to take my reporting back to Valve after all. I’d like to try Ubuntu 12.04 or SteamOS–since those are the officially supported distros–on this same machine and see if the results are any different. I may do that if I still can’t find a fix. Have either of you guys tried that?