337.19 driver is very unstable with GeForce GTX 295

I’m using 3.13 kernel and Debian GNU/Linux, I have GeForce GTX 295 and Xeon X3450 CPU with lots of ECC RAM backed up with UPS, so theoretically my workstation should be very stable. With 304.121 NVidia driver my system is stable enough for me. But when I tried to upgrade to 337.19 driver, I get very frequent crashes. I have reasons to upgrade NVidia driver, so I would appreciate very much if this is fixed.

With 337.19 I experience the following kind of crashes:

  1. All monitors become black (go to standby), simultaneously or slowly one-by-one, for no apperent reason (for example, it may happen while watching youtube, etc.). After this I can’t switch to console with Ctrl+Alt+F1. I still can login remotely though.

  2. After some action, like switching to another window or desktop, X screen blinks (becomes black for very short moment, probably some milliseconds) and everything freezes or becomes really slow. X screen may blink just once or multiple times. After a while, either blinking stops and I can move mouse cursor (X stays frozen forever in this case), or X crashes.

These kind of crashes happen at least few times per week and up to few times per day. I usually use three monitors with compositing (KDE4), but even if I use single monitor without compositing, new NVidia driver still may freeze or crash just because I switched from konsole to chrome with some Google results, or some other minor reason.

I managed to generate some bug reports (if “before” reports are useless let me know and I will not upload them in future):

nvidia-bug-report few minutes before and few minutes after all monitors became black while watching youtube video in chrome.

nvidia-bug-report few minutes before and after X screen blinked and after a while became black and crashed.

I tried some versions before 337.19 driver, and they are very unstable too. So somewhere between 304.121 and 337.19 regression happened. If really necessary, I could try to find exact version when the regression happened. Please let me know if you need more information.

Are running youtube video with html5 or adobe flash plugin? Make sure there is no options enabled like sleep display or suspend system , Can you check kde power management options ? Also check screensaver or screen lock is enabled? What action trigger this issue (I am looking for reproduction steps)?

I think I’m using flash. I do not have screensaver and only power management option I have enabled is to make monitors to go to standby after 10min of inactivity, but when these problems happen, they almost always happen during my activity of some sort (only exception was when I was watching youtube video, but it happened too quickly after my last activity, so not enough time to trigger normal standby).

So far quickest way I found to trigger the crash is to play fullscreen game (in my case it was some 3D game in PS1 emulator) and often (1-4 times per hour) switch to Chrome (to look at text documents, google results, youtube, or whatever). For me this happened multiple times while I was in single monitor mode, without compositing, even when all I had in Chrome was some Google results. This way it usually takes few hours before X crash or freeze. Perhaps switching between fullscreen OpenGL game and Chrome even more often could trigger bug even faster.

This bug is not Chrome-specific (it’s just I need browser often), crash happens sometimes when switching to other windows too, like Blender or some other application. Sometimes it can crash during switching from triple monitor to single monitor (or vice versa) with xrandr. Sometimes attempt to maximize mplayer with VDPAU freezes X.

I do not know if it makes any difference, but I always log-in with triple monitors enabled, and switch to single monitor later, when I feel like playing a game or watching a movie. But crashes and freezes seems to happen with same probability in both triple and single monitor modes, and may happen before my first attempt to switch to single monitor after last reboot, so I guess this does not matter.

If you need faster way to trigger the bug I could try again new driver to attempt to find a faster way to trigger X crash/freeze, but should I try 337.19 again, or 331.79? 331.79 seems to be newer despite lower version number, so I’m not sure.

I have a GTX 285m with a similar problem, except my card(s) are instable/crashing ever since the first 304 driver, and with every driver after that. Additionally, the new 331.7x crashes my system when I try to over or underclock it. Also, I can’t run enemy Territory, a decade old game at all, it freezes almost immediately after I log into a match. This problem affects every opengl game I have. Note I have tried this with both flash on and off, and I have upgraded and downgraded almost every software package I have over the last 8 months or so trying to solve this problem! NVIDIA, fix your powermizer code on Linux!

Edit: I would also like to add that my symptoms are identical to the OP’s, only I am on a laptop with 1 screen alone. I am using the 3.13.0 kernel as well, and have reproduced each result with each driver described above. Also, the nvidia-settings program showed that the video memory was being clocked at 2000 mhz. This is of course an error, as the GTX 285m is only rated to run its video memory at 1000mhz. I guess someone got the bright idea to show the “real” (ddr x 1000 mhz) data rate in the nvidia-settings panel and accidentally misset the reference table entry for the card in the driver. That or its really running at 1000 and showing in the the panel at 2000, either way this is a bug. The panel should read the base rate regardless of the “real” rate, as I need to be able to cross check what nvidia-settings is telling me what the current memory speed is vs the actual hardware reference spec published on your website.

System:
Bodhi Linux 64 2.4.0
Clevo X8100 w/
2x NVIDIA GTX 285m SLI
8GB Corsair 1333 DDR2 Ram.