OpenGL application sometimes crashes when switching workspace

This is an old bug I reported already in the old forum, and that I have been encountering for years despite the changes in hardware and software on my systems (7600GT, 8800GT and now GTX460 cards; Althon XP 3200+, Core2 Quad Q6600 and now Core-i5 CPUs; Mandriva 2007.1, 2008.1, 2009.1 and now 2010.2 distros; Linux kernel (vanilla) from v2.6 to v3.9; Sawfish window manager, all versions including the latest ones; Second Life viewers, v1, v2, v3, official and third party viewers; all NVIDIA drivers versions).

The symptoms are: while the Second Life viewer is running (this is about the only OpenGL application I use, but I’d bet it would happen with any OpenGL app) on one workspace, it sometimes (like 1 time every 10 times) crashes when I switch to another workspace.

The viewer provides a stack trace on crashes, and it always point into libnvidia-glcore.so and is always accompanied with a “NVRM: Xid” message in /var/log/messages (see the attached stack trace and bug report files).

Please, could someone at (loooooong) last look into this extremely annoying issue ?..

Many thanks in advance !

stacktrace.log (637 Bytes)
nvidia-bug-report.log.gz (67.5 KB)

Hi dinosaur_, Is this the old issue you are talking about http://www.nvnews.net/vbulletin/showthread.php?t=176944 ?

If no, Please provide reproduction steps (step-by-step), Any setting done in Second life and if possible login details for Second life ?

Not at all. The issue was already reported here: http://www.nvnews.net/vbulletin/showthread.php?t=167040 and here: http://www.nvnews.net/vbulletin/showthread.php?t=182516 in the old forum

I already gave you the repro steps !.. Just install Gnome, Sawfish (I didn’t try with other WMs and the bug would perhaps (probably) happen with them as well, but to be closer to my setup, just try with Sawfish for a start), configure several workspaces (I use 6 workspaces, since I’m an heavy multitasker), add the Gnome Workspace switcher to the Gnome panel, start a SecondLife viewer (any, really), then login and switch between the desktop in which the SL viewer is running and other desktops till you crash (depending on my luck, it can crash every 5 switches or every 50, but I found no correlation with other running applications or anything in that vein).
What you do in SL is irrelevant (it can be with any avatar, and pretty much any graphics settings: I usually use the non-deferred rendering mode (no shadows but much better FPS rates) with all the other settings maxed out (including anisotropic filtering on and VBO on in the “Hardware settings”) excepted for draw distance (256m), and AA (x4 “only”).

Tested with Ubuntu 11.10 23-bit + Dell T7500 WS + GeForce GTX 460 + gnome-panel/sawfish display manager + 4 work spaces - Yet not reproduced . I will continue testing …

Thanks for looking into this !

Note that I didn’t crash either for now since I updated to the v319.23 driver: some driver versions seem more sensitive to this bug than others, but all driver versions (I always update to the latest driver and most often also test beta drivers) in the last 2+ years were affected nonetheless…

Note that I didn’t crash either for now since I updated to the v319.23 driver
dinosaur_, Do you mean this issue resolved for you with latest driver ?

I almost thought so… but I just got a crash with v319.23 minutes ago. See the attached stack trace and nvidia-bug-report log.

Note however that v319.23 proved exceptionally stable when compared with previous driver versions relatively to this bug (it means, alas, that it will be even harder for you to reproduce it, I’m afraid)… Any way for me to run a symbols-loaded driver that would provide a proper stack trace on crash (with the name of the crashing function in the driver) ?
nvidia-bug-report.log.gz (63.8 KB)
stacktrace.txt (637 Bytes)

Still not reproduced with Mandriva 2010.2 i586 + GeForce GTX 460 + gnome-panel/sawfish display manager + SecondLife-i686-3.5.1.274821 . I think it is specific any location/map. I am just launching second-life, roaming there-and-there and then trying to switch workspace from 1st to 2nd and 2nd to 1st. Any more information ?

Not at all: what you do in SL once logged in is totally irrelevant (I had it happening in plentiful locations and situations, either idling or role-playing.

Well, when I crash, it’s often while switching to a Firefox window (via the task bar buttons) living on another work space than the SL viewer (since it’s often to do a search on the web, browse a blog, or stuff like that). I cannot say if it is at all linked to the fact that Firefox is also launched but it is a fact that the latter is almost always open on one of my workspaces…
Any way for you to provide me with a driver that would be compiled with the symbols built-in, so that the stack trace would be more relevant and would allow to spot the problem ?.. My kernel is a vanilla one (always the latest stable version available: currently 3.9.4, and going to compile 3.9.5 today or tomorrow) compiled with gcc v4.4.3…

Also one more thing is it specific to sawfish ?

Like I explained already, I don’t know, because I use only Sawfish and never any other WM… This said, given the range of Sawfish versions I used over the years this bug has hit me, I think it is unlikely that this bug is caused by Sawfish…

I got another crash, today, still with v319.23. Here are the logs.
It happened when switching from the SL viewer to Firefox (that was running on another desktop), via the Firefox button in the task bar.
nvidia-bug-report.log.gz (67.4 KB)
stack_trace.log (637 Bytes)

And another today… This time, switching from the SL viewer to an XTerm window running on another desktop, still via the task bar button.

After all, v319.23 is not stabler than older versions… I just had a lot of luck in the couple of weeks that followed its installation on my system.
nvidia-bug-report.log.gz (67.1 KB)

Can you try with export __GL_SINGLE_THREADED=1 before running Secondlife, just to see if it has an effect?

In fact, and while the SecondLife viewers are definitely multi-threaded applications, the render pipeline itself uses a single thread, so with all but the newest drivers (i.e. all drivers before v310) a single thread was used while executing OpenGL functions (and the crashes were already seen with those old drivers). After v310 has been published, I tried with and without “export __GL_THREADED_OPTIMIZATIONS=1” (which forces multi-threading at the OpenGL driver level), but it doesn’t change anything as far as the crashes and their frequency are concerned.

Note that I updated today to the beta driver v319.32 (and Linux kernel v3.9.8); I see that there was a thread-related fix in that driver: let’s see if it changes anything… If I still encounter crashes, then I’ll let you know and I will try “export __GL_SINGLE_THREADED=1” (even if I’m pretty sure it won’t change a thing).

Well, it has not been long before I got a crash with v319.32 too (see the attached files). Now trying to run the viewer with “export __GL_SINGLE_THREADED=1”…
crash_log.txt (637 Bytes)
nvidia-bug-report.log.gz (68.1 KB)

As I anticipated, the “export __GL_SINGLE_THREADED=1” trick did not prevent a new crash, but interestingly, this time, the stack trace is different. See the attached files. I will try and run the viewer under gdb, to try and see what OpenGL instruction exactly is being called in LLRenderTarget::flush() and triggers the crash.
stacktrace.txt (618 Bytes)
nvidia-bug-report.log.gz (69.2 KB)

Got it !

The crash seems to happen whenever the viewer is calling glCopyTexSubImage2D() when the desktop switch occurs…

dinosaur_, Any updates about your testing ?

Well, I did give you a pointer to the culprit OpenGL function in your driver… I don’t see what I could do more without having the driver sources at hand. It’s time for you to look into your code and see what could be the cause for such crashes.

For info, I just got another, identical crash with the beta driver v325.08 (with the patch published on this forum to compile it against Linux v3.10, which I’m also now running) and the very latest SL viewer code from Linden Lab. See the attached logs.
crash_log.txt (618 Bytes)
nvidia-bug-report.log.gz (62 KB)