SUMMARY:
I’m experiencing a very unusual segmentation fault on linux machines for an openGL application I am working on. The main application I am working on uses qt, but I have reproduced the issue in a simple Glut application. The machines experiencing the problem are using later Nvidia drivers with GL version 4 (see below for more details). For one machine that was previously unaffected by the issue, upgrading to the latest drivers induced the crash.
The general (but not necessarily exclusive) method to reproduce the crash is the following:
- An OpenGL app establishes a vertex pointer for drawing vertex data (e.g. glVertexPointer()).
- The app draws openGL vertex data (e.g. glDrawArrays()).
- The same application attempts to load (dlopen) and unload (dlclose) an arbitrary shared library (not libGL).
- If a segmentation fault does not occur, repeat 1-3 until it does.
Notes:
- The crash occurs during execution of dlopen().
- The crash does not repeat when running with strace, gdb, or valgrind.
- A coredump is produced on seg. fault. The resulting stack trace from the glut sample app is included below.
- The crash is irregular (May crash quickly or may require many attempts).
- The crash does not manifest when running with older versions of GL (<= 2.*).
- The crash does not occur if the vertex pointer passed to openGL points to non-dynamically allocated memory (i.e. copying into a local array and passing that pointer will prevent the crash).
- I've tried changing to the compatibility profile and changing GL version (using qt) and the problem persists.
- On one machine, I found that if I do not have permissions for device files /dev/nvidia*, then GL version 2.1.2 will be reported instead of GL 4+ and the crash does not manifest.
SYSTEMS:
The crash has been reproduced on the following systems:
-
Linux: Red Hat 5.9 (Tikanga) (Machine #1)
- Memory: 16GB
- Video: nvidia quadro 2000
- GL/Drivers: 4.3.0 NVIDIA 319.12
-
Linux: Red Hat 5.9 (Tikanga) (Machine #2)
- Memory: 96GB
- Video: 2GB nvidia Quadro 4000
- GL/Drivers: 4.3.0 nvidia 319.23
-
Linux: SUSE Linux Enterprise Desktop 11 (x86_64)
- Memory: 16GB
- Video: nvidia quadro 600
- GL/Drivers: 4.3.0 NVIDIA 319.12
-
Linux: Ubuntu 12.04.1 LTS (64-bit)
- Memory: 16GB
- Video: nvidia quadro 600
- GL/Drivers: 4.2.0 nvidia 304.64
The crash can not be reproduced on the following systems:
-
Linux: Red Hat 5.3 (Tikanga) (64-bit)
- Memory: 16GB
- Video: nVidia Quadro FX 3450/4000 SDI
- GL/Drivers: 1.2 (1.5 Mesa 6.5.1)
-
Linux: CentOS release 6.4
- Memory: 16GB
- Video: nvidia quadro 600
- GL/Drivers: 4.3.0 NVIDIA 319.60
SAMPLE APPLICATION:
To make things easier, I’ve attached a sample application that reproduces the issue using glut. My main application uses qt, but the sample glut app seems to show the same behavior. Please note, the sample application is contrived and very simple (and quickly written). The crash occurs in my main application for primitives other than GL_LINE_STRIP.
The app contains two main pieces: 1) a simple glut app that draws lines and loads a shared library, and 2) a simple library. I’ve included source code for a “minimal” library that may be used for testing; however, there is nothing special about this library except that it is as minimal as possible. Another library may suffice.
The provided glut application is very simple. Line strips are randomly generated and drawn with a default size of 400 pts. Given a shared library, you may load & unload that library. A combination of these two actions will cause the crash.
I’ve included a README file in the attached sample app zip. Please see that for details on building & running the app.
CORE:
Here is a stacktrace from a typical coredump. Notice the libGL references when dlopen should instead be loading the custom library:
#0 0x000000360247252b in _int_malloc () from /lib64/libc.so.6
#1 0x0000003602473f6e in malloc () from /lib64/libc.so.6
#2 0x0000003602475c28 in realloc () from /lib64/libc.so.6
#3 0x0000003e274abcdb in ?? () from /usr/lib64/libGL.so.1
#4 0x0000003e274a716a in ?? () from /usr/lib64/libGL.so.1
#5 0x0000003e274ae53d in ?? () from /usr/lib64/libGL.so.1
#6 0x0000003e274a6905 in ?? () from /usr/lib64/libGL.so.1
#7 0x0000003602010f69 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#8 0x000000360200d136 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#9 0x00000036020108bc in _dl_open () from /lib64/ld-linux-x86-64.so.2
#10 0x0000003603000f9a in dlopen_doit () from /lib64/libdl.so.2
#11 0x000000360200d136 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#12 0x000000360300150d in _dlerror_run () from /lib64/libdl.so.2
#13 0x0000003603000f11 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
#14 0x0000000000401691 in loadLibrary(std::basic_string<char, std::char_traits, std::allocator > const&) ()
#15 0x0000000000401a17 in handleKey(unsigned char, int, int) ()
#16 0x00002b45e19a07ea in glutMainLoopEvent () from /usr/lib64/libglut.so.3
#17 0x00002b45e19a0f4a in glutMainLoop () from /usr/lib64/libglut.so.3
#18 0x0000000000401b1f in main ()
I’ll be attaching the sample app code as well as the nvidia-bug-report.sh log.
EDIT 7/24/2014
I updated the sample application to report/print GL version information to the console on startup. I also updated the readme for clarity. Additionally, I attached a screenshot of the sample app in action. It isn’t pretty, but this should give you an idea of what it is supposed to look like.
nvidia-bug-report.log.gz (105 KB)
glutSampleApp.zip (5.53 KB)