Multithreaded graphics app, intermittent deadlock with Quadro FX3800, Driver 319.32, libGL.so.1 impl

Hi all, we’ve been experiencing frequent deadlocks on Redhat 5.6, with the above card/ driver details.
From the stack traces we’ve recorded, a recurring backtrace from a child thread (2) seems to be:

  1. __lll_mutex_lock_wait
  2. _L_lock_4166
  3. __libc_free
  4. ?? () from /usr/lib/libGL.so.1
  5. ?? () from /usr/lib/libGL.so.1
  6. < signal handler called >
  7. _int_malloc()
  8. __lib_malloc(bytes=35)
  9. ?? () from /usr/lib/tls/libnvidia-tls.so.319.32
  10. in operator new (sz=53)

There are around 10 other threads which are deadlocked (stuck in at the same time. (Note, for workplace security reasons, I had to manually type the stacktrace above).
My current thinking is that the signal handler in libGL.so.1 is what’s causing the other threads to deadlock because it’s making a non-async safe function call.

Any help on this greatly appreciated! Apologies again for being unable to paste stack traces directly.

Update to this: as a further check, I tried blocking all signals in our app. This results in zero deadlocks (overnight test), so there is a pretty strong case suggesting the above signal handler is indeed the cause of our deadlocks.

Although blocking signals will stop the deadlocks, it’s not really a practical solution - it would be better to fix the underlying signal handling (in the Nvidia drivers?) to prevent more clients having the same problem.

I will attempt to find out which signal is being raised, in the meantime it’d be great to get some feedback from Nvidia or elsewhere on whether my assessment of the deadlocks is reasonable.

Many thanks.

Take a look at https://devtalk.nvidia.com/default/topic/572073/cudagraphicsmapresources-returning-cudaerrormemoryallocation-error-/?offset=2#3891515, your assessment sounds reasonable to me.

papadeltagolf, Do you have any sample code so that we can compile and duplicate the issue in house?

Many thanks for the responses and apologies for the delayed reply! I had assumed that this thread had died, so I’m really pleased to see some feedback.

Unfortunately it’s proving difficult to package up a cut down app to reproduce the issue (and the scripts to repeatedly run the app and test for deadlock etc). I can reproduce the issue with a test app, but even the cut down app requires access to a large volume of mapping and elevation data plus a lot of third party libraries.

I will keep trying to package up a test app, but as I stated above blocking the signals fixes the issue, and it does appear that the signal handling is being done in the nvidia libraries? It would be great if someone could have a look at the nvidia signal handling code to ensure there are no calls to non-handler safe functions.

Any further feedback most welcome.

Can you please get a backtrace from all threads by attaching GDB to the process and running “thread apply all backtrace”? Please also capture the output of “info sharedlibrary”.

Hi Aaron thanks for the response and sorry about the delayed response (I dont seem to get notified when the thread is updated), please find requested info below:

Please let me know if you require any further info. Many thanks.

Full backtrace

Thread 5 (Thread 0xb5a1db90 (LWP 30432)):
#0 0x004dd410 in __kernel_vsyscall ()
#1 0x0078dc05 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2 0x020f815d in __pthread_cond_wait (cond=0x8b7713c, mutex=0x8b7711c) at forward.c:138
#3 0x00202865 in OpenThreads::Condition::wait(OpenThreads::Mutex*) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libOpenThreads.so.11
#4 0x00c9a625 in OpenThreads::Block::block (this=0x8b7146c) at /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/include/OpenThreads/Block:42
#5 0x00c9d0cc in ossimPlanetActionRouterThreadQueue::run (this=0x8b77018) at /local_views/pdg_vxxxx_ev_slider/vobs/vxxxx/ossim/ossimPlanet/include/ossimPlanet/ossimPlanetActionRouter.h:42
#6 0x00201dd4 in OpenThreads::ThreadPrivateActions::StartThread(void*) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libOpenThreads.so.11
#7 0x00789852 in start_thread () from /lib/libpthread.so.0
#8 0x020eba8e in clone () from /lib/libc.so.6

Thread 4 (Thread 0xb3fa2b90 (LWP 30433)):
#0 0x004dd410 in __kernel_vsyscall ()
#1 0x020f8403 in __lll_lock_wait_private () from /lib/libc.so.6
#2 0x02085c56 in _L_lock_5396 () at malloc.c:6195
#3 0x02080e69 in _int_free (av=0x2170140, p=0x9afd180, have_lock=0) at malloc.c:4846
#4 0x020818e9 in __libc_free (mem=0x9afd188) at malloc.c:3670
#5 0x0669abf1 in operator delete (ptr=0xfffffffc) at /local_views/third_party/vobs/third_party/gcc/gcc-4.4.1/extract/gcc-4.4.1/libstdc+±v3/libsupc++/del_op.cc:44
#6 0x00ca5ae3 in ossimPlanetExtents::~ossimPlanetExtents (this=0x9afd188, __in_chrg=) at /local_views/pdg_vxxxx_ev_slider/vobs/vxxxx/ossim/ossimPlanet/include/ossimPlanet/ossimPlanetExtents.h:325
#7 0x08063a99 in osg::Referenced::unref (this=0x9afd188) at /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/include/osg/Referenced:183
#8 0x00cd7aca in osg::ref_ptr::~ref_ptr (this=0xb3fa1fd0, __in_chrg=) at /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/include/osg/ref_ptr:33
#9 0x00e0c1fb in ossimPlanetOssimImageLayer::getTexture (this=0x8bb4260, width=256, height=256, tileId=…, grid=…, padding=0) at /local_views/pdg_vxxxx_ev_slider/vobs/vxxxx/ossim/ossimPlanet/src/ossimPlanet/ossimPlanetOssimImageLayer.cpp:762
#10 0x00de6cd1 in ossimPlanetTextureLayerGroup::getTexture (this=0x8b80f40, width=256, height=256, tileId=…, grid=…, padding=0) at /local_views/pdg_vxxxx_ev_slider/vobs/vxxxx/ossim/ossimPlanet/src/ossimPlanet/ossimPlanetTextureLayerGroup.cpp:253
#11 0x00d325a1 in ossimPlanetTextureRequest::run (this=0x8e84b40) at /local_views/pdg_vxxxx_ev_slider/vobs/vxxxx/ossim/ossimPlanet/src/ossimPlanet/ossimPlanetTileRequest.cpp:546
#12 0x00d25246 in ossimPlanetOperation::start (this=0x8e84b40) at /local_views/pdg_vxxxx_ev_slider/vobs/vxxxx/ossim/ossimPlanet/include/ossimPlanet/ossimPlanetOperation.h:255
#13 0x00d30e57 in ossimPlanetTileRequestThreadQueue::run (this=0x8b81728) at /local_views/pdg_vxxxx_ev_slider/vobs/vxxxx/ossim/ossimPlanet/src/ossimPlanet/ossimPlanetTileRequest.cpp:228
#14 0x00201dd4 in OpenThreads::ThreadPrivateActions::StartThread(void*) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libOpenThreads.so.11
#15 0x00789852 in start_thread () from /lib/libpthread.so.0
#16 0x020eba8e in clone () from /lib/libc.so.6

Thread 3 (Thread 0xb35a1b90 (LWP 30434)):
#0 ossimFilename::ossimFilename (this=0xb35a08fc) at /local_views/pdg_vxxxx_ev_slider/vobs/vxxxx/ossim/ossim/src/ossim/base/ossimFilename.cpp:207
#1 0x0144ca23 in ossimDtedElevationDatabase::pointHasCoverage (this=0x8b6ed18, gpt=…) at /local_views/pdg_vxxxx_ev_slider/vobs/vxxxx/ossim/ossim/include/ossim/elevation/ossimDtedElevationDatabase.h:32
#2 0x00ce77d2 in ossimPlanetOssimElevationDatabase::getTextureCellDatabase (this=0x8b80d50, width=9, height=9, tileId=…, grid=…, padding=1) at /local_views/pdg_vxxxx_ev_slider/vobs/vxxxx/ossim/ossimPlanet/src/ossimPlanet/ossimPlanetOssimElevationDatabase.cpp:257
#3 0x00ce6a28 in ossimPlanetOssimElevationDatabase::getTexture (this=0x8b80d50, width=9, height=9, tileId=…, grid=…, padding=1) at /local_views/pdg_vxxxx_ev_slider/vobs/vxxxx/ossim/ossimPlanet/src/ossimPlanet/ossimPlanetOssimElevationDatabase.cpp:117
#4 0x00cd5845 in ossimPlanetElevationDatabaseGroup::getTexture (this=0x8b808c8, width=9, height=9, tileId=…, grid=…, padding=1) at /local_views/pdg_vxxxx_ev_slider/vobs/vxxxx/ossim/ossimPlanet/src/ossimPlanet/ossimPlanetElevationDatabaseGroup.cpp:91
#5 0x00d33015 in ossimPlanetElevationRequest::run (this=0x92b83d8) at /local_views/pdg_vxxxx_ev_slider/vobs/vxxxx/ossim/ossimPlanet/src/ossimPlanet/ossimPlanetTileRequest.cpp:657
#6 0x00d25246 in ossimPlanetOperation::start (this=0x92b83d8) at /local_views/pdg_vxxxx_ev_slider/vobs/vxxxx/ossim/ossimPlanet/include/ossimPlanet/ossimPlanetOperation.h:255
#7 0x00d30e57 in ossimPlanetTileRequestThreadQueue::run (this=0x8b81320) at /local_views/pdg_vxxxx_ev_slider/vobs/vxxxx/ossim/ossimPlanet/src/ossimPlanet/ossimPlanetTileRequest.cpp:228
#8 0x00201dd4 in OpenThreads::ThreadPrivateActions::StartThread(void*) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libOpenThreads.so.11
#9 0x00789852 in start_thread () from /lib/libpthread.so.0
#10 0x020eba8e in clone () from /lib/libc.so.6

Thread 2 (Thread 0xb1ddbb90 (LWP 30435)):
#0 0x004dd410 in __kernel_vsyscall ()
#1 0x00790779 in __lll_lock_wait () from /lib/libpthread.so.0
#2 0x0078bddf in _L_lock_885 () from /lib/libpthread.so.0
#3 0x0078bca6 in pthread_mutex_lock () from /lib/libpthread.so.0
#4 0x020f82c6 in pthread_mutex_lock (mutex=0x8bb459c) at forward.c:181
#5 0x00202963 in OpenThreads::Mutex::lock() () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libOpenThreads.so.11
#6 0x08066ea8 in OpenThreads::ScopedLockOpenThreads::Mutex::ScopedLock (this=0xb1ddb0ac, m=…) at /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/include/OpenThreads/ScopedLock:31
#7 0x00e0abc9 in ossimPlanetOssimImageLayer::hasTexture (this=0x8bb4260, width=256, height=256, tileId=…, grid=…) at /local_views/pdg_vxxxx_ev_slider/vobs/vxxxx/ossim/ossimPlanet/src/ossimPlanet/ossimPlanetOssimImageLayer.cpp:522
#8 0x00de6958 in ossimPlanetTextureLayerGroup::hasTexture (this=0x8b80f40, width=256, height=256, tileId=…, grid=…) at /local_views/pdg_vxxxx_ev_slider/vobs/vxxxx/ossim/ossimPlanet/src/ossimPlanet/ossimPlanetTextureLayerGroup.cpp:204
#9 0x00d31657 in ossimPlanetSplitRequest::run (this=0x8f6f270) at /local_views/pdg_vxxxx_ev_slider/vobs/vxxxx/ossim/ossimPlanet/src/ossimPlanet/ossimPlanetTileRequest.cpp:352
#10 0x00d25246 in ossimPlanetOperation::start (this=0x8f6f270) at /local_views/pdg_vxxxx_ev_slider/vobs/vxxxx/ossim/ossimPlanet/include/ossimPlanet/ossimPlanetOperation.h:255
#11 0x00d30e57 in ossimPlanetTileRequestThreadQueue::run (this=0x8b81a00) at /local_views/pdg_vxxxx_ev_slider/vobs/vxxxx/ossim/ossimPlanet/src/ossimPlanet/ossimPlanetTileRequest.cpp:228
#12 0x00201dd4 in OpenThreads::ThreadPrivateActions::StartThread(void*) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libOpenThreads.so.11
#13 0x00789852 in start_thread () from /lib/libpthread.so.0
#14 0x020eba8e in clone () from /lib/libc.so.6

Thread 1 (Thread 0xb5e1b920 (LWP 30430)):
#0 0x004dd410 in __kernel_vsyscall ()
#1 0x020f8403 in __lll_lock_wait_private () from /lib/libc.so.6
#2 0x02085c56 in _L_lock_5396 () at malloc.c:6195
#3 0x02080e69 in _int_free (av=0x2170140, p=0x8b1b910, have_lock=0) at malloc.c:4846
#4 0x020818e9 in __libc_free (mem=0x8b1b918) at malloc.c:3670
#5 0x0571793d in ?? () from /usr/lib/libGL.so.1
#6 0x0571f5a5 in ?? () from /usr/lib/libGL.so.1
#7
#8 malloc_consolidate (av=0x2170140) at malloc.c:5093
#9 0x02081c17 in _int_malloc (av=0x2170140, bytes=3827) at malloc.c:4313
#10 0x02083d97 in __libc_malloc (bytes=3827) at malloc.c:3605
#11 0x0571a4e9 in ?? () from /usr/lib/libGL.so.1
#12 0xb6df2204 in ?? () from /usr/lib/libnvidia-glcore.so.325.15
#13 0xb6de035a in ?? () from /usr/lib/libnvidia-glcore.so.325.15
#14 0xb6de0517 in ?? () from /usr/lib/libnvidia-glcore.so.325.15
#15 0xb6de0831 in ?? () from /usr/lib/libnvidia-glcore.so.325.15
#16 0x04bd7e51 in osg::BufferObject::Extensions::glBufferData(unsigned int, int, void const*, unsigned int) const () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosg.so.55
#17 0x04bd9824 in osg::VertexBufferObject::compileBuffer(osg::State&) const () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosg.so.55
#18 0x04c406d4 in osg::State::bindVertexBufferObject(osg::VertexBufferObject const*) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosg.so.55
#19 0x04c3726e in osg::Geometry::drawImplementation(osg::RenderInfo&) const () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosg.so.55
#20 0x006cf647 in osgUtil::RenderLeaf::render(osg::RenderInfo&, osgUtil::RenderLeaf*) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgUtil.so.55
#21 0x006c94b3 in osgUtil::RenderBin::drawImplementation(osg::RenderInfo&, osgUtil::RenderLeaf*&) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgUtil.so.55
#22 0x006d2567 in osgUtil::RenderStage::drawImplementation(osg::RenderInfo&, osgUtil::RenderLeaf*&) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgUtil.so.55
#23 0x006c9140 in osgUtil::RenderBin::draw(osg::RenderInfo&, osgUtil::RenderLeaf*&) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgUtil.so.55
#24 0x006d28f6 in osgUtil::RenderStage::drawInner(osg::RenderInfo&, osgUtil::RenderLeaf*&, bool&) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgUtil.so.55
#25 0x006d56e5 in osgUtil::RenderStage::draw(osg::RenderInfo&, osgUtil::RenderLeaf*&) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgUtil.so.55
#26 0x006dfac9 in osgUtil::SceneView::draw() () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgUtil.so.55
#27 0x00149fe0 in osgViewer::Renderer::cull_draw() () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgViewer.so.55
#28 0x00147268 in osgViewer::Renderer::operator()(osg::GraphicsContext*) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgViewer.so.55
#29 0x04c5c3a8 in osg::GraphicsContext::runOperations() () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosg.so.55
#30 0x0018b72a in osgViewer::ViewerBase::renderingTraversals() () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgViewer.so.55
#31 0x0018802d in osgViewer::ViewerBase::frame(double) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgViewer.so.55
#32 0x080634a4 in main (argc=7, argv=0xbfc34f24) at /local_views/pdg_vxxxx_ev_slider/vobs/vxxxx/ossim/ossimPlanet/examples/ossimplanetviewer/ossimplanetviewer.cpp:745

Shared libs

0x002d6200 0x004c3a28 Yes /local_components/ossim/VXXXX_INT_CR40207_BL0012/lib/libossimPlanet.so.1
0x0124c3a0 0x0195ec08 Yes /local_components/ossim/VXXXX_INT_CR40207_BL0012/lib/libossim.so.1
0x00c2fde0 0x00cae148 Yes /local_components/third_party/THIRD_PARTY_INT_BL0048-03/tiff/tiff-4.0.0beta3/tiff-4.0.0beta3-bin/lib/libtiff.so.5
0x00724e18 0x00730a58 Yes () /local_components/third_party/THIRD_PARTY_INT_BL0048-03/libgeotiff/libgeotiff-1.2.5/libgeotiff-1.2.5-bin/lib/libgeotiff.so
0x009cbb10 0x00a23ac4 Yes (
) /usr/lib/libfreetype.so.6
0x0077da70 0x0077eaa4 Yes () /lib/libdl.so.2
0x0061c130 0x0067fee8 Yes (
) /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgViewer.so.55
0x008fe620 0x00953348 Yes () /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgDB.so.55
0x00591470 0x005b7248 Yes (
) /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgGA.so.55
0x006aaac0 0x006d6948 Yes () /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgText.so.55
0x007d37b0 0x008b3198 Yes (
) /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgUtil.so.55
0x02aff9d0 0x02c80ca8 Yes () /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosg.so.55
0x00e22f30 0x00e24ce8 Yes (
) /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libOpenThreads.so.11
0x006e3c10 0x00719048 Yes () /local_components/third_party/THIRD_PARTY_INT_BL0048-03/libjpeg/libjpeg-vis3d.7/libjpeg-vis3d.7-bin/lib/libjpeg.so.7
0x00981120 0x0099e448 Yes /local_components/ossim/VXXXX_INT_CR40207_BL0012/lib/libwms.so.1
0x00d899d0 0x00db3084 Yes (
) /usr/lib/libcurl.so.3
0x00744fb0 0x00763f18 Yes () /local_components/third_party/THIRD_PARTY_INT_BL0048-03/expat/expat-2.0.1/expat-2.0.1-bin/lib/libexpat.so.1
0x075ec2c0 0x077e1668 Yes /local_components/third_party/THIRD_PARTY_INT_BL0048-03/gpstk/gpstk-1.6/gpstk-1.6-bin/lib/libgpstk.so.16
0x009ab6a0 0x009b68d4 Yes (
) /usr/lib/libz.so.1
0x00a57e50 0x00ab09a4 Yes () /usr/lib/libGLU.so.1
0x00b01000 0x00b68910 Yes (
) /usr/lib/libGL.so.1
0x00d4ff70 0x00d54a94 Yes () /usr/lib/libSM.so.6
0x00d35600 0x00d45624 Yes (
) /usr/lib/libICE.so.6
0x01cc8f70 0x01d598a4 Yes () /usr/lib/libX11.so.6
0x005c76c0 0x005d13e4 Yes (
) /usr/lib/libXext.so.6
0x02d44050 0x02dbfb48 Yes /local_components/third_party/THIRD_PARTY_INT_BL0048-03/gcc/gcc-4.4.1/gcc-4.4.1-bin/lib/libstdc++.so.6
0x00bc9410 0x00be45a4 Yes () /lib/libm.so.6
0x00bf0e50 0x00c08ba8 Yes /local_components/third_party/THIRD_PARTY_INT_BL0048-03/gcc/gcc-4.4.1/gcc-4.4.1-bin/lib/libgcc_s.so.1
0x06efac80 0x06ff78d0 Yes (
) /lib/libc.so.6
0x00c10210 0x00c1bb04 Yes () /lib/libpthread.so.0
0x005d67f0 0x005ebfef Yes (
) /lib/ld-linux.so.2
0x060e73d0 0x061082d4 Yes () /usr/lib/libgssapi_krb5.so.2
0x06058270 0x060c7e14 Yes (
) /usr/lib/libkrb5.so.3
0x06021790 0x060395b4 Yes () /usr/lib/libk5crypto.so.3
0x00df17e0 0x00df1f94 Yes (
) /lib/libcom_err.so.2
0x05fc5d80 0x05fc9ed4 Yes () /usr/lib/libidn.so.11
0x06146c20 0x06174984 Yes (
) /lib/libssl.so.6
0x05eb4580 0x05f70e64 Yes () /lib/libcrypto.so.6
0x0076dd90 0x0076f843 Yes (
) /usr/lib/tls/libnvidia-tls.so.325.15
0x0330a6c0 0x044c6b00 Yes () /usr/lib/libnvidia-glcore.so.325.15
0x0071ca20 0x0071d6d4 Yes (
) /usr/lib/libXau.so.6
0x00771f10 0x00773b44 Yes () /usr/lib/libXdmcp.so.6
0x061119c0 0x06116274 Yes (
) /usr/lib/libkrb5support.so.0
0x00df68d0 0x00df7014 Yes () /lib/libkeyutils.so.1
0x00ddc130 0x00de7ae4 Yes (
) /lib/libresolv.so.2
0x00cf75a0 0x00d04524 Yes () /lib/libselinux.so.1
0x00e29f40 0x00e57e34 Yes (
) /lib/libsepol.so.1

Hmm, that’s strange. The backtrace shows libGL.so.1 code from the 0x0571793d range, but the “info sharedlibrary” output shows libGL.so.1 loaded from 0x00b01000 to 0x00b68910. Did you get the back trace and the “info sharedlibrary” output from the same run of the application?

Well spotted! I recorded the backtrace a while back, and later thought it would be useful to get a list of all the shared libraries (so basically I had the info you requested already saved but from different runs).
In order to reproduce the problem I have a script which automates the running of my test app (and takes over my machine) - this normally takes a while to reproduce the issue - if it’s a problem though, I could re-run and resend from the same ‘run’?

I had presumed the “info shared” would just be to list the libraries rather than reference their actual addresses.

Many thanks

Yes, please do capture both a backtrace and the output of “info sharedlibrary” from the same instance of the hang. This is necessary in order to be able to line up the backtraces because GDB (stupidly, IMO) doesn’t print the offsets into the libraries of the frames in the backtrace, only their absolute addresses and which library it think they’re in. On some systems, library load addresses are intentionally randomized as a security feature, so that could explain why they change from run to run.

Hi Aaron, understood. Here’s the gdb output generated from the same run. (this deadlocked after the 40th run (each run takes about 30 secs). Fortunately the deadlock occurs without elevation data, which will make packaging up a test app easier. Please let me know if you require further info (I’m in the process of packaging up a test app and scripts for Sandip).

Thread 5 (Thread 0xb5991b90 (LWP 8105)):
#0 0x00660410 in __kernel_vsyscall ()
#1 0x085dac05 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2 0x0226115d in __pthread_cond_wait (cond=0x8f5f104, mutex=0x8f5f0e4) at forward.c:138
#3 0x005aa865 in OpenThreads::Condition::wait(OpenThreads::Mutex*) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libOpenThreads.so.11
#4 0x002de405 in OpenThreads::Block::block (this=0x8f59434) at /usr/local/include/OpenThreads/Block:42
#5 0x002e0eac in ossimPlanetActionRouterThreadQueue::run (this=0x8f5efe0) at /local_views/pdg_vis3d_ev_slider/vobs/vis3d/ossim/ossimPlanet/include/ossimPlanet/ossimPlanetActionRouter.h:42
#6 0x005a9dd4 in OpenThreads::ThreadPrivateActions::StartThread(void*) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libOpenThreads.so.11
#7 0x085d6852 in start_thread () from /lib/libpthread.so.0
#8 0x02254a8e in clone () from /lib/libc.so.6

Thread 4 (Thread 0xb3f1cb90 (LWP 8106)):
#0 0x00660410 in __kernel_vsyscall ()
#1 0x085dac05 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2 0x0226115d in __pthread_cond_wait (cond=0x8f699f4, mutex=0x8f699d4) at forward.c:138
#3 0x005aa865 in OpenThreads::Condition::wait(OpenThreads::Mutex*) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libOpenThreads.so.11
#4 0x002de405 in OpenThreads::Block::block (this=0x8f699a4) at /usr/local/include/OpenThreads/Block:42
#5 0x003f02cd in ossimPlanetOperationQueue::nextOperation (this=0x8f698e8, blockIfEmptyFlag=true) at /local_views/pdg_vis3d_ev_slider/vobs/vis3d/ossim/ossimPlanet/src/ossimPlanet/ossimPlanetOperation.cpp:228
#6 0x0037474f in ossimPlanetTileRequestQueue::nextOperation (this=0x8f698e8, blockIfEmptyFlag=true) at /local_views/pdg_vis3d_ev_slider/vobs/vis3d/ossim/ossimPlanet/src/ossimPlanet/ossimPlanetTileRequest.cpp:155
#7 0x00374adb in ossimPlanetTileRequestThreadQueue::run (this=0x8f69770) at /local_views/pdg_vis3d_ev_slider/vobs/vis3d/ossim/ossimPlanet/src/ossimPlanet/ossimPlanetTileRequest.cpp:212
#8 0x005a9dd4 in OpenThreads::ThreadPrivateActions::StartThread(void*) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libOpenThreads.so.11
#9 0x085d6852 in start_thread () from /lib/libpthread.so.0
#10 0x02254a8e in clone () from /lib/libc.so.6

Thread 3 (Thread 0xb351bb90 (LWP 8107)):
#0 0x00660410 in __kernel_vsyscall ()
#1 0x085dac05 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2 0x0226115d in __pthread_cond_wait (cond=0x8f6971c, mutex=0x8f696fc) at forward.c:138
#3 0x005aa865 in OpenThreads::Condition::wait(OpenThreads::Mutex*) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libOpenThreads.so.11
#4 0x002de405 in OpenThreads::Block::block (this=0x8f696cc) at /usr/local/include/OpenThreads/Block:42
#5 0x003f02cd in ossimPlanetOperationQueue::nextOperation (this=0x8f69610, blockIfEmptyFlag=true) at /local_views/pdg_vis3d_ev_slider/vobs/vis3d/ossim/ossimPlanet/src/ossimPlanet/ossimPlanetOperation.cpp:228
#6 0x0037474f in ossimPlanetTileRequestQueue::nextOperation (this=0x8f69610, blockIfEmptyFlag=true) at /local_views/pdg_vis3d_ev_slider/vobs/vis3d/ossim/ossimPlanet/src/ossimPlanet/ossimPlanetTileRequest.cpp:155
#7 0x00374adb in ossimPlanetTileRequestThreadQueue::run (this=0x8f69368) at /local_views/pdg_vis3d_ev_slider/vobs/vis3d/ossim/ossimPlanet/src/ossimPlanet/ossimPlanetTileRequest.cpp:212
#8 0x005a9dd4 in OpenThreads::ThreadPrivateActions::StartThread(void*) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libOpenThreads.so.11
#9 0x085d6852 in start_thread () from /lib/libpthread.so.0
#10 0x02254a8e in clone () from /lib/libc.so.6

Thread 2 (Thread 0xb1f57b90 (LWP 8108)):
#0 0x00660410 in __kernel_vsyscall ()
#1 0x085dac05 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2 0x0226115d in __pthread_cond_wait (cond=0x8f69ccc, mutex=0x8f69cac) at forward.c:138
#3 0x005aa865 in OpenThreads::Condition::wait(OpenThreads::Mutex*) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libOpenThreads.so.11
#4 0x002de405 in OpenThreads::Block::block (this=0x8f69c7c) at /usr/local/include/OpenThreads/Block:42
#5 0x003f02cd in ossimPlanetOperationQueue::nextOperation (this=0x8f69bc0, blockIfEmptyFlag=true) at /local_views/pdg_vis3d_ev_slider/vobs/vis3d/ossim/ossimPlanet/src/ossimPlanet/ossimPlanetOperation.cpp:228
#6 0x0037474f in ossimPlanetTileRequestQueue::nextOperation (this=0x8f69bc0, blockIfEmptyFlag=true) at /local_views/pdg_vis3d_ev_slider/vobs/vis3d/ossim/ossimPlanet/src/ossimPlanet/ossimPlanetTileRequest.cpp:155
#7 0x00374adb in ossimPlanetTileRequestThreadQueue::run (this=0x8f69a48) at /local_views/pdg_vis3d_ev_slider/vobs/vis3d/ossim/ossimPlanet/src/ossimPlanet/ossimPlanetTileRequest.cpp:212
#8 0x005a9dd4 in OpenThreads::ThreadPrivateActions::StartThread(void*) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libOpenThreads.so.11
#9 0x085d6852 in start_thread () from /lib/libpthread.so.0
#10 0x02254a8e in clone () from /lib/libc.so.6

Thread 1 (Thread 0xb5d8f700 (LWP 8103)):
#0 0x00660410 in __kernel_vsyscall ()
#1 0x02261403 in __lll_lock_wait_private () from /lib/libc.so.6
#2 0x021eec56 in _L_lock_5396 () at malloc.c:6195
#3 0x021e9e69 in _int_free (av=0x22d9140, p=0x8f06bc0, have_lock=0) at malloc.c:4846
#4 0x021ea8e9 in __libc_free (mem=0x8f06bc8) at malloc.c:3670
#5 0x0720997d in ?? () from /usr/lib/libGL.so.1
#6 0x07211de5 in ?? () from /usr/lib/libGL.so.1
#7
#8 _int_malloc (av=0x22d9140, bytes=48) at malloc.c:4601
#9 0x021ecd97 in __libc_malloc (bytes=48) at malloc.c:3605
#10 0x0720c529 in ?? () from /usr/lib/libGL.so.1
#11 0xb7579814 in ?? () from /usr/lib/libnvidia-glcore.so.319.49
#12 0xb757a189 in ?? () from /usr/lib/libnvidia-glcore.so.319.49
#13 0xb75cd919 in ?? () from /usr/lib/libnvidia-glcore.so.319.49
#14 0xb75ebc3f in ?? () from /usr/lib/libnvidia-glcore.so.319.49
#15 0xb75d383a in ?? () from /usr/lib/libnvidia-glcore.so.319.49
#16 0x06b0eea8 in osgUtil::RenderLeaf::render(osg::RenderInfo&, osgUtil::RenderLeaf*) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgUtil.so.55
#17 0x06b094b3 in osgUtil::RenderBin::drawImplementation(osg::RenderInfo&, osgUtil::RenderLeaf*&) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgUtil.so.55
#18 0x06b12567 in osgUtil::RenderStage::drawImplementation(osg::RenderInfo&, osgUtil::RenderLeaf*&) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgUtil.so.55
#19 0x06b09140 in osgUtil::RenderBin::draw(osg::RenderInfo&, osgUtil::RenderLeaf*&) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgUtil.so.55
#20 0x06b128f6 in osgUtil::RenderStage::drawInner(osg::RenderInfo&, osgUtil::RenderLeaf*&, bool&) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgUtil.so.55
#21 0x06b156e5 in osgUtil::RenderStage::draw(osg::RenderInfo&, osgUtil::RenderLeaf*&) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgUtil.so.55
#22 0x06b1fac9 in osgUtil::SceneView::draw() () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgUtil.so.55
#23 0x053d0fe0 in osgViewer::Renderer::cull_draw() () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgViewer.so.55
#24 0x053ce268 in osgViewer::Renderer::operator()(osg::GraphicsContext*) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgViewer.so.55
#25 0x05f9c3a8 in osg::GraphicsContext::runOperations() () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosg.so.55
#26 0x0541272a in osgViewer::ViewerBase::renderingTraversals() () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgViewer.so.55
#27 0x0540f02d in osgViewer::ViewerBase::frame(double) () from /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgViewer.so.55
#28 0x080632b3 in main (argc=7, argv=0xbfabf3e4) at /local_views/pdg_vis3d_ev_slider/vobs/vis3d/ossim/ossimPlanet/examples/ossimplanetviewer/ossimplanetviewer.cpp:738
(gdb) info sharedlibrary
From To Syms Read Shared Object Library
0x002d5fe0 0x004c3808 Yes /local_components/ossim/VIS3D_INT_CR40207_BL0015/lib/libossimPlanet.so.1
0x00a30240 0x01142aa8 Yes /local_components/ossim/VIS3D_INT_CR40207_BL0015/lib/libossim.so.1
0x04d5ade0 0x04dd9148 Yes /local_components/third_party/THIRD_PARTY_INT_BL0048-03/tiff/tiff-4.0.0beta3/tiff-4.0.0beta3-bin/lib/libtiff.so.5
0x00582e18 0x0058ea58 Yes /local_components/third_party/THIRD_PARTY_INT_BL0048-03/libgeotiff/libgeotiff-1.2.5/libgeotiff-1.2.5-bin/lib/libgeotiff.so
0x03306b10 0x0335eac4 Yes /usr/lib/libfreetype.so.6
0x005a1a70 0x005a2aa4 Yes /lib/libdl.so.2
0x053c0130 0x05423ee8 Yes /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgViewer.so.55
0x059eb620 0x05a40348 Yes /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgDB.so.55
0x00607470 0x0062d248 Yes /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgGA.so.55
0x01a33ac0 0x01a5f948 Yes /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgText.so.55
0x06a847b0 0x06b64198 Yes /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosgUtil.so.55
0x05efa9d0 0x0607bca8 Yes /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libosg.so.55
0x005a8f30 0x005aace8 Yes /local_components/third_party/THIRD_PARTY_INT_BL0048-03/osg/osg-2.8.2/osg-2.8.2-bin/lib/libOpenThreads.so.11
0x07e10c10 0x07e46048 Yes /local_components/third_party/THIRD_PARTY_INT_BL0048-03/libjpeg/libjpeg-vis3d.7/libjpeg-vis3d.7-bin/lib/libjpeg.so.7
0x024460c0 0x024633e8 Yes /local_components/ossim/VIS3D_INT_CR40207_BL0015/lib/libwms.so.1
0x05d519d0 0x05d7b084 Yes /usr/lib/libcurl.so.3
0x005af0d0 0x005c4cc4 Yes /lib/libexpat.so.0
0x075342c0 0x07729668 Yes /local_components/third_party/THIRD_PARTY_INT_BL0048-03/gpstk/gpstk-1.6/gpstk-1.6-bin/lib/libgpstk.so.16
0x0063c6a0 0x006478d4 Yes /usr/lib/libz.so.1
0x07452e50 0x074ab9a4 Yes /usr/lib/libGLU.so.1
0x071b7ab0 0x07220dc8 Yes /usr/lib/libGL.so.1
0x0064ff70 0x00654a94 Yes /usr/lib/libSM.so.6
0x05299600 0x052a9624 Yes /usr/lib/libICE.so.6
0x0343af70 0x034cb8a4 Yes /usr/lib/libX11.so.6
0x015746c0 0x0157e3e4 Yes /usr/lib/libXext.so.6
0x0686a050 0x068e5b48 Yes /local_components/third_party/THIRD_PARTY_INT_BL0048-03/gcc/gcc-4.4.1/gcc-4.4.1-bin/lib/libstdc++.so.6
0x084a4410 0x084bf5a4 Yes /lib/libm.so.6
0x04f7ae50 0x04f92ba8 Yes /local_components/third_party/THIRD_PARTY_INT_BL0048-03/gcc/gcc-4.4.1/gcc-4.4.1-bin/lib/libgcc_s.so.1
0x02196c80 0x022938d0 Yes /lib/libc.so.6
0x085d5210 0x085e0b04 Yes /lib/libpthread.so.0
0x005d67f0 0x005ebfef Yes /lib/ld-linux.so.2
0x060e73d0 0x061082d4 Yes /usr/lib/libgssapi_krb5.so.2
0x06c49270 0x06cb8e14 Yes /usr/lib/libkrb5.so.3
0x072d1790 0x072e95b4 Yes /usr/lib/libk5crypto.so.3
0x005ce7e0 0x005cef94 Yes /lib/libcom_err.so.2
0x01499d80 0x0149ded4 Yes /usr/lib/libidn.so.11
0x06146c20 0x06174984 Yes /lib/libssl.so.6
0x02f73580 0x0302fe64 Yes /lib/libcrypto.so.6
0x005d2d90 0x005d4843 Yes /usr/lib/tls/libnvidia-tls.so.319.49
0xb6516340 0xb76c1dc0 Yes /usr/lib/libnvidia-glcore.so.319.49
0x00657a20 0x006586d4 Yes /usr/lib/libXau.so.6
0x0065af10 0x0065cb44 Yes /usr/lib/libXdmcp.so.6
0x061119c0 0x06116274 Yes /usr/lib/libkrb5support.so.0
0x017258d0 0x01726014 Yes /lib/libkeyutils.so.1
0x02e7b130 0x02e86ae4 Yes /lib/libresolv.so.2
0x051f45a0 0x05201524 Yes /lib/libselinux.so.1
0x08294f40 0x082c2e34 Yes /lib/libsepol.so.1

Thanks for the detailed report! With the updated backtrace, I was able to determine what is likely to be causing the hang and filed bug 1371255. However, after analyzing the signature of the hang it looks likely that your application was about to crash anyway when the hang occurred. If you start the application in GDB and let it run with GDB attached, does it eventually crash?

Aaron - that’s great! Under normal circumstances, neither our test app, nor full application actually crash though, they just hang indefinitely. When I’ve tried running the apps under gdb from the start the crash hasn’t occurred - I could repeat this test again though. It is possible that there a few problems at play here. We have found that we can get rid of deadlocks by either masking all signals, or setting malloc_check=3 - don’t fully understand the mechanics of how these prevent the deadlocks, but it implied some kind of timing effect might be involved.

Are you thinking that the result of the fix might be that our application will crash rather than hang? If so, then I guess it would still be a step in the right direction in establishing the root cause of the problem for us.

Is it possible for me to see the bug you logged (I tried searching for the number)? After working on this for so long, I’m curious as to what was happening:)
Also, would you have any idea when the updated driver would be available for us to test?

Many thanks for your help on this.

The only paths through that code that I could see that call free() are either right when a debugger is being attached, or if the application is about to crash. You attached the debugger after the deadlock occurred, right? That rules out the debugger attach path which is why I was focusing on the your-app-is-about-to-crash path. It might help to try running your application in Valgrind, in case there’s some shoddy memory management going on that it might detect right away. You could also try setting MALLOC_CHECK_=3 in the environment.

As for the bug, the bug tracker is not publicly accessible, but you can ask us for status updates. I just filed it so it hasn’t been assigned to an engineer yet. I.e. there’s no progress to report yet.

Hi Aaron, thanks for the update: yes, the debugger was attached after the deadlock occurred, which just leaves the app-about-to-crash path. Running the application under MALLOC_CHECK_=3 seems to mask the issue - we don’t get deadlocks (operationally, we actually use this as our workaround). The fact that setting MALLOC_CHECK_=3 appears to resolve the issue indicates some kind of timing dependency. Valgrind doesn’t throw anything up, probably due to slowing the app down.

I’ll try targeting insure++ on specific areas of the code, in the hope that the ‘real-life’ timig will be maintained, and this may throw something up.

I’m guessing with a patched driver, there’s a good chance it will expose the root cause anyway.

Thanks again.

Paul

Hi all, could we please have a status update for bug 1371255 please? Also an estimated date for an updated driver?

Many thanks

Paul

Hi papadeltagolf,

I’m afraid I have nothing to report and I can’t provide an estimate for when a fix will be available.

Hi Aaron, could we please have a status update for bug 1371255? Also an estimated date for an updated driver? It’s been a fair while now since the bug was filed?

Many thanks

Paul

It’s still in the queue of things to look at.

Thanks for the update Aaron. In the absence of any progress on resolving this, maybe you could share a high level description of what is going wrong in the driver? From what you’ve seen in the stack traces, is there anything we can do on the client side to work around the issue?
You mentioned it looked like our app was about to crash (which is infintely preferable to the application deadlocking). Does this mean that the driver is trapping a segv or similar signal? If that is the case, would our app be able to intercept the same signal? As I mentioned previously, if I mask all signals in our app, the deadlock does not occur?

Thanks in advance for your help.

Paul