vkDestroyDevice hang on linux (libnvidia-glcore.so.535.216.01)

Hi, I am a Google developer working on the Android Emulator. We are experiencing occasional hangs when using vulkan. Seems related to __GL_THREADED_OPTIMIZATIONS, as we can’t repro without it. But this does not seem to be a good workaround for us, because performance drops drastically with it disabled. Is this a known issue?

Tue Jan  7 11:17:20 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.216.01             Driver Version: 535.216.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro P1000                   Off | 00000000:18:00.0  On |                  N/A |
| 40%   52C    P0              N/A /  N/A |    882MiB /  4096MiB |     22%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
Thread 244 (Thread 0x7ffe780006c0 (LWP 3234466) \"MainLoopThread\"): ****** TRY ACQUIRE mutex held by Thread 249 ******                                                                                                                                 
#0  futex_wait (private=0, expected=2, futex_word=0x7fffa45f00a0) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x7fffa45f00a0, private=0) at ./nptl/lowlevellock.c:49  
#2  0x00007fffe9aa696a in lll_mutex_lock_optimized (mutex=0x7fffa45f00a0) at ./nptl/pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=0x7fffa45f00a0) at ./nptl/pthread_mutex_lock.c:128                     
#4  0x00007fffa2ef94f0 in ?? () from /lib/x86_64-linux-gnu/libnvidia-glcore.so.535.216.01               
#5  0x00007fffe54e163a in ?? () from /lib/x86_64-linux-gnu/libGLX_nvidia.so.0                           
#6  0x00007fffe2688822 in ?? () from /lib/x86_64-linux-gnu/libGLX.so.0                                  
#7  0x00007fffe268a301 in ?? () from /lib/x86_64-linux-gnu/libGLX.so.0                              
#8  0x00007fffa7d73199 in (anonymous namespace)::GlxDisplay::makeCurrent (this=0x555558380900, read=0x55555bd79340, draw=0x55555bd79340, context=0x555557fead58) at /work/emu-master-dev/hardware/google/gfxstream/host/gl/glestranslator/EGL/EglOsApi_glx.cpp:557
#9  0x00007fffa7d5f1ec in translator::egl::eglMakeCurrent (display=<optimized out>, draw=<optimized out>, read=<optimized out>, context=<optimized out>) at /work/emu-master-dev/hardware/google/gfxstream/host/gl/glestranslator/EGL/EglImp.cpp:1176
#10 0x00007fffa7d428e3 in gfxstream::gl::(anonymous namespace)::DisplaySurfaceGlContextHelper::setupContext (this=0x55555bd797a0) at /work/emu-master-dev/hardware/google/gfxstream/host/gl/DisplaySurfaceGl.cpp:85 
#11 0x00007fffa7d1909d in gfxstream::RecursiveScopedContextBind::RecursiveScopedContextBind (helper=0x55555bd797a0, this=<optimized out>) at /work/emu-master-dev/external/qemu/../../hardware/google/gfxstream/host/ContextHelper.h:51
#12 std::__1::__optional_storage_base<gfxstream::RecursiveScopedContextBind, false>::__construct[abi:v170000]<gfxstream::ContextHelper*>(gfxstream::ContextHelper*&&) (this=<optimized out>, __args=<optimized out>) at /work/emu-master-dev/prebuilts/clang/host/linux-x86/clang-r487747c/bin/../include/c++/v1/optional:363
#13 std::__1::optional<gfxstream::RecursiveScopedContextBind>::emplace[abi:v170000]<gfxstream::ContextHelper*, void>(gfxstream::ContextHelper*&&) (this=<optimized out>, __args=<optimized out>) at /work/emu-master-dev/prebuilts/clang/host/linux-x86/clang-r487747c/bin/../include/c++/v1/optional:885
#14 gfxstream::FrameBuffer::postImpl(unsigned int, std::__1::function<void (std::__1::shared_future<void>)>, bool, bool) (this=this@entry=0x555557fbe800, p_colorbuffer=p_colorbuffer@entry=1, callback=..., needLockAndBind=true, repaint=false) at /work/emu-master-dev/hardware/google/gfxstream/host/FrameBuffer.cpp:1834
#15 0x00007fffa7d1e0dc in gfxstream::FrameBuffer::postImplSync (this=this@entry=0x555557fbe800, p_colorbuffer=p_colorbuffer@entry=1, needLockAndBind=true, repaint=false) at /work/emu-master-dev/hardware/google/gfxstream/host/FrameBuffer.cpp:1796
#16 0x00007fffa7d1dfa5 in gfxstream::FrameBuffer::post (this=this@entry=0x555557fbe800, p_colorbuffer=1, needLockAndBind=true) at /work/emu-master-dev/hardware/google/gfxstream/host/FrameBuffer.cpp:1767
#17 0x00007fffa7d1ee79 in gfxstream::FrameBuffer::compose (this=0x555557fbe800, bufferSize=<optimized out>, buffer=<optimized out>, needPost=true) at /work/emu-master-dev/hardware/google/gfxstream/host/FrameBuffer.cpp:2214
#18 0x00007fffa7d0b687 in gfxstream::rcCompose (bufferSize=72, buffer=0x5555627dfde0) at /work/emu-master-dev/hardware/google/gfxstream/host/RenderControl.cpp:1243
#19 0x00007fffa7d3d3f0 in gfxstream::renderControl_decoder_context_t::decode (this=0x7ffe77ffd540, buf=0x5555627dfdd0, len=<optimized out>, stream=0x7ffe77ffd0b8, checksumCalc=0x55556010b600) at /work/emu-master-dev/hardware/google/gfxstream/host/renderControl_dec/renderControl_dec.cpp:795
#20 0x00007fffa7d03ef4 in gfxstream::RenderThread::main (this=0x55556400eb40) at /work/emu-master-dev/hardware/google/gfxstream/host/RenderThread.cpp:548
#21 0x00007fffebfd41dd in android::base::Thread::thread_main (arg=0x55556400eb40) at /work/emu-master-dev/external/qemu/android/android-emu-base/android/base/threads/Thread_pthread.cpp:147
#22 0x00007fffe9aa36c2 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447             
#23 0x00007fffe9b1e128 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78


Thread 249 (Thread 0x7ffe740006c0 (LWP 3234473) \"MainLoopThread\"): ****** waiting on pthread_join Thread 252 ******                                   
#0  0x00007fffe9aa01ce in __futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=3234550, futex_word=0x7ffe71200990) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7ffe71200990, expected=3234550, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=128, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
#2  0x00007fffe9aa024b in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7ffe71200990, expected=<optimized out>, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=128) at ./nptl/futex-internal.c:139
#3  0x00007fffe9aa5193 in __pthread_clockjoin_ex (threadid=140730796345024, thread_return=0x0, clockid=0, abstime=0x0, block=<optimized out>) at ./nptl/pthread_join_common.c:102
#4  0x00007fffa2efa205 in ?? () from /lib/x86_64-linux-gnu/libnvidia-glcore.so.535.216.01               
#5  0x00007fffa33b7276 in ?? () from /lib/x86_64-linux-gnu/libnvidia-glcore.so.535.216.01               
#6  0x00007fffa33fbc0d in ?? () from /lib/x86_64-linux-gnu/libnvidia-glcore.so.535.216.01               
#7  0x00007fffe54fc820 in ?? () from /lib/x86_64-linux-gnu/libGLX_nvidia.so.0                           
#8  0x00007fffe55b0d2f in loader_layer_destroy_device (destroyFunction=0x7fffe54fc7f0, pAllocator=0x0, device=0x55555fe62050) at /vulkan-sdk/1.3.290.0/source/Vulkan-Loader/loader/loader.c:4510
#9  loader_layer_destroy_device (device=device@entry=0x55555fe62050, pAllocator=pAllocator@entry=0x0, destroyFunction=0x7fffe54fc7f0) at /vulkan-sdk/1.3.290.0/source/Vulkan-Loader/loader/loader.c:4500
#10 0x00007fffe55beafe in vkDestroyDevice (device=0x55555fe62050, pAllocator=0x0) at /vulkan-sdk/1.3.290.0/source/Vulkan-Loader/loader/trampoline.c:1037
#11 0x00007fffa7efdb59 in gfxstream::vk::VkDecoderGlobalState::Impl::destroyDeviceWithExclusiveInfo (this=0x55555c70b700, device=0x55555fe62050, deviceInfo=..., fenceInfos=..., queueInfos=..., pAllocator=0x0) at /work/emu-master-dev/hardware/google/gfxstream/host/vulkan/VkDecoderGlobalState.cpp:2308
#12 0x00007fffa7f04db5 in gfxstream::vk::VkDecoderGlobalState::Impl::destroyDeviceLocked (this=this@entry=0x55555c70b700, device=<optimized out>, device@entry=0x55555fe62050, pAllocator=pAllocator@entry=0x0) at /work/emu-master-dev/hardware/google/gfxstream/host/vulkan/VkDecoderGlobalState.cpp:2320
#13 0x00007fffa7ea5711 in gfxstream::vk::VkDecoderGlobalState::Impl::on_vkDestroyDevice (this=0x55555c70b700, boxed_device=<optimized out>, pool=<optimized out>, pAllocator=<optimized out>) at /work/emu-master-dev/hardware/google/gfxstream/host/vulkan/VkDecoderGlobalState.cpp:2334
#14 gfxstream::vk::VkDecoderGlobalState::on_vkDestroyDevice (this=<optimized out>, pool=<optimized out>, snapshotInfo=<optimized out>, device=<optimized out>, pAllocator=0x0) at /work/emu-master-dev/hardware/google/gfxstream/host/vulkan/VkDecoderGlobalState.cpp:9404
#15 0x00007fffa7e77afa in gfxstream::vk::VkDecoder::Impl::decode (this=0x555562e26000, buf=0x55557e7142dd, len=<optimized out>, ioStream=0x7ffe73ffd0b8, processResources=0x555562999b08, context=...) at /work/emu-master-dev/hardware/google/gfxstream/host/vulkan/VkDecoder.cpp:972
#16 0x00007fffa7d03cbe in gfxstream::RenderThread::main (this=0x55556400ed80) at /work/emu-master-dev/hardware/google/gfxstream/host/RenderThread.cpp:479
#17 0x00007fffebfd41dd in android::base::Thread::thread_main (arg=0x55556400ed80) at /work/emu-master-dev/external/qemu/android/android-emu-base/android/base/threads/Thread_pthread.cpp:147
#18 0x00007fffe9aa36c2 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447             
#19 0x00007fffe9b1e128 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 


Thread 252 (Thread 0x7ffe712006c0 (LWP 3234550) \"MainLoopThread\"): ****** TRY ACQUIRE mutex held by Thread 244 ******                               
#0  futex_wait (private=0, expected=2, futex_word=0x7fffa45f00c8) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x7fffa45f00c8, private=0) at ./nptl/lowlevellock.c:49  
#2  0x00007fffe9aa696a in lll_mutex_lock_optimized (mutex=0x7fffa45f00c8) at ./nptl/pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=0x7fffa45f00c8) at ./nptl/pthread_mutex_lock.c:128                     
#4  0x00007fffa2ef94f0 in ?? () from /lib/x86_64-linux-gnu/libnvidia-glcore.so.535.216.01               
#5  0x00007fffe54e4151 in ?? () from /lib/x86_64-linux-gnu/libGLX_nvidia.so.0                           
#6  0x00007fffa2f1570e in ?? () from /lib/x86_64-linux-gnu/libnvidia-glcore.so.535.216.01           
#7  0x00007fffe54e43b0 in ?? () from /lib/x86_64-linux-gnu/libGLX_nvidia.so.0                       
#8  0x00007fffe9aa07a1 in __GI___nptl_deallocate_tsd () at ./nptl/nptl_deallocate_tsd.c:73          
#9  __GI___nptl_deallocate_tsd () at ./nptl/nptl_deallocate_tsd.c:22                                
#10 0x00007fffe9aa34cf in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:455         
#11 0x00007fffe9b1e128 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Hi @joshuaduong
Could you please help to share reliable repro steps and nvidia bug report from repro state.

Repro steps:

  1. Install android studio
  2. In android studio > sdk manager, install:
  • SDK Tools > Android Emulator,
  • SDK Tools > Android SDK Platform-Tools (adb),
  • SDK Tools > Android SDK Build-Tools 36-rc3 (aapt)
  1. in Device Manager, create a virtual device. Use:
  • Pixel 9,
  • API 35 x86_64 Google Play system image
  • Rename AVD name: avd35
  1. on command line, boot emulator:

set ANDROID_SDK_ROOT= (this is typically ~/Android/Sdk)
$ANDROID_SDK_ROOT/emulator/emulator -wipe-data -no-snapshot -avd avd35 -feature GuestAngle -gpu host

  • in a different terminal, run CTS dEQP tests:
  1. Download and unzip Android CTS (for dEQP) at https://dl.google.com/dl/android/cts/android-cts-15_r2-linux_x86-x86.zip (android-15 cts for x86)

  2. Add aapt to path:

set PATH=ANDROID_SDK_ROOT/build-tools/36.0.0-rc3:$PATH

  1. Run dEQP:

cd <path_to_android_cts>
./tools/cts-tradefed run cts -l INFO -m CtsDeqpTestCases

Repro rate is close to 100%. Timing wise, it frequently occurs around 10-40 minutes of running dEQP.

nvidia-bug-report.log.gz (1.2 MB)