Hi,
I am working on a realtime video processing software using CUDA and DirectX 11. Both technologies are used to access a shared 2D texture in distinct threads. The shared texture is protected and synchronized by means of DirectX Synchronized Shared Surfaces [Surface sharing between Windows graphics APIs - Win32 apps | Microsoft Docs].
During development I ran into a deadlock situation which only occurs in conjunction with CUDA graphs. The outline in order to reproduce it is as follows:
Thread 1 Thread 2
cudaGraphLaunch() IDXGIKeyedMutex::AcquireSync()
Eventually executes a host node: cudaGraphicsMapResources()
IDXGIKeyedMutex::AcquireSync() * cudaGraphicsUnmapResources() *
IDXGIKeyedMutex::ReleaseSync() IDXGIKeyedMutex::ReleaseSync()
The deadlock happens in AcquireSync() of Thread 1 and cudaGraphicsUnmapResources() of Thread 2. Their debug callstacks indicate a contention in win32u.dll!NtGdiDdDDIAcquireKeyedMutex2 and nvcuda64.dll!00007ffd83ea99d0.
However, if the use of CUDA graphs is omitted, there is no deadlock:
Thread 1 Thread 2
IDXGIKeyedMutex::AcquireSync() IDXGIKeyedMutex::AcquireSync()
IDXGIKeyedMutex::ReleaseSync() cudaGraphicsMapResources()
cudaGraphicsUnmapResources()
IDXGIKeyedMutex::ReleaseSync()
Both threads operate on their own (non-default, non-blocking) CUDA stream. I used latest versions:
Microsoft Windows 10 Pro, Version 10.0.18362 Build 18362
d3d11.dll 10.0.18362.387
CUDA 11.0.182
Quadro GP 100 with driver version 451.22
I attached a minimal reproducing example.
MinimalReproducer.cpp (5.9 KB)
Helpful comments are very appreciated.
Thank you.