D3D11 device context in a separate thread gets corrupted when CUDA graphics resource mapping is used

I have been chasing this bug for several weeks. More information is available on a github issue report which I filed for a Unity plugin called AVPro Video, available here:

I have written a Window Media Source object (registered DLL for Media Foundation framework) which parses omnidirectional media files, decodes and composites them with CUDA, and delivers Direct3D 11 media samples stored on the GPU to the Media Foundation player chain.

From a high level, the way that the system works is that my code creates a pool of very large (10k or 12k wide) D3D11 textures with RGBA pixels. I have threads which decode multiple HEVC streams using NVDEC/CUVID, where the output from each HEVC stream corresponds with a fixed smaller rectangle within the large output frame. When a new frame is started, a large output frame is taken from the D3D11 texture pool and it is mapped to a CUDA array using cuGraphicsMapResources(). As each sub-frame within the large output frame is decoded by the HEVC decoder, a CUDA function is called to convert the output from NV12 to RGBA and composite the sub-frame into the big output frame. When all of the pieces of the big frame have been completed, it is unmapped from CUDA, encapsulated in a Media Foundation media sample object, and then sent to the player via media foundation. When the player has finished using a particular big frame and it is released, my code gets a callback, and puts the frame back into the D3D11 texture pool for re-use.

So all of this works perfectly when my media source object is running with a simple media foundation test player application. But the final environment for which this was designed, includes Unity and a plugin called AVPro Video and SteamVR. In this environment, there are many more threads using D3D and the GPU, and when my decoder is used with this software, the system will occasionally (every 100-10,000 frames rendered) encounter a D3D error and result in a GPU crash / device removed scenario.

I used the D3D debug layer to observe that the first failures which occur are D3D problems, causing error messages like this:

D3D11 ERROR: ID3D11DeviceContext::Draw: Current Primitive Topology value (0) is not valid. [ EXECUTION ERROR #365: DEVICE_DRAW_INVALID_PRIMITIVETOPOLOGY]
D3D11 ERROR: ID3D11DeviceContext::Draw: A Vertex Shader is always required when drawing, but none is currently bound. [ EXECUTION ERROR #341: DEVICE_DRAW_VERTEX_SHADER_NOT_SET]
D3D11 ERROR: ID3D11DeviceContext::Draw: Rasterization Unit is enabled (PixelShader is not NULL or Depth/Stencil test is enabled and RasterizedStream is not D3D11_SO_NO_RASTERIZED_STREAM) but position is not provided by the last shader before the Rasterization Unit. [ EXECUTION ERROR #362: DEVICE_DRAW_POSITION_NOT_PRESENT]
...
D3D11: Removing Device.
D3D11 ERROR: ID3D11Device::RemoveDevice: Device removal has been triggered for the following reason (DXGI_ERROR_DEVICE_HUNG: The Device took an unreasonable amount of time to execute its commands, or the hardware crashed/hung. As a result, the TDR (Timeout Detection and Recovery) mechanism has been triggered. The current Device Context was executing commands when the hang occurred. The application may want to respawn and fallback to less aggressive use of the display hardware). [ EXECUTION ERROR #378: DEVICE_REMOVAL_PROCESS_AT_FAULT]

From these error messages, it appears to me that there is some race condition which causing the ID3D11DeviceContext to get corrupted. I don’t even create or use an ID3D11DeviceContext in any of my code anywhere, so this is a baffling problem. I worked to debug this problem by disabling different parts of my code to try and isolate the issue. And what I found was that the D3D device context failure does not happen if I don’t use the cuGraphicsMapResources() / cuGraphicsUnmapResources() functions. Obviously there is no work-around, and I need these functions in order to do what needs to be done, but for the sake of debugging I ran tests which definitively demonstrated that these function calls are the cause of the D3D failure.

I even ran an experiment to demonstrate that the problem was not caused by some other bug in the downstream media foundation player such as using my D3D11 texture after releasing it. I modified my D3D11 texture buffer pool to create 2 output textures (call them A and B) for each frame instead of only 1, and I did the CUDA resource mapping and unmapping on the A textures, but sent the B textures down the Media Foundation pipeline. Even in this case, the D3D device context errors will sometimes happen if I call the CUDA graphics mapping functions, but never if I don’t call them.

So from this evidence, it really looks like there is some side-effect of the cuGraphicsMapResources() / cuGraphicsUnmapResources() functions which is causing the D3D11 device context to occasionally become corrupted in another thread. Is it possible that the CUDA cuGraphicsMapResources function is internally retrieving the immediate D3D11 device context corresponding with the ID3D11Device and using it during the map/unmap operation? If so, that seems like a bug in CUDA, because the D3D11 device context is not thread-safe and needs to be protected with some kind of mutex.