DXVA/D3D11VideoContext: decoding stalls rendering thread


we decode multiple h.264 streams (from 720p up to 4k) concurrently using a mixture of DXVA/D3D11VideoDecoder-accelerated and software decoding. The D3D11VideoDecoder-based decoding is performed in separated threads. The UI and video is rendered via DirectX/Direct2D on the main thread. The problem is that both D2D/DX and the Decoder seem to lock the device during operation, rendering concurrent decoding practically useless. D3D11VideoContext::DecoderEndFrame takes up to 30ms for the first few frames (not necessarily i-frames, since i-frames later in the stream show acceptable performance). During operation, DecoderEndFrame locks the rendering to the D3DDevice on the main thread which leads to stuttering in the ui.
Is there any chance to get rid of the lock-contention on the device? If not, what’s the point of hw-accelerated decoding if there’s fighting for HW resources used for normal rendering?

Edit: ID3D11VideoContext is only obtainable from the immediate context, deferred contexts dont seem to implement this interface.

We also have support for Intel Quick-Sync via MediaSDK, the H.264 decoders in Intel-CPUs dont show this behavior. Only dedicated Nvidia (and AMD) GPUs when fed through the DXVA/D3D11VideoDecoder interfaces.

Thanks alot for any hints and tips in advance!