Understanding NVDecodeD3d11 sample


I’m trying to understand he pipeline by which a decoded video frame is displayed.
My understanding is:

  1. cuvidMapVideoFrame uses pictureIndex of decoded frame to fetch device pointer (pDecodedFrame[active_field]) and pitch

  2. CUDA kernel is launched to convert NV12 pDecodedFrame[active_field] to pre-allocated g_pRgba device array

  3. cuMemcpy2D is used to copy g_pRgba to g_backBufferArray that is mapped to pTexture_[active_field] (which is a 2DTexture)

  4. context->CopyResource is used to copy from pTexture_[active_field] to pBackBuffer
    (which is buffer 0 of the swap chain).

  5. The swap chain then presents the next buffer

I don’t understand why steps 3 & 4 are necessary. Why can’t the CUDA kernel write directly to the target back buffer/texture? Seem’s like unnecessary copying. I’m sure there’s a good reason, I’m just new to CUDA and Direct3D.

It’d be great if there was a bare bones C video decode D3D11 example.
All the c++ object orientation makes it difficult to see the essential sequence and flow of data between the key API calls.

Cheers, Wayne.