CUDA-DX11 interop and DXGI_FORMAT_NV12 surfaces

Folks,

I ran into a problem while trying to feed NVENC using ID3D11Texture2D surfaces with DXGI_FORMAT_NV12.

I can create the DXGI_FORMAT_NV12 surface, register it for CUDA interop (cudaGraphicsD3D11RegisterResource) and I’m able to register the DX11 resource for NVENC input (nvEncRegisterResource) without errors. A CUDA kernel does RGA->NV12 color space conversion of source frames into a temporary CUDA memory buffer which is then copied over the DX11 resource (see code below). The surface is finally sent for encoding by NVENC (nvEncMapInputResource / nvEncEncodePicture / nvEncUnmapInputResource), which returns a valid bitstream that is written to a MP4 file. Everything works great but the resulting video lacks chrominance data, most likely because UV plane is lacking when copying the CUDA memory buffer into the mapped NV12 surface.

The code I’m using to do the copy is very simple:

// m_pBuffer_CUDA is the DX11 surface with format DXGI_FORMAT_NV12
// registered with cudaGraphicsD3D11RegisterResource

bool bResult = false;
cudaArray_t pBuffer = nullptr;
cudaGraphicsResource_t dxResource = m_pBuffer_CUDA;
cudaRes = cudaGraphicsMapResources(1, &dxResource);
if (cudaRes == cudaSuccess)
{
cudaRes = cudaGraphicsSubResourceGetMappedArray((cudaArray_t*)&pBuffer, dxResource, 0, 0);
if (cudaRes == cudaSuccess)
{
// m_cuBuffer holds the NV12 surface and its content is guaranteed
// to have a valid NV12 surface.
cudaRes = cudaMemcpy2DToArray(pBuffer, 0, 0, (void*)m_cuBuffer, m_Pitch, m_Width, m_Height, cudaMemcpyDeviceToDevice);
if (cudaRes == cudaSuccess)
{
// Copy is OK but gets only Y plane… how to get UV plane copied into NV12 mapped surface?
bResult = true;
}
}
cudaRes = cudaGraphicsUnmapResources(1, &dxResource);
if (cudaRes != cudaSuccess)
{
printf(“cudaGraphicsUnmapResources failed. cudaRe=%d”, cudaRes);
}
}

Result video: https://www.youtube.com/watch?v=yuq3uBrLu10

Question is: how to copy the UV plane into the mapped NV12 surface?