NVDEC decoded frame - trying a zero copy to NV12 d3d11 texture

Hi all, new dedicated thread.

I’m receiving CUVIDPARSERDISPINFO packets from nvdec, what I want to do now is a zero copy to d3d11 texture for rendering. The goal is to not overload CPU.

Test scenario: 40 indipendent h264 streams, 704x576@15fps

What I’ve managed to do so far: cuGraphicsD3D11RegisterResource two different textures (R8_UNorm + R8G8_UNorm, created at begin), then cuGraphicsMapResources, cuGraphicsSubResourceGetMappedArray (one time) cuMemcpy2DAsync and cuGraphicsUnmapResources for both. This way I got my complete frame on screen, but average CPU usage is about 38%

If I get rid of the chrominance texture, CPU drops to 25%.

What I’m trying to do: I’ve created a single NV12 texture, cuGraphicsD3D11RegisterResource it, then trying to copy decoded nv12 frame into the mapped array in a single shot.

If I leave CUDA_MEMCPY2D unchanged, I obviously obtain only the luma plane, a greenish frame.
Each attempts to modify these params leads to a CUDA_ERROR_INVALID_VALUE from the following cuMemcpy2DAsync.

How to modify params to perform a single shot copy?

CUDA_MEMCPY2D m = { 0 };
m.srcDevice = dpSrcFrame;
m.srcPitch = nSrcPitch;
m.dstMemoryType = CU_MEMORYTYPE_ARRAY;
m.dstArray = dstArray;
m.WidthInBytes = m_nWidth;
m.Height = m_nLumaHeight;


Please can I have assistance on my request? Is that possible someway?