Decoder_unit_sample in MMAPI, extracting nv12 data is inefficient

In the dump_raw_dmabuf() function, there is a code that uses the for loop to extract data. After testing, this code is very inefficient.

What is the efficient way to extract data from nvbuf_surf->surfaceList?
How can the memory in nvbuf_surf->surfaceList[0].mappedAddr.addr[plane] be converted into cuda device memory?

There is data alignment in hardware DMA buffer so for dumping frame data to a file, we have to copy line by line to eliminate the additional pixels.

For getting CUDA buffer of a NvBufSurface, please refer to cuda_postprocess() in


hat’s what I’m doing now.

But I find it very performance-consuming and time-consuming. Is there a high-performance method to extract DMA buffer data ?

The memory type of DMA buffer is NVBUF_MEM_SURFACE_ARRAY, which cannot be used by [_device] Functions of type.

How can I convert NVBUF_MEM_SURFACE_ARRAY type memory into [_device] without copying memory data What about functions of type?

With the following function calls:


You can get CUDA pointer to the buffer and there is no additional memory copy. It is the optimal method on Jetson platforms.

Can you provide a specific use method or example?

It is demonstrated in HandleEGLImage(). The code of the function is in


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.