Most efficient way to transfer a captured video frame to NvBufSurface

I use a Magewell Eco Capture M.2. capture card to capture video frames to mallocd memory using MWCaptureVideoFrameToVirtualAddress.

I then transfer the data to an NvBufSurface for encoding using nvenc on the Jetson.

I use NvBufSurfaceAllocate with memType NVBUF_MEM_SURFACE_ARRAY to create an NvBufSurface and use a pooling system to re-use these surfaces as required.

I use the functions NvBufSurfaceMap, memcpy (to copy the video data), NvBufSurfaceSyncForDevice and NvBufSurfaceUnMap to transfer the captured video data.

Although it works in real-time, I get high CPU usage from the memcpy.

I suspect (hope?) there is a more efficient way, such as:

  1. capturing directly to a mapped NvBufSurface.
  2. something else?

How are others dealing with this, what is the optimum path I should be taking?

If the device supports v4l2 in driver and can capture frame data through v4l2, you can try the sample:


If it does not support v4l2, your solution should be optimal.

So the problem I’m having trying to capture directly to a mapped NvBufSurface is when a surface is mapped the planes are not contiguous in memory; there is some padding between the Y and the UV plane…yet the only function I can use to capture the video frame expects a single pointer to the start of the planes…

I suppose I could allocate a surface using NVBUF_MEM_HANDLE for the capture, but how do I then efficiently transfer that to a surface allocated using NVBUF_MEM_SURFACE_ARRAY for transforming/encoding?

NvBufSurface is hardware DMA buffer and there’s data alignment for each plane. Please check if you can change data layout of the source to fit the alignment.

I suppose I could allocate a surface using NVBUF_MEM_HANDLE for the capture, but how do I then efficiently transfer that to a surface allocated using NVBUF_MEM_SURFACE_ARRAY for transforming/encoding?
We don’t support this function since the surface array has to consider alignment.

Thanks for confirming.

At the current time it isn’t possible to change the data layout of the source; it expects the Y and UV planes to be joined as a contiguous block. I can specify the pitch, but not any gap between the planes.

The NvBufSurfaceMap() command maps the surface DMA and gives back two pointers accessible via surfaceList[0].mappedAddr.addr[0] and surfaceList[0].mappedAddr.addr[1].

I’m not really too sure how memory-mapping works, but from my experiments these two pointers can be anywhere, for example:

30/01/24 13:19:42.823791873 [default] DEBUG : VIDEO_FRAME: ----------------------------------------------------------------------
30/01/24 13:19:42.823861541 [default] DEBUG : VIDEO_FRAME: Mapped buffer surface
30/01/24 13:19:42.823896679 [default] DEBUG : VIDEO_FRAME: ----------------------------------------------------------------------
30/01/24 13:19:42.823955883 [default] DEBUG : VIDEO_FRAME: buffer_surface->surfaceList[0].width = 3840
30/01/24 13:19:42.823986157 [default] DEBUG : VIDEO_FRAME: buffer_surface->surfaceList[0].height = 2160
30/01/24 13:19:42.824015823 [default] DEBUG : VIDEO_FRAME: buffer_surface->surfaceList[0].pitch = 7680
30/01/24 13:19:42.824059601 [default] DEBUG : VIDEO_FRAME: buffer_surface->surfaceList[0].dataSize = 25034752
30/01/24 13:19:42.824093683 [default] DEBUG : VIDEO_FRAME: buffer_surface->surfaceList[0].planeParams.num_planes = 2
VIDEO_FRAME: buffer_surface->surfaceList[0].mappedAddr.addr[0] = 0xffff08088000
30/01/24 13:19:42.824130710 [default] DEBUG : VIDEO_FRAME: buffer_surface->surfaceList[0].planeParams.width[0] = 3840
30/01/24 13:19:42.824158807 [default] DEBUG : VIDEO_FRAME: buffer_surface->surfaceList[0].planeParams.height[0] = 2160
30/01/24 13:19:42.824184345 [default] DEBUG : VIDEO_FRAME: buffer_surface->surfaceList[0].planeParams.pitch[0] = 7680
30/01/24 13:19:42.824209498 [default] DEBUG : VIDEO_FRAME: buffer_surface->surfaceList[0].planeParams.offset[0] = 0
30/01/24 13:19:42.824235836 [default] DEBUG : VIDEO_FRAME: buffer_surface->surfaceList[0].planeParams.psize[0] = 16646144
VIDEO_FRAME: buffer_surface->surfaceList[0].mappedAddr.addr[1] = 0xffff18402000
30/01/24 13:19:42.824266430 [default] DEBUG : VIDEO_FRAME: buffer_surface->surfaceList[0].planeParams.width[1] = 1920
30/01/24 13:19:42.824293791 [default] DEBUG : VIDEO_FRAME: buffer_surface->surfaceList[0].planeParams.height[1] = 1080
30/01/24 13:19:42.824320801 [default] DEBUG : VIDEO_FRAME: buffer_surface->surfaceList[0].planeParams.pitch[1] = 7680
30/01/24 13:19:42.824347043 [default] DEBUG : VIDEO_FRAME: buffer_surface->surfaceList[0].planeParams.offset[1] = 16646144
30/01/24 13:19:42.824374468 [default] DEBUG : VIDEO_FRAME: buffer_surface->surfaceList[0].planeParams.psize[1] = 8388608

Sometimes the addr[1] pointer is lower than the addr[0] pointer, there seems to be no relationship between them.

I wonder if it is possible, somehow, to have these planes mapped contiguously such that they can be written to as one plane by the capture card driver, maybe via a different memory map command?

This is hard requirement in multi-plane formats and it cannot be adapted. Not sure if your source can generate single-plane formats such as YUV422(YUYV, UYVY, etc). If the source supports the format, we would suggest change to the format and try.

Hmm, the source can generate single plane formats but all those packed formats are 4:2:2 whereas the source is 4:2:0 and I want to end up encoding 4:2:0, so I’d be going through an undesirable conversion:

4:2:0 source → 4:2:2 capture → 4:2:0 conversion → encode

Rather than the current

4:2:0 source → 4:2:0 capture → encode

It turns out that the wonderful folk at Magewell are working on some enhancements to their SDK allowing both zero-copy and multi-planar capture which should solve all of this.

If anyone is researching capture cards; Magewell’s support is really good by the way.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.