Encode frames using v4l2 and NvBufSurface

Hasa · June 10, 2024, 8:08am

My system:

Jetson Xavier NX
Jetson Linux 35.2.1

For my application I have some frames on cuda memory which I want to encode them. Using the MMAPI video encoding sample I could encode my video correctly.
However according to the sample, before enqueuing the frames using v4l2_buf, I have to transfer the frames to NvBuffer* array, which is on CPU, so technically I need a slow copy between GPU and CPU.
I also checked the possibility to use NvBufSurface directly (without NvBuffer) but that also seems a CPU-accessible structure (NvBufSurfaceMappedAddr is mentioned as “holds planewise pointers to a CPU mapped buffer”).
Now my question is how should I transfer my data to v4l2 without touching CPU memory and rely only on DeviceToDevice copy? My data is a yuv image kept in 3 different uchar* arrays.

In summary this is the MMAPI sample workflow:

file_read(cpu) → NvBuffer(cpu) → NvBufSurfaceSyncForDevice (cpu->gpu) → v4l2_buf (gpu)

But my initial data is on GPU memory, so how should I change the above pipeline?
Thank you.

DaneLLL · June 11, 2024, 3:03am

Hi,
Yo may refer to the sample:

/usr/src/jetson_multimedia_api/samples/03_video_cuda_enc

NvBufSurface can be mapped to GPU and you can GPU-to-GPU memory copy to have frame data in NvBufSurface.

And we have later Jetpack release. Would suggest upgrade to latest 5.1.3(r35.5.0)

Hasa · June 11, 2024, 3:27am

Thank you I am checking that sample as well. In that one also the initial data is grasped by CPU and then mapped to GPU.
I am wondering how can I load the NvBuffer on GPU even before mapping by NvBufSurface?
Can I map NvBufSurface to GPU in advanced, and then copy my GPU data to NvBufSurfaceMappedAddr directly (and omit the NvBuffer)?

DaneLLL · June 11, 2024, 4:22am

Hi,
The 03 sample demonstrates how to access NvBufSurface through GPU and CPU. You may ignore the CPU part and focus on GPU part.

Hasa · June 11, 2024, 5:49am

Thank you, I am trying to get the key point here.
This is the main part of the 03 sample that I am looking at:

struct v4l2_buffer v4l2_buf;
struct v4l2_plane planes[MAX_PLANES];
NvBuffer *buffer = ctx.enc->output_plane.getNthBuffer(i);
memset(&v4l2_buf, 0, sizeof(v4l2_buf));
memset(planes, 0, MAX_PLANES * sizeof(struct v4l2_plane));
v4l2_buf.index = i;
v4l2_buf.m.planes = planes;

read_video_frame(ctx.in_file, *buffer); // This gets the data on CPU?!

ret = sync_buf (buffer); // This is for CPU so I ignore!

ret = render_rect (&ctx, buffer); // This maps the data on EGL

ret = ctx.enc->output_plane.qBuffer(v4l2_buf, NULL);

,
I understand ignoring sync_buf (cpu) and going to render_rect,
but what should I do with NvBuffer? In a function similar to read_video_frame I have tried:

void cuda_read_video_frame(
    unsigned char* y_arr,
    unsigned char* u_arr,
    unsigned char* v_arr,
    NvBuffer & buffer)
{

   for (i = 0; i < buffer.n_planes; i++)
   {
        NvBuffer::NvBufferPlane &plane = buffer.planes[i];
        plane.bytesused = 0;

             if(i==0)  cudaMemcpy2D(plane.data, plane.fmt.stride, y_arr, plane.fmt.width, plane.fmt.width, plane.fmt.height, cudaMemcpyDeviceToDevice);
        else if(i==1)  cudaMemcpy2D(plane.data, plane.fmt.stride, u_arr, plane.fmt.width, plane.fmt.width, plane.fmt.height, cudaMemcpyDeviceToDevice);
        else if(i==2)  cudaMemcpy2D(plane.data, plane.fmt.stride, v_arr, plane.fmt.width, plane.fmt.width, plane.fmt.height, cudaMemcpyDeviceToDevice);

        plane.bytesused = plane.fmt.stride * plane.fmt.height;
    }
}

but it doesn’t work.
If I change it to cudaMemcpyDeviceToHost everything works fine! (although very slow, since the DeviceToHost copy takes more time)

So still not clear for me where can I inject my cuda_data? Should I copy into nvbuf_surf->surfaceList[0].mappedAddr.eglImage? then what how should I fill NvBuffer planes?

DaneLLL · June 11, 2024, 7:08am

Hi,
Please check HandleEGLImage() in

/usr/src/jetson_multimedia_api/samples/common/algorithm/cuda/NvCudaProc.cpp

Would need to call the CUDA functions to get the pointer and then can copy data to the NvBufSurface.

Hasa · June 12, 2024, 5:44am

Hi,
Now I am getting some ideas.
It seems using the below code, I can transfer my data to the NvBufSurface eglImage:

for (uint32_t j = 0 ; j < buffer->n_planes; j++)
{
     NvBufSurface *nvbuf_surf = 0;
     NvBufSurfaceFromFd(buffer->planes[j].fd, (void**)(&nvbuf_surf));
     NvBufSurfaceMapEglImage(nvbuf_surf, -1);
     NvBufSurfaceParams *sfparams = &nvbuf_surf->surfaceList[0];
     NvBufSurfacePlaneParams *nvspp = &nvbuf_surf->surfaceList[0].planeParams;

     CUgraphicsResource pResource;
     cuGraphicsEGLRegisterImage(&pResource, sfparams>mappedAddr.eglImage, CU_GRAPHICS_MAP_RESOURCE_FLAGS_WRITE_DISCARD);
     CUeglFrame eglFrame;
     cuGraphicsResourceGetMappedEglFrame( &eglFrame, pResource, 0, 0 );

          if(i==0)  cudaMemcpy2D(eglFrame.frame.pPitch[0], nvspp->pitch[0], y_cuda, nvspp->width[0], nvspp->width[0], nvspp->height[0], cudaMemcpyDeviceToDevice);
     else if(i==1)  cudaMemcpy2D(eglFrame.frame.pPitch[1], nvspp->pitch[1], u_cuda, nvspp->width[1], nvspp->width[1], nvspp->height[1], cudaMemcpyDeviceToDevice);
     else if(i==2)  cudaMemcpy2D(eglFrame.frame.pPitch[2], nvspp->pitch[2], v_cuda, nvspp->width[2], nvspp->width[2], nvspp->height[2], cudaMemcpyDeviceToDevice);
}

but what about the NvBuffer* now? Should I do another copy to buffer->planes[j].data?

The enoder thread:

encoder_capture_plane_dq_callback(struct v4l2_buffer *v4l2_buf, NvBuffer * buffer, NvBuffer * shared_buffer, void *arg)

strictly needs the NvBuffer* data, but still I couldn’t find the way to transfer the data to NvBuffer using only one cudaMemcpyDeviceToDevice copy !?

DaneLLL · June 12, 2024, 8:03am

Hi,
For feeding NvBufsurface in output plane, you don’t need to fill in NvBuffer. Please also refer to this patch:
How to use v4l2 to capture videos in Jetson Orin r35 Jetpack 5.0 and encode them using a hardware encoding chip - #8 by DaneLLL

And please check write_video_frame() for getting encoded stream in capture plane.

system · July 3, 2024, 6:20am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Transfer video frames from a PCIe capture card to Jetson TX1 device memory (for RT video processing) Jetson TX1	20	5868	June 1, 2018
How to feed the V4L2 H265 encoder NvBufSurface? Jetson AGX Xavier mmapi , encoder , video	5	563	January 30, 2024
NvBufferGetParams failed Jetson AGX Xavier camera , hw , gstreamer	31	1631	December 7, 2022
Faster way to cache images on Jetson DeepStream SDK	8	403	June 21, 2023
Copy image data to custom array from IFrame of EGLStream Jetson Xavier NX mmapi	7	337	May 30, 2024
How to pass to hardware encoder from OpenCV Jetson Xavier NX opencv , encoder	15	5005	October 18, 2021
Encoding from OpenCV GpuMat and Writing Output to File Jetson Xavier NX opencv , cuda , jetson-inference	13	1393	December 15, 2023
Most efficient way to transfer a captured video frame to NvBufSurface Jetson Orin NX video	9	402	February 21, 2024
capture + encode on TX2 Jetson TX2	10	2107	October 18, 2021
How to process NvBuffer video data in CUDA Jetson TX2	19	5647	October 18, 2018

Encode frames using v4l2 and NvBufSurface

Related topics