Encoding from OpenCV GpuMat and Writing Output to File

joseph.zc · November 14, 2023, 4:28am

Hi All,
I am new to encoding in general, but I am trying to take some manually generated OpenCV cv::cuda::GpuMat NV12M frames, encode them to H265 format, and then write out as a video file. I have been using the jetson_multimedia_api samples as reference, but they all seem to use file input/output. For the time being I want to avoid downloading the GpuMat frames for the sake of speed.

It looks like read_video_frame is where I should be looking at for getting the raw data into the buffers, but I’m not sure how I could perform the same operations with the CUdeviceptr .data component of the GpuMat rather than a file stream. Besides that, it seems that the file output functionality doesn’t need to change much. Is 03_video_cuda_enc the closest match to what I am trying to accomplish?

If I understand the functionality correctly:

Make the GpuMat frames
Perform any necessary NvBuffer GPU preparations/conversions (?)
Enqueue the frames onto the NvBuffers in the output plane
Dequeue the buffers from the capture plane
Write the encoded frame to the file

DaneLLL · November 14, 2023, 6:19am

Hi,
If your source data is in cv::cuda::GpuMat, you would need to copy the data from GpuMat to NvBufSurface and then send to encoder. This is done through GPU so it should not have much overhead.

joseph.zc · November 14, 2023, 7:13am

Thank you for the quick reply. If it is that simple, then it would certainly be helpful. However, I still do not understand how to use it in conjunction with NvBuffers like how the jetson_multimedia_api samples function.

I found this thread previously which seems to be doing something similar. Are there significant differences between the two methods that I am not understanding?

I should also add that this is JetPack 4.6.1 if it changes anything.

DaneLLL · November 14, 2023, 7:35am

Hi,
On Jetpack 4.6.1, please use NvBuffer APIs. And this method should work:
Copy OpenCV GpuMat data to an NvBuffer - #9 by sanatmharolkar

NvBuffer in NV12 has two planes. One is Y plane and the other is UV-interleaved plane. You would need to copy data to the two planes individually.
Here is a post about map NV12 NvBuffer to GpuMat:
Real-time CLAHE processing of video, framerate issue. Gstreamer + nvivafilter + OpenCV - #5 by Honey_Patouceul
In your use-case, you are copying GpuMat to NvBuffer. Please refer to the post to handle alignment.

You may do implementation based on 01_video_encode.

joseph.zc · November 14, 2023, 8:14am

Thank you for quick reply again. I hadn’t realized that about the planes, but that is true and seems simple enough to split with opencv. I will try with an implementation similar to 01_video_encode and see how it goes. Thank you for the instructions.

Honey_Patouceul · November 22, 2023, 9:12pm

You may see: OpenCV CUDA processing from gstreamer pipeline [JP4, JP5]

joseph.zc · November 28, 2023, 3:57am

On Jetpack 4.6.1, please use NvBuffer APIs. And this method should work:
Copy OpenCV GpuMat data to an NvBuffer - #9 by sanatmharolkar

This was helpful with getting the code to work for CPU encoding, thank you.

You may see: OpenCV CUDA processing from gstreamer pipeline [JP4, JP5]

This was helpful in figuring out the process for GPU encoding, thank you. I hadn’t been using your GetPitch function and was getting a video with the right colors but bad alignment until I switched.

Although it works well now, there doesn’t seem to be much of a speedup compared to the CPU method and cudaMemcpyDefault. Approximately 0.2 seconds per 500 hundred frames or so (in total 14.9 seconds for GPU encoding and 15.1 seconds for CPU encoding including setup). For 40 frames, it is only about 0.1 seconds faster.

I notice the time between successive frame writes to the output file did not change in terms of time taken. Are there other measures I need to take for the capture plane as well to utilize GPU encoding?

DaneLLL · November 28, 2023, 4:59am

Hi,
Please execute sudo tegrastats to check if GPU is at full loading. If it is at full loading, the GPU engine shall offer optimal throughput in the use-case.

For capability of hardware encoder, please check
https://developer.nvidia.com/jetson-xavier-nx-data-sheet

joseph.zc · November 29, 2023, 1:24am

Unless my math is incorrect, it does seem to be the case that we are reaching the maximum throughput for HEVC encoding. For 4504x4504 images that we use, 40 frames done in about 1.2s seems to have a throughput of 676 MP/s which is close to the maximum listed.

It seems the switch to using GPU encoding just sped up the times to copy into the buffer which helps but might be the extent of things. Thank you for clarifying.

joseph.zc · November 30, 2023, 9:42am

I’ve actually run into another issue in trying to get this to work with standard uchar pointers instead of GpuMats. I had assumed setting eglFrame.frame.pPitch[0] and [1] to the Y and UV planes respectively would work, but the buffer ends up not copying anything.

Is this format/order of function calls with regards to the EGL object handling correct?

cv::cuda::GpuMat d_frame_rgb(4504, 4504, CV_8UC3);
EGLImageKHR eglimage;
eglimage = NvEGLImageFromFd(ctx.eglDisplay, buffer->planes[0].fd);
CUresult status;
CUeglFrame eglFrame;
CUgraphicsResource pResource = NULL;
cudaFree(0);
status = cuGraphicsEGLRegisterImage(&pResource, eglimage, CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
if(status != CUDA_SUCCESS)
    cerr << "cuGraphicsEGLRegisterImage failed\n";
status = cuGraphicsResourceGetMappedEglFrame(&eglFrame, pResource, 0, 0);
if (status != CUDA_SUCCESS)
    cerr << "cuGraphicsResourceGetMappedEglFrame failed\n";

status = cuCtxSynchronize();
if (status != CUDA_SUCCESS)
    cerr << "cuCtxSynchronize failed\n";

uchar* d_frame_y_uchar;
uchar2* d_frame_uv_uchar;
cudaMalloc(&d_frame_y_uchar, 4504*4608*sizeof(uchar));
cudaMalloc(&d_frame_uv_uchar, (4504/2)*(4608/2)*sizeof(uchar2));
eglFrame.frame.pPitch[0] = (void*)d_frame_y_uchar;
eglFrame.frame.pPitch[1] = (void*)d_frame_uv_uchar;
if(d_frame_y_uchar != eglFrame.frame.pPitch[0])
    cerr << "ERROR copying y frame to EGLFRame object\n";
if(d_frame_uv_uchar != eglFrame.frame.pPitch[1])
    cerr << "ERROR copying uv frame to EGLFRame object\n";

make_pattern(d_frame_rgb);
convertRGBtoNV12M(d_frame_rgb, d_frame_y_uchar, d_frame_uv_uchar);
  
read_video_frame(d_frame_y_uchar, d_frame_uv_uchar, *buffer);

status = cuCtxSynchronize();
if (status != CUDA_SUCCESS)
    cerr << "cuCtxSynchronize 2 failed\n";
status = cuGraphicsUnregisterResource(pResource);
if (status != CUDA_SUCCESS)
    cerr << "cuGraphicsUnregisterResource failed\n";
NvDestroyEGLImage(ctx.eglDisplay, eglimage);

int read_video_frame(uchar* yframe, uchar2* uvframe,  NvBuffer & buffer)
{
    for(unsigned int i = 0; i < buffer.n_planes; i++){

        NvBuffer::NvBufferPlane &plane = buffer.planes[i];
        if(i == 0){
            cudaMemcpy(plane.data, yframe, plane.fmt.bytesperpixel * plane.fmt.width * plane.fmt.height, cudaMemcpyDeviceToDevice);
        }else{
            cudaMemcpy(plane.data, uvframe, plane.fmt.bytesperpixel * plane.fmt.width * plane.fmt.height, cudaMemcpyDeviceToDevice);
        }
        plane.bytesused = plane.fmt.bytesperpixel * plane.fmt.width * plane.fmt.height;
    }
    
    return 0;
}

I can get it to work without issues with GpuMats like in OpenCV CUDA processing from gstreamer pipeline [JP4, JP5], but something about the uchar pointers results in the capture plane buffers writing nothing to the file. Without using EGLFrames, using uchar pointers only works, so there is something regarding EGLFrames/EGLImages that I don’t seem to understand yet.

rajupadhyay59 · December 1, 2023, 2:27am

I do not know if this is even your case but I once created uchar4* pointer from gpumat (which came from egl) and then I used cuda programming to do some processing on my image.

joseph.zc · December 1, 2023, 4:12am

That is what I was doing to copy the data into the buffer, but I wanted to avoid having to use OpenCV if I could. In writing this, I realized my issue and have fixed it. It was actually just a simple mistake involving pointer usage that I overlooked.

uchar* d_frame_y_uchar;
uchar2* d_frame_uv_uchar;
cudaMalloc(&d_frame_y_uchar, height*pitch*sizeof(uchar));
cudaMalloc(&d_frame_uv_uchar, (height/2)*(pitch/2)*sizeof(uchar2));
//Start loop here
cudaMemcpy(eglFrame.frame.pPitch[0], d_frame_y_uchar, height*pitch*sizeof(uchar), cudaMemcpyDeviceToDevice);
cudaMemcpy(eglFrame.frame.pPitch[1], d_frame_uv_uchar, (height/2)*(pitch/2)*sizeof(uchar2), cudaMemcpyDeviceToDevice);
uchar* new_y = (uchar*)eglFrame.frame.pPitch[0];
uchar2* new_uv = (uchar2*)eglFrame.frame.pPitch[1];
//End loop

The cudaMalloc is before the loop to avoid the unnecessary allocation time for each loop iteration.

rajupadhyay59 · December 1, 2023, 5:30am

This looks good, I’ll give it a try in my use case too.
Till now what I was doing is buffer → egl → gpu → uchar4 → preprocessing (cuda kernel, creating black rectangles)

But if I try your method, I do not need to create a GpuMat.

Thanks for sharing!

system · December 15, 2023, 5:30am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Encode from cv::cuda::GpuMat Jetson AGX Xavier mmapi	7	966	May 24, 2023
Copy OpenCV GpuMat data to an NvBuffer Jetson AGX Xavier camera , opencv , cuda , mmapi	10	1358	October 18, 2021
How to pass to hardware encoder from OpenCV Jetson Xavier NX opencv , encoder	15	5044	October 18, 2021
JPEG encoding of cv::Mat Jetson TX2	6	1823	October 18, 2021
need to encode directly from OpenCV::cuda::GpuMat Video Processing & Optical Flow opencv	6	1805	December 14, 2020
efficient transfer of buffers created on gpu to encoder Jetson TX2	3	675	October 18, 2021
how to input GPU device data directly to encode data? Jetson TX2	3	475	October 18, 2021
how to encoder with cv::Ptr<cv::cudacodec::VideoWriter> Jetson AGX Xavier	5	2046	October 18, 2021
Opencv gpu mat into GStreamer without downloading to cpu Jetson Nano opencv , gstreamer	19	8787	October 13, 2021
Send OpenCV GpuMat to GStreamer pipeline without memory copy? Jetson TX2	12	3878	October 18, 2021

Encoding from OpenCV GpuMat and Writing Output to File

Related topics