NvVideoEncoder video looping issue on Jetpack 5.X.X

Until jetpack version 4.6, the following code has been successfully working to encode H264/H265 video from DMA buffer FDs obtained from Argus API.

static int index = 0;
NvBuffer *nvBuffer;
if(index < encoder->output_plane.getNumBuffers())
{
    nvBuffer = encoder->output_plane.getNthBuffer(index);
    v4l2_buffer.index = index++;
}
else
{
    nvBuffer = encoder->output_plane.dqBuffer(v4l2_buffer, nullptr, &nvBuffer, 10);
}

if(!nvBuffer)
{
    return -1;
}

nvBuffer->planes[0].fd = fd;
nvBuffer->planes[0].bytesused = 1;

encoder->output_plane.qBuffer(v4l2_buffer, nvBuffer);

On newer jetpack versions from 5.X.X (tested primarily on 5.1.2), the video keeps looping on first buffers that were passed to the encoder. It seems like the encoder maps the given file descriptors for each input buffer (encoder was configured for 10 buffers on output plane) and expects the same FDs but with new data, which is not the use case in our software. I have reviewed multiple jetson multimedia API examples, including 10_camera_recording, but the logic in this example is different as the same FD is used repeatedly for both Argus and video encoder. The only workaround which I have found to be working is to map both the image FD and NvBuffer FD with NvBufSurfaceMap and copy the data with NvBufSurfaceCopy, which is highly inefficient and unnecessary. Is this a bug or should some extra steps be done in newer jetpack versions to get the encoder working properly?

Hi,
We have deprecated NvBuffer APIs on Jetpack 5 releases. Please check the migration guide:

https://developer.nvidia.com/embedded/jetson-linux-r3541
nvbuf_utils to NvUtils Migration Guide

And 01_video_encode sample of Jetpack 5 is updated accordingly. Please refer to the sample. You may also refer to the patch for 12_v4l2_camera_cuda:
How to use v4l2 to capture videos in Jetson Orin r35 Jetpack 5.0 and encode them using a hardware encoding chip - #8 by DaneLLL

I have reviewed the patch for using NvVideoEncoder in camera_v4l2_cuda example and implemented it into our application. The NvTransform function basically copies the input NvBuffer to the encoder NvBuffer, which is as inefficient as NvBufSurfaceCopy. On Jetson Xavier NX we are getting only ~18 FPS for 3840 x 2160 image resolution, which is nowhere near to the specification of 2x4K60 (H.265) encoding performance for this chip. I am aware that the limitation is the DMA buffer copy and not the encoding process itself, which is why I am trying to find a way to skip it altogether. The older NvBuffer API did not require any extra copying. Is there any way to achieve this?

Hi,
Please try the methods and see if there is enhancement:

  1. Run the script to enable hardware converter at maximum clock:
    VPI - Vision Programming Interface: Performance Benchmark
  2. Create NvBufSurfTransformSession for each v4l2 source, call NvBufSurfTransformSetSessionParams() and then NvBufSurfTransform().

I have tried the suggested steps you have mentioned and neither of those have helped the performance. It would seem like that the copy is necessary since jetpack 5, but the time it takes can be improved by using empty NvBufSurfTransformParams structure, which reduced the latency to only about 12 ms on Xavier NX, so now we are finally able to reach 30 FPS encoding at ~4K resolution. The problem itself has not been solved as the copy is in fact necessary, but the performance has been improved to the point that it is usable in our application.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.