Formatting images to feed into NvVideoEncoder (Tegra multimedia API)

I am using a TX2 to receive video from a gigE vision camera that does not support V4L2. The gigE vision camera driver sends me a raw array of bytes containing the pixel values every time a new frame is generated by the camera.

As new frames are received from the camera I would like to use NvVideoEncoder to real time compress the streaming video.

Which example in the Tegra Multimedia API should I use as a starting point?

How do I modify the format of the byte array containing my pixel values that I receive from my gigE camera in order to transform it into a data type that NvVideoEncoder will accept as input?
My camera is 1920x1080 pixels with 8 bit gray values (no color / chrominance). So each time
the gigE vision camera sends me a frame, I get a 2073600 byte buffer of pixels.

Do I need fill in the v4l2_buffer and v4l2_plane with my raw pixel bytes coming from my camera?

struct v4l2_buffer v4l2_buf;
struct v4l2_plane planes[MAX_PLANES];

How do the bytes need to be reordered in order to be compatible with NvVideoEncoder?
I am guessing that I need to append some fake chrominance pixels to the pixel data that gets
put into either the v4l2_buffer or the v4l2_plane. How should this be done?

In the initialization code below, how many capture buffers need to be setup?

// Enqueue all the empty capture plane buffers
    for (uint32_t i = 0; i < m_VideoEncoder->capture_plane.getNumBuffers(); i++)
        struct v4l2_buffer v4l2_buf;
        struct v4l2_plane planes[MAX_PLANES];

        memset(&v4l2_buf, 0, sizeof(v4l2_buf));
        memset(planes, 0, MAX_PLANES * sizeof(struct v4l2_plane));

        v4l2_buf.index = i;
        v4l2_buf.m.planes = planes;

        CHECK_ERROR(m_VideoEncoder->capture_plane.qBuffer(v4l2_buf, NULL));

In other words, how is m_VideoEncoder->capture_plane.getNumBuffers() being determined?

In this method call:

ret = m_VideoEncoder->setCapturePlaneFormat(ENCODER_PIXFMT, STREAM_SIZE.width,
                                    STREAM_SIZE.height, 2 * 1024 * 1024);

what are the legal values for the ENCODER_PIXFMT argument?
Also, looking at NvVideoEncoder.h, the last argument is named sizeimage
and is described as,

“Maximum size of the encoded buffers on the capture plane in bytes”

What should this number be for my 1920x1080 image based on however I have to transform
my raw pixel bytes in order to be put inside a v4l2_buffer struct.

Hi ceres.imaging,
The output plane to encoder has to be I420 or NV12. In 01_video_encode, it is configured to I420:

// Set encoder output plane format
ret =
    ctx.enc->setOutputPlaneFormat(V4L2_PIX_FMT_YUV420M, ctx.width,

For conversion, you can put grey into Y plane and set 0x80 to U and V planes.
Here are steps for your reference:
1 Generate I420s

$ gst-launch-1.0 videotestsrc num-buffers=150 ! video/x-raw,width=640,height=480,format=I420 ! filesink location=a.yuv

2 Apply the following change to NvUtils.cpp and rebuild 01_video_encode

read_video_frame(std::ifstream * stream, NvBuffer & buffer)
    uint32_t i, j;
    char *data;

    for (i = 0; i < buffer.n_planes; i++)
        NvBuffer::NvBufferPlane &plane = buffer.planes[i];
        std::streamsize bytes_to_read =
            plane.fmt.bytesperpixel * plane.fmt.width;
        data = (char *);
        plane.bytesused = 0;
        for (j = 0; j < plane.fmt.height; j++)
            [b]if (i == 0) { // Y
                stream->read(data, bytes_to_read);
            } else { // U and V
                stream->seekg(bytes_to_read, stream->cur);
                memset(data, 0x80, bytes_to_read);
            if (stream->gcount() < bytes_to_read)
                return -1;
            data += plane.fmt.stride;
        plane.bytesused = plane.fmt.stride * plane.fmt.height;
    return 0;

3 Run

$ ./video_encode a.yuv 640 480 H264 ~/a.h264

4 You can play it via 00_video_decode

$ export DISPLAY=:0
$ ../00_video_decode/video_decode ~/a.h264 H264

Dear DaneLLL,

Is it possible to provide us more documentation or a tutorial that shows step by step the image processing pipeline for HW encoding example?
This threaded call of the hw encoder is quite new to the video encoding people like us. We are used to feed the raw images and get the encoded byte stream through one function call such as for x264:

int frame_size = x264_encoder_encode(encoder, &nals, &num_nals, &pic_in, &pic_out);

It would be great if you are able to provide a code example on this issue.

Could you also please explain the usage reason of the buffer initialization given in the first question with comment “// Enqueue all the empty capture plane buffers” ? I am curious whether this buffer causes a visual delay for a real-time video streaming application.

Thanks for your support!

Hi bcizmeci,
Please refer to

All APIs and samples are documented in.

Dear DaneLLL,

Thanks for the recent documentation! Similar to ceres.imaging, I am feeding the raw video stream from another camera source. While implementing my application, I am taking the “01_video_encode” sample as example.
I would like to avoid frame buffering to achieve the lowest possible latency. Therefore, I make the plane buffer settings as follows:

ret = ctx.enc->output_plane.setupPlane(V4L2_MEMORY_MMAP, 1, true, false); // set 1 instead of 10
   ret = ctx.enc->capture_plane.setupPlane(V4L2_MEMORY_MMAP, 1, true, false); // set 1 instead of 10

    printf("DEBUG: Number of buffers: %d \n",ctx.enc->output_plane.getNumBuffers());

However, the printf statement above shows me : “DEBUG: Number of buffers: 10”



what is the meaning of the last parameter in this function call used when setting up the video encoder

ret = ctx.enc->setCapturePlaneFormat(ctx.encoder_pixfmt, ctx.width,
                                         ctx.height, 2 * 1024 * 1024);


What is the significance of 2 * 1024 * 1024?

Does this value need to change based on the bit rate or the dimensions of the video fed to the encoder?


Please refer to NvVideoEncoder.h

     * Sets the format on the converter capture plane.
     * Calls \c VIDIOC_S_FMT IOCTL internally on the capture plane.
     * @param[in] pixfmt One of the coded V4L2 pixel formats.
     * @param[in] width Width of the input buffers in pixels.
     * @param[in] height Height of the input buffers in pixels.
     * @param[in] sizeimage Maximum size of the encoded buffers on the capture.
     *                      plane in bytes
     * @return 0 for success, -1 otherwise.
    int setCapturePlaneFormat(uint32_t pixfmt, uint32_t width,
                              uint32_t height, uint32_t sizeimage);


let’s say my image is 1920 x 1080 at 1.5 bytes per pixel (V4L2_PIX_FMT_YUV420M).
Should the value of sizeimage be 1920 * 1080 * 3 / 2?
If not, what should it be?


Yes, please set it as widthheight1.5

I wrote a class and test fixture which takes in raw luminance video frames and appends blank UV data to them and then shoves them into the video encoder. Hopefully it is useful to somebody. The code is based on the 01_video_encode example from MM API code samples. (25.8 KB)

Thanks for your sharing. I am sure someone will find it useful.

Thx ceres.imaging for Attachments. I exploring an issu for my problem.

Like bcizmeci , i try to encode a raw video stream.

Today, i used a FFMPEG function to encode and write with h264/MP4 with out Harware encode. I would be like to use HW HEVC.
With a function, i want to keep last h264 buffer to the encoder and inject it into MP4 encapsulation ( on my exist FFMPEG code). That’s possible , no ?

Hi Syd, we don’t support ffmpeg with hardware acceleration on TX2.

i know that … i try to use HW HEVC in the place of ffmpeg h264 code without change convert ( YUV ) and encapsule (MP4) ffpeg functions.

Only push h264 buffer from MMAPI into my code.

my idea , it’s try to used “encode_frames” function with “MAX_PLANES = 1” and push DATA from List (extern input) into this loop . after , pull the H264 buffer results into another list (output).


I find fr testing … thx ceres.imaging for example.

I build test version , that’s compile on TX1 / TX2 , with MMAPI 24.2.1 and 28.1.


  • fix the thread with callback and abort functions.
  • may be a way to convert YUV420 and RGBx.

So , on my application with FFMPEG , i have :

1 ) lib FFMPEG work with YUV420P for H.264.
2 ) a camera give BGRa buffers , it’s convert to YUV420P buffer.
3 ) I use an overlay “datetime” on this video ( write on YUV420P buffers ).
4 ) encode YUV420P to H264.
5 ) write H264 to MP4 file.

I can change (4) for use H264 from NvVideoEncoder ( like git example), but i have wrong format buffer … so, i have wrong video … :/

But, how i can convert YUV420P to YUV420M for H.264 NvVideoEncoder ?!

I haven’t find how with NvVideoConvert ( samples 07 ) , i can make a NvVideoContext between YUV420P (ffmpeg) and YUV420M from MMAPI.

Not sure what YUV420P is. Other users may share experience.

The input to NvVideoEncoder is YUV420M or NV12M.
YUV420M are 3 panes, which are Y pland, U pland and V plane.
NV12M are 2 planes, which are Y plane and UV plane.

I found :

  • Format YUV420P was YU12 planar.
  • Format YUV420M was I420

In FFMPEG, HW encode and convert was not supported.

So , i can write timeoverlay on YUV420M as on YUV420P. but i convert BGR to YUV420M without HW.

I think I will win FPS if it used NvVideoConvert. so, seeing to use 07_video_convert for BGR buffers.

Now, it’s work like :

1 ) a camera give BGRa buffers
2 ) convert BGR to YUV420M for H.264 ( NvVideoConvert )
3 ) write overlay "datetime"on YUV420M.
4 ) encode YUV420M to H264. ( NvVideoEncoder ).
5 ) write H264 to MP4 file with AVPacket ( struct FFMPEG ).

i back … ASAP.

I found an issu for write in MP4

bool write_Nvidia_Frame ( NvEncoderHEVC::DataHEVC * _buffer ) {
        bool lret = false;
        if( _buffer != nullptr ) {
            uint64_t lNow = av_gettime() / 1000.0;
            uint64_t lcount = 0;
            if ( 1 < mStartTm )
                if ( mStartTm < lNow   )
                    //timestamps in libav are based on the time_scale set in FormatContext (ie mRecordFps).
                    //for example for 10fps, timescale is 10. So frames coming at 0, 100ms, 200ms (perfect 10ms)
                    //will have timestamps 0, 1, 2,
                    //so the millisecond timestamp is calculated as = (timestamp * 1000)/timescale
                    //and so the timestamp is calculated as = (time diff * timescale)/1000
                    //note that time diff should be calculated from a start time and not just relative
                    //to last frame to avoid long term rounding off problem

                    mVideoTimestamp = (( lNow - mStartTm ) *  mRecordFps);
                mStartTm = lNow;
                mVideoTimestamp = 0;
            mCountFrame += 1 ;

            AVPacket lAvpkt;
            av_init_packet( &lAvpkt );
            lAvpkt.size = 0;
   = NULL;
            int lGotPacketPtr = 0;
            lAvpkt.pts = mVideoTimestamp;
            lAvpkt.dts = mVideoTimestamp;
            lAvpkt.duration = 0 ; //1000.0 / mRecordFps;
//             printf( "WriteFrameNVIDIA av_write_frame = %li \n" , mCountFrame);
            if(_buffer->iframe) {
                lAvpkt.flags |= AV_PKT_FLAG_KEY;
//                 lAvpkt.pts = lAvpkt.dts = 0;
//             lAvpkt.pts = av_rescale_q(lcount, mPictureStream->codec->time_base, mPictureStream->time_base);
// //             lAvpkt.dts = av_rescale_q(mCountFrame, mPictureStream->codec->time_base, mFormatContext->streams[0]->time_base);// m_dts
//             lAvpkt.dts = lAvpkt.pts ; //av_rescale_q(mPictureEncoded->pkt_dts, mPictureStream->codec->time_base, mPictureStream->time_base);

   = (uint8_t * )malloc ( _buffer->size * sizeof( uint8_t )  );
            memcpy(, _buffer->buffer, _buffer->size );
            lAvpkt.size = _buffer->size;
            if( _buffer->buffer != nullptr ) {
                delete _buffer->buffer ;
                _buffer->buffer = nullptr;
            delete _buffer ;
            _buffer = nullptr;
            if ( lAvpkt.size )
                lAvpkt.stream_index = mFormatContext->streams[0]->index;

//                 if ( av_interleaved_write_frame( mFormatContext, &lAvpkt ) )
                if ( av_write_frame( mFormatContext, &lAvpkt ) )
                    printf( "ERROR WriteFrameNVIDIA av_write_frame \n");
                    lret = false;
                    mCountError -=1 ;
       = nullptr;
                printf( "ERROR lAvpkt size \n");
                lret = false;
            printf( "ERROR Buffer is NULL\n");
            lret = true;
        return lret;

But , there are few problems :

  • elapse time in player ( like VLC) is not good, it's leak 5 seconds with chronos , with mVideoTimestamp or av_rescale_q() function.
  • few freeze in video.
  • i cant delete ctx->enc or call "ctx->enc->capture_plane.waitForDQThread(-1);" (program was blocked )
  • So , i cant close and open stream for write into another MP4 file
  • leak memory with : av_interleaved_write_frame

I find in forum a solution for close and re-open a h264 video :

few freeze in video , leak memory with (av_interleaved_write_frame) and timestamp (av_rescale_q) are FFMPEG issu.

I have a strong probleme … ^^’

I fix to CPU/EMC to max rate with script.

When I run my application , htop display each CPU load have 40% to 60% used and tegrastats write :

Mar 13 16:30:33 tegra-ubuntu ubuntu: RAM 678/3995MB (lfb 2x4MB) cpu [38%,44%,53%,19%]@1734 EMC 11%@1600 AVP 0%@80 NVDEC 192 MSENC 192 GR3D 0%@38 EDP limit 1734

But, after 25s ~ 30 s , htop display each CPU load have under 20% used for 3~4s and tegrastats write 100% CPU like

Mar 13 16:27:37 tegra-ubuntu ubuntu: RAM 676/3995MB (lfb 1x4MB) cpu [14%,100%,23%,10%]@1734 EMC 2%@1600 AVP 0%@80 NVDEC 192 MSENC 192 GR3D 0%@38 EDP limit 1734

In video , there are freezes without buffer lost ( std::list without error log ).