I want to decode H264NAL unit data to still image data


I want to create an application like below using Jetson Multimediaia API.

  • Pass H264 NAL unit data to the application in order.
  • The application accumulates the passed NAL unit data.
  • Every time NAL unit data is passed, it attempts to decode it into “RGB still image data” using the accumulated NAL unit data.
  • If decoding is not possible yet, return “Unable to decode”.
    (For example, when only SPS and PPS are accumulated)
  • If decoding is possible, return the decoding result “RGB still image data”.
    (For example, when SPS, PPS, and I frames are accumulated)

In other words, I want to input the H264NAL unit data that is passed in order and obtain still image data at a timing that can be decoded.
I also looked at the sample code for “00_Video_decode”, but it seems to be a little different from what I want to do.
Is it possible to create such an application using Jetson Multimedia API?

thank you.

It is possible to use jetson_multimedia_api to achieve this. Lots of customization have to be done by referring to the default samples. As we have suggested in
I want to use "H264 NAL unit data" as Input and obtain "RGB data"

For a quick solution, please consider use gstreamer:
Accelerated GStreamer — Jetson Linux Developer Guide documentation

Some plugins are implemented and you can use them directly.


“00_video_decode” makes the H264 file the input.
I want to input “NAL unit data”.
(I know that the contents of the H264 file are a series of NAL unit data)

The options for “00_video_decode” include “–input-nalu”.
It seems like it’s very close to what I want to do.
However, I don’t know how to use this option.

./video_decode H264 --disable-rendering -o chunk.nv12 ../../data/Video/sample_outdoor_car_1080p_10fps.h264
./video_decode H264 --disable-rendering --input-nalu -o nalu.nv12 ../../data/Video/sample_outdoor_car_1080p_10fps.h264

These two commands output exactly the same file.
Please tell me the difference in processing depending on the presence or absence of “–input-nalu”.

thank you


I would like to ask an additional question regarding “-input-nalu”.
I’m currently parsing video_decode.
I have a question with the decode_proc method.

    /* Read encoded data and enqueue all the output plane buffers.
       Exit loop in case file read is complete. */
    i = 0;
    current_loop = 1;
    while (!eos && !ctx.got_error && !ctx.dec->isInError() &&
           i < ctx.dec->output_plane.getNumBuffers())
        struct v4l2_buffer v4l2_buf;
        struct v4l2_plane planes[MAX_PLANES];
        NvBuffer *buffer;

        memset(&v4l2_buf, 0, sizeof(v4l2_buf));
        memset(planes, 0, sizeof(planes));

        buffer = ctx.dec->output_plane.getNthBuffer(i);
        if ((ctx.decoder_pixfmt == V4L2_PIX_FMT_H264) ||
                (ctx.decoder_pixfmt == V4L2_PIX_FMT_H265) ||
                (ctx.decoder_pixfmt == V4L2_PIX_FMT_MPEG2) ||
                (ctx.decoder_pixfmt == V4L2_PIX_FMT_MPEG4))
            if (ctx.input_nalu)
                /* read the input nal unit. */
                read_decoder_input_nalu(ctx.in_file[current_file], buffer, nalu_parse_buffer,
                        CHUNK_SIZE, &ctx);
                /* read the input chunks. */
                read_decoder_input_chunk(ctx.in_file[current_file], buffer);

This loop processing ends under the following conditions.

  • End of stream.
  • An error has occurred.
  • The process was executed the number of times equal to the value of getNumBuffers().

The read_decoder_input_nalu method stores all NAL unit data of the Input file in nalu_parse_buffer by executing it once.
Therefore, when “-input-nalu” is specified, it is sufficient to execute this loop once.

Don’t you have to add something like “If input_nalu == true, exit the loop unconditionally after executing once” to the loop termination condition?
Otherwise, unnecessary processing will occur.

Of course, the decoding process is completed without any problem, so there is no problem even if unnecessary processing is executed.
However, I would like to understand the processing when “-input-nalu” is specified correctly, so I would like to resolve this question.

thank you.

In 00_video_decode sample it reads h264/h265 stream from a file, so it keeps reading data before end of file. If you feed only one NAL in your use-case, please customize it to fit your use-case.


I analyzed the video_decode sample program.
I was able to modify it to do what I wanted.
・Input is NAL unit data.
・Output is RGB data.

I have a question regarding this decoding process.

The decoding process is performed by the NvVideoDecoder object.
Here’s how:

  • Enqueue NAL unit data from the main thread to the “output plane buffer”.
  • Repeat the above.
  • The NvVideoDecoder object performs decoding processing when it becomes decodable.
  • The decoded result is enqueued into the “capture plane buffer”.
  • In a subthread, wait for it to be queued to the capture plane’s buffer.
  • Once queued, dequeue.

As mentioned above, when you queue NAL unit data to the “output plane”, the decoding results are automatically queued to the “capture plane”, so all you have to do is dequeue them.
Is this understanding correct?

thank you.

The working flow looks correct.


Thank you for checking the work flow.
So, I have a question about this work flow.

I queued the “H264 NAL unit data” sequentially from the main thread.
This is because I wanted to understand at what timing the decoding process is performed.

The queued NAL unit data is as follows.
SES->PPS->SEI->I frame->SEI->P frame->SEI->P frame->SEI->P frame->・・・

My guess is that the decoding process will be performed once the I-frame is enqueued.
If you have SPS, PPS, and I-frames, you should be able to decode it.
In other words, you can get the first decoded result before enqueuing the first P frame.

However, the results were different.
I was able to obtain the first decoding result at the timing of enqueuing the third P frame.

Why is it necessary to pass three “P frames” for decoding to occur?

thank you.

The buffered frame number is related to number of reference frames, so it is different for each h264 stream. You may check number of reference frames of the stream.

SEI is not required, so you can discard it instead of feeding into decoder.


Sorry, I didn’t understand the answer, so I’d like to ask the question again.
(I made the question simpler)

Why don’t I get any decoding results when I pass the three NAL unit data of SPS, PPS and I-frame to the NvVideoDecoder object?
Normally, these three NAL unit data should be able to be decoded into one still image data.

thank you.

If the decoded frames are referred by later frames, it will be buffered in decoder for decoding later frames. Until it is no longer a reference frame, it will be passed to upper application layer.


Thank you for answering.
Is the following understanding correct?

  • Queue “SPS, PPS, I-frame” from the main thread to the output plane of the decoder object in this order.
  • The decoder object decodes at this point, but does not yet queue the decoding results to the capture plane.
  • Since the “reference frame number information” in SPS is 3, the decoding result is queued to the capture plane when three P frames are queued after this.
    (It is my prediction that “the reference frame number information is 3”. I did not actually judge it from the contents of SPS)

thank you.

It looks correct. Once the decoded frame data is no longer needed by decoder, it will be passed to upper layer.


Thank you for checking.
My understanding of the video_decode sample program has improved.

By the way, what is the license for this sample program?
Can I use the source code of “/usr/src/jetson_multimedia_api/samples/common/” as is in my application?
Our application may become commercial in the future.

thank you.


Please provide your answers to questions regarding the license of the sample program.

Apart from the above, I have a question.

Currently, by modifying the video_decode sample program, it is now possible to perform the following processing.

  • In the main thread, pass H264 NAL unit data to the decode object in sequence.
  • Wait until the decoding result (RGB data) is obtained in the subthread.
  • Once the decoding results are obtained, save the .rgb file.

I would like to implement these processes using Gstreamer without using Jetson Multimediaia API.
I don’t know the relationship between Gstreamer and Jetson Multimediaia API.
Maybe there is Gstreamer under the Jetson Multimediaia API.

Is it possible to implement the same using Gstreamer API without using Jetson Multimedia API?
If you have any helpful source code or anything similar, please let me know.

I have a Gstreamer development environment and have confirmed that I can do the following.

  • Pass multiple decodable NAL unit data to Gstreamer at once
  • Decode results are returned from Gstreamer

However, it was not possible to implement an implementation in which NAL unit data was passed sequentially and the decoding result was obtained at the point when decoding was possible.
Every time I passed one piece of NAL unit data, the following message appeared.

0:00:11.524342000 6012 0000018BC0BDE300 WARN h264parse gsth264parse.c:1525:gst_h264_parse_handle_frame:<h264parse0> broken/invalid nal Type: 5 Slice IDR, Size: 28 will be dropped

thank you.

We are checking the software license. Will get back to you.

gstreamer is implemented based on jetson_multimedia_api. For video decoding, we have nvv4l2decoder plugin. It is open source and please download the package:

Driver Package (BSP) Sources

You can download it and follow the README to build the plugin.


I have incorporated “context_t” from the video_decode sample program into the application I am creating.
However, it doesn’t work properly.
Specifically, even if NAL unit data (SPS, PPS, I frame, P frame…) is queued in context_t.dec in order, the resolution change event (V4L2_EVENT_RESOLUTION_CHANGE) does not occur.
(Even if you set a breakpoint in the code that is executed when a resolution change event occurs and run debugging, it will not break.)

Specifically, the process ‘‘ret = dec->dqEvent(ev, 50000);’’ in the ‘‘dec_capture_loop_fcn’’ function times out.

The major differences between “video_decode” and “my application” are as follows.

  • video_decode…context_t is a local variable declared in the main function. A resolution change event occurs when an SPS->PPS->I frame is enqueued.
  • My application…context_t is a class member. No matter how much NAL unit data is enqueued, the resolution change event does not occur.

However, the processing performed on “context_t” in the sample program is also performed without exception in my application.
Of course, “ctx.dec->subscribeEvent(V4L2_EVENT_RESOLUTION_CHANGE, 0, 0)” is also executed.
The NAL unit data used is also exactly the same.

What are the possible reasons why the resolution change event is not occurring?
Are there any items I should check?

thank you.

Please discard the NALs before first SPS/PPS. The decoding starts from SPS/PPS so the SPS/PPS NALs have to be in the very beginning.


Thanks for your advice.
As a result of various modifications, the decoding process is now working on my application.

The following is a question regarding the implementation of the video_decode sample program.
We will queue the NAL unit data to the “output plane buffer” of NvVideoDecoder (V4L2 video decoder object).
At that time, we always perform the process of ``dequeuing the previous data before queuing.‘’
What is this dequeuing for?
Does it simply prevent the buffer from becoming full?

The “capture plane buffer” has a process that enqueues it after dequeuing it.

                /* If not writing to file, Queue the buffer back once it has been used. */
                if(ctx->capture_plane_mem_type == V4L2_MEMORY_DMABUF)
                    v4l2_buf.m.planes[0].m.fd = ctx->dmabuff_fd[v4l2_buf.index];
                if (dec->capture_plane.qBuffer(v4l2_buf, NULL) < 0)
                    cerr <<
                        "Error while queueing buffer at decoder capture plane"
                        << endl;

What are you doing this for?
Is there no need to worry about the buffer being full?

thank you.

We will enqueue buffers to capture plane for decoded YUVs. And after feeding H264 stream to output plane, we dequeue the decoded YUVs from capture plane.