H.264 encoder capture plane concepts

The Jetson Nano H.264 encoder takes input from what is called the Output Plane, with pixel encoding and (width, height) defined, and generates encoded data onto the “Capture Plane”.

My question involves the Capture Plane. Apparently in addition to specifying pixel encoding and (width, height) for the Output Plane, we ALSO need to specify a pixel encoding and (width, height) for the Capture plane, BUT in theory this Capture Plane is a compressed bytestream, which would not be accessible as normal pixels until eventual decoding. Indeed the amount of data could wildly vary, between IDR frames versus incremental changes.

Can a kind developer explain the concepts of what data goes into this Capture Plane, how to calculate the (width, height) and pixel type for this, how to determine the length of the presumably variable length data to pull out of the Capture Plane (i.e. for network streaming), and ideally does this encoder output already have Annex B prefix bytes and the non-emulation bytes present in this “signal”, or do I need to add prefix bytes and non-emulation?

Intent here is to build in C a low latency pipeline from camera to H.264 encode, to network streaming (whether via RTP or otherwise). So, understanding concepts of what is IN the Capture Plane and how to deal with it, would be very helpful!

Thanks to any/all in advance!

For video encoding, please take a look at the samples:


In the samples the size is set to 2Mbytes:

    ret =
        ctx.enc->setCapturePlaneFormat(ctx.encoder_pixfmt, ctx.width,
                                      ctx.height, 2 * 1024 * 1024);

For high resolutions this may be too small. If the setting is too small, it is set to widthxheightx1.5 bytes(size of a YUV420 frame) in low-level code, so you should not need to change the value. Or you may set it to widthxheightx1.5 bytes identically.

Thank you DaneLLL!

Now, should I assume that this is simply a means to allocate a buffer, but should expect that the actual data (being compressed) will be of varying lengths per the NALU (not always filling this buffer full)?

Subquestion 1: is the capture plane data encoded by Nvidia to have Annex B prefix bytes and emulation prevention bytes?

Subquestion 2: can there be more than one NALU in a returned Capture Plane buffer? (i.e. back-to-back runs of bytes, each being a NALU)

Subquestion 3: does a NALU always exactly begin at this buffer’s start point (can the buffer be considered an alignment point for the start-of-NALU at byte0)?

By the way, thank you for mentioning ‘encoder_unit_sample’ which has helpful comments on usage. I’m tracing through that code now and might have followup questions later.



No, the buffer is not fully filled. Only the size of the NAL units.

Yes. You can inspect/parse the bitstream through software decoder such as JM decoder or ffmpeg.

It is possible for I/IDR frames, which may contain SPS, PPS NALUs.P/B frames generally have encoded slices

Yes. It begins with start code 0x00000001 or 0x000001.

Thank you again DaneLLL!

How would we know if we have under-sized the capture buffer for the bitstream being written to it (i.e. the encoder ran out of space)? Is there an error thrown, and will it tell us how much room would have been needed?

Also, do we receive some value of “bytes written” for each buffer emitted by the encoder?

For compressed stream size, it is stored in buffer->planes[0].bytesused. Please check code of encoder_capture_plane_dq_callback() for handling compressed stream.

The size of YUV420 is widthxheightx1.5 and buffer size is allocated to the value, so compressed stream size will not exceed the buffer size.

DaneLLL, thank you again for the info, and your patience with all my questions. I hope this post also serves others who are interested in H.264 encoding.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.