User metadata embedding into h.264/h.265 ES and extraction

I’m trying to embed our data into H.264 ES.
Now, I consider to use jetson_multi_media api.
Does anyone know a good way?

my environment is as below.
HW: AGX Xavier
SW: jetpack4.4

thank you.

For putting additional metadata/information, you may consider to implement supplemental enhancement information (SEI) NAL and send it along with encoded h264stream.

yes, I think so too.
I’m reading sample apps(01_video_encode, 03_video_cuda_enc).
I’m looking for where it’s encoded and where it contains the encoded data.

Please check encoder_capture_plane_dq_callback(). It is called once there is encoded bitstream.

Sorry for late reply.
I’m checking callback function.
In case of using camera device, do you have any good examples?

If you use Bayer sensors, it uses Argus framework and you can take a look at 10_camera_recording,

1 Like

I’m checking 01_video_encode.
When is encoder_capture_plae_dq_callback running ?
Below is my understanding in encoder_proc_blocking().

  1. dequeue buffer from output plane( in this timing dq callback called in backend?)
  2. read video one frame
  3. set some settings for v4l2_buf
  4. queue v4l2_buf into output_plane_buffer

I would like to know more detail data flow.

The implementation is in


The callback function is called in

void *
NvV4l2ElementPlane::dqThread(void *data);

Thank you for your reply.
I had cheked the implementation.
Maybe I don’t understand correctly about dqBuffer in encoder_proc_blocking().
I’ll read the implementation about it.

I checked the entire source code.
However, I still can’t understand the data flow.
Specifically, I don’t know how data is being exchanged between output_plane and capture_plane.The same applies to dequeue and enqueue.

The implementation is based on V4L2 API spec. If 01_video_encode is not easy to read, you may also refer to


OK, I’ll try it.

I have tried to read “encoder_unit_sample”.
I alomost understood dataflow.
However, I still don’t know when the encoding is done.
When does the encoding processing start ?
Is it done internally ?

It starts after calling:

/* Set streaming on both plane
    ** Start stream processing on output plane and capture
    ** plane by setting the streaming status ON.

    ret = v4l2_ioctl(ctx.fd, VIDIOC_STREAMON, &ctx.outplane_buf_type);
    CHECK_ERROR(ret, "Error in setting streaming status ON output plane", cleanup);
    ret = v4l2_ioctl (ctx.fd, VIDIOC_STREAMON, &ctx.capplane_buf_type);
    CHECK_ERROR(ret, "Error in setting streaming status ON capture plane", cleanup);

For ending encoding task, need to send EoS(v4l2_buffer with bytesused=0) in output plance and then wait for EoS(buffer->planes[0].bytesused=0) in capture plane.

Sorry for lately reply.
I almost understood encoder_unit_sample.
And I have additional question.
Is the output from encoder byte-alignment ?

It should be byte-alignment. The stream is in byte stream format as described in