TX2 decide H264 with tegra_multimedia_api

hobin0920 · April 29, 2020, 2:48am

Hi:

We want to use TX2 with tegra_multimedia_api to decode H264 frame by frame, we get H264 frame data from specific network, and we want to leverage TX2 HW codec to decode this standard H264 frame. we refer 00_video_decode on tegra_multimedia_api, but from this example, it shows me to many parameters that need to get from open a h264 file like below example, and even we set the resolution also crop resolution, it still has other parameters gets from open file.

Does Nvidia team have a example to show me show to use HW codec without open mediafile, I could config these parameters directly like ffmpeg -h264 did?

github.com

shengbinmeng/ffmpeg-h264-dec/blob/master/main.c

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include "libavcodec/avcodec.h"
#include "decoder.h"

#define READ_SIZE 4096
#define BUFFER_CAPACITY 4096*64

extern AVCodec ff_h264_decoder;
extern AVCodecParser ff_h264_parser;

static void yuv_save(unsigned char *buf[], int wrap[], int xsize,int ysize, FILE *f)
{
	int i;	
	for (i = 0; i < ysize; i++) {
		fwrite(buf[0] + i * wrap[0], 1, xsize, f);
	}
	for (i = 0; i < ysize / 2; i++) {
		fwrite(buf[1] + i * wrap[1], 1, xsize/2, f);

This file has been truncated. show original

======================
static void *
dec_capture_loop_fcn(void *arg)
{
context_t *ctx = (context_t *) arg;
NvVideoDecoder *dec = ctx->dec;
struct v4l2_event ev;
int ret;

cout << "Starting decoder capture loop thread" << endl;
// Need to wait for the first Resolution change event, so that
// the decoder knows the stream resolution and can allocate appropriate
// buffers when we call REQBUFS
do
{
    ret = dec->dqEvent(ev, 50000);
    if (ret < 0)
    {
        if (errno == EAGAIN)
        {
            cerr <<
                "Timed out waiting for first V4L2_EVENT_RESOLUTION_CHANGE"
                << endl;
        }
        else
        {
            cerr << "Error in dequeueing decoder event" << endl;
        }
        abort(ctx);
        break;
    }
}
while ((ev.type != V4L2_EVENT_RESOLUTION_CHANGE) && !ctx->got_error);

================================

DaneLLL · April 29, 2020, 5:36am

Hi,
As we have suggested in the comment, you would need to customize 00_video_decode to your usecase.

You need to input SPS/PPS in the beginning and then the first IDR frame.

hobin0920 · April 30, 2020, 2:56am

Hi DaneLLL:

 on this example, do you know how do I make sure the queueFrame already  decode by hw and we could dequeueFrame to do next process, on this example, it queue all  frame and after that, it dequeue and render it.

DaneLLL · April 30, 2020, 5:13am

Hi,
For mapping frames in output plane and capture plane, please check v4l2_buf.timestamp

hobin0920 · April 30, 2020, 9:47am

Hi DaneLLL:

while I open input_nalu flag with h264 files as input,  I can not see the decode result on my screen, but if I disable it, use read_decoder_input_chunk function, it works normal with files input.  

anything I miss while open input_nalu with file? I want to double check this function works with file, then re-write it with our data input.

btw I already checked the nalu data, it parses fine and fill data in buffer already.

DaneLLL · May 1, 2020, 12:40am

Hi,
We verify the function by running

$ ./video_decode H264 --input-nalu ../../data/Video/sample_outdoor_car_1080p_10fps.h264

You can check if you contain SPS, PPS, first SLICE in the first feed.

hobin0920 · May 4, 2020, 9:40am

Hi DaneLLL:

I could  read the nalu from my network packet now, and I use non-blocking mode, I want to double confirm the following things.

CHUNK_SIZE, from the example, it’s 4000000, could I change it, (I try to reduce it, but it does not work with other value on my side)
could I send the buffer less than CHUNK_SIZE?

DaneLLL · May 5, 2020, 1:45am

Hi,
I try to modify
#define CHUNK_SIZE 2000000

and can run the command:

$ ./video_decode H264 --blocking-mode 0 ../../data/Video/sample_outdoor_car_1080p_10fps.h264

Usually IDR frames are with large size. If you ensure CHUNK_SIZE > IDR frame size, it should work just fine.

hobin0920 · May 5, 2020, 7:51am

Hi DaneLLL:

now I could read nalu data correctly , but I am blocking on qbuffer and dqbuffer,  on nonblocking function, 
I can deBuffer and get w/o error, so the ret always is 0, on this situation, the code will not check_capture_buffers and do next step,  ps: I get the network nalu data continues and update to read_decoder_input_nalu while it calls get nalu data.

do you have any idea about this part?

        if (allow_DQ)
        {
            ret = ctx.dec->output_plane.dqBuffer(v4l2_output_buf, &output_buffer, NULL, 0);
                    if (ret < 0)
            {
                if (errno == EAGAIN)
                {
                    goto check_capture_buffers;
                }
                else
                {
                    cerr << "Error DQing buffer at output plane" << endl;
                    abort(&ctx);
                    break;
                }
            }
        }

hobin0920 · May 5, 2020, 7:56am

Hi DaneLLL:

I got this error “Capture plane not ON, skipping capture plane” Do you have any idea?

DaneLLL · May 6, 2020, 4:02am

Hi,
The print is not an error. You would see it before you get V4L2_EVENT_RESOLUTION_CHANGE event. The log is like:

nvidia@nvidia-desktop:/usr/src/jetson_multimedia_api/samples/00_video_decode$ ./video_decode H264 --blocking-mode 0 ../../data/Video/sample_outdoor_car_1080p_10fps.h264
Set governor to performance before enabling profiler
Creating decoder in non-blocking mode
Opening in O_NONBLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
Setting frame input mode to 1
Created the PollThread and Decoder Thread
Starting Device Poll Thread
Capture plane not ON, skipping capture plane
(---skip...)
Capture plane not ON, skipping capture plane
Got V4L2_EVENT_RESOLUTION_CHANGE EVENT
Video Resolution: 1920x1080
[INFO] (NvEglRenderer.cpp:110) <renderer0> Setting Screen width 1920 height 1080
Decoder colorspace ITU-R BT.601 with standard range luma (16-235)
Query and set capture successful
Input file read complete
Done processing all the buffers returning
Decoder got eos, exiting poll thread
App run was successful

Do you register the event V4L2_EVENT_RESOLUTION_CHANGE?

hobin0920 · May 6, 2020, 8:41am

Hi DaneLLL:

 if the register is subscribeEvent,  yes, I call this function with in decode_proc().
  however I can not get V4L2_EVENT_RESOLUTION_CHANGE event to execute query_and_set_capture(), w/o this step, I think the decoder does not work. Do I miss anything?


// Subscribe to Resolution change event
printf("+++ set V4L2_EVENT_RESOLUTION_CHANGE \r\n");
ret = ctx.dec->subscribeEvent(V4L2_EVENT_RESOLUTION_CHANGE, 0, 0);
TEST_ERROR(ret < 0, "Could not subscribe to V4L2_EVENT_RESOLUTION_CHANGE",
           cleanup);

DaneLLL · May 7, 2020, 5:58am

Hi,
Please check

If you create the semaphores and polling thread.

sem_init(&ctx.pollthread_sema, 0, 0);
sem_init(&ctx.decoderthread_sema, 0, 0);
pthread_create(&ctx.dec_pollthread, NULL, decoder_pollthread_fcn, &ctx);

Feed h264 frames to output plane
Call SetPollInterrupt() and wait for the semaphores.

/* Call for SetPollInterrupt.
   Refer V4L2_CID_MPEG_SET_POLL_INTERRUPT */
ctx.dec->SetPollInterrupt();

/* Since buffers have been queued, issue a post to start polling and
   then wait here. */
sem_post(&ctx.pollthread_sema);
sem_wait(&ctx.decoderthread_sema);

/* Call for dequeuing an event.
   Refer ioctl VIDIOC_DQEVENT */
ret = ctx.dec->dqEvent(ev, 0);

hobin0920 · May 11, 2020, 7:41am

Hi DaneLLL:

 I could decode and show it on screen now, but the display fps is not good, my camera fps is 24fps, and this is the result I got. https://drive.google.com/file/d/1zlSO0HImz40kgTNOBDGejbSLiV-c7pIV/view?usp=sharing

I set display and decode fps to 24, anything I need to take care about?

thank you

hobin0920 · May 11, 2020, 10:19am

Hi DaneLLL:

 update info, I found something interesting but I can't find out how to handle it.

 my camera output is 25fps, but if I set decode fps is 25, than I will encounter the jitter or stop situation while I watch the screen, but if I set 24fps on decode fps, it's okay on screen but I will get queue buffer issue, the screen will delay to show. 

My question is how to get correct setting on it?

DaneLLL · May 11, 2020, 10:33am

Hi,
Please check if your video output supports 25fps or 50fps:

nvidia@nvidia-desktop:~$ export DISPLAY=:0
nvidia@nvidia-desktop:~$ xrandr

And switch to the mode to fit your source.

hobin0920 · May 12, 2020, 2:20am

Hi DaneLLL:

my video source output is 24pfs, and I could save it as H264 file and decode/display it with sample 00, but while I use nalu input, the jitter and queue buffer issue is occurred,  BTW the display setting is the same. 
any suggestion about it?

this is my display setting

Screen 0: minimum 8 x 8, current 1920 x 1080, maximum 32767 x 32767
HDMI-0 connected primary 1920x1080+0+0 (normal left inverted right x axis y axis) 600mm x 340mm
1920x1080 60.00*+ 59.95 50.00
1680x1050 59.96
1440x900 59.89
1440x576 50.00
1440x480 59.94
1280x1024 75.03 60.00
1280x960 60.00
1280x720 60.00 59.94 50.00
1152x864 75.00
1024x768 75.03 70.07 60.01
832x624 75.05
800x600 75.00 72.19 60.32 56.25
720x576 50.00
720x480 59.94
720x400 70.04
640x480 75.00 67.06 59.94 59.94

DaneLLL · May 12, 2020, 5:24am

Hi,
Please configure more buffers in output plane. By default it is 2.

    /* Query, Export and Map the output plane buffers so can read
       encoded data into the buffers. */
    if (ctx.output_plane_mem_type == V4L2_MEMORY_MMAP) {
        /* configure decoder output plane for MMAP io-mode.
           Refer ioctl VIDIOC_REQBUFS, VIDIOC_QUERYBUF and VIDIOC_EXPBUF */
        ret = ctx.dec->output_plane.setupPlane(V4L2_MEMORY_MMAP, 2, true, false);

hobin0920 · May 12, 2020, 6:28am

Hi DaneLLL:

thanks for that, one more question, how to get decode result output buffer, I saw on dump_dmabuf function,  it will sync mem to cpu then write the data to file line by line, Does Nvidia have other example that could sync mem and move data by block directly?

thank you

DaneLLL · May 13, 2020, 1:03am

Hi,
The output of hardware decoder is with hardware alignment, so you need to check pitch,width,height to get valid data. It is hardware limit. If you would like to avoid it, you can convert to RGBA format, which does not have hardware alignment.