TX2 decide H264 with tegra_multimedia_api

Hi DaneLLL:

 update info, I found something interesting but I can't find out how to handle it.

 my camera output is 25fps, but if I set decode fps is 25, than I will encounter the jitter or stop situation while I watch the screen, but if I set 24fps on decode fps, it's okay on screen but I will get queue buffer issue, the screen will delay to show. 

My question is how to get correct setting on it?

Please check if your video output supports 25fps or 50fps:

nvidia@nvidia-desktop:~$ export DISPLAY=:0
nvidia@nvidia-desktop:~$ xrandr

And switch to the mode to fit your source.

Hi DaneLLL:

my video source output is 24pfs, and I could save it as H264 file and decode/display it with sample 00, but while I use nalu input, the jitter and queue buffer issue is occurred,  BTW the display setting is the same. 
any suggestion about it?

this is my display setting

Screen 0: minimum 8 x 8, current 1920 x 1080, maximum 32767 x 32767
HDMI-0 connected primary 1920x1080+0+0 (normal left inverted right x axis y axis) 600mm x 340mm
1920x1080 60.00*+ 59.95 50.00
1680x1050 59.96
1440x900 59.89
1440x576 50.00
1440x480 59.94
1280x1024 75.03 60.00
1280x960 60.00
1280x720 60.00 59.94 50.00
1152x864 75.00
1024x768 75.03 70.07 60.01
832x624 75.05
800x600 75.00 72.19 60.32 56.25
720x576 50.00
720x480 59.94
720x400 70.04
640x480 75.00 67.06 59.94 59.94

Please configure more buffers in output plane. By default it is 2.

    /* Query, Export and Map the output plane buffers so can read
       encoded data into the buffers. */
    if (ctx.output_plane_mem_type == V4L2_MEMORY_MMAP) {
        /* configure decoder output plane for MMAP io-mode.
        ret = ctx.dec->output_plane.setupPlane(V4L2_MEMORY_MMAP, 2, true, false);

Hi DaneLLL:

thanks for that, one more question, how to get decode result output buffer, I saw on dump_dmabuf function,  it will sync mem to cpu then write the data to file line by line, Does Nvidia have other example that could sync mem and move data by block directly?

thank you

The output of hardware decoder is with hardware alignment, so you need to check pitch,width,height to get valid data. It is hardware limit. If you would like to avoid it, you can convert to RGBA format, which does not have hardware alignment.

Hi Dane:

 Thanks, I could get it and convert it to RGBA with opencv, but I want to know if NV has any function with cuda that could merge 2 planes into RGBA frame without copy it out and cover it by cpu?

thank you

You can call NvBufferTransform() to convert to RGBA, and NvBufferMemMap() to get CPU address.

A sample for reference:

Hi Dane:

              I encounter something issue while use this method, I don't have this API, and I use NvBufferCreateEx
              others is the same, but the image I got, has something wrong on height and weight, my input is 3840x2160 but the output seems like 960x540, I can not figure out what's happened, could you?

                   m_dmabufs[i] = iNativeBuffer->createNvBuffer(iEglOutputStreams[i]->getResolution(),
  •                                                      NvBufferColorFormat_YUV420,
  •                                                      NvBufferLayout_BlockLinear);
  •                                                      NvBufferColorFormat_ABGR32,
  •                                                      NvBufferLayout_Pitch);
               Use this method to allocate HW buffer (Deprecated, instead use NvBufferCreateEx API).

Please call NvBufferGetParams() to get information of the buffer and check if it is correct. You may check pixel_format, num_planes, width, height.

Hi Dane:

 I dump the info with the following code and get result as the following, I don't understand why 2 plane size is different, on my initial, all plane are 3840x2160

[dump code]

                  ret = NvBufferGetParams (ctx.dst_dma_fd, &params);
                  printf("pixel_format =%d \r\n", params.pixel_format);
                  printf("num_planes =%d \r\n", params.num_planes);
                  for (int i = 0; i < params.num_planes ; i++){
                          printf("width[%d] =%d height[%d]=%d pitch[%d]=%d\r\n", i, params.width[i],i, params.height[i], i, params.pitch[i]);


[get result]
num_planes =2
width[0] =3840 height[0]=2160 pitch[0]=3840
width[1] =1920 height[1]=1080 pitch[1]=3840


[create NV buffer code]
on query_and_set_capture() function.
printf("+++++++ NvBufferCreateEx w=%d h=%d\r\n",input_params.width, input_params.height);
ret = NvBufferCreateEx (&ctx->dst_dma_fd, &input_params);

for (int index = 0; index < ctx->numCapBuffers; index++)
cParams.width = crop.c.width;
cParams.height = crop.c.height;
cParams.layout = NvBufferLayout_BlockLinear;
cParams.payloadType = NvBufferPayload_SurfArray;
cParams.nvbuf_tag = NvBufferTag_VIDEO_DEC;
printf("+++++++ NvBufferCreateEx %d w=%d h=%d\r\n", index,input_params.width, input_params.height);
ret = NvBufferCreateEx(&ctx->dmabuff_fd[index], &cParams);
TEST_ERROR(ret < 0, “Failed to create buffers”, error);

Video Resolution: 3840x2160
+++++++ NvBufferCreateEx w=3840 h=2160

+++++++ NvBufferCreateEx 0 w=3840 h=2160
+++++++ NvBufferCreateEx 1 w=3840 h=2160
+++++++ NvBufferCreateEx 2 w=3840 h=2160
+++++++ NvBufferCreateEx 3 w=3840 h=2160
+++++++ NvBufferCreateEx 4 w=3840 h=2160
+++++++ NvBufferCreateEx 5 w=3840 h=2160
+++++++ NvBufferCreateEx 6 w=3840 h=2160
+++++++ NvBufferCreateEx 7 w=3840 h=2160
+++++++ NvBufferCreateEx 8 w=3840 h=2160
+++++++ NvBufferCreateEx 9 w=3840 h=2160
+++++++ NvBufferCreateEx 10 w=3840 h=2160
+++++++ NvBufferCreateEx 11 w=3840 h=2160
+++++++ NvBufferCreateEx 12 w=3840 h=2160
+++++++ NvBufferCreateEx 13 w=3840 h=2160

Please check pixel_format. Is is two planes. Looks to be NvBufferColorFormat_NV12?

yes, it’s NV12

Looks like the buffers are in 4K NV12. You may use NvBufferCreateEx() to create 4K RGBA pitchlinear buffer, and do format conversion through NvBufferTransform().

Hi Dane:

I disable the rendering, and the plane 2 is gone, but I STILL get stranger result.

this is the info I dump, I dump the info after dump_dmabuf() and dump the ctx.dst_dma_fd.
the pixel format is 18, NvBufferColorFormat_ARGB32, this is what I set. and the plane number and size is match what I set. but while I save the buffer directly, I got 4 small and the same screenshot on the image.
Do you know what’s the problem?

Query and set capture successful
pixel_format =18
num_planes =1
width[0] =3840 height[0]=2160 pitch[0]=15360
pixel_format =18
num_planes =1
width[0] =3840 height[0]=2160 pitch[0]=15360

Please modify the line in dump_duabuf():

                stream->write((char *)psrc_data + i * parm.pitch[plane],
                                /* MODIFY HERE */parm.width[plane]*4);

You should get 3840x2160x4 bytes for single RGBA frame.

Hi Dane:

 I got the RGBA frame, but it back to our original goal, I still need to use cudaMemcpy2D() to convert the frame from char to uchar3, then use cudaRGB8ToRGBA32() to convert rgb8 to rgba32 then cuda could get the correct format to calculate,  

Do we have any method to convert the format directly from HW codec output to cuda rgba32 input?

thank you

The hardware converter does not support 24-byte RGB or BGR, so your solution of using GPU is optimal.
If your source is YUV420 or YUV422, the hardware converter can be utilized for converting to RGBA.

Hi Dane:

I run the multithreading decode example for 30 minutes, and got this error, it seems like IOCTL fail, then cause return eos, Do you get this error before?

I run 14 multichannel deocde, on nonblocking mode. 

reference in DPB was never decoded
[ERROR] Output Plane:Error while Qing buffer: Device or resource busy
Error Qing buffer at output plane
Decoder got eos, exiting poll thread
Decoder is in error
NvRmChannelSubmit: NvError_IoctlFailed with error code 22
NvRmPrivFlush: NvRmChannelSubmit failed (err = 196623, SyncPointIdx = 38, SyncPointValue = 0)

thank you

We have sample of demonstrating multiple video decoding:


Please check if you can reproduce the issue with the sample.