Possible multimedia api regression with decode interlace source

Hi,

similar previous thread Broken picture after l4t decode+transform interlaced source . Sample code for reproduction GitHub - maxlapshin/l4t2-demo This sample decoder → nvbuffertransform → encoder with simulate live work, added sleep 40ms before read next nalu.

Progressive source works well in all cases, broken only interlace source

jetson tx2, xavier nx

# cat /etc/nv_tegra_release 
# R32 (release), REVISION: 5.1, GCID: 27362550, BOARD: t186ref, EABI: aarch64, DATE: Wed May 19 18:16:00 UTC 2021
ii  nvidia-l4t-3d-core                     32.5.1-20210519111140                            arm64        NVIDIA GL EGL Package
ii  nvidia-l4t-apt-source                  32.5.1-20210219084708                            arm64        NVIDIA L4T apt source list debian package
ii  nvidia-l4t-bootloader                  32.5.1-20210614115015                            arm64        NVIDIA Bootloader Package
ii  nvidia-l4t-camera                      32.5.1-20210519111140                            arm64        NVIDIA Camera Package
ii  nvidia-l4t-configs                     32.5.1-20210219084708                            arm64        NVIDIA configs debian package
ii  nvidia-l4t-core                        32.5.1-20210519111140                            arm64        NVIDIA Core Package
ii  nvidia-l4t-cuda                        32.5.1-20210519111140                            arm64        NVIDIA CUDA Package
ii  nvidia-l4t-firmware                    32.5.1-20210519111140                            arm64        NVIDIA Firmware Package
ii  nvidia-l4t-graphics-demos              32.5.1-20210519111140                            arm64        NVIDIA graphics demo applications
ii  nvidia-l4t-gstreamer                   32.5.1-20210519111140                            arm64        NVIDIA GST Application files
ii  nvidia-l4t-init                        32.5.1-20210519111140                            arm64        NVIDIA Init debian package
ii  nvidia-l4t-initrd                      32.5.1-20210614115015                            arm64        NVIDIA initrd debian package
ii  nvidia-l4t-jetson-io                   32.5.1-20210219084708                            arm64        NVIDIA Jetson.IO debian package
ii  nvidia-l4t-jetson-multimedia-api       32.5.1-20210519111140                            arm64        NVIDIA Jetson Multimedia API is a collection of lower-level APIs that support flexible application development.
ii  nvidia-l4t-kernel                      4.9.201-tegra-32.5.1-20210505093723              arm64        NVIDIA Kernel Package
ii  nvidia-l4t-kernel-dtbs                 4.9.201-tegra-32.5.1-20210505093723              arm64        NVIDIA Kernel DTB Package
ii  nvidia-l4t-kernel-headers              4.9.201-tegra-32.5.1-20210505093723              arm64        NVIDIA Linux Tegra Kernel Headers Package
ii  nvidia-l4t-libvulkan                   32.5.1-20210519111140                            arm64        NVIDIA Vulkan Loader Package
ii  nvidia-l4t-multimedia                  32.5.1-20210519111140                            arm64        NVIDIA Multimedia Package
ii  nvidia-l4t-multimedia-utils            32.5.1-20210519111140                            arm64        NVIDIA Multimedia Package
ii  nvidia-l4t-oem-config                  32.5.1-20210219084708                            arm64        NVIDIA OEM-Config Package
ii  nvidia-l4t-tools                       32.5.1-20210614115015                            arm64        NVIDIA Public Test Tools Package
ii  nvidia-l4t-wayland                     32.5.1-20210519111140                            arm64        NVIDIA Wayland Package
ii  nvidia-l4t-weston                      32.5.1-20210519111140                            arm64        NVIDIA Weston Package
ii  nvidia-l4t-x11                         32.5.1-20210519111140                            arm64        NVIDIA X11 Package
ii  nvidia-l4t-xusb-firmware               32.5.1-20210614115015                            arm64        NVIDIA USB Firmware Package

Hi,
We have suggested adjust some properties in
Broken picture after l4t decode+transform interlaced source - #23 by DaneLLL
Does it work for you? With the setting we don’t observe any issue. would like to know your status.

And it is gstreamer in previous topic. This topic is about jetson_multiemdia_api. Do you switch from gstreamer to jetson_multimedia_api now? Would be great if you can provide more information about your use-case, such as a TV box, or?

gstreamer works well, thank you

Yep, use-case using jetson mmapi for transcode live streams; also postprocessing raw data after v4l2 decoder with a nvbuffercomposite or cuda code

Hi khizbulin,

Please try below samples, this sample works well.

/usr/src/jetson_multimedia_api/samples/16_multivideo_transcode

Hi,

your 16_multivideo_transcode don’t use NvBufferTransform or NvBufferCompose.

Hi,
Since NvBufferTransform() and NvBufferComposite() does not touch timestamps, the function calls should not make impact to mis-ordering.

We have checked and confirmed video decoder sends out frames in correct order. By decoding sample.h264 through 00_video_decode, the playback is good, and timestamps are in order with the test patch:

diff --git a/multimedia_api/ll_samples/samples/00_video_decode/video_decode_main.cpp b/multimedia_api/ll_samples/samples/00_video_decode/video_decode_main.cpp
index 8bb14a9..c7baa23 100644
--- a/multimedia_api/ll_samples/samples/00_video_decode/video_decode_main.cpp
+++ b/multimedia_api/ll_samples/samples/00_video_decode/video_decode_main.cpp
@@ -1085,6 +1085,12 @@ dec_capture_loop_fcn(void *arg)
               cout << "[" << v4l2_buf.index << "]" "dec capture plane dqB timestamp [" <<
                   v4l2_buf.timestamp.tv_sec << "s" << v4l2_buf.timestamp.tv_usec << "us]" << endl;
             }
+           static uint64_t prev_ts = 0;
+           uint64_t ts;
+           ts = v4l2_buf.timestamp.tv_sec * 1000000 + v4l2_buf.timestamp.tv_usec;
+           if (prev_ts >= ts) {
+           cout << "ts = " << ts << ", prev_ts = " << prev_ts << endl;
+           } prev_ts = ts;

             if (!ctx->disable_rendering && ctx->stats)
             {

So it is more like the buffers are not sent to encoder in correct order. Could you please check this? Please print out timestamp of each buffer and check if the order is wrong in certain condition.

Hi,

i check timestamps 00_video_decode with patch similar yours

--- jetson_multimedia_api/samples/00_video_decode/video_decode_main.cpp 2021-07-26 22:37:32.000000000 +0300
+++ jetson_multimedia_api1/samples/00_video_decode/video_decode_main.cpp        2021-08-23 13:02:28.161042492 +0300
@@ -1086,6 +1086,16 @@
                   v4l2_buf.timestamp.tv_sec << "s" << v4l2_buf.timestamp.tv_usec << "us]" << endl;
             }
 
+            static int64_t prev_ts = 0;
+            int64_t ts;
+            ts = v4l2_buf.timestamp.tv_sec * 1000000 + v4l2_buf.timestamp.tv_usec;
+            if (prev_ts >= ts) {
+                cout << "broken ts = " << ts << ", prev_ts = " << prev_ts << ", time_delta = " << ts - prev_ts << endl;
+            } else {
+                cout << "ok ts = " << ts << ", prev_ts = " << prev_ts << ", time_delta = " << ts - prev_ts << endl;
+            }
+            prev_ts = ts;
+
             if (!ctx->disable_rendering && ctx->stats)
             {
                 /* EglRenderer requires the fd of the 0th plane to render the buffer. */

./video_decode H264 --disable-rendering --input-nalu --copy-timestamp 0 25 ../22_trans_cuda/sample.h264 produce broken timestamps

ok ts = 40000, prev_ts = 0, time_delta = 40000                
ok ts = 680000, prev_ts = 40000, time_delta = 640000          
broken ts = 600000, prev_ts = 680000, time_delta = -80000  
ok ts = 760000, prev_ts = 600000, time_delta = 160000         
broken ts = 520000, prev_ts = 760000, time_delta = -240000  
ok ts = 1000000, prev_ts = 520000, time_delta = 480000        
broken ts = 920000, prev_ts = 1000000, time_delta = -80000    
ok ts = 1080000, prev_ts = 920000, time_delta = 160000       
broken ts = 840000, prev_ts = 1080000, time_delta = -240000 
ok ts = 1240000, prev_ts = 840000, time_delta = 400000        
ok ts = 1320000, prev_ts = 1240000, time_delta = 80000       
ok ts = 1400000, prev_ts = 1320000, time_delta = 80000       
ok ts = 1480000, prev_ts = 1400000, time_delta = 80000        
ok ts = 1560000, prev_ts = 1480000, time_delta = 80000        
broken ts = 1160000, prev_ts = 1560000, time_delta = -400000 
ok ts = 1800000, prev_ts = 1160000, time_delta = 640000       
ok ts = 1880000, prev_ts = 1800000, time_delta = 80000        
broken ts = 1720000, prev_ts = 1880000, time_delta = -160000  
ok ts = 1960000, prev_ts = 1720000, time_delta = 240000      
ok ts = 2040000, prev_ts = 1960000, time_delta = 80000        
broken ts = 1640000, prev_ts = 2040000, time_delta = -400000  

--input-nalu - does not work correctly
--input-chunk - works correctly

But I don’t think the problem is with timestamps. 00_video_decode produce broken timestamps also for progressive source. It’s bug from read_decoder_input_nalu

I update GitHub - maxlapshin/l4t2-demo with cuda function for transfer raw image data from dma_buf decoder capture plane to dma_buf encoder output plane. With cuda transfer you can see the problem better

./transcoder H264 sample.h264 H264 /mnt/nfs/ts/out_cuda.h264 -ew 1920 -eh 1080 --input-nalu --cuda - works good
./transcoder H264 sample.h264 H264 /mnt/nfs/ts/out_cuda.h264 -ew 1920 -eh 1080 --input-nalu --cuda --live - with simulate live input 40ms sleep the picture broken

i think, it’s bad when behavior depends on input latency

Hi,
For your description the issue is more like decoder does not output correct order in certain condition. Could you make a patch to 00_video_decode so that we can reproduce it and check with other teams? Please help check where we can put usleep(40000); in 00_video_decode and then see misorder in playback.

The logic in --copy-timestamp is for IDR-P-P-P-… encoded stream, but sample.h264 is encoded like:

POC must = frame# or field# for SNRs to be correct
--------------------------------------------------------------------------
  Frame          POC  Pic#   QP    SnrY     SnrU     SnrV   Y:U:V Time(ms)
--------------------------------------------------------------------------
00011( I )       22    20    25                             4:2:0      89
00008( B )       16    21    33                             4:2:0      70
00006( b )       12    22    35                             4:2:0      64
00007( b )       14    22    35                             4:2:0      62
00009( b )       18    22    34                             4:2:0      59
00010( b )       20    22    34                             4:2:0      55
00015( P )       30    22    30                             4:2:0      69
00013( B )       26    23    33                             4:2:0      54

So needs different logic for this kind of stream. But even though the logic is wrong, it should not trigger misorder.

Hi,

by the way, I haven’t tested 00_video_decode with 40ms live sleep.

Yep, 00_video_decode with this patch, the problem is reproduce.

--- ../../../jetson_multimedia_api/samples/00_video_decode/video_decode_main.cpp        2021-07-26 22:37:32.000000000 +0300
+++ ./video_decode_main.cpp     2021-08-24 12:59:14.271718743 +0300
@@ -135,7 +135,7 @@
     buffer->planes[0].bytesused = 4;
     stream_ptr += 4;
 
-    if (ctx->copy_timestamp)
+    //if (ctx->copy_timestamp)
     {
       if (ctx->decoder_pixfmt == V4L2_PIX_FMT_H264) {
         if ((IS_H264_NAL_CODED_SLICE(stream_ptr)) ||
@@ -1706,6 +1706,9 @@
         }
         v4l2_buf.m.planes[0].bytesused = buffer->planes[0].bytesused;
 
+       if (ctx.flag_copyts)
+         usleep(40000);
+
         if (ctx.input_nalu && ctx.copy_timestamp && ctx.flag_copyts)
         {
           /* Update the timestamp. */
/video_decode H264 --disable-rendering -o out_nv12.raw --input-nalu sample.h264
ffmpeg -y -f rawvideo -pixel_format nv12 -s 1920x1080 -i out_nv12.raw -c:v libx264 -preset fast -crf 28 -g 300 out.ts

also problem is reproduce with usleep 40ms before dqbuffer

--- ../../../jetson_multimedia_api/samples/00_video_decode/video_decode_main.cpp        2021-07-26 22:37:32.000000000 +0300
+++ ./video_decode_main.cpp     2021-08-24 13:25:09.852827831 +0300
@@ -135,7 +135,7 @@
     buffer->planes[0].bytesused = 4;
     stream_ptr += 4;
 
-    if (ctx->copy_timestamp)
+    //if (ctx->copy_timestamp)
     {
       if (ctx->decoder_pixfmt == V4L2_PIX_FMT_H264) {
         if ((IS_H264_NAL_CODED_SLICE(stream_ptr)) ||
@@ -1639,6 +1639,8 @@
 
         v4l2_buf.m.planes = planes;
 
+       if (ctx.flag_copyts)
+         usleep(40000);
         /* dequeue a buffer for output plane. */
         if(allow_DQ)
         {
1 Like

Hi,
We can reproduce the issue with your patch. Will check with teams and update.

2 Likes

Hi DaneLLL,

Any update here?

Hi pro.trener,
The issue is still under investigation. Will update once there is further progress.

Hi,

i found another bug in deinterlace thread. Some time decoder segfaults and looks like an obvious bug:

(gdb) bt
#0  0x0000007ebfc874c8 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libnvmmlite_video.so
#1  0x0000007ebfca4050 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libnvmmlite_video.so
#2  0x0000007f50fc0628 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libnvos.so
#3  0x0000007fb7dda088 in start_thread () from /lib/aarch64-linux-gnu/libpthread.so.0
#4  0x0000007fb7ac1ffc in ?? () from /lib/aarch64-linux-gnu/libc.so.6

frame 0

=> 0x0000007ebfc874c8:  ldr     w2, [x1]
   0x0000007ebfc874cc:  cmn     w2, #0x1
   0x0000007ebfc874d0:  b.eq    0x7ebfc874f0  // b.none

frame 1

   0x0000007ebfca4044:  mov     x1, #0x2c                       // #44
   0x0000007ebfca4048:  mov     x0, x19
   0x0000007ebfca404c:  bl      0x7ebfc874c8

x1 register always contain 0x2c number and read from address 0x2c always results in an error.

this is 32.2.3 l4t sdk

# R32 (release), REVISION: 2.3, GCID: 17644089, BOARD: t186ref, EABI: aarch64, DATE: Tue Nov 5 21:48:17 UTC 2019

Also, i use objdump and grep found another similar place:

segfault place

    64c8:       b9400022        ldr     w2, [x1]
    64cc:       3100045f        cmn     w2, #0x1
    64d0:       54000100        b.eq    64f0 <NvMMLiteOpen@@Base+0x1e28>  // b.none

my segfault

   23044:       d2800581        mov     x1, #0x2c                       // #44
   23048:       aa1303e0        mov     x0, x19
   2304c:       97ff8d1f        bl      64c8 <NvMMLiteOpen@@Base+0x1e00>

another possible segfault

   22eec:       d2800581        mov     x1, #0x2c                       // #44
   22ef0:       aa1303e0        mov     x0, x19
   22ef4:       97ff8d75        bl      64c8 <NvMMLiteOpen@@Base+0x1e00>

And i found similar buggy code in 32.6.1
# R32 (release), REVISION: 6.1, GCID: 27863751, BOARD: t186ref, EABI: aarch64, DATE: Mon Jul 26 19:36:31 UTC 2021

segfaulted func

    65c8:       b9400022        ldr     w2, [x1]
    65cc:       3100045f        cmn     w2, #0x1
    65d0:       54000100        b.eq    65f0 <NvMMLiteOpen@@Base+0x1ee8>  // b.none
    65d4:       a9bf7bf3        stp     x19, x30, [sp, #-16]!
    65d8:       aa0103f3        mov     x19, x1
    65dc:       97ffffe9        bl      6580 <NvMMLiteOpen@@Base+0x1e78>
    65e0:       aa1303e1        mov     x1, x19
    65e4:       12800002        mov     w2, #0xffffffff                 // #-1
    65e8:       a8c17bf3        ldp     x19, x30, [sp], #16
    65ec:       17fff7bd        b       44e0 <NvRmFenceWait@plt>

places with bug

--
   23968:	97ff81e6 	bl	4100 <NvOsSemaphoreSignal@plt>
   2396c:	17fffe4d 	b	232a0 <NvMSEncSetGlobal2DHandle@@Base+0x2ff0>
   23970:	d2800581 	mov	x1, #0x2c                  	// #44
   23974:	aa1303e0 	mov	x0, x19
   23978:	97ff8b14 	bl	65c8 <NvMMLiteOpen@@Base+0x1ec0>
--
   23ac0:	97ff8200 	bl	42c0 <NvOsDebugPrintf@plt>
   23ac4:	17ffff5c 	b	23834 <NvMSEncSetGlobal2DHandle@@Base+0x3584>
   23ac8:	d2800581 	mov	x1, #0x2c                  	// #44
   23acc:	aa1303e0 	mov	x0, x19
   23ad0:	97ff8abe 	bl	65c8 <NvMMLiteOpen@@Base+0x1ec0>

Hi @khizbulin ,
For clearness, please create a new topic and share us patch/steps so that we can try to reproduce it on r32.6.1 and check.

Hi,

about this segfault in interlaced thread, could not repeat at the last l4t sdk

Also,

segfault problem was minor; closing it does not solve the main problem of this thread.
We are still waiting for a solution to the main problem: a broken picture when a decoder with an interlaced source is working,

thanks for your support!

Hi,

main problem with broken picture for interlace source don’t fixed anyway. Thanks

Hi,
On r32.6.1, please execute the steps to run VIC at max clock and try again:
Nvvideoconvert issue, nvvideoconvert in DS4 is better than Ds5? - #3 by DaneLLL

The engine is used for deinterlacing. We have tried to run it at max clock and this issue is not observed. Please give it a try.