Decoder flush for interlaced source does not work

Hi,

i check several h264 interlace sources and on all of them samples/16_multivideo_transcode print error in the end, looks like flush decoder does not work

/usr/src/jetson_multimedia_api/samples/16_multivideo_transcode# ./multivideo_transcode num_files 1 interlaced.h264 H264 i0.h264 H264
Opening in BLOCKING MODE 
Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
NVMEDIA: NvMediaMixerInit: 119: frameWidth = 1920, frameHeight = 1088 
NVMEDIA: DeinterlaceThread: 782: DeinterlaceThread is created 
Starting decoder capture loop thread
Video Resolution: 1920x1080Decoder colorspace ITU-R BT.709 with standard range luma (16-235)
NvMMLiteOpen : Block : BlockType = 4 
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4 
875967048
842091854
H264: Profile = 66, Level = 51 
Query and set capture successful
Input file read complete
[ERROR] (NvV4l2ElementPlane.cpp:256) <enc0> Output Plane:Error while Qing buffer: Device or resource busy
ERROR while DQing buffer at output plane
[ERROR] (NvV4l2ElementPlane.cpp:178) <enc0> Capture Plane:Error while DQing buffer: Broken pipe
[ERROR] (NvV4l2ElementPlane.cpp:178) <enc0> Capture Plane:Error while DQing buffer: Broken pipe
Error while queueing buffer at decoder capture plane
Exiting decoder capture loop thread
[ERROR] (NvV4l2ElementPlane.cpp:256) <enc0> Capture Plane:Error while Qing buffer: Device or resource busy
Error while Qing buffer at capture plane
Encoder is in error
NVMEDIA: DeinterlaceThread: 860: Closing Deinterlace Thread 
Instance 0 Failed.
App run failed

jetson xavier nx/jetson tx2 8Gb nvidia jetson sdk 32.6.1 # R32 (release), REVISION: 6.1, GCID: 27863751, BOARD: t186ref, EABI: aarch64, DATE: Mon Jul 26 19:36:31 UTC 2021

also i check gstreamer on same interlaced files and his lost last few frames and don’t print error messages and hide flush error.

It is interesting that the decoding sample samples/00_video_decode works without errors.
And yes, i check some progressive h264 source and they work fine, without errors.
And in nvidia jetson sdk 32.2.3 also broken decoder flush with samples/16_multivideo_transcode

I like to write live transcoder which can switch smoothly when changing source and without a working flush this is impossible

Hi,
Please share the h264 stream so that we can run 16_multivideo_transcode sample to reproduce the error.

Hi,

Hi DaneLLL,

Were you able to reproduce this issue? Do you think your developers could fix it?

Thanks.

Hi,
We can observe the issue on TX2/Jetpack 4.6. The issue is under investigation.

Hi,
Please apply this patch and try again:

@@ -1508,6 +1508,14 @@ dec_capture_loop_fcn(void *arg)
         }
     }
 
+    ctx->stop_refill=1;
+    pthread_join(ctx->buffer_refill, NULL);
+
+    /* dqueue a buffer for sending EoS */
+    if (enc->output_plane.dqBuffer(v4l2_buf, &buffer, NULL, 10) < 0)
+    {
+        cerr << "ERROR while DQing buffer at output plane" << endl;
+    }
     buffer = enc->output_plane.getNthBuffer(v4l2_buf.index);
     v4l2_buf.type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
 
@@ -1531,10 +1539,6 @@ dec_capture_loop_fcn(void *arg)
             << endl;
     }
 
-    ctx->stop_refill=1;
-
-    pthread_join(ctx->buffer_refill, NULL);
-
     cout << "Exiting decoder capture loop thread" << endl;
     return NULL;
 }

Hi,

i check your patch. Error is gone, but transcoder continues to lose the last frames. Checking with ffprobe:

After transcoder:

# ffprobe -show_frames -print_format csv i0.h264 | wc -l
Input #0, h264, from '/mnt/nfs/ts/sd0.h264':
  Duration: N/A, bitrate: N/A
    Stream #0:0: Video: h264 (Constrained Baseline), yuv420p(progressive), 720x576, 25 fps, 25 tbr, 1200k tbn, 50 tbc
5405

Source:

# ffprobe -show_frames -print_format csv /mnt/nfs/ts/sd.h264 | wc -l
Input #0, h264, from '/mnt/nfs/ts/sd.h264':
  Duration: N/A, bitrate: N/A
    Stream #0:0: Video: h264 (Main), yuv420p(tv, bt470bg, top first), 720x576 [SAR 16:11 DAR 20:11], 25 fps, 25 tbr, 1200k tbn, 50 tbc
5411

Lost last 5411-5405=6 frames

So, i think error messages from your samples is not main problem, looks like flush decoder for interlace source does not work well.

Thanks.

Also, in some random call multivideo_transcode deadlocked.

Check while true; do ./multivideo_transcode num_files 1 in.h264 H264 out.h264 H264; done

I dump gdb backtraces gdb.txt (221.3 KB)

Hi,

This looks to be another issue. Not an issue triggered by the patch. We will check it.
[UPDATE] the issue looks specific to sd.h264. Try with sample.h264 in
GitHub - maxlapshin/l4t2-demo
We don’t observe the issue. Could you try and confirm this?

And for deadlock, what is the failure rate in your test?

Hi,

sd.h264 sample reproduces the bug with the loss of the last frames well. For me, this is a critical bug, because I expect to receive all frames that are sent to the decoder, otherwise there will be gaps in the broadcast.

I add some debug output in 00_video_decoder sample

--- a/video_decode_main.cpp 2021-09-17 07:13:42.000000000 +0300
+++ b/video_decode_main.cpp        2021-11-08 12:35:03.191819473 +0300
@@ -2168,6 +2168,8 @@
         error = 1;
     }
 
+    cout << "dec capture_plane dq:" << ctx.dec->capture_plane.getTotalDequeuedBuffers() << endl;
+
     /* The decoder destructor does all the cleanup i.e set streamoff on output and
        capture planes, unmap buffers, tell decoder to deallocate buffer (reqbufs
        ioctl with count = 0), and finally call v4l2_close on the fd. */

his lost last 5 frames:

# ./video_decode H264 --disable-rendering --input-nalu sd.h264
Set governor to performance before enabling profiler
Creating decoder in blocking mode
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
Setting frame input mode to 0
Starting decoder capture loop thread
NVMEDIA: NvMediaMixerInit: 119: frameWidth = 720, frameHeight = 576
NVMEDIA: DeinterlaceThread: 782: DeinterlaceThread is created
Video Resolution: 720x576
Decoder colorspace ITU-R BT.601 with standard range luma (16-235)
Query and set capture successful
Could not read nal unit from file. EOF or file corrupted
Input file read complete
Exiting decoder capture loop thread
dec capture_plane dq:5405
NVMEDIA: DeinterlaceThread: 860: Closing Deinterlace Thread
App run was successful

also ffmpeg transcode all 5411 frames
ffmpeg -hide_banner -y -i sd.h264 -c:v libx264 -preset ultrafast -crf 30 sd_out.h264

# ffprobe -show_frames -print_format csv sd_out.h264 | wc -l
Input #0, h264, from '/mnt/nfs/ts/sd_out.h264':
  Duration: N/A, bitrate: N/A
    Stream #0:0: Video: h264 (Constrained Baseline), yuv420p(progressive), 720x576 [SAR 16:11 DAR 20:11], 25 fps, 25 tbr, 1200k tbn, 50 tbc
5411

With your patch on 16_multivideo_transcode deadlocked after three runs.

while true; do ./multivideo_transcode num_files 1 sd. h264 H264 sd_out3.h264 H264 ; done

one of threads at the patch location:

(gdb) thread 11
[Switching to thread 11 (Thread 0x7f991011d0 (LWP 27482))]
#0  0x0000007fae5922a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f780019cc) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
88      in ../sysdeps/unix/sysv/linux/futex-internal.h
(gdb) bt
#0  0x0000007fae5922a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7f780019cc) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  0x0000007fae5922a4 in __pthread_cond_wait_common (abstime=0x0, mutex=0x7f78001970, cond=0x7f780019a0) at pthread_cond_wait.c:502
#2  0x0000007fae5922a4 in __pthread_cond_wait (cond=0x7f780019a0, mutex=0x7f78001970) at pthread_cond_wait.c:655
#3  0x0000007fadbd3fdc in  () at /usr/lib/aarch64-linux-gnu/tegra/libnvos.so
#4  0x0000007fa984cb54 in TegraV4L2_Poll_OPlane () at /usr/lib/aarch64-linux-gnu/tegra/libtegrav4l2.so
#5  0x0000007fada480f8 in plugin_ioctl () at /usr/lib/aarch64-linux-gnu/libv4l/plugins/nv/libv4l2_nvvideocodec.so
#6  0x0000007fae4745c0 in v4l2_ioctl () at /usr/lib/aarch64-linux-gnu/libv4l2.so.0
#7  0x0000005579a0774c in NvV4l2ElementPlane::dqBuffer(v4l2_buffer&, NvBuffer**, NvBuffer**, unsigned int) (this=0x7fa4001408, v4l2_buf=..., buffer=0x7f99100548, shared_buffer=0x0, num_retries=10) at NvV4l2ElementPlane.cpp:126
#8  0x00000055799d1d70 in dec_capture_loop_fcn(void*) (arg=0x7faa0ac6d8) at multivideo_transcode_main.cpp:1516
#9  0x0000007fae58c088 in start_thread (arg=0x7faa0ac47f) at pthread_create.c:463
#10 0x0000007fae003ffc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
(gdb) frame 8
#8  0x00000055799d1d70 in dec_capture_loop_fcn (arg=0x7faa0ac6d8) at multivideo_transcode_main.cpp:1516
1516        if (enc->output_plane.dqBuffer(v4l2_buf, &buffer, NULL, 10) < 0)
(gdb) list
1511        ctx->stop_refill=1;
1512
1513        pthread_join(ctx->buffer_refill, NULL);
1514
1515        /* dqueue a buffer for sending EoS */
1516        if (enc->output_plane.dqBuffer(v4l2_buf, &buffer, NULL, 10) < 0)
1517        {
1518            cerr << "ERROR while DQing buffer at output plane" << endl;
1519        }
1520

Hi,
For deadlock please try this patch:

@@ -834,11 +834,12 @@ buffer_refil(void *arg)
     context_t *ctx = (context_t *) arg;
     NvVideoDecoder *dec = ctx->dec;
     NvVideoEncoder *enc = ctx->enc;
+    bool sent_eos = false;
 
     /* Set thread name for decoder Capture Plane thread. */
     pthread_setname_np(ctx->buffer_refill, "BufferRefill");
 
-    while(!ctx->stop_refill)
+    while(!ctx->stop_refill || !sent_eos)
     {
         struct v4l2_buffer v4l2_buf;
         struct v4l2_plane planes[MAX_PLANES];
@@ -857,19 +858,46 @@ buffer_refil(void *arg)
             break;
         }
 
-        if (ctx->dec_capture_plane_mem_type == V4L2_MEMORY_DMABUF)
-        {
-            buffer->planes[0].fd = ctx->dmabuff_fd[v4l2_buf.index];
-            v4l2_buf.m.planes[0].m.fd = ctx->dmabuff_fd[v4l2_buf.index];
-        }
+        if (!ctx->stop_refill) {
+            if (ctx->dec_capture_plane_mem_type == V4L2_MEMORY_DMABUF)
+            {
+                buffer->planes[0].fd = ctx->dmabuff_fd[v4l2_buf.index];
+                v4l2_buf.m.planes[0].m.fd = ctx->dmabuff_fd[v4l2_buf.index];
+            }
 
-        if (dec->capture_plane.qBuffer(v4l2_buf, NULL) < 0)
-        {
-            abort(ctx);
-            cerr <<
-                "Error while queueing buffer at decoder capture plane"
-                << endl;
-            break;
+            if (dec->capture_plane.qBuffer(v4l2_buf, NULL) < 0)
+            {
+                abort(ctx);
+                cerr <<
+                    "Error while queueing buffer at decoder capture plane"
+                    << endl;
+                break;
+            }
+        } else {
+            /* Send size 0 buffer to encoder as EoS */
+            buffer = enc->output_plane.getNthBuffer(v4l2_buf.index);
+            v4l2_buf.type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
+
+            if (ctx->enc_output_memory_type == V4L2_MEMORY_DMABUF)
+            {
+                for (uint32_t i = 0 ; i < buffer->n_planes ; i++)
+                {
+                    buffer->planes[i].fd = ctx->dmabuff_fd[v4l2_buf.index];
+                    v4l2_buf.m.planes[i].m.fd = buffer->planes[i].fd;
+                    buffer->planes[i].bytesused = 0;
+                    v4l2_buf.m.planes[i].bytesused = 0;
+                }
+            }
+
+            /* Enqueue a size 0 buffer in encoder */
+            if (ctx->enc->output_plane.qBuffer(v4l2_buf, NULL) < 0)
+            {
+                abort(ctx);
+                cerr <<
+                    "Error while queueing buffer at encoder output plane"
+                    << endl;
+            }
+            sent_eos = true;
         }
     }
 
@@ -1508,29 +1536,6 @@ dec_capture_loop_fcn(void *arg)
         }
     }
 
-    buffer = enc->output_plane.getNthBuffer(v4l2_buf.index);
-    v4l2_buf.type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
-
-    if (ctx->enc_output_memory_type == V4L2_MEMORY_DMABUF)
-    {
-        for (uint32_t i = 0 ; i < buffer->n_planes ; i++)
-        {
-            buffer->planes[i].fd = ctx->dmabuff_fd[v4l2_buf.index];
-            v4l2_buf.m.planes[i].m.fd = buffer->planes[i].fd;
-            buffer->planes[i].bytesused = 0;
-            v4l2_buf.m.planes[i].bytesused = 0;
-        }
-    }
-
-    /* Enqueue a size 0 buffer in encoder */
-    if (ctx->enc->output_plane.qBuffer(v4l2_buf, NULL) < 0)
-    {
-        abort(ctx);
-        cerr <<
-            "Error while queueing buffer at decoder capture plane"
-            << endl;
-    }
-
     ctx->stop_refill=1;
 
     pthread_join(ctx->buffer_refill, NULL);

Hi,

with this patch also lost last 6 frames from source.

Hi,
Please help try the patch and let us know if you still observe deadlock. Would like to keep this topic for the error prints:

[ERROR] (NvV4l2ElementPlane.cpp:256) <enc0> Output Plane:Error while Qing buffer: Device or resource busy
ERROR while DQing buffer at output plane
[ERROR] (NvV4l2ElementPlane.cpp:178) <enc0> Capture Plane:Error while DQing buffer: Broken pipe
[ERROR] (NvV4l2ElementPlane.cpp:178) <enc0> Capture Plane:Error while DQing buffer: Broken pipe
Error while queueing buffer at decoder capture plane
Exiting decoder capture loop thread
[ERROR] (NvV4l2ElementPlane.cpp:256) <enc0> Capture Plane:Error while Qing buffer: Device or resource busy
Error while Qing buffer at capture plane
Encoder is in error

This would need our teams’ help. Please create a new topic and share other interlaced streams. As of now we only observe the issue in transcoding sd.h264. Transcoding sample.h264 in GitHub - maxlapshin/l4t2-demo looks OK. Would be grate if you are able to share other streams.

Ok, i create new topic Jetson h264 decoder loses last few frames

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.