Jetson h264 decoder flush deadlock

Hi,

sample_outdoor_car_1080p_10fps.h264 to long for test, for reproduce i create also transcoder piece of sample_outdoor_car_1080p_10fps

ffmpeg -y -i sample_outdoor_car_1080p_10fps.h264 -t 00:25 -c copy sample_outdoor_car_1080p_10fps_25.h264
./multivideo_transcode num_files 1 ../../data/Video/sample_outdoor_car_1080p_10fps_25.h264 H264 ../../data/Video/sample_outdoor_car_1080p_10fps_25_transcoded.h264

So, i am running multiple decode threads and after some time pid 19440 freeze

# ps auxww | grep video_de
root      8621  0.3  0.6 8637220 52016 pts/5   Sl+  13:50   0:00 ./video_decode H264 --disable-rendering --input-nalu ../../data/Video/sample_outdoor_car_1080p_10fps_25_transcoded.h264
root     10494  101  0.6 8631100 48064 pts/3   Rl+  13:53   0:02 ./video_decode H264 --disable-rendering --input-nalu ../../data/Video/sample_outdoor_car_1080p_10fps.h264
root     10514  0.0  0.5 8631100 46872 pts/4   Sl+  13:53   0:00 ./video_decode H264 --disable-rendering --input-nalu ../../data/Video/sample_outdoor_car_1080p_10fps_25.h264
root     10524  0.0  0.0   6892   628 pts/7    S+   13:53   0:00 grep --color=auto video_de
root     19440  0.1  0.6 8631100 48036 pts/2   tl+  13:30   0:02 ./video_decode H264 --disable-rendering --input-nalu ../../data/Video/sample_outdoor_car_1080p_10fps.h264
# gdb -p 19440
GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 19440
[New LWP 19442]
[New LWP 19443]
[New LWP 19444]
[New LWP 19445]
[New LWP 19455]
[New LWP 19456]
[New LWP 19457]
[New LWP 19458]
[New LWP 19464]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
0x0000007f94f83310 in __GI___pthread_timedjoin_ex (threadid=547676893648, thread_return=0x0, abstime=0x0, block=<optimized out>) at pthread_join_common.c:89
89      pthread_join_common.c: No such file or directory.
(gdb) info threads
  Id   Target Id         Frame
* 1    Thread 0x7f94fe7430 (LWP 19440) "DecOutPlane" 0x0000007f94f83310 in __GI___pthread_timedjoin_ex (threadid=547676893648, thread_return=0x0, abstime=0x0, block=<optimized out>)
    at pthread_join_common.c:89
  2    Thread 0x7f940a31d0 (LWP 19442) "drm_vbl" 0x0000007f94f882a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x55b088a288)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  3    Thread 0x7f938a21d0 (LWP 19443) "drm_pflip" 0x0000007f94f882a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x55b088a258)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  4    Thread 0x7f930a11d0 (LWP 19444) "drm_vbl" 0x0000007f94f882a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x55b088a338)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  5    Thread 0x7f928a01d0 (LWP 19445) "drm_pflip" 0x0000007f94f882a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x55b088a308)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  6    Thread 0x7f8747c1d0 (LWP 19455) "NVMDecBufProcT" 0x0000007f94f882a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x55b08b526c)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
  7    Thread 0x7f86c7b1d0 (LWP 19456) "NVMDecDisplayT" 0x0000007f94f882a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x55b08afec8)                      
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88                                                                                                                        
  8    Thread 0x7f8647a1d0 (LWP 19457) "NVMDecFrmStatsT" 0x0000007f94f882a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x55b08aff3c)       
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88                                          
  9    Thread 0x7f849641d0 (LWP 19458) "V4L2_DecThread" 0x0000007f94f882a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x55b095f3cc)        
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88                          
  10   Thread 0x7f841631d0 (LWP 19464) "DecCapPlane" 0x0000007f94f882a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x55b095f358)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:88
(gdb) thread 10                                                              
[Switching to thread 10 (Thread 0x7f841631d0 (LWP 19464))]        
#0  0x0000007f94f882a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x55b095f358) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
88      ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
(gdb) list                                     
83      in ../sysdeps/unix/sysv/linux/futex-internal.h
(gdb) bt                                   
#0  0x0000007f94f882a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x55b095f358) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  0x0000007f94f882a4 in __pthread_cond_wait_common (abstime=0x0, mutex=0x55b095f300, cond=0x55b095f330) at pthread_cond_wait.c:502
#2  0x0000007f94f882a4 in __pthread_cond_wait (cond=0x55b095f330, mutex=0x55b095f300) at pthread_cond_wait.c:655
#3  0x0000007f945c5fdc in  () at /usr/lib/aarch64-linux-gnu/tegra/libnvos.so
#4  0x0000007f90a3fbf0 in TegraV4L2_Poll_CPlane () at /usr/lib/aarch64-linux-gnu/tegra/libtegrav4l2.so
#5  0x0000007f9443a2e4 in plugin_ioctl () at /usr/lib/aarch64-linux-gnu/libv4l/plugins/nv/libv4l2_nvvideocodec.so
#6  0x0000007f94e69d68 in v4l2_ioctl (fd=14, request=3227014673) at libv4l2.c:1152
#7  0x000000557b13ccc0 in NvV4l2ElementPlane::dqBuffer(v4l2_buffer&, NvBuffer**, NvBuffer**, unsigned int) (this=0x55b089a488, v4l2_buf=..., buffer=0x7f841625b8, shared_buffer=0x0, num_ret
ries=0) at NvV4l2ElementPlane.cpp:126
#8  0x000000557b105720 in dec_capture_loop_fcn(void*) (arg=0x7ffba68218) at video_decode_main.cpp:1055
#9  0x0000007f94f82088 in start_thread (arg=0x7ffba67fff) at pthread_create.c:463
#10 0x0000007f949f5ffc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
(gdb) frame 8  
#8  0x000000557b105720 in dec_capture_loop_fcn (arg=0x7ffba68218) at video_decode_main.cpp:1055
1055                if (dec->capture_plane.dqBuffer(v4l2_buf, &dec_buffer, NULL, 0))
(gdb) list                                   
1050                memset(&v4l2_buf, 0, sizeof(v4l2_buf));                
1051                memset(planes, 0, sizeof(planes));                                                                                                      
1052                v4l2_buf.m.planes = planes;          
1053              
1054                /* Dequeue a filled buffer. */
1055                if (dec->capture_plane.dqBuffer(v4l2_buf, &dec_buffer, NULL, 0))                                                                                                 
1056                {          
1057                    if (errno == EAGAIN)                                                                                                               
1058                    {                            
1059                        usleep(1000);                                                                                                                    
(gdb) p dec->capture_plane.num_queued_buffers        
$1 = 10                                                                                                                                                    
(gdb) p ctx.got_e                                    
There is no member named got_e.                                                                                                                              
(gdb) p ctx.got_eos                                  
$2 = true                                                                                                                                                         
(gdb)                                  

Can you please tell me did you succeed in reproducing this behavior?

Also, with three decoders and small pieces of sample_outdoor_car_1080p_10fps_25_transcoded.h264 freeze problem is reproduced stably.

ffmpeg -y -i ../../data/Video/sample_outdoor_car_1080p_10fps.h264 -t 00:05 -c copy ../../data/Video/sample_outdoor_car_1080p_10fps_05.h264
ffmpeg -y -i ../../data/Video/sample_outdoor_car_1080p_10fps.h264 -t 00:10 -c copy ../../data/Video/sample_outdoor_car_1080p_10fps_10.h264
while true; do ./video_decode H264 --disable-rendering --input-nalu ../../data/Video/sample_outdoor_car_1080p_10fps_05.h264 ; done &
while true; do ./video_decode H264 --disable-rendering --input-nalu ../../data/Video/sample_outdoor_car_1080p_10fps_05.h264 ; done &
while true; do ./video_decode H264 --disable-rendering --input-nalu ../../data/Video/sample_outdoor_car_1080p_10fps_10.h264 ; done &

Hi,
Thanks for the information. We will follow the steps to reproduce the issue.

Hi,
Please apply the patch and try again:

diff --git a/multimedia_api/ll_samples/samples/00_video_decode/video_decode.h b/multimedia_api/ll_samples/samples/00_video_decode/video_decode.h
index 4c81278..9d6ccfd 100644
--- a/multimedia_api/ll_samples/samples/00_video_decode/video_decode.h
+++ b/multimedia_api/ll_samples/samples/00_video_decode/video_decode.h
@@ -101,6 +101,7 @@ typedef struct
 
     pthread_t dec_capture_loop; // Decoder capture thread, created if running in blocking mode.
     bool got_error;
+    bool op_sent_eos; // Sent EoS to output plane
     bool got_eos;
     bool vp9_file_header_flag;
     bool vp8_file_header_flag;
diff --git a/multimedia_api/ll_samples/samples/00_video_decode/video_decode_main.cpp b/multimedia_api/ll_samples/samples/00_video_decode/video_decode_main.cpp
index 8bb14a9..47b050c 100644
--- a/multimedia_api/ll_samples/samples/00_video_decode/video_decode_main.cpp
+++ b/multimedia_api/ll_samples/samples/00_video_decode/video_decode_main.cpp
@@ -1056,7 +1056,14 @@ dec_capture_loop_fcn(void *arg)
             {
                 if (errno == EAGAIN)
                 {
-                    usleep(1000);
+                    if (ctx->op_sent_eos)
+                    {
+                        usleep(16666);
+                    }
+                    else
+                    {
+                        usleep(1000);
+                    }
                 }
                 else
                 {
@@ -2055,6 +2062,7 @@ decode_proc(context_t& ctx, int argc, char *argv[])
         eos = decoder_proc_blocking(ctx, eos, current_file, current_loop, nalu_parse_buffer);
     else
         eos = decoder_proc_nonblocking(ctx, eos, current_file, current_loop, nalu_parse_buffer);
+    ctx.op_sent_eos = eos;
     /* After sending EOS, all the buffers from output plane should be dequeued.
        and after that capture plane loop should be signalled to stop. */
     if (ctx.blocking_mode)

It looks to be a race condition in handing EoS. We don’t see the issue after applying the patch. Please give it a try.

Hi,

will your patch always work? I expected you to see race condition and accept it as a bug in the v4l2 implementation. I write live transcoder with multiple input and output video stream and your patch with usleep doesn’t look like a solution that will help in my case.

Also, i start 5 ./video_decode process and after ~6 hours one of his freeze. Looks like race condition in libtegrav4l2.so, so maybe you will fix libtegrav4l2.so ?

(gdb) thread 10
[Switching to thread 10 (Thread 0x7f5dce81d0 (LWP 9907))]
#0  0x0000007f7ed6f2a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x55714f035c) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
88      ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
(gdb) bt
#0  0x0000007f7ed6f2a4 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x55714f035c) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  0x0000007f7ed6f2a4 in __pthread_cond_wait_common (abstime=0x0, mutex=0x55714f0300, cond=0x55714f0330) at pthread_cond_wait.c:502
#2  0x0000007f7ed6f2a4 in __pthread_cond_wait (cond=0x55714f0330, mutex=0x55714f0300) at pthread_cond_wait.c:655
#3  0x0000007f7e3acfdc in  () at /usr/lib/aarch64-linux-gnu/tegra/libnvos.so
#4  0x0000007f7a826bf0 in TegraV4L2_Poll_CPlane () at /usr/lib/aarch64-linux-gnu/tegra/libtegrav4l2.so
#5  0x0000007f7e2212e4 in plugin_ioctl () at /usr/lib/aarch64-linux-gnu/libv4l/plugins/nv/libv4l2_nvvideocodec.so
#6  0x0000007f7ec50d68 in v4l2_ioctl (fd=14, request=3227014673) at libv4l2.c:1152
#7  0x000000555cda8ce8 in NvV4l2ElementPlane::dqBuffer(v4l2_buffer&, NvBuffer**, NvBuffer**, unsigned int) (this=0x557142b488, v4l2_buf=..., buffer=0x7f5dce75b8, shared_buffer=0x0, num_retries=0)
    at NvV4l2ElementPlane.cpp:126
#8  0x000000555cd71720 in dec_capture_loop_fcn(void*) (arg=0x7fe32daec8) at video_decode_main.cpp:1055
#9  0x0000007f7ed69088 in start_thread (arg=0x7fe32dacaf) at pthread_create.c:463
#10 0x0000007f7e7dcffc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
(gdb) frame 8
#8  0x000000555cd71720 in dec_capture_loop_fcn (arg=0x7fe32daec8) at video_decode_main.cpp:1055
1055                if (dec->capture_plane.dqBuffer(v4l2_buf, &dec_buffer, NULL, 0))
(gdb) list
1050                memset(&v4l2_buf, 0, sizeof(v4l2_buf));
1051                memset(planes, 0, sizeof(planes));
1052                v4l2_buf.m.planes = planes;
1053
1054                /* Dequeue a filled buffer. */
1055                if (dec->capture_plane.dqBuffer(v4l2_buf, &dec_buffer, NULL, 0))
1056                {
1057                    if (errno == EAGAIN)
1058                    {
1059                        if (ctx->op_sent_eos)
(gdb) p ctx.got_eos
$1 = true
(gdb) p ctx.op_sent_eos
$2 = true

Hi,
With the patch, we don’t hit issue in 3 decoding processes. Do you run more processes? If there is a faster way to reproduce it, please share so that we can set up and work on it more efficiently.

We are also checking libtegrav4l2.so, but this would take some time. Would like to suggest a quick fix for existing release. Please try longer delay such as usleep(30000).

Hi,

yes i am starting more processes. Last time six processes. The goal of the topic has been achieved for me, you reproduced the problem and see what you can do with libtegrav4l2.so and maybe deeper. Fast patches may be fine, but not in this case. I researched your code samples and came to the conclusion that the more complex the logic after receiving the decoded buffer, the more likely errors are. To make it easier for you to fix the problem, I use video_decode as the smallest example of code that reproduces the problem.

Hi,
Please apply this patch and give it a try:

diff --git a/multimedia_api/ll_samples/samples/00_video_decode/video_decode_main.cpp b/multimedia_api/ll_samples/samples/00_video_decode/video_decode_main.cpp
index 8bb14a9..4bc9243 100644
--- a/multimedia_api/ll_samples/samples/00_video_decode/video_decode_main.cpp
+++ b/multimedia_api/ll_samples/samples/00_video_decode/video_decode_main.cpp
@@ -1024,7 +1024,7 @@ dec_capture_loop_fcn(void *arg)
         query_and_set_capture(ctx);
 
     /* Exit on error or EOS which is signalled in main() */
-    while (!(ctx->got_error || dec->isInError() || ctx->got_eos))
+    while (!(ctx->got_error || dec->isInError()))
     {
         NvBuffer *dec_buffer;
 
@@ -1038,6 +1038,9 @@ dec_capture_loop_fcn(void *arg)
                 case V4L2_EVENT_RESOLUTION_CHANGE:
                     query_and_set_capture(ctx);
                     continue;
+                case V4L2_EVENT_EOS:
+                    cout << "Got EoS event at capture plane" << endl;
+                    goto handle_eos;
             }
         }
 
@@ -1209,6 +1212,7 @@ dec_capture_loop_fcn(void *arg)
             }
         }
     }
+handle_eos:
 #ifndef USE_NVBUF_TRANSFORM_API
     /* Send EOS to converter */
     if (ctx->conv)
@@ -1826,6 +1830,11 @@ decode_proc(context_t& ctx, int argc, char *argv[])
     ret = ctx.dec->subscribeEvent(V4L2_EVENT_RESOLUTION_CHANGE, 0, 0);
     TEST_ERROR(ret < 0, "Could not subscribe to V4L2_EVENT_RESOLUTION_CHANGE",
                cleanup);
+    /* Subscribe to EOS event.
+       Refer ioctl VIDIOC_SUBSCRIBE_EVENT */
+    ret = ctx.dec->subscribeEvent(V4L2_EVENT_EOS, 0, 0);
+    TEST_ERROR(ret < 0, "Could not subscribe to V4L2_EVENT_EOS",cleanup);
+
 
     /* Set format on the output plane.
        Refer ioctl VIDIOC_S_FMT */
@@ -2076,6 +2085,11 @@ decode_proc(context_t& ctx, int argc, char *argv[])
                 abort(&ctx);
                 break;
             }
+            if (v4l2_buf.m.planes[0].bytesused == 0)
+            {
+                cout << "Got EoS at output plane"<< endl;
+                break;
+            }
 
             if ((v4l2_buf.flags & V4L2_BUF_FLAG_ERROR) && ctx.enable_input_metadata)
             {
@@ -2106,6 +2120,20 @@ decode_proc(context_t& ctx, int argc, char *argv[])
     }
 #endif
 
+cleanup:
+    if (ctx.blocking_mode && ctx.dec_capture_loop)
+    {
+        pthread_join(ctx.dec_capture_loop, NULL);
+    }
+    else if (!ctx.blocking_mode)
+    {
+        /* Clear the poll interrupt to get the decoder's poll thread out. */
+        ctx.dec->ClearPollInterrupt();
+        /* If Pollthread is waiting on, signal it to exit the thread. */
+        sem_post(&ctx.pollthread_sema);
+        pthread_join(ctx.dec_pollthread, NULL);
+    }
+
     if (ctx.stats)
     {
         profiler.stop();
@@ -2123,19 +2151,6 @@ decode_proc(context_t& ctx, int argc, char *argv[])
         profiler.printProfilerData(cout);
     }
 
-cleanup:
-    if (ctx.blocking_mode && ctx.dec_capture_loop)
-    {
-        pthread_join(ctx.dec_capture_loop, NULL);
-    }
-    else if (!ctx.blocking_mode)
-    {
-        /* Clear the poll interrupt to get the decoder's poll thread out. */
-        ctx.dec->ClearPollInterrupt();
-        /* If Pollthread is waiting on, signal it to exit the thread. */
-        sem_post(&ctx.pollthread_sema);
-        pthread_join(ctx.dec_pollthread, NULL);
-    }
     if(ctx.capture_plane_mem_type == V4L2_MEMORY_DMABUF)
     {
         for(int index = 0 ; index < ctx.numCapBuffers ; index++)

It is to subscribe EoS event and handles the event in termination. We set up to run 6 decoding processes and don’t hit deadlock.

Hi,

patch looks interesting and it works except for one small things: looks like sometime decoder lost one frame. I added printing counter total dequed buffers from capture plane and this most often print 53 frame and sometime 52 frame. Your patch with my print and my video sample: 00_video_decode with event eos patch by hizel · Pull Request #1 · maxlapshin/l4t2-demo · GitHub

So, i run 5-6 process: while true; do ./video_decode H264 --disable-rendering --input-nalu ../../data/Video/sample_outdoor_car_1080p_10fps_05.h264 2> /dev/null | grep 'capture_plane total dequeued buffers' ; done and output looks like

capture_plane total dequeued buffers:53
capture_plane total dequeued buffers:53
capture_plane total dequeued buffers:53
capture_plane total dequeued buffers:53
capture_plane total dequeued buffers:52
capture_plane total dequeued buffers:53
capture_plane total dequeued buffers:53
capture_plane total dequeued buffers:53
capture_plane total dequeued buffers:52
capture_plane total dequeued buffers:53

Hi,
Do you observe the issue with –stats option:

$ ./video_decode H264 --disable-rendering --stats ../../data/Video/sample_outdoor_car_1080p_10fps_05.h264

Hi,

yep also from stats

Total units processed = 53
Total units processed = 53
Total units processed = 53
Total units processed = 52
Total units processed = 53
Total units processed = 53
Total units processed = 53

6 streams while true; do ./video_decode H264 --disable-rendering --stats ../../data/Video/sample_outdoor_car_1080p_10fps_05.h264 2> /dev/null | egrep 'Total units processed' ; done

Hi,

after some time all video_decode process dead, linux oom killer kill all. The system is in a broken state, it is not even possible to use journalctl. At first glance, there is enough free memory:

# free -m
              total        used        free      shared  buff/cache   available
Mem:           7773        1554         903           0        5315        6050
Swap:          3886         424        3462

But it is not so:

cat /proc/buddyinfo 
Node 0, zone      DMA   3742   4946      2      3      2      1      2      3      0      2    165 
Node 0, zone   Normal  43301   2004      0      0      0      0      0      0      0      0      0

Looks like kernel memory leak.

Part of dmesg:

дек 06 11:39:09 core-xavier-nx0 kernel: Killed process 24610 (video_decode) total-vm:24448kB, anon-rss:1772kB, file-rss:5904kB, shmem-rss:0kB
дек 06 11:39:09 core-xavier-nx0 kernel: video_decode: page allocation failure: order:2, mode:0x27080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK)
дек 06 11:39:09 core-xavier-nx0 kernel: CPU: 3 PID: 24610 Comm: video_decode Not tainted 4.9.253-tegra #1
дек 06 11:39:09 core-xavier-nx0 kernel: Hardware name: NVIDIA Jetson Xavier NX Developer Kit (DT)
дек 06 11:39:09 core-xavier-nx0 kernel: Call trace:
дек 06 11:39:09 core-xavier-nx0 kernel: [<ffffff800808ba40>] dump_backtrace+0x0/0x198
дек 06 11:39:09 core-xavier-nx0 kernel: [<ffffff800808c004>] show_stack+0x24/0x30
дек 06 11:39:09 core-xavier-nx0 kernel: [<ffffff8008f6121c>] dump_stack+0xa0/0xc4
дек 06 11:39:09 core-xavier-nx0 kernel: [<ffffff80081cd82c>] warn_alloc+0x104/0x130
дек 06 11:39:09 core-xavier-nx0 kernel: [<ffffff80081cdd28>] __alloc_pages_nodemask+0x450/0xcb8
дек 06 11:39:09 core-xavier-nx0 kernel: [<ffffff80080af7f0>] copy_process.isra.7.part.8+0xf0/0x1578
дек 06 11:39:09 core-xavier-nx0 kernel: [<ffffff80080b0e18>] _do_fork+0xd8/0x470
дек 06 11:39:09 core-xavier-nx0 kernel: [<ffffff80080b1324>] SyS_clone+0x4c/0x60
дек 06 11:39:09 core-xavier-nx0 kernel: [<ffffff8008083900>] el0_svc_naked+0x34/0x38
дек 06 11:39:09 core-xavier-nx0 kernel: Mem-Info:
дек 06 11:39:09 core-xavier-nx0 kernel: active_anon:2100 inactive_anon:2402 isolated_anon:0
                                            active_file:4746 inactive_file:4292 isolated_file:0
                                            unevictable:3879 dirty:3 writeback:0 unstable:0
                                            slab_reclaimable:1351584 slab_unreclaimable:326291
                                            mapped:3652 shmem:28 pagetables:752 bounce:0
                                            free:226195 free_pcp:0 free_cma:161230
дек 06 11:39:09 core-xavier-nx0 kernel: Node 0 active_anon:8400kB inactive_anon:9608kB active_file:18984kB inactive_file:17168kB unevictable:15516kB isolated(anon):0kB isolated(file):0kB mapped:14608kB dirty
дек 06 11:39:09 core-xavier-nx0 kernel: DMA free:699060kB min:10204kB low:12752kB high:15300kB active_anon:64kB inactive_anon:84kB active_file:4kB inactive_file:0kB unevictable:0kB writepending:0kB present:1
дек 06 11:39:09 core-xavier-nx0 kernel: lowmem_reserve[]: 0 6000 6000 6000
дек 06 11:39:09 core-xavier-nx0 kernel: Normal free:205720kB min:34848kB low:43560kB high:52272kB active_anon:8336kB inactive_anon:9524kB active_file:18980kB inactive_file:17168kB unevictable:15516kB writepe
дек 06 11:39:09 core-xavier-nx0 kernel: lowmem_reserve[]: 0 0 0 0
дек 06 11:39:09 core-xavier-nx0 kernel: DMA: 3697*4kB (UEHC) 4946*8kB (UEHC) 2*16kB (C) 2*32kB (C) 2*64kB (C) 1*128kB (C) 1*256kB (C) 2*512kB (C) 0*1024kB 2*2048kB (C) 156*4096kB (C) = 699060kB
дек 06 11:39:09 core-xavier-nx0 kernel: Normal: 45595*4kB (UME) 2979*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 206212kB
дек 06 11:39:09 core-xavier-nx0 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
дек 06 11:39:09 core-xavier-nx0 kernel: 13928 total pagecache pages
дек 06 11:39:09 core-xavier-nx0 kernel: 942 pages in swap cache
дек 06 11:39:09 core-xavier-nx0 kernel: Swap cache stats: add 215294968, delete 215684005, find 7092633/36947129
дек 06 11:39:09 core-xavier-nx0 kernel: Free swap  = 3551112kB
дек 06 11:39:09 core-xavier-nx0 kernel: Total swap = 3979992kB
дек 06 11:39:09 core-xavier-nx0 kernel: 2033152 pages RAM
дек 06 11:39:09 core-xavier-nx0 kernel: 0 pages HighMem/MovableOnly
дек 06 11:39:09 core-xavier-nx0 kernel: 43153 pages reserved
дек 06 11:39:09 core-xavier-nx0 kernel: 188416 pages cma reserved
дек 06 11:39:09 core-xavier-nx0 kernel: Out of memory: Kill process 24601 (video_decode) score 0 or sacrifice child
дек 06 11:39:09 core-xavier-nx0 kernel: Killed process 24601 (video_decode) total-vm:70828kB, anon-rss:2960kB, file-rss:7124kB, shmem-rss:0kB
дек 06 11:39:09 core-xavier-nx0 kernel: Out of memory: Kill process 24613 (video_decode) score 0 or sacrifice child
дек 06 11:39:09 core-xavier-nx0 kernel: Killed process 24613 (video_decode) total-vm:70828kB, anon-rss:3436kB, file-rss:8392kB, shmem-rss:0kB

Hi,
Please apply the prebuilt lib and patch for a try:

diff --git a/multimedia_api/ll_samples/samples/00_video_decode/video_decode_main.cpp b/multimedia_api/ll_samples/samples/00_video_decode/video_decode_main.cpp
index 8bb14a9..5780c5e 100644
--- a/multimedia_api/ll_samples/samples/00_video_decode/video_decode_main.cpp
+++ b/multimedia_api/ll_samples/samples/00_video_decode/video_decode_main.cpp
@@ -1024,7 +1024,7 @@ dec_capture_loop_fcn(void *arg)
         query_and_set_capture(ctx);
 
     /* Exit on error or EOS which is signalled in main() */
-    while (!(ctx->got_error || dec->isInError() || ctx->got_eos))
+    while (!(ctx->got_error || dec->isInError()))
     {
         NvBuffer *dec_buffer;
 
@@ -1056,6 +1056,11 @@ dec_capture_loop_fcn(void *arg)
             {
                 if (errno == EAGAIN)
                 {
+                    if (v4l2_buf.flags & V4L2_BUF_FLAG_LAST)
+                    {
+                        cout << "Got EoS at capture plane" << endl;
+                        goto handle_eos;
+                    }
                     usleep(1000);
                 }
                 else
@@ -1209,6 +1214,8 @@ dec_capture_loop_fcn(void *arg)
             }
         }
     }
+handle_eos:
+
 #ifndef USE_NVBUF_TRANSFORM_API
     /* Send EOS to converter */
     if (ctx->conv)
@@ -2076,6 +2083,11 @@ decode_proc(context_t& ctx, int argc, char *argv[])
                 abort(&ctx);
                 break;
             }
+            if (v4l2_buf.m.planes[0].bytesused == 0)
+            {
+                cout << "Got EoS at output plane"<< endl;
+                break;
+            }
 
             if ((v4l2_buf.flags & V4L2_BUF_FLAG_ERROR) && ctx.enable_input_metadata)
             {
@@ -2106,6 +2118,20 @@ decode_proc(context_t& ctx, int argc, char *argv[])
     }
 #endif
 
+cleanup:
+    if (ctx.blocking_mode && ctx.dec_capture_loop)
+    {
+        pthread_join(ctx.dec_capture_loop, NULL);
+    }
+    else if (!ctx.blocking_mode)
+    {
+        /* Clear the poll interrupt to get the decoder's poll thread out. */
+        ctx.dec->ClearPollInterrupt();
+        /* If Pollthread is waiting on, signal it to exit the thread. */
+        sem_post(&ctx.pollthread_sema);
+        pthread_join(ctx.dec_pollthread, NULL);
+    }
+
     if (ctx.stats)
     {
         profiler.stop();
@@ -2123,19 +2149,6 @@ decode_proc(context_t& ctx, int argc, char *argv[])
         profiler.printProfilerData(cout);
     }
 
-cleanup:
-    if (ctx.blocking_mode && ctx.dec_capture_loop)
-    {
-        pthread_join(ctx.dec_capture_loop, NULL);
-    }
-    else if (!ctx.blocking_mode)
-    {
-        /* Clear the poll interrupt to get the decoder's poll thread out. */
-        ctx.dec->ClearPollInterrupt();
-        /* If Pollthread is waiting on, signal it to exit the thread. */
-        sem_post(&ctx.pollthread_sema);
-        pthread_join(ctx.dec_pollthread, NULL);
-    }
     if(ctx.capture_plane_mem_type == V4L2_MEMORY_DMABUF)
     {
         for(int index = 0 ; index < ctx.numCapBuffers ; index++)

r32_6_1_TEST_libtegrav4l2.zip (80.2 KB)

Please run and check deadloack and not decoding to last frame. In our test we don’t see deadlock and see improvement of no decoding to last frame. It happens once in running 6 processes for 1+ hour. For a complete fix we would need to check with teams and may take some time. Please try the current solution.

Hi,

thank you for the patch; this fix looks promising, but I need to do a few more tests to make sure that it is working.

Please consider building the library libtegrav4l2.so for the previous version of l4t SDK - 32.2.3.

Unfortunately, the current version 32.6.1 of the SDK has serious issues with decoding interlaced live sources
( Possible multimedia api regression with decode interlace source - #12 by khizbulin ) and this prevents us from fully test and utilize your solution for decoder flush deadlock.

Hello,

We are having an issue where after several days, sometimes weeks a Jetson module performing transcoding operations would suddenly freeze.

While looking for a solution, we have encountered and reproduced an issue where after four days of constant restarts of six video_decode processes the kernel throws an OOM error and kills all system processes including systemd.

It looks like kernel memory leak, because the system reports that there is some memory available, but in fact there is no RAM left in the system - it is sliced into very small chunks. See /proc/buddyinfo

# cat /proc/meminfo | grep Mem
MemTotal:        7959996 kB
MemFree:         1134948 kB
MemAvailable:    6168752 kB
NvMapMemFree:          0 kB
NvMapMemUsed:          0 kB
#cat /proc/buddyinfo 
Node 0, zone      DMA      9      8   1949    146      2      1      1      2      0      1    171 
Node 0, zone   Normal  65657  16524      2      0      0      0      0      0      0      0      0

After the system enters this OOM state, it is impossible to programmatically recover - the only solution is to do hardware reset of the Jetson module.

We realize, that restarting video_decode every two seconds in the course of 3-4 days is an artificial task, but it illustrates the problem that we are having with random OOM errors happening to our live transcoders working 24/7.

Unfortunately, we could not find a way to quicker reproduce OOM situation that is related to video_decode restart.

Here is the steps on how to reproduce the issue:

while true; do ./video_decode H264 --disable-rendering --stats ../../data/Video/sample_outdoor_car_1080p_10fps_05.h264' ; done

Please advise,

Thanks,

Hi,
This looks to be another issue. Please create a new topic. Do you observe the issue on r32.2.3 or r32.6.1? We would suggest use latest release.

Hi,

this is r32.6.1 l4t sdk and modified libtegrav4l2.so library. I have not tested the original libtegrav4l2.so.

Hi,

This solution works, thank you. The only thing that is bothering us a little is a slow increase in time of the resources consumption.

Are you going to keep this flush behavior? It is different from what you demonstrated in code examples; however it is very similar to the one described in SDK documentation.

We really appreciate your help; you are offering an excellent support.

Thank you and all of your team!

Hi,
The fix will be present in next release.

If you have observed another issue, please start a new topic with steps so that we can replicate it and check.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.