Frame data extraction and pipeline halts/hangs after 7 minutes of run

**• Hardware Platform - Jetson **
**• DeepStream Version - 5.0 **
• JetPack Version - 4.5
• TensorRT Version - 7.1.3
• Issue Type( questions, bugs)

Hi,

I work on DeepStream app that uses very simple pipeline where I need the inference only. I need the metadata and frames to be post processed on the separate threads beyond the DS pipeline so I decided to use pad probe to extract metadata and frames and store them in the external variables. It worked perfect until I added the the extraction of frame buffer. Now I have encountered the weird behaviour - app works perfect about 7 min +/- 30 sec then halts/hangs without saying a word. My Jetson is in MAXN power and has proper fan. No overheating or memory leaks.

The pipeline is:

gst-launch-1.0 v4l2src device=/dev/video0 io-mode=2 ! 'image/jpeg, width=3264, height=2448, framerate=15/1, format=MJPG'  \
! jpegdec \
! videoconvert \
! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=NV12' \
! m.sink_0 nvstreammux name=m batch-size=1 width=3264 height=2448 \
! nvinfer config-file-path=../config/pgie_config.txt batch-size=1 unique-id=1 \
! fakesink 

The pad is connected to fakesink’s sink

The pad probe code is here:

static GstPadProbeReturn meta_probe_callback (GstPad * pad, GstPadProbeInfo * info,
                            gpointer u_data)
{
    GstBuffer *buf = (GstBuffer *) info->data;
    NvDsObjectMeta *obj_meta = NULL;
    NvDsMetaList * l_frame = NULL;
    NvDsMetaList * l_obj = NULL;
    NvBufSurface *surface = NULL;
    GstMapInfo in_map_info;
    long long tt;

    memset (&in_map_info, 0, sizeof (in_map_info));

    if (gst_buffer_map (buf, &in_map_info, GST_MAP_READWRITE)) {
        surface = (NvBufSurface *) in_map_info.data;

        NvBufSurfaceMap(surface, -1, -1, NVBUF_MAP_READ_WRITE);
        NvBufSurfaceSyncForCpu(surface, -1, -1);

        NvDsBatchMeta *batch_meta = gst_buffer_get_nvds_batch_meta (buf);

        for (l_frame = batch_meta->frame_meta_list; l_frame != NULL;
             l_frame = l_frame->next) {
            NvDsFrameMeta *frame_meta = (NvDsFrameMeta *) (l_frame->data);

            gint frame_width = (gint)surface->surfaceList[frame_meta->batch_id].width;
            gint frame_height = (gint)surface->surfaceList[frame_meta->batch_id].height;
            void *frame_data = surface->surfaceList[frame_meta->batch_id].mappedAddr.addr[0];
            size_t frame_step = surface->surfaceList[frame_meta->batch_id].pitch;

            FRAME_BUF = cv::Mat(frame_height, frame_width, CV_8UC4, frame_data, frame_step);

            for (l_obj = frame_meta->obj_meta_list; l_obj != NULL;
                 l_obj = l_obj->next) {
                obj_meta = (NvDsObjectMeta *) (l_obj->data);
                NvOSD_RectParams * rect_params = &(obj_meta->rect_params);
                int left = (int) (rect_params->left);
                int top = (int) (rect_params->top);
                int right = left + (int) (rect_params->width);
                int bottom = top + (int) (rect_params->height);
                int class_index = obj_meta->class_id;
                const char * lbl = CLASSES[obj_meta->class_id].c_str();
                g_print("%s, ", lbl);
            }
            tt = get_timestamp_msec();
            g_print("W: %d, H: %d, ",  frame_width, frame_height);
            g_print("duration: %lld \n",  tt - TMP_T);
            TMP_T = tt;
        }
        NvBufSurfaceUnMap(surface, -1, -1);
    }

    return GST_PAD_PROBE_OK;
}

No other threads are running outside the pipeline. CLASSES, FRAME_BUF etc are external variables.

Interesting to notice that after it’s halt/hang if I send SIGINT the log is the following and looks like the pipeline is destroyed for some reason.

^C*** Interrupted ***
0:07:26.097549195 23425   0x5592210000 DEBUG              GST_EVENT gstevent.c:306:gst_event_new_custom: creating new event 0x5592640f10 eos 28174
0:07:26.097603362 23425   0x5592210000 DEBUG       GST_ELEMENT_PADS gstelement.c:1856:gst_element_send_event: send eos event on element main-pipeline
0:07:26.097646800 23425   0x5592210000 DEBUG                    bin gstbin.c:3135:gst_bin_send_event:<main-pipeline> Sending eos event to src children
0:07:26.097718207 23425   0x5592210000 DEBUG             GST_STATES gstbin.c:2032:bin_element_is_src:<main-pipeline> child fake_sink is not src
0:07:26.097752010 23425   0x5592210000 DEBUG             GST_STATES gstbin.c:2032:bin_element_is_src:<main-pipeline> child primary-nvinference-engine is not src
0:07:26.097807323 23425   0x5592210000 DEBUG             GST_STATES gstbin.c:2032:bin_element_is_src:<main-pipeline> child streammux is not src
0:07:26.097836386 23425   0x5592210000 DEBUG             GST_STATES gstbin.c:2032:bin_element_is_src:<main-pipeline> child camera_bin is src
0:07:26.097907325 23425   0x5592210000 DEBUG       GST_ELEMENT_PADS gstelement.c:1856:gst_element_send_event: send eos event on element camera_bin
0:07:26.097953836 23425   0x5592210000 DEBUG                    bin gstbin.c:3135:gst_bin_send_event:<camera_bin> Sending eos event to src children
0:07:26.098017014 23425   0x5592210000 DEBUG             GST_STATES gstbin.c:2032:bin_element_is_src:<camera_bin> child nvvidconv_cap_filter is not src
0:07:26.098052535 23425   0x5592210000 DEBUG             GST_STATES gstbin.c:2032:bin_element_is_src:<camera_bin> child nvvideoconv is not src
0:07:26.098081806 23425   0x5592210000 DEBUG             GST_STATES gstbin.c:2032:bin_element_is_src:<camera_bin> child videoconv is not src
0:07:26.098167849 23425   0x5592210000 DEBUG             GST_STATES gstbin.c:2032:bin_element_is_src:<camera_bin> child jpeg_dec is not src
0:07:26.098202537 23425   0x5592210000 DEBUG             GST_STATES gstbin.c:2032:bin_element_is_src:<camera_bin> child src_cap_filter is not src
0:07:26.098457020 23425   0x5592210000 DEBUG             GST_STATES gstbin.c:2032:bin_element_is_src:<camera_bin> child src_elem is src
0:07:26.098482281 23425   0x5592210000 DEBUG       GST_ELEMENT_PADS gstelement.c:1856:gst_element_send_event: send eos event on element src_elem
0:07:26.098527594 23425   0x5592210000 DEBUG                basesrc gstbasesrc.c:1786:gst_base_src_send_event:<src_elem> handling event 0x5592640f10 eos event: 0x5592640f10, time 99:99:99.999999999, seq-num 1113, (NULL)
0:07:26.098551188 23425   0x5592210000 DEBUG                basesrc gstbasesrc.c:3679:gst_base_src_set_flushing:<src_elem> flushing 1
0:07:26.098568063 23425   0x5592210000 LOG               bufferpool gstbufferpool.c:1387:gst_buffer_pool_set_flushing:<src_elem:pool:src> flushing 1
0:07:26.098582543 23425   0x5592210000 DEBUG         v4l2bufferpool gstv4l2bufferpool.c:960:gst_v4l2_buffer_pool_flush_start:<src_elem:pool:src> start flushing
0:07:26.098610981 23425   0x5592210000 LOG                 GST_POLL gstpoll.c:1621:gst_poll_set_flushing: 0x7f34006cf0: flushing: 1
0:07:26.098630877 23425   0x5592210000 LOG                     v4l2 gstv4l2object.c:4117:gst_v4l2_object_unlock:<src_elem:src> start flushing
0:07:26.098645512 23425   0x5592210000 LOG               bufferpool gstbufferpool.c:1387:gst_buffer_pool_set_flushing:<src_elem:pool:src> flushing 1

I would very much appreciated for any help or hints

Further investigation of the issue above.

  1. I modified the pad probe by removing surface data part. The pipeline had beed working overnight (10h) without any problems.

  2. I made two probes:

  • for object metadata (class_id and rect) only and attached it to fakesink’s sink
  • for getting access to the surface data only the same way as I did above and attached it to pgie’s src
    The pipeline died exactly after the same 7 minutes

I wonder what kind of software problems may pop up with 7 minute latency without writing anything in any logs?

May it be a bad idea to try to get access to surface data from within the pad probe?

Unfortunately, I don’t know what this 7 min problem was about, but obviously my solution was incorrect.
Eventually, I found the correct method of access and decoding the frame data explained here
and here

Now everything works fine

Great work, the issue should be from here, nv12 image size is heightw3/2.