Heap corruption using Smart Record

DeepStream 7.1, Driver Version: 525.147.05

Hi,

W’ve been having stability issues with the Smart Record feature.
We improved stability by removing one instance of that feature per stream (initially we were recording both the inputs and the demuxed outputs), but we still encounter heap corruptions such as :
#0 g_type_check_instance (type_instance=type_instance@entry=0x7f32175fabd0) at …/gobject/gtype.c:4270
#1 0x00007f376fee66d1 in g_signal_emit_by_name (instance=0x7f32175fabd0, detailed_signal=0x7f376a3f3306 “sr-done”)
at …/gobject/gsignal.c:3650
#2 0x00007f376a3d0f52 in () at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_deepstream_bins.so
#3 0x00007f376c92a167 in () at ///opt/nvidia/deepstream/deepstream-7.0/lib/libnvdsgst_smartrecord.so
#4 0x00007f376fdf7ac1 in g_thread_proxy (data=0x7f35c4257360) at …/glib/gthread.c:831
#5 0x00007f376fbc7ac3 in () at /lib/x86_64-linux-gnu/libc.so.6
#6 0x00007f376fc58a04 in clone () at /lib/x86_64-linux-gnu/libc.so.6

The crashes occur randomly in time and at various steps in the code, but always during a R/W on dynamically allocated memory locations. Stack here is given as example only, apparently a write occurred earlier at an incorrect location and overwrote the GstElement ; sometimes it crashes somewhere else.

We are convinced the corruption does not come from our code, we do very little on the heap and did an ablation test to make sure everything was correct on our side

Now, we know of course your feature works with your demo app. So we think that the problem is probably on your side, but triggered by some sort of misuse on ours.

Our hunch is that we trigger overlapping Smart Record and that it is not supported. While it should not lead to memory corruption, we’re ok with making sure it doesn’t happen, if you guarantee us it is indeed unsupported and should resolve the issue at hand.
Some of our cameras can indeed occasionally catch two events over the span of a Smart Record recording (around 12 seconds in our case).
FYI we trigger Smart Record through GLib signaling :
g_signal_emit_by_name (
sr_elements[source_id],
“start-sr”,
&session_id,
START_TIME,
SMART_REC_DURATION,
(gpointer) &sr_user_data[source_id],
&status
);

If confirmed, we can then either drop the second detection, or interrupt the first one through “stop-sr” I suppose ?
None is super satisfactory, but we can certainly live with that if needed. Is there an official recommandation here ?

Thanks for your help

Yes. We do not currently support this feature.

But if you can provide us with an easy way to reproduce your problem, we can help debug that.

Yeah sure, here’s a program you can use that demonstrate the issue.
It’s a simple pipe, with n nvurisrcbin (that I use with RTSP urls) followed by a queue, that connect to a streammuxer followed by a fakesink.
I added a probe on the muxer’s src pad that triggers a Smart Record every 5 frame on every source, by emitting start-sr.

Let it run for a few minutes and you should get a crash due to heap corruption.

I could not reproduce using Asan unfortunately.

#include <gst/gst.h>
#include <glib.h>
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
#include <uuid/uuid.h>

#include "gst-nvdssr.h"
#include "gstnvdsmeta.h"


#define GST_CAPS_FEATURES_NVMM "memory:NVMM"

#define MUXER_BATCH_TIMEOUT_USEC 12500000
#define MUXER_OUTPUT_WIDTH 736
#define MUXER_OUTPUT_HEIGHT 608

#define SMART_REC_CONTAINER 0
#define CACHE_SIZE_SEC 10
#define SMART_REC_DEFAULT_DURATION 10
#define SMART_REC_DIRPATH "/dvr"
#define START_TIME 6
#define SMART_REC_DURATION 10


// User data for the smart record callbacks
typedef struct {
    uuid_t uuid;
} SmartRecordCallbackUserData;

static GstElement *pipeline = NULL;
static GMainLoop *loop = NULL;

static GstElement **sr_elements;
static SmartRecordCallbackUserData *sr_user_data;

// ----- Smart Record callback -----

static void
smart_record_done (GstElement*, NvDsSRRecordingInfo* info, void* data, void* user_data)
{
  char uuid_str[37];
  SmartRecordCallbackUserData *u_data = (SmartRecordCallbackUserData *) user_data;
  uuid_unparse (u_data->uuid, uuid_str);

  printf ("Done recording: %s\n", uuid_str);
}

// ----- Source decodebin creation -----

static void
cb_newpad (GstElement * decodebin, GstPad * decoder_src_pad, gpointer data)
{
  GstPad *sink_pad = gst_element_get_static_pad (data, "sink");
  GstCaps *caps = gst_pad_get_current_caps (decoder_src_pad);
  if (!caps) {
    caps = gst_pad_query_caps (decoder_src_pad, NULL);
  }
  const GstStructure *str = gst_caps_get_structure (caps, 0);
  const gchar *name = gst_structure_get_name (str);
  GstCapsFeatures *features = gst_caps_get_features (caps, 0);

  if (!strncmp (name, "video", 5)) {
    if (gst_caps_features_contains (features, GST_CAPS_FEATURES_NVMM)) {
      if (gst_pad_link (decoder_src_pad, sink_pad) != GST_PAD_LINK_OK) {
        g_printerr ("Failed to link decoder src pad to sink pad\n");
      }
    } else {
      g_printerr ("Error: Decodebin did not pick nvidia decoder plugin.\n");
    }
  }

  gst_object_unref (sink_pad);
}


static GstElement *
create_source_bin (guint index, gchar * uri)
{
  GstElement *bin = NULL, *uri_decode_bin = NULL, *queue = NULL;
  gchar bin_name[16] = { }, camera_name[12] = { };

  g_snprintf (bin_name, 15, "source-bin-%02d", index);
  g_snprintf (camera_name, 11, "source-%02d", index);

  bin = gst_bin_new (bin_name);

  uri_decode_bin = gst_element_factory_make ("nvurisrcbin", "uri-decode-bin");
  queue = gst_element_factory_make ("queue", "queue");
  sr_elements[index] = uri_decode_bin;

  g_object_set (G_OBJECT (uri_decode_bin), "source-id", index, NULL);
  g_object_set (G_OBJECT (uri_decode_bin), "uri", uri, NULL);
  g_object_set (G_OBJECT (uri_decode_bin), "cudadec-memtype", 0, NULL);
  g_object_set (G_OBJECT (uri_decode_bin), "smart-record", 1, NULL);
  g_object_set (G_OBJECT (uri_decode_bin), "smart-rec-container", SMART_REC_CONTAINER, NULL);
  g_object_set (G_OBJECT (uri_decode_bin), "smart-rec-dir-path", SMART_REC_DIRPATH, NULL);
  g_object_set (G_OBJECT (uri_decode_bin), "smart-rec-file-prefix", camera_name, NULL);
  g_object_set (G_OBJECT (uri_decode_bin), "smart-rec-cache", CACHE_SIZE_SEC, NULL);
  g_object_set (G_OBJECT (uri_decode_bin), "smart-rec-mode", 1, NULL);
  g_object_set (G_OBJECT (uri_decode_bin), "smart-rec-default-duration", SMART_REC_DEFAULT_DURATION, NULL);

  g_signal_connect (G_OBJECT (uri_decode_bin), "pad-added",
      G_CALLBACK (cb_newpad), queue);
  g_signal_connect (G_OBJECT (uri_decode_bin), "sr-done",
      G_CALLBACK (smart_record_done),  (gpointer) &sr_user_data[index]);

  gst_bin_add_many (GST_BIN (bin), uri_decode_bin, queue, NULL);

  GstPad *src_pad = gst_element_get_static_pad (queue, "src");
  GstPad *bin_ghost_pad = gst_ghost_pad_new ("src", src_pad);
  gst_element_add_pad (bin, bin_ghost_pad);
  gst_object_unref (src_pad);

  return bin;
}

// ----- Probe triggering smart record -----

static GstPadProbeReturn
src_pad_buffer_probe (GstPad * pad, GstPadProbeInfo * info,
    gpointer u_data)
{
    GstBuffer *buf = (GstBuffer *) info->data;
    NvDsMetaList * l_frame = NULL;

    NvDsBatchMeta *batch_meta = gst_buffer_get_nvds_batch_meta (buf);

    for (l_frame = batch_meta->frame_meta_list; l_frame != NULL;
      l_frame = l_frame->next) {
        NvDsFrameMeta *frame_meta = (NvDsFrameMeta *) (l_frame->data);

        gint source_id = frame_meta->source_id;

        if (frame_meta->frame_num % 5 == 0) {
          uuid_generate (sr_user_data[source_id].uuid);

          g_print ("starting record for %d\n", source_id);

          NvDsSRStatus status;
          NvDsSRSessionId session_id;
          g_signal_emit_by_name (
              sr_elements[source_id],
              "start-sr",
              &session_id,
              START_TIME,
              SMART_REC_DURATION,
              (gpointer) &sr_user_data[source_id],
              &status
          );
        }
    }

    return GST_PAD_PROBE_OK;
}

// ----- Pipeline main loop -----

int
main (int argc, char *argv[])
{
  GstElement *streammux = NULL, *sink = NULL;
  GstPad *src_pad = NULL;

  guint i = 0, num_sources = 0;

  if (argc < 2) {
    g_printerr ("Usage: %s <uri1> [uri2] ... [uriN] \n", argv[0]);
    return -1;
  }

  gst_init (&argc, &argv);
  loop = g_main_loop_new (NULL, FALSE);

  pipeline = gst_pipeline_new ("pipeline");
  streammux = gst_element_factory_make ("nvstreammux", "stream-muxer");
  sink = gst_element_factory_make ("fakesink", "sink");
  gst_bin_add_many (GST_BIN (pipeline), streammux, sink, NULL);

  num_sources = argc - 1;

  /* Allocate memory for the context tables */
  sr_elements = g_malloc0 (sizeof (GstElement *) * num_sources);
  sr_user_data = g_malloc0 (sizeof (SmartRecordCallbackUserData) * num_sources);

  for (i = 0; i < num_sources; i++) {
    GstPad *sinkpad, *srcpad;
    gchar pad_name[16] = { };

    GstElement *source_bin = create_source_bin (i, argv[i + 1]);
    gst_bin_add (GST_BIN (pipeline), source_bin);

    g_snprintf (pad_name, 15, "sink_%u", i);
    sinkpad = gst_element_request_pad_simple (streammux, pad_name);
    srcpad = gst_element_get_static_pad (source_bin, "src");
    if (gst_pad_link (srcpad, sinkpad) != GST_PAD_LINK_OK) {
      g_printerr ("Failed to link source bin to stream muxer. Exiting.\n");
      return -1;
    }

    gst_object_unref (srcpad);
    gst_object_unref (sinkpad);
  }
  
  g_object_set (G_OBJECT (streammux), "batch-size", num_sources, NULL);
  g_object_set (G_OBJECT (streammux), "width", MUXER_OUTPUT_WIDTH, "height",
      MUXER_OUTPUT_HEIGHT, "batched-push-timeout", MUXER_BATCH_TIMEOUT_USEC, NULL);

  if (!gst_element_link_many (streammux, sink, NULL)) {
    g_printerr ("Elements could not be linked. Exiting.\n");
    return -1;
  }

  src_pad = gst_element_get_static_pad (streammux, "src");
  gst_pad_add_probe (src_pad, GST_PAD_PROBE_TYPE_BUFFER,
      src_pad_buffer_probe, streammux, NULL);
  gst_object_unref (src_pad);

  gst_element_set_state (pipeline, GST_STATE_PLAYING);

  g_print ("Running...\n");
  g_main_loop_run (loop);

  g_free (sr_elements);

  g_print ("Returned, stopping playback\n");
  gst_element_set_state (pipeline, GST_STATE_NULL);
  gst_object_unref (GST_OBJECT (pipeline));
  g_main_loop_unref (loop);
  return 0;
}

This scenario is not supported in the current smart-record implementation. Could you please refer to #310363 and try to implement it?

Did that. Indeed it’s a workaround and fixes the issue.

That said, you might still want to report the issue to engineering, because an unsupported usage should not lead to a heap corruption like that. It more often than not leads to a time waste on the client side to try to understand what went wrong in cases like that :)

Best regards and thanks for the solution!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.