"Double Free or Corruption" Crash with nvdspreprocess and SGIE nvinfer

Environment Details

  • Hardware Platform: dGPU

  • DeepStream Version: 7.1 (also occurs in 6.4, 7.0)

  • TensorRT Version: 10.3.0.26

  • NVIDIA GPU Driver Version: 570.169

  • Issue Type: Bug

I am encountering an occasional crash with a “double free or corruption (out)” error when using the nvdspreprocess plugin in combination with an nvinfer plugin configured in SGIE (Secondary GPU Inference Engine) mode. This issue has occurred in DeepStream versions 6.4, 7.0, and 7.1.

My pipeline uses a custom method to control when the secondary inference runs. After the primary nvinfer (UID: 1), a probe inserts “fake” detection (with class ID 254) into frame meta. The nvdspreprocess plugin (UID: 13) and the SGIE nvinfer (UID: 14) are configured to operate only on these fake detections.

This approach allows me to selectively perform a time-consuming inference by controlling when and for which source the fake detection is added. While this technique works perfectly with other SGIEs, it fails when the nvdspreprocess plugin is involved.

The crash is directly related to the presence of these fake detections:

  • The application crashes within minutes if batches are processed that do not contain any fake detections.

  • If I modify the logic to ensure at least one source per batch always has a fake detection, the application runs without crashing.

  • The crash only ever occurs on batches where no fake detections were inserted.

Here are the relevant configuration sections for the nvdspreprocess and SGIE nvinfer plugins:

NVDSPREPROCESS CONFIG

[property]
enable=1
unique-id=13
gpu-id=0

# SGIE
operate-on-gie-id=1

# PGIE if 1
process-on-frame=0
target-unique-ids=14

# NCHW
network-input-order=0
network-input-shape=1;6;736;1280
maintain-aspect-ratio=1
symmetric-padding=0
processing-width=1280
processing-height=736
scaling-buf-pool-size=4
tensor-buf-pool-size=4

# RGB
network-color-format=0

# FP32
tensor-data-type=0
tensor-name=prep_images

# NVBUF_MEM_DEFAULT
scaling-pool-memory-type=0

# NvBufSurfTransformCompute_Default
scaling-pool-compute-hw=0

# NvBufSurfTransformInter_Bilinear
scaling-filter=1
custom-lib-path=libnvdspreprocess_custom_oor_cd.so
custom-tensor-preparation-function=CustomTensorPreparation

[user-configs]
gpu-id=0
src-limit=16
maintain-aspect-ratio=1
symmetric-padding=0
scaling-filter=1

[group-0]
src-ids=0;1;2;3;4;5;6;7;8;9;10;11;12;13;14;15
process-on-roi=0
operate-on-class-ids=254
process-on-all-objects=1
custom-input-transformation-function=CustomAsyncTransformation

SGIE NVINFER CONFIG

[property]
enable-dla=0
use-dla-core=0
gie-unique-id=14
process-mode=2
network-type=0
model-color-format=0
maintain-aspect-ratio=1
labelfile-path=danger.label
onnx-file=tinycd_v2_xl_real_plus_synth_512_1280_17_dynamic_batch_with_preproc_FP32_postprocess_RGB_only_change_FP32.onnx
network-mode=2
custom-lib-path=libnvdspreprocess_custom_oor_cd.so
input-tensor-from-meta=1
output-tensor-meta=0
force-implicit-batch-dim=0
operate-on-gie-id=1
cluster-mode=4
num-detected-classes=2
parse-bbox-func-name=NvDsInferParseCustomOOR
operate-on-class-ids=254

STACK TRACE

double free or corruption (out)
Reading from backwardcpp.log file:
Stack trace (most recent call last) in thread 121502:
#24   Object "", at 0xffffffffffffffff, in 
#23   Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7aec362e284f, in __xmknodat
#22   Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7aec36250ac2, in pthread_condattr_setpshared
#21   Source "/opt/tritonserver/librdkafka/hiredis/mosquitto-2.0.15/glib/build/../glib/gthread.c", line 831, in g_thread_proxy [0x7aec49f41ac0]
        828:       thread->name = NULL;
        829:     }
        830: 
      > 831:   thread->retval = thread->thread.func (thread->thread.data);
        832: 
        833:   return NULL;
        834: }
#20   Source "/opt/nvidia/deepstream/deepstream-7.1/sources/gst-plugins/gst-nvinfer/gstnvinfer.cpp", line 2414, in gst_nvinfer_output_loop(void*) [0x7aebac18b9b9]
       2411:       nvds_set_output_system_timestamp(batch->inbuf, GST_ELEMENT_NAME(nvinfer));
       2412: 
       2413:       GstFlowReturn flow_ret =
      >2414:           gst_pad_push (GST_BASE_TRANSFORM_SRC_PAD (nvinfer),
       2415:           batch->inbuf);
       2416:       if (nvinfer->last_flow_ret != flow_ret) {
       2417:         switch (flow_ret) {
#19   Object "/usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0", at 0x7aec37c9e22d, in gst_pad_push
#18   Object "/usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0", at 0x7aec37c9de08, in gst_pad_get_allowed_caps
#17   Object "/usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0", at 0x7aec37c9a86c, in gst_pad_query
#16   Object "/usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0", at 0x7aebf01aa4ef, in gst_base_parse_finish_frame
#15   Object "/usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0", at 0x7aebf01db257, in gst_push_src_get_type
#14   Object "/usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0", at 0x7aebf01ad1f2, in gst_base_sink_get_last_sample
#13   Object "/usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0", at 0x7aec37c91cc4, in gst_mini_object_unref
#12   Object "/usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0", at 0x7aec37c5f217, in gst_buffer_pool_release_buffer
#11   Object "/usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0", at 0x7aec37c5f13f, in gst_buffer_pool_release_buffer
#10   Object "/usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0", at 0x7aec37c5e87a, in gst_buffer_list_take
#9    Object "/usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0", at 0x7aec37c5919d, in gst_buffer_unmap
#8    Object "/usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0", at 0x7aec37c92a62, in gst_mini_object_init
#7    Object "/usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_multistream.so", at 0x7aebc812d657, in gst_get_current_running_time(_GstElement*, _GstNvStreamMux*)
#6    Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7aec36261452, in free
#5    Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7aec3625ee6f, in __default_morecore
#4    Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7aec3625ccfb, in timer_settime
#3    Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7aec36245675, in __fsetlocking
#2    Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7aec361e47f2, in abort
#1    Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7aec361fe475, in raise
#0    Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7aec362529fc, in pthread_kill

Nvdspreprocess + sgie nvinfer configuration is inspired by the deepstream-pose-classification sample application. Is there anything incorrect with my configuration that could be causing this memory corruption?

I have found several other forum posts describing similar “double free” issues / crashes with the nvdspreprocess plugin:

I am assuming you are using a custom TensorPreparation function looking at your config and the fact, that the standard nvinfer uses an internal nvpreprocess too, so using an unchanged version before sgie should behave the exact same as without nvpreprocess.

Setting the batch size at the end of your custom tensor preparation function is important, so even when a batch size of 0 is given to CustomTensorPreparation, the function sets the tensorParams correctly and nvinfer knows the size of the raw tensor.

Further you have to use the given acquirer to get a buffer address and assign it to the given buf argument in CustomTensorPreparation function. The data in this buffer must align with your values set in tensorParams object.

NvDsPreProcessStatus
CustomTensorPreparation(CustomCtx *ctx, NvDsPreProcessBatch *batch, NvDsPreProcessCustomBuf *&buf,
                        CustomTensorParams &tensorParam, NvDsPreProcessAcquirer *acquirer)
{
  NvDsPreProcessStatus status = NVDSPREPROCESS_TENSOR_NOT_READY;

  /** acquire a buffer from tensor pool, this is a must!!*/
  buf = acquirer->acquire();

  /** Prepare Tensor */
  status = ctx->tensor_impl->prepare_tensor(batch, tensorParam, buf->memory_ptr);
  if (status != NVDSPREPROCESS_SUCCESS) {
    printf ("Custom Lib: Tensor Preparation failed\n");
    acquirer->release(buf);
  }

  /** synchronize cuda stream */
  status = ctx->tensor_impl->syncStream();
  if (status != NVDSPREPROCESS_SUCCESS) {
    printf ("Custom Lib: Cuda Stream Synchronization failed\n");
    acquirer->release(buf);
  }
  /** this is also a must, otherwise nvinfer wont know the real total size of your raw tensor created in prepare_tensor */
  tensorParam.params.network_input_shape[0] = (int)batch->units.size();

  return status;
}

This is code from /opt/nvidia/deepstrem/deepstream/sources/gst-plugins/gstnvpreprocess/nvpreprocess_lib/nvprepocess_lib.cpp. As you can see the buffer is always acquired and only released if an error occurs.

This is needed because in gstnvdspreprocess.cpp the function attach_user_meta_at_batch_level attaches nvdspreprocess→tensor_buf to the user_meta of the current batch_meta. (user_meta->user_meta_data->tensor_meta->raw_tensor_buffer, line 1383 in gstnvpreprocess.cpp) which is the actual tensor used by sgie nvinfer. The whole code in nvpreprocess is never checking if this tensor_buf of nvpreprocess is pointing to a legal address (i think), so nvinfer is later on checking if the current batch has any objects in it and if it has to be processed. If not it drops the data that was designated for processing by this nvinfer instance. Releasing a buffer that does not exist, will cause this error i guess.

These are all just assumptions using my basic understanding of nvpreprocess.
If you provide detailed code of your preprocessing function, helping you will be much easier.

Hi johannesrhvw, thank you very much for you answer. I wanted to continue the conversation in your thread you have added but it was already closed. Can I ask you what you meant by your last answer:

As i already suspected, the configuration files were not correct.
I used the nvdspreprocess example as a base config and adapted it to my model inputs. It works now, i think the problem was related to missmatching hardware selection in the configs.

I suspected that my configuration file are not correct as well. Was there anything specific you did to resolve the double free error?

Back to you respond to me. Yes I am using CustomTensorPreparation function. I also set the batch size at the end. I also use the acquirer. The problem is not in the tensor preparation itself. Even if no tensor preprocessing is done (batch->units.size() == 0) the crash still occurs. That is the wierd part. If there is at least one object that activates the preprocessing function in every batch then everything works fine as expected. It seems to me like there is some race condition in the DeepStream library in a scenario where nvdspreprocess + nvinfer in sgie mode is used on empty batches.

If I add this ugly hack into gstnvdspreprocess.cpp + recompile everything works as expected:

@@ -2162,13 +2162,44 @@ gst_nvdspreprocess_submit_input_buffer (GstBaseTransform * btrans,
   }
   in_surf = (NvBufSurface *) in_map_info.data;
 
-  nvds_set_input_system_timestamp (inbuf, GST_ELEMENT_NAME (nvdspreprocess));
 
   /** Preprocess on Frames */
   if (nvdspreprocess->process_on_frame) {
+    nvds_set_input_system_timestamp (inbuf, GST_ELEMENT_NAME (nvdspreprocess));
     flow_ret = gst_nvdspreprocess_on_frame (nvdspreprocess, inbuf, in_surf);
   } else {
-    flow_ret = gst_nvdspreprocess_on_objects (nvdspreprocess, inbuf, in_surf);
+    NvDsBatchMeta *batch_meta = gst_buffer_get_nvds_batch_meta (inbuf);
+    NvDsMetaList *l_frame = NULL;
+    bool should_preprocess = false;
+    for (l_frame = batch_meta->frame_meta_list; l_frame != NULL;
+      l_frame = l_frame->next) {
+
+      NvDsFrameMeta *frame_meta = NULL;
+      frame_meta = (NvDsFrameMeta *) (l_frame->data);
+
+      for (NvDsMetaList * l_obj = frame_meta->obj_meta_list; l_obj != NULL;
+        l_obj = l_obj->next) {
+
+          NvDsObjectMeta *object_meta = (NvDsObjectMeta *) (l_obj->data);
+          if(object_meta->class_id == 254){
+              should_preprocess = true;
+              break;
+          }
+      }
+
+      if(should_preprocess)
+          break;
+
+    }
+
+    if(should_preprocess){
+        nvds_set_input_system_timestamp (inbuf, GST_ELEMENT_NAME (nvdspreprocess));
+        flow_ret = gst_nvdspreprocess_on_objects (nvdspreprocess, inbuf, in_surf);
+    } else {
+        gst_buffer_unmap (inbuf, &in_map_info);
+        flow_ret = gst_pad_push(GST_BASE_TRANSFORM_SRC_PAD (nvdspreprocess), inbuf);
+        return flow_ret;
+    }
   }
 
   if (flow_ret != GST_FLOW_OK)

No double free crash when gst_nvdspreprocess_on_objects function is omitted when there is no object to operate on (in my case CLASS ID 254). In this case I make the nvdspreprocess plugin behave like it is in passthrough mode (enabled: false). This is not general fix.

How did you implement the object meta insert?

Are the “CustomAsyncTransformation” and “CustomTensorPreparation” customized?

Seems GstBuffer push crashed. Can you give us a simplified sample to reproduce the crash?