Memory leak in OSD

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU): Jetson
• DeepStream Version: 7.1
• JetPack Version (valid for Jetson only): 6.1
• TensorRT Version: 10.3.0.31
• Issue Type( questions, new requirements, bugs): bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

When running the following pipeline, a sharp increase in RSS memory usage was observed approximately 15 hours after execution.
Memory usage has increased by 6996 KB.
Upon investigation, we found that this issue does not occur when the elements related to OSD are removed from the pipeline.
I therefore consider the cause to lie with the OSD.

I would like to know the cause and how to fix this.
What measures should be taken to fix this sharp increase in memory usage?

I run this command in the docker container.

The video streamed via RTSP is sample_1080p_h264.mp4.

I used the following ps command to obtain the RSS memory usage.

while true; do
        echo -n "`date '+%Y-%m-%d_%H:%M:%S'` " >> psinfo.log
        ps -aux | grep gst-launch-1.0 | grep -v grep >> psinfo.log
        sleep 5
done
  • With OSD

    The commands(pipeline) are as follows:

    export USE_NEW_NVSTREAMMUX=yes
    gst-launch-1.0 \
        rtspsrc location="rtsp://192.168.130.26:554/test.mpeg4" protocols=tcp latency=100 drop-on-latency=1 ! rtph264depay ! h264parse ! tee ! queue ! h264parse ! "video/x-h264, stream-format=byte-stream, alignment=au, parsed=true" ! nvv4l2decoder drop-frame-interval=0 num-extra-surfaces=1 enable-max-performance=1 mjpeg=1 ! queue ! tee ! nvvideoconvert copy-hw=VIC ! "video/x-raw(memory:NVMM)" ! mux.sink_0 \
        rtspsrc location="rtsp://192.168.130.26:554/test.mpeg4" protocols=tcp latency=100 drop-on-latency=1 ! rtph264depay ! h264parse ! tee ! queue ! h264parse ! "video/x-h264, stream-format=byte-stream, alignment=au, parsed=true" ! nvv4l2decoder drop-frame-interval=0 num-extra-surfaces=1 enable-max-performance=1 mjpeg=1 ! queue ! tee ! nvvideoconvert copy-hw=VIC ! "video/x-raw(memory:NVMM)" ! mux.sink_1 \
        nvstreammux name=mux batch-size=2 ! \
        queue ! nvvideoconvert ! nvinfer unique-id=1 config-file-path="/opt/nvidia/deepstream/deepstream-7.1/samples/configs/deepstream-app/config_infer_primary.txt" model-engine-file="/opt/nvidia/deepstream/deepstream-7.1/samples/models/Primary_Detector/resnet18_trafficcamnet_pruned.onnx_b2_gpu0_int8.engine" batch-size=2 ! \
        tee ! nvstreamdemux name=demux \
        demux.src_0 ! queue ! nvvideoconvert ! queue ! nvdsosd x-clock-offset=800 y-clock-offset=820 ! queue ! tee ! queue ! fakesink max-lateness=4294967295 async=0 enable-last-sample=0 \
        demux.src_1 ! queue ! nvvideoconvert ! queue ! nvdsosd x-clock-offset=800 y-clock-offset=820 ! queue ! tee ! queue ! fakesink max-lateness=4294967295 async=0 enable-last-sample=0
    

    Memory usage log
    psinfo.log (21.0 MB)

  • Without OSD

    The commands(pipeline) are as follows:

    export USE_NEW_NVSTREAMMUX=yes
    gst-launch-1.0 \
        rtspsrc location="rtsp://192.168.130.26:554/test.mpeg4" protocols=tcp latency=100 drop-on-latency=1 ! rtph264depay ! h264parse ! tee ! queue ! h264parse ! "video/x-h264, stream-format=byte-stream, alignment=au, parsed=true" ! nvv4l2decoder drop-frame-interval=0 num-extra-surfaces=1 enable-max-performance=1 mjpeg=1 ! queue ! tee ! nvvideoconvert copy-hw=VIC ! "video/x-raw(memory:NVMM)" ! mux.sink_0 \
        rtspsrc location="rtsp://192.168.130.26:554/test.mpeg4" protocols=tcp latency=100 drop-on-latency=1 ! rtph264depay ! h264parse ! tee ! queue ! h264parse ! "video/x-h264, stream-format=byte-stream, alignment=au, parsed=true" ! nvv4l2decoder drop-frame-interval=0 num-extra-surfaces=1 enable-max-performance=1 mjpeg=1 ! queue ! tee ! nvvideoconvert copy-hw=VIC ! "video/x-raw(memory:NVMM)" ! mux.sink_1 \
        nvstreammux name=mux batch-size=2 ! \
        queue ! nvvideoconvert ! nvinfer unique-id=1 config-file-path="/opt/nvidia/deepstream/deepstream-7.1/samples/configs/deepstream-app/config_infer_primary.txt" model-engine-file="/opt/nvidia/deepstream/deepstream-7.1/samples/models/Primary_Detector/resnet18_trafficcamnet_pruned.onnx_b2_gpu0_int8.engine" batch-size=2 ! \
        tee ! nvstreamdemux name=demux \
        demux.src_0 ! queue ! tee ! queue ! fakesink max-lateness=4294967295 async=0 enable-last-sample=0 \
        demux.src_1 ! queue ! tee ! queue ! fakesink max-lateness=4294967295 async=0 enable-last-sample=0
    

    Memory usage log
    psinfo.log (17.5 MB)

First, let’s confirm: is rtsp://192.168.130.26:554/test.mpeg4 a looping mjpeg stream? For both OSD and without_OSD modes, is there a memory difference due to the number of objects?

The pipeline you provided has several useless tee and h264parse elements, but I think that might not be the problem. The 6996kb memory increase over 15 hours is more likely due to glibc heap fragmentation or a slow drift in the GLib slice cache.

After analyzing OSD-related memory using EBPF, no memory leaks were found. You can try the following tips to see if there is any improvement.

  1. Warm-up Right after the pipeline reaches PLAYING, Add a segment with multiple objects to your stream, putting the pipeline under high memory load from the start. thus preventing a step increase in RSS.
  2. Periodic malloc_trim(0). From a timer (e.g. once per hour) call malloc_trim(0) to return the free pages at the top of [heap] back to the kernel via madvise(MADV_DONTNEED). That prevents pages from accumulating at the heap top in the first place.

I apologise for the delay in replying.
And thank you for analysing it.

I tried both of the two methods you suggested (1 and 2), but the same problem occurred.
Similarly, memory usage increased sharply by 5248 KB.
psinfo.log (24.5 MB)

I have modified the source code for the nvinfer element as follows.
gstnvinfer.cpp.txt (96.2 KB)

--- gstnvinfer.cpp      2024-10-04 09:26:08.000000000 +0900
+++ gst-nvinfer/gst-nvinfer/gstnvinfer.cpp      2026-05-12 14:52:55.859028000 +0900
@@ -21,6 +21,7 @@
 #include <list>
 #include <thread>
 #include <vector>
+#include <malloc.h>

 #include "gst-nvevent.h"
 #include "gst-nvdscustomevent.h"
@@ -144,6 +145,32 @@
 /* Create enum type for the process mode property. */
 #define GST_TYPE_NVDSINFER_PROCESS_MODE (gst_nvinfer_process_mode_get_type ())

+static gboolean
+page_free_func (gpointer data) {
+  printf ("call page_free_func\n");
+  malloc_trim (0);
+  return TRUE;
+}
+
+static GstStateChangeReturn
+gst_nvinfer_change_state (GstElement *element, GstStateChange transition) {
+  gpointer tmp_mem = NULL;
+  switch (transition) {
+  case GST_STATE_CHANGE_PAUSED_TO_PLAYING:
+    printf ("nvinfer PLAYING\n");
+    // method 1, high memory load
+    gsize byte = 1000000000; // 1GB
+    tmp_mem = g_malloc (byte);
+    memset (tmp_mem, 0, byte);
+    g_free (tmp_mem);
+
+    // method 2, call malloc_trim(0)
+    guint interval = 3600000; // 1hour
+    g_timeout_add (interval, page_free_func, NULL);
+  }
+  return GST_ELEMENT_CLASS (parent_class)->change_state (element, transition);
+}
+
 static GType
 gst_nvinfer_process_mode_get_type (void)
 {
@@ -212,6 +239,8 @@
   gstbasetransform_class->generate_output =
       GST_DEBUG_FUNCPTR (gst_nvinfer_generate_output);

+  gstelement_class->change_state = gst_nvinfer_change_state;
+
   /* Install properties. Values set through these properties override the ones in
    * the config file. */
   g_object_class_install_property (gobject_class, PROP_UNIQUE_ID,