RAM Usage is Rising Continuously

Setup 1

• Hardware Platform (GPU) NVIDIA RTX 3500 Ada
• DeepStream Version 7.1
• TensorRT Version 10.5.0.18
• NVIDIA GPU Driver Version (valid for GPU only) 553.46
• Issue Type( questions, new requirements, bugs) Bug

Setup 2

• Hardware Platform (GPU) NVIDIA RTX 6000 Ada
• DeepStream Version 7.0
• TensorRT Version 8.6.1.6
• NVIDIA GPU Driver Version (valid for GPU only) 535.230.02
• Issue Type( questions, new requirements, bugs) Bug

Hello,

when running the simple python-pipeline provided below on both the systems noted above, a constant rise in RAM usage can be observed. While for this simple pipeline with the short input video with a length of 48 seconds, the rise is limited to around 2-4MB, in our much more complex production pipeline with 10+ input sources and three different models being infered we noticed the RAM usage slowly going up to the installed RAM of 64GB over a few days and thus crashing the application. We observed that behaviour for both file and RTSP input sources. We also observed a higher rising rate for models with more output values.

We’d like to kindly ask what the cause of this rising RAM usage is and how to prevent / limit it.
We hope that a possible solution for this simple pipeline can be transferred to our production pipeline.

Thank you very much for your help!

Simple pipeline to reproduce rising RAM usage.

import sys
import gi
gi.require_version("Gst", "1.0")
from gi.repository import GLib, Gst


# taken from deepstream-test3
def cb_newpad(decodebin, decoder_src_pad,data):
    print("In cb_newpad\n")
    caps=decoder_src_pad.get_current_caps()
    if not caps:
        caps = decoder_src_pad.query_caps()
    gststruct=caps.get_structure(0)
    gstname=gststruct.get_name()
    source_bin=data
    features=caps.get_features(0)
    print("gstname=",gstname)
    if(gstname.find("video")!=-1):
        print("features=",features)
        if features.contains("memory:NVMM"):
            bin_ghost_pad=source_bin.get_static_pad("src")
            if not bin_ghost_pad.set_target(decoder_src_pad):
                sys.stderr.write("Failed to link decoder src pad to source bin ghost pad\n")
        else:
            sys.stderr.write(" Error: Decodebin did not pick nvidia decoder plugin.\n")


# taken from deepstream-test3
def decodebin_child_added(child_proxy,Object,name,user_data):
    print("Decodebin child added:", name, "\n")
    if(name.find("decodebin") != -1):
        Object.connect("child-added",decodebin_child_added,user_data)
    if "source" in name:
        source_element = child_proxy.get_by_name("source")
        if source_element.find_property("drop-on-latency") != None:
            Object.set_property("drop-on-latency", True)


# taken from deepstream-test3
def create_source_bin(index,uri):
    print("Creating source bin")
    bin_name="source-bin-%02d" %index
    print(bin_name)
    nbin=Gst.Bin.new(bin_name)
    if not nbin:
        sys.stderr.write(" Unable to create source bin \n")
    uri_decode_bin=Gst.ElementFactory.make("uridecodebin", "uri-decode-bin")
    if not uri_decode_bin:
        sys.stderr.write(" Unable to create uri decode bin \n")
    uri_decode_bin.set_property("uri",uri)
    uri_decode_bin.connect("pad-added",cb_newpad,nbin)
    uri_decode_bin.connect("child-added",decodebin_child_added,nbin)
    Gst.Bin.add(nbin,uri_decode_bin)
    bin_pad=nbin.add_pad(Gst.GhostPad.new_no_target("src",Gst.PadDirection.SRC))
    if not bin_pad:
        sys.stderr.write(" Failed to add ghost pad in source bin \n")
        return None
    return nbin


if __name__ == "__main__":
    Gst.init(None)
    pipeline = Gst.Pipeline()

    source_0_bin = create_source_bin(0, "file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4")
    source_1_bin = create_source_bin(1, "file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4")

    streammux = Gst.ElementFactory.make("nvstreammux", "streammux")
    streammux.set_property("batch-size", 2)

    pgie = Gst.ElementFactory.make("nvinfer", "pgie")
    pgie.set_property("config-file-path", "/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_infer_primary.txt")

    fakesink = Gst.ElementFactory.make("fakesink", "fakesink")
    fakesink.set_property("sync", 1)

    pipeline.add(source_0_bin)
    pipeline.add(source_1_bin)
    pipeline.add(streammux)
    pipeline.add(pgie)
    pipeline.add(fakesink)

    source_0_bin_src_pad = source_0_bin.get_static_pad("src")
    source_1_bin_src_pad = source_1_bin.get_static_pad("src")
    streammux_sink_pad_0 = streammux.request_pad_simple("sink_0")
    streammux_sink_pad_1 = streammux.request_pad_simple("sink_1")
    source_0_bin_src_pad.link(streammux_sink_pad_0)
    source_1_bin_src_pad.link(streammux_sink_pad_1)

    streammux.link(pgie)
    pgie.link(fakesink)

    loop = GLib.MainLoop()
    pipeline.set_state(Gst.State.PLAYING)
    try:
        loop.run()
    except Exception as e:
        print(e)

    pipeline.set_state(Gst.State.NULL)

RAM usage of upper simple pipeline measured using psrecord. When using longer files, the rise continues.

# Elapsed time   CPU (%)     Real (MB)   Virtual (MB)
       0.000        0.000      471.254    56038.227
       5.005       22.200      471.719    56102.227
      10.011       29.600      472.074    56102.227
      15.016       29.600      472.289    56102.227
      20.021       28.200      472.504    56102.227
      25.025       27.800      472.996    56102.227
      30.030       27.400      473.219    56102.227
      35.035       27.400      473.441    56102.227
      40.040       23.400      473.691    56102.227

--> rise of 2.437MB RAM usage over 40 seconds. 

This sample code should have nothing to do with your problem. In addition, this code cannot run correctly.
After applying the following patch, you can observe that the memory does not continue to grow.

+  uri_decode_bin=Gst.ElementFactory.make("nvurisrcbin", "uri-decode-bin")
  if not uri_decode_bin:
      sys.stderr.write(" Unable to create uri decode bin \n")
  uri_decode_bin.set_property("uri",uri)
+  uri_decode_bin.set_property("file-loop", True)

    streammux = Gst.ElementFactory.make("nvstreammux", "streammux")
    streammux.set_property("batch-size", 2)
+    streammux.set_property("width", 1920)
+    streammux.set_property("height", 1080)

    pgie = Gst.ElementFactory.make("nvinfer", "pgie")
+    pgie.set_property("batch-size", 2)

Please use valgrind to detect the memory growth of your program

PYTHONMALLOC=malloc valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all \
 --suppressions=/usr/lib/valgrind/python3.supp  python3 xxx.py

Can you describe your pipeline in detail, and what elements are used?

If you are using nvmsgconv in your pipeline, please refer to this FAQ first.

Thank you for responding @junshengy!

a)

In addition, this code cannot run correctly.

Sorry, I forgot to mention that we use the new nvstreammux. If you executeexport USE_NEW_NVSTREAMMUX=yes, then the provided pipeline should run. However, I created a new simple pipeline that reassembles some components we also use in our production pipeline (probe function at streammux to add custom user meta data and an appsink at the end to enable access to inference results). This pipelines uses the new nvstreammux as well.

import sys
import gi
gi.require_version("Gst", "1.0")
from gi.repository import GLib, Gst

import datetime
import pyds


# taken from deepstream-test3
def cb_newpad(decodebin, decoder_src_pad,data):
    print("In cb_newpad\n")
    caps=decoder_src_pad.get_current_caps()
    if not caps:
        caps = decoder_src_pad.query_caps()
    gststruct=caps.get_structure(0)
    gstname=gststruct.get_name()
    source_bin=data
    features=caps.get_features(0)
    print("gstname=",gstname)
    if(gstname.find("video")!=-1):
        print("features=",features)
        if features.contains("memory:NVMM"):
            bin_ghost_pad=source_bin.get_static_pad("src")
            if not bin_ghost_pad.set_target(decoder_src_pad):
                sys.stderr.write("Failed to link decoder src pad to source bin ghost pad\n")
        else:
            sys.stderr.write(" Error: Decodebin did not pick nvidia decoder plugin.\n")


# taken from deepstream-test3
def decodebin_child_added(child_proxy,Object,name,user_data):
    print("Decodebin child added:", name, "\n")
    if(name.find("decodebin") != -1):
        Object.connect("child-added",decodebin_child_added,user_data)
    if "source" in name:
        source_element = child_proxy.get_by_name("source")
        if source_element.find_property("drop-on-latency") != None:
            Object.set_property("drop-on-latency", True)


# modified from deepstream-test3
def create_source_bin(index,uri):
    print("Creating source bin")
    bin_name="source-bin-%02d" %index
    print(bin_name)
    nbin=Gst.Bin.new(bin_name)
    if not nbin:
        sys.stderr.write(" Unable to create source bin \n")
    uri_decode_bin=Gst.ElementFactory.make("nvurisrcbin", "uri-decode-bin")
    if not uri_decode_bin:
        sys.stderr.write(" Unable to create uri decode bin \n")
    uri_decode_bin.set_property("uri",uri)
    uri_decode_bin.set_property("file-loop",True)
    uri_decode_bin.connect("pad-added",cb_newpad,nbin)
    uri_decode_bin.connect("child-added",decodebin_child_added,nbin)
    Gst.Bin.add(nbin,uri_decode_bin)
    bin_pad=nbin.add_pad(Gst.GhostPad.new_no_target("src",Gst.PadDirection.SRC))
    if not bin_pad:
        sys.stderr.write(" Failed to add ghost pad in source bin \n")
        return None
    return nbin

# listen for end-of-stream message
def bus_call(bus, message, loop):
    msg_type = message.type
    if msg_type == Gst.MessageType.EOS:
        loop.quit()
    return True


# PROBE FUNCTION TO ADD TIMESTAMP TO CUSTOM USER META
def streammux_probe(pad: Gst.Pad, info: Gst.PadProbeInfo):
    gst_buffer = info.get_buffer()
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))

    pyds.nvds_acquire_meta_lock(batch_meta)
    frame_meta_list = batch_meta.frame_meta_list

    while frame_meta_list:
        try:
            frame_meta = pyds.NvDsFrameMeta.cast(frame_meta_list.data)
        except StopIteration:
            continue

        user_meta = pyds.nvds_acquire_user_meta_from_pool(batch_meta)
        if user_meta:
            timestamp = str(datetime.datetime.utcnow().timestamp())
            data = pyds.alloc_custom_struct(user_meta)

            data.structId = 0
            data.message = timestamp
            data.message = pyds.get_string(data.message)

            user_meta.user_meta_data = data
            user_meta.base_meta.meta_type = pyds.NvDsMetaType.NVDS_USER_META

            pyds.nvds_add_user_meta_to_frame(frame_meta, user_meta)

        try:
            frame_meta_list = frame_meta_list.next
        except StopIteration:
            break

    pyds.nvds_release_meta_lock(batch_meta)
    return Gst.PadProbeReturn.OK

# APPSINK HANDLER TO READ TIMESTAMP AND DETECTION COUNT
def on_new_sample(sink):
    sample = sink.emit("pull-sample")
    gst_buffer = sample.get_buffer()
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))

    pyds.nvds_acquire_meta_lock(batch_meta)
    frame_meta_list = batch_meta.frame_meta_list

    while frame_meta_list is not None:
        try:
            frame_meta = pyds.NvDsFrameMeta.cast(frame_meta_list.data)
        except StopIteration:
            break

        object_meta_list = frame_meta.obj_meta_list
        detection_count = 0
        while object_meta_list is not None:
            try:
                obj_meta = pyds.NvDsObjectMeta.cast(object_meta_list.data)
            except StopIteration:
                break

            detection_count += 1

            try:
                object_meta_list = object_meta_list.next
            except StopIteration:
                break

        frame_user_meta_list = frame_meta.frame_user_meta_list
        timestamp = 0
        while frame_user_meta_list is not None:
            try:
                user_meta = pyds.NvDsUserMeta.cast(frame_user_meta_list.data)
                if user_meta and user_meta.base_meta.meta_type == pyds.NvDsMetaType.NVDS_USER_META:
                    custom_data = pyds.CustomDataStruct.cast(user_meta.user_meta_data)
                    timestamp = float(pyds.get_string(custom_data.message))
            except StopIteration:
                break

            try:
                frame_user_meta_list = frame_user_meta_list.next
            except StopIteration:
                break

        try:
            frame_meta_list = frame_meta_list.next
        except StopIteration:
            break

        # print("Frame:", frame_meta.frame_num, "Timestamp:", timestamp, "Detection Count:", detection_count)

    pyds.nvds_release_meta_lock(batch_meta)
    return Gst.FlowReturn.OK


if __name__ == "__main__":
    Gst.init(None)
    pipeline = Gst.Pipeline()

    source_0_bin = create_source_bin(0, "file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4")
    source_1_bin = create_source_bin(1, "file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4")

    streammux = Gst.ElementFactory.make("nvstreammux", "streammux")
    streammux.set_property("batch-size", 2)

    streammux_src_pad = streammux.get_static_pad("src")
    streammux_src_pad.add_probe(Gst.PadProbeType.BUFFER, streammux_probe)

    pgie = Gst.ElementFactory.make("nvinfer", "pgie")
    pgie.set_property("config-file-path", "/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_infer_primary.txt")

    appsink = Gst.ElementFactory.make("appsink", "appsink")
    appsink.set_property("sync", 1)
    appsink.set_property("emit-signals", True)
    appsink.connect("new-sample", on_new_sample)

    pipeline.add(source_0_bin)
    pipeline.add(source_1_bin)
    pipeline.add(streammux)
    pipeline.add(pgie)
    pipeline.add(appsink)

    source_0_bin_src_pad = source_0_bin.get_static_pad("src")
    source_1_bin_src_pad = source_1_bin.get_static_pad("src")
    streammux_sink_pad_0 = streammux.request_pad_simple("sink_0")
    streammux_sink_pad_1 = streammux.request_pad_simple("sink_1")
    source_0_bin_src_pad.link(streammux_sink_pad_0)
    source_1_bin_src_pad.link(streammux_sink_pad_1)

    streammux.link(pgie)
    pgie.link(appsink)

    loop = GLib.MainLoop()
    
    bus = pipeline.get_bus()
    bus.add_signal_watch()
    bus.connect("message", bus_call, loop)

    pipeline.set_state(Gst.State.PLAYING)

    try:
        loop.run()
    except Exception as e:
        print(e)

    pipeline.set_state(Gst.State.NULL)

With this new simple pipeline, the RAM usage is rising much slower for the build-in DeepStream model config_infer_primary.txt. However, when we use a custom model, the RAM usage is rising fast (measured using psrecord and htop).

Setup Model RAM Usage Rise
1 config_infer_primary.txt ca. 5 MB (over 8 minutes)
1 YOLOv9 Pose Estimation ca. 106.4 MB (over 8 minutes)
2 config_infer_primary.txt ca. 3.4 MB (over 15 minutes)
2 YOLOv9 Pose Estimation ca. 203.6 MB (over 15 minutes)

The custom parser function of this model is located at: DeepStream-Yolo-Pose/nvdsinfer_custom_impl_Yolo_pose at master · marcoslucianops/DeepStream-Yolo-Pose

The configuration file of this model is:

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
onnx-file=yolov9c_pose.onnx
model-engine-file=yolov9c_pose.engine
#int8-calib-file=calib.table
labelfile-path=labels.txt
batch-size=2
network-mode=2
num-detected-classes=1
interval=0
gie-unique-id=1
process-mode=1
infer-dims=3;640;640
network-type=3
cluster-mode=4
maintain-aspect-ratio=1
symmetric-padding=1
#workspace-size=2000
parse-bbox-instance-mask-func-name=NvDsInferParseYoloPose
custom-lib-path=nvdsinfer_custom_impl_Yolo_pose/libnvdsinfer_custom_impl_Yolo_pose.so
output-instance-mask=1

[class-attrs-all]
pre-cluster-threshold=0.4
topk=300

If you would like to reproduce the issue using the custom model, here are the files:
reproduce.zip (83.2 MB)

The parser files are precompiled for DeepStream 7.0, but can easily be compiled for DeepStream 7.1 via export CUDA_VER=12.6 and make.

Do you know why the RAM usage is rising in general using this new simple pipeline?
Do you know why the RAM usage rises much faster using the custom model?

b)

Please use valgrind to detect the memory growth of your program

This is the summary of running valgrind on the new simple pipeline using the YOLOv9 Pose Estimation model (Setup 2, file-loop = False). Do you need another output?

...
==4600== 78,928,476 bytes in 2,409 blocks are still reachable in loss record 43,553 of 43,553
==4600==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==4600==    by 0x7D14134: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02)
==4600==    by 0x7D2F5D1: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02)
==4600==    by 0x7D16A0A: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02)
==4600==    by 0x7D171DC: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02)
==4600==    by 0x7D7DE73: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.535.230.02)
==4600==    by 0x7839922: ??? (in /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcudart.so.12.2.140)
==4600==    by 0x783A06F: ??? (in /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcudart.so.12.2.140)
==4600==    by 0x783A0DD: ??? (in /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcudart.so.12.2.140)
==4600==    by 0x783CD46: ??? (in /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcudart.so.12.2.140)
==4600==    by 0x7814F5A: ??? (in /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcudart.so.12.2.140)
==4600==    by 0x7870B2A: cudaLaunchKernel (in /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcudart.so.12.2.140)
==4600== 
==4600== LEAK SUMMARY:
==4600==    definitely lost: 10,020,544 bytes in 45,194 blocks
==4600==    indirectly lost: 45,402 bytes in 377 blocks
==4600==      possibly lost: 93,653,510 bytes in 15,636 blocks
==4600==    still reachable: 290,372,174 bytes in 123,183 blocks
==4600==                       of which reachable via heuristic:
==4600==                         stdstring          : 6,389 bytes in 116 blocks
==4600==         suppressed: 0 bytes in 0 blocks
==4600== 
==4600== Use --track-origins=yes to see where uninitialised values come from
==4600== For lists of detected and suppressed errors, rerun with: -s
==4600== ERROR SUMMARY: 6284 errors from 6003 contexts (suppressed: 0 from 0)

Note that we cannot run valgrind on our production program since it is too slow, i.e., RTSP-streams and other functionalities do not start properly and the program thus won’t run.

c)

Can you describe your pipeline in detail, and what elements are used?

Here you find a graph of our production pipeline:
production_pipeline.zip (1.2 MB)

Can you identify any elements that might cause a consistently rising RAM usage?
Can you identify any elements important for a proper functionality that are missing in this pipeline?

d)

If you are using nvmsgconv in your pipeline

We don’t use nvmsgconv in our pipeline.

Thank you for your time!

Let’s narrow down the problem. Please run valgrind with a pipeline that reproduces your problem.The rest is not important.

This log is not complete, but it indicates that there is a memory leak of about 10M. definitely lost will indicate the specific code location of the memory leak

Thank you for responding @junshengy!

I ran

PYTHONMALLOC=malloc valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all --suppressions=/usr/lib/valgrind/python3.supp --log-file=“valgrind_log.log” python3 pipeline.py

on this pipeline:

import sys
import gi
gi.require_version("Gst", "1.0")
from gi.repository import GLib, Gst

import datetime
import pyds


# taken from deepstream-test3
def cb_newpad(decodebin, decoder_src_pad,data):
    print("In cb_newpad\n")
    caps=decoder_src_pad.get_current_caps()
    if not caps:
        caps = decoder_src_pad.query_caps()
    gststruct=caps.get_structure(0)
    gstname=gststruct.get_name()
    source_bin=data
    features=caps.get_features(0)
    print("gstname=",gstname)
    if(gstname.find("video")!=-1):
        print("features=",features)
        if features.contains("memory:NVMM"):
            bin_ghost_pad=source_bin.get_static_pad("src")
            if not bin_ghost_pad.set_target(decoder_src_pad):
                sys.stderr.write("Failed to link decoder src pad to source bin ghost pad\n")
        else:
            sys.stderr.write(" Error: Decodebin did not pick nvidia decoder plugin.\n")


# taken from deepstream-test3
def decodebin_child_added(child_proxy,Object,name,user_data):
    print("Decodebin child added:", name, "\n")
    if(name.find("decodebin") != -1):
        Object.connect("child-added",decodebin_child_added,user_data)
    if "source" in name:
        source_element = child_proxy.get_by_name("source")
        if source_element.find_property("drop-on-latency") != None:
            Object.set_property("drop-on-latency", True)


# modified from deepstream-test3
def create_source_bin(index,uri):
    print("Creating source bin")
    bin_name="source-bin-%02d" %index
    print(bin_name)
    nbin=Gst.Bin.new(bin_name)
    if not nbin:
        sys.stderr.write(" Unable to create source bin \n")
    uri_decode_bin=Gst.ElementFactory.make("nvurisrcbin", "uri-decode-bin")
    if not uri_decode_bin:
        sys.stderr.write(" Unable to create uri decode bin \n")
    uri_decode_bin.set_property("uri",uri)
    uri_decode_bin.set_property("file-loop",False)
    uri_decode_bin.connect("pad-added",cb_newpad,nbin)
    uri_decode_bin.connect("child-added",decodebin_child_added,nbin)
    Gst.Bin.add(nbin,uri_decode_bin)
    bin_pad=nbin.add_pad(Gst.GhostPad.new_no_target("src",Gst.PadDirection.SRC))
    if not bin_pad:
        sys.stderr.write(" Failed to add ghost pad in source bin \n")
        return None
    return nbin

# listen for end-of-stream message
def bus_call(bus, message, loop):
    msg_type = message.type
    if msg_type == Gst.MessageType.EOS:
        loop.quit()
    return True


# PROBE FUNCTION TO ADD TIMESTAMP TO CUSTOM USER META
def streammux_probe(pad: Gst.Pad, info: Gst.PadProbeInfo):
    gst_buffer = info.get_buffer()
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))

    pyds.nvds_acquire_meta_lock(batch_meta)
    frame_meta_list = batch_meta.frame_meta_list

    while frame_meta_list:
        try:
            frame_meta = pyds.NvDsFrameMeta.cast(frame_meta_list.data)
        except StopIteration:
            continue

        user_meta = pyds.nvds_acquire_user_meta_from_pool(batch_meta)
        if user_meta:
            timestamp = str(datetime.datetime.utcnow().timestamp())
            data = pyds.alloc_custom_struct(user_meta)

            data.structId = 0
            data.message = timestamp
            data.message = pyds.get_string(data.message)

            user_meta.user_meta_data = data
            user_meta.base_meta.meta_type = pyds.NvDsMetaType.NVDS_USER_META

            pyds.nvds_add_user_meta_to_frame(frame_meta, user_meta)

        try:
            frame_meta_list = frame_meta_list.next
        except StopIteration:
            break

    pyds.nvds_release_meta_lock(batch_meta)
    return Gst.PadProbeReturn.OK

# APPSINK HANDLER TO READ TIMESTAMP AND DETECTION COUNT
def on_new_sample(sink):
    sample = sink.emit("pull-sample")
    gst_buffer = sample.get_buffer()
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))

    pyds.nvds_acquire_meta_lock(batch_meta)
    frame_meta_list = batch_meta.frame_meta_list

    while frame_meta_list is not None:
        try:
            frame_meta = pyds.NvDsFrameMeta.cast(frame_meta_list.data)
        except StopIteration:
            break

        object_meta_list = frame_meta.obj_meta_list
        detection_count = 0
        while object_meta_list is not None:
            try:
                obj_meta = pyds.NvDsObjectMeta.cast(object_meta_list.data)
            except StopIteration:
                break

            detection_count += 1

            try:
                object_meta_list = object_meta_list.next
            except StopIteration:
                break

        frame_user_meta_list = frame_meta.frame_user_meta_list
        timestamp = 0
        while frame_user_meta_list is not None:
            try:
                user_meta = pyds.NvDsUserMeta.cast(frame_user_meta_list.data)
                if user_meta and user_meta.base_meta.meta_type == pyds.NvDsMetaType.NVDS_USER_META:
                    custom_data = pyds.CustomDataStruct.cast(user_meta.user_meta_data)
                    timestamp = float(pyds.get_string(custom_data.message))
            except StopIteration:
                break

            try:
                frame_user_meta_list = frame_user_meta_list.next
            except StopIteration:
                break

        try:
            frame_meta_list = frame_meta_list.next
        except StopIteration:
            break

        # print("Frame:", frame_meta.frame_num, "Timestamp:", timestamp, "Detection Count:", detection_count)

    pyds.nvds_release_meta_lock(batch_meta)
    return Gst.FlowReturn.OK


if __name__ == "__main__":
    Gst.init(None)
    pipeline = Gst.Pipeline()

    source_0_bin = create_source_bin(0, "file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4")
    source_1_bin = create_source_bin(1, "file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4")

    streammux = Gst.ElementFactory.make("nvstreammux", "streammux")
    streammux.set_property("batch-size", 2)

    streammux_src_pad = streammux.get_static_pad("src")
    streammux_src_pad.add_probe(Gst.PadProbeType.BUFFER, streammux_probe)

    pgie = Gst.ElementFactory.make("nvinfer", "pgie")
    pgie.set_property("config-file-path", "config.txt")

    appsink = Gst.ElementFactory.make("appsink", "appsink")
    appsink.set_property("sync", 1)
    appsink.set_property("emit-signals", True)
    appsink.connect("new-sample", on_new_sample)

    pipeline.add(source_0_bin)
    pipeline.add(source_1_bin)
    pipeline.add(streammux)
    pipeline.add(pgie)
    pipeline.add(appsink)

    source_0_bin_src_pad = source_0_bin.get_static_pad("src")
    source_1_bin_src_pad = source_1_bin.get_static_pad("src")
    streammux_sink_pad_0 = streammux.request_pad_simple("sink_0")
    streammux_sink_pad_1 = streammux.request_pad_simple("sink_1")
    source_0_bin_src_pad.link(streammux_sink_pad_0)
    source_1_bin_src_pad.link(streammux_sink_pad_1)

    streammux.link(pgie)
    pgie.link(appsink)

    loop = GLib.MainLoop()
    
    bus = pipeline.get_bus()
    bus.add_signal_watch()
    bus.connect("message", bus_call, loop)

    pipeline.set_state(Gst.State.PLAYING)

    try:
        loop.run()
    except Exception as e:
        print(e)

    pipeline.set_state(Gst.State.NULL)

with the model (YOLOv9 Pose Estimation), configuration and parser functions contained in the reproduce.zip file from above on Setup 2.

This is the complete valgrind output (10MB of memory leak):
valgrind_log.log (47.0 MB)

Thank you for your time!

Try the following two patches, one for postprocessing and the other for nvdsinfer

nvdsinfer_custom_impl_Yolo_pose/nvdsparsepose_Yolo.cpp

// Pass by reference, not by value
static std::vector<NvDsInferInstanceMaskInfo>
 nonMaximumSuppression(std::vector<NvDsInferInstanceMaskInfo>& binfo)
 {
   auto overlap1D = [](float x1min, float x1max, float x2min, float x2max) -> float {
     if (x1min > x2min) {
       std::swap(x1min, x2min);
       std::swap(x1max, x2max);
     }
     return x1max < x2min ? 0 : std::min(x1max, x2max) - x2min;
   };
 
   auto computeIoU = [&overlap1D](NvDsInferInstanceMaskInfo& bbox1, NvDsInferInstanceMaskInfo& bbox2) -> float {
     float overlapX = overlap1D(bbox1.left, bbox1.left + bbox1.width, bbox2.left, bbox2.left + bbox2.width);
     float overlapY = overlap1D(bbox1.top, bbox1.top + bbox1.height, bbox2.top, bbox2.top + bbox2.height);
     float area1 = (bbox1.width) * (bbox1.height);
     float area2 = (bbox2.width) * (bbox2.height);
     float overlap2D = overlapX * overlapY;
     float u = area1 + area2 - overlap2D;
     return u == 0 ? 0 : overlap2D / u;
   };
 
   std::stable_sort(binfo.begin(), binfo.end(), [](const NvDsInferInstanceMaskInfo& b1, const NvDsInferInstanceMaskInfo& b2) {
     return b1.detectionConfidence > b2.detectionConfidence;
   });
 
   std::vector<NvDsInferInstanceMaskInfo> out;
   for (auto i : binfo) {
     bool keep = true;
     for (auto j : out) {
       if (keep) {
         float overlap = computeIoU(i, j);
         keep = overlap <= NMS_THRESH;
       }
       else {
         break;
       }
     }
     if (keep) {
       out.push_back(i);
     } else {
        // free does not save the mask of the object
        if (i.mask) {
          delete[] i.mask;
          i.mask = nullptr;
        }
     }
   }
   return out;
 }

/opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer/nvdsinfer_context_impl_output_parsing.cpp

diff --git a/sources/libs/nvdsinfer/nvdsinfer_context_impl_output_parsing.cpp b/sources/libs/nvdsinfer/nvdsinfer_context_impl_output_parsing.cpp
index 7df1804..f2d594e 100644
--- a/sources/libs/nvdsinfer/nvdsinfer_context_impl_output_parsing.cpp
+++ b/sources/libs/nvdsinfer/nvdsinfer_context_impl_output_parsing.cpp
@@ -573,16 +573,24 @@ InstanceSegmentPostprocessor::fillUnclusteredOutput(NvDsInferDetectionOutput& ou
  * Filter out objects which have been specificed to be removed from the metadata
  * prior to clustering operation
  */
-void InstanceSegmentPostprocessor::preClusteringThreshold(
-                           NvDsInferParseDetectionParams const &detectionParams,
-                           std::vector<NvDsInferInstanceMaskInfo> &objectList)
+ void InstanceSegmentPostprocessor::preClusteringThreshold(
+    NvDsInferParseDetectionParams const &detectionParams,
+    std::vector<NvDsInferInstanceMaskInfo> &objectList)
 {
-    objectList.erase(std::remove_if(objectList.begin(), objectList.end(),
-               [detectionParams](const NvDsInferInstanceMaskInfo& obj)
-               { return (obj.classId >= detectionParams.numClassesConfigured) ||
-                        (obj.detectionConfidence <
-                        detectionParams.perClassPreclusterThreshold[obj.classId])
-                        ? true : false;}),objectList.end());
+    auto removeMask = [&detectionParams](const NvDsInferInstanceMaskInfo& obj) -> bool {
+        /* Remove objects which are not in the configured classes or have
+        * confidence less than the configured threshold */
+        if ((obj.classId >= detectionParams.numClassesConfigured) ||
+            (obj.detectionConfidence < detectionParams.perClassPreclusterThreshold[obj.classId])) {
+            if (obj.mask != nullptr) {
+                delete []obj.mask;
+            }
+            return true;
+        } else {
+            return false;
+        }
+    };
+    objectList.erase(std::remove_if(objectList.begin(), objectList.end(), removeMask), objectList.end());
 }

First tests on the pipeline from above as well as in our production environment show that the RAM usage is stable after applying your two patches.
Still we’ll conduct some further tests to ensure the issue is definitively solved.

Thank you very much @junshengy for your fast and helpful responses!
This really helps us to continue our development, as this memory leak was taking up a lot of resources.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.