Memory Leak of when running deepstream python (using grpc, triton-server, docker, Ubuntu)

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) 4090 Ti GPU
• DeepStream Version: 7.1
• JetPack Version (valid for Jetson only)
• TensorRT Version 10.3.0.26-1+cuda12.5
• NVIDIA GPU Driver Version (valid for GPU only) Driver Version: 535.183.01
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

When running Python DeepStream in a Docker container (based on the official NVIDIA Docker image running Ubuntu 22.04), I observe that the container’s memory usage increases after completing one pipeline and creating a new one. The memory is not released even after setting the pipeline’s state to NULL.

I noticed that it is unnecessary to run a loop or link elements to reproduce this issue. The same problem occurs when simply changing the pipeline’s state to PLAYING and then setting it to NULL.

The memory increases most when I set the property for pgie and sgie and change to 'PLAYING". Other elements increase memory a little but not too much. If the config file for pgie and sgie is “”. It doesn’t increase memory too much. You can using other model available in NVIDIA to set for my code below

Below is my code, which can easily to reproduce this issue.
Thanks.

Here is some related topics:

deepstream-issues.zip (6.1 KB)

1 Like
  1. Please monitor the memory usage of the process, not docker, which is meaningless for this issues

The first topic has been fixed on Jetson, you should have no problem using dGPU
The second topic is a gstreamer problem that needs to be solved by the gstreamer community

3.If I understand you correctly, you mean the following code will cause the process memory to continue to grow?

import sys
sys.path.append('../')
import gi
import time
gi.require_version('Gst', '1.0')
from gi.repository import Gst

def main():
    # Standard GStreamer initialization
    Gst.init(None)

    # Create gstreamer elements
    # Create Pipeline element that will form a connection of other elements
    print("Creating Pipeline \n ")
    pipeline = Gst.Pipeline()

    if not pipeline:
        sys.stderr.write(" Unable to create Pipeline \n")
            
    pgie = Gst.ElementFactory.make("nvinfer", "primary-inference")
    if not pgie:
        sys.stderr.write(" Unable to create pgie \n")
    pgie.set_property('config-file-path', "dstest1_pgie_config.txt")

    pipeline.add(pgie)

    for _ in range(50):
        pipeline.set_state(Gst.State.PLAYING)
        time.sleep(3)
        pipeline.set_state(Gst.State.NULL)

if __name__ == '__main__':
    sys.exit(main(sys.argv))

You can use valgrind to detect memory leaks in the following way:

PYTHONMALLOC=malloc valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all \
 --suppressions=/usr/lib/valgrind/python3.supp  python3 xxx.py

Thank for your reply.
I follow this code: deepstream_python_apps/apps/deepstream-test3/deepstream_test_3.py at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub
My simple pipeline contains:

  • N input src from mp4 file
  • streammux
  • tee, queue
  • 1 pgie and 1 sgie
    I don’t know if memory increases because of pgie, sgie only or related to a number of input video srcs and streammux, pgie, sgie. You can add some more elements to check. My code has all these elements, you can use other pgie, sgie to reproduce this
    deepstream-issues.zip (6.0 KB)

I see if I use the more input src, the more memory usage after one pipeline
Here is my valgrind result. I run for loop 20 times and 64 input sources
log_20.txt (15.4 MB)

This file I remove warning line: Failed to query video capabilities: Invalid argument
log_20_remove_warning.txt (15.3 MB)

I run python main.py app when docker restart, memory of container after one pipeline is increase, so I need to restart container after each 6h. But when test with valgrind with python main.py. The value of memory usage is capped and does not increase continuously.

I wonder if it related to this issue or it was fixed in DS 7.1 or not, but solution for gstreamer 1.16 not gstreamer 1.20:
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_troubleshooting.html#memory-usage-keeps-on-increasing-when-the-source-is-a-long-duration-containerized-files-e-g-mp4-mkv

Thanks.

The more input, the more memory is used, this is normal

These two logs show that there is no DeepStream leak.

This patch is related to your input. If you are using rtsp camera, this modification will have no effect.

If the program cannot run due to Docker memory usage, first determine which process caused the problem. In addition, the sample cannot reproduce the problem. Can you use the above method to detect your program?

1.This patch is related to your input. If you are using rtsp camera, this modification will have no effect.
I try to rebuild gstreamer from source but it look like not solve my problem.

This is my docker-compose file that I using to test it (docker image is public).
docker-compose.ds.new-pipeline.yaml.zip (1.0 KB)
When it runs, test.py is only used to execute the sleep command. Then, I exec into the container, run python3 main.py, and monitor the container’s memory usage using docker stats.

The memory increases with each pipeline. Once the execution completes, the process terminates and exits (python3 main.py). At that point, the memory is released and returns to its original state before running the main file.

Thanks

Since your yolov8m model is not in this image, I did not run your program successfully.

I tested a similar scenario using valgrind, I think your problem may be caused by the increase of still reachable memory, but this problem seems to be only related to gstreamer. If you remove all plugins related to deepstreamer, the memory will still increase.

==2526== LEAK SUMMARY:
==2526==    definitely lost: 89,088 bytes in 6 blocks
==2526==    indirectly lost: 0 bytes in 0 blocks
==2526==      possibly lost: 200,847 bytes in 2,141 blocks
==2526==    still reachable: 11,854,984 bytes in 116,883 blocks
==2526==                       of which reachable via heuristic:
==2526==                         stdstring          : 6,389 bytes in 116 blocks
==2526==         suppressed: 0 bytes in 0 blocks

After the element is created, it is not recycled by the Python virtual machine’s GC, which may be the reason for the memory growth. The issue may be related to gstreamer’s python binding pygobject. This is maintained by the community, I don’t have much experience with it.

I tried Gst.deinit() after setting the pipeline status to NULL, which slowed down the growth, but this should not be the root cause.

==2526== 348,160 bytes in 640 blocks are still reachable in loss record 46,452 of 46,453
==2526==    at 0x60D06CF: g_type_create_instance (gtype.c:1961)
==2526==    by 0x60B4005: g_object_new_internal (gobject.c:2246)
==2526==    by 0x60B5E00: g_object_new_valist (gobject.c:2585)
==2526==    by 0x60B642C: g_object_new (gobject.c:2058)
==2526==    by 0x629BE2D: ??? (in /usr/lib/x86_64-linux-gnu/libffi.so.8.1.0)
==2526==    by 0x6298492: ??? (in /usr/lib/x86_64-linux-gnu/libffi.so.8.1.0)
==2526==    by 0x7F4D721: pygi_invoke_c_callable (pygi-invoke.c:684)
==2526==    by 0x7F4DFA7: UnknownInlinedFun (pygi-cache.c:783)
==2526==    by 0x7F4DFA7: _constructor_cache_invoke_real (pygi-cache.c:929)
==2526==    by 0x7F4B825: UnknownInlinedFun (pygi-cache.c:862)
==2526==    by 0x7F4B825: UnknownInlinedFun (pygi-invoke.c:727)
==2526==    by 0x7F4B825: _wrap_g_callable_info_invoke (pygi-invoke.c:764)
==2526==    by 0x7F4BA7C: _callable_info_call (pygi-info.c:548)
==2526==    by 0x289B4A: _PyObject_MakeTpCall (call.c:215)
==2526==    by 0x2834C7: UnknownInlinedFun (abstract.h:112)
==2526==    by 0x2834C7: UnknownInlinedFun (abstract.h:99)
==2526==    by 0x2834C7: UnknownInlinedFun (abstract.h:123)
==2526==    by 0x2834C7: UnknownInlinedFun (ceval.c:5893)
==2526==    by 0x2834C7: _PyEval_EvalFrameDefault (ceval.c:4181)
==2526== 
==2526== 413,440 bytes in 700 blocks are still reachable in loss record 46,453 of 46,453
==2526==    at 0x60D06CF: g_type_create_instance (gtype.c:1961)
==2526==    by 0x60B4005: g_object_new_internal (gobject.c:2246)
==2526==    by 0x60B56C0: g_object_new_with_properties (gobject.c:2406)
==2526==    by 0x5C06CCF: gst_element_factory_create_with_properties (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.2003.0)
==2526==    by 0x5C0715B: gst_element_factory_create_valist (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.2003.0)
==2526==    by 0x5C07721: gst_element_factory_make_valist (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.2003.0)
==2526==    by 0x5C07947: gst_element_factory_make_full (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.2003.0)
==2526==    by 0x629BE2D: ??? (in /usr/lib/x86_64-linux-gnu/libffi.so.8.1.0)
==2526==    by 0x6298492: ??? (in /usr/lib/x86_64-linux-gnu/libffi.so.8.1.0)
==2526==    by 0x7F4D721: pygi_invoke_c_callable (pygi-invoke.c:684)
==2526==    by 0x7F4B825: UnknownInlinedFun (pygi-cache.c:862)
==2526==    by 0x7F4B825: UnknownInlinedFun (pygi-invoke.c:727)
==2526==    by 0x7F4B825: _wrap_g_callable_info_invoke (pygi-invoke.c:764)
==2526==    by 0x289B4A: _PyObject_MakeTpCall (call.c:215)

If you use native code to build a similar pipeline, will the same issue occur?

I want to determine if this problem is related to the python virtual machine

I tried running the DeepStream sample code in C, and it looked fine—no memory leaks occurred. Only the PyDS code has this problem. Thanks

1.In the test program main.py you provided, element does not actually build a pipeline.

They are independent of each other. Try to build a complete pipeline, and then add del pipeline after the pipeline is end.

2.Do you plan to use tee and multiple nvinferservers for parallel inference?
This approach doesn’t work, refer to this sample