Memory leak in DeepStream

• Hardware Platform (Jetson / GPU) : NVIDIA GTX 1050 4GB (Mobile)
• DeepStream Version: 6.0
• TensorRT Version: 8.0 GA (8.0.1)
• NVIDIA GPU Driver Version: 470.63.01
• Issue Type: Bugs

I’m facing memory leak in deepstream-app and samples (both C/C++ and Python).

Example in deepstream-app

deepstream_app_config.txt [application] enable-perf-measurement=1 perf-measurement-interval-sec=5

[tiled-display]
enable=0
rows=1
columns=1
width=1280
height=720
gpu-id=0
nvbuf-memory-type=0

[source0]
enable=1
type=3
uri=file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4
num-sources=30
gpu-id=0
cudadec-memtype=0

[sink0]
enable=1
type=1
sync=0
gpu-id=0
nvbuf-memory-type=0

[osd]
enable=1
gpu-id=0
border-width=5
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
live-source=0
batch-size=30
batched-push-timeout=40000
width=1920
height=1080
enable-padding=0
nvbuf-memory-type=0

[primary-gie]
enable=1
gpu-id=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary.txt

[tests]
file-loop=1

config_infer_primary.txt [property] gpu-id=0 net-scale-factor=0.0039215697906911373 model-color-format=0 custom-network-config=yolov3.cfg model-file=yolov3.weights model-engine-file=model_b30_gpu0_fp32.engine labelfile-path=labels.txt batch-size=30 network-mode=0 num-detected-classes=80 interval=0 gie-unique-id=1 process-mode=1 network-type=0 cluster-mode=2 maintain-aspect-ratio=0 parse-bbox-func-name=NvDsInferParseYolo custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so engine-create-func-name=NvDsInferYoloCudaEngineGet

[class-attrs-all]
nms-iou-threshold=0.45
pre-cluster-threshold=0.25

VmRSS memory maximum value each 10 minutes

Elapesed time (minutes) VmRSS
10 2738008
20 2740120
30 2741176
40 2741176
50 2741176
60 2741176
70 2741440
80 2741440
90 2741968
100 2741968

How can I fix it?

Board: RTX 3080 16GB running 300 videos in FP16 mode

VmRSS memory maximum value each 10 minutes

Elapesed time (minutes) VmRSS
10 10943876
20 10979780
30 10988228
40 10989812
50 10999580
60 10999844
70 11001956
80 11003276
90 11005388
100 11006708

Valgrind log on GTX 1050 running for 1 hour:

valgrind --tool=memcheck --leak-check=full --num-callers=100 --show-leak-kinds=definite,indirect --track-origins=yes deepstream-app -c deepstream_app_config.txt

==15759== LEAK SUMMARY:
==15759==    definitely lost: 105,720 bytes in 249 blocks
==15759==    indirectly lost: 97,768 bytes in 1,026 blocks
==15759==      possibly lost: 46,982 bytes in 497 blocks
==15759==    still reachable: 178,515,203 bytes in 116,218 blocks
==15759==                       of which reachable via heuristic:
==15759==                         stdstring          : 84,230 bytes in 771 blocks
==15759==                         length64           : 9,144 bytes in 213 blocks
==15759==                         newarray           : 1,712 bytes in 27 blocks
==15759==         suppressed: 0 bytes in 0 blocks
==15759== Reachable blocks (those to which a pointer was found) are not shown.
==15759== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==15759== 
==15759== ERROR SUMMARY: 20922 errors from 424 contexts (suppressed: 0 from 0)

valgrind.log (371.2 KB)

Any update?

Hey,
Have you found any solution to this? I am facing a similar problem I think. Hardware decoder deallocation?

I didn’t found solution for that yet. I have many projects with DeepStream pipelines and I have this problem in all of them (Python and C).

Hey @marcoslucianops , it’s nice to see you also in here! Recently I had a huge memory leak which was caused by video decoding: the nvinfer model wasn’t fast enough to process all the frames and therefore the video got accumulated in a buffer. I found a solution by using other decoders instead of the source_bin proposed in the official examples but you can start to check if this is the issue. I’d love to solve this using a queue Gstreamer leaky queue stops the pipeline but I am having trouble with it.

Hey @marcoslucianops
Yes, I am in a similar situation. I hope we get a reply soon.

@mfoglio I noticed this happens to me especially when using nvjpeg decoders. But doesn’t happen while using h265/h264 decoders. I have tried to narrow down the problem by completely removing decoders and just using a test video source with streammux, nvinfer, tracker and osd, the leak still exists. I am not sure if both my experiments are related, but the behaviour is same.

I also noticed that if you run the corresponding gst-launch command, say via a python subprocess, this leak does not happen. So I wonder if I am doing something wrong with respect to de-allocating the GstElements?

I remember reading in the Release Notes of Deepstream 6.0 about a potential memory leak related to QOS messages being sent upstream. Please double check the document to make sure my memory is not tricking me !

Hey, you’re right. I noticed that too.
Btw, there were some unattended-upgrades on my machine yesterday. After that, I am unable to reproduce the memory leak error. It’s quite weird because even the valgrind logs don’t match anymore. I was able to reproduce it several times earlier, but cannot do it after the upgrade.

I have tested the same code on 2 machines.

  • 1 with RTX 2080 Ti (that has the unattended-updates enabled) I cannot reproduce it on this machine anymore. The driver version of this is 470
  • with RTX 3090 (this has not had any upgrades). I can still reproduce it on this machine. The driver version of this machine is 510

Have you faced anything similar?

@mfoglio Nice to see you too. I’ve tested the pipeline adding a queue element with leaky=2 and max-size-buffers=1 properties between source and nvstreammux (source_src_pad → queue_sink_pad and queue_src_pad → streammux_sink_pad) but I’m still facing memory leaks in the pipeline.

My pipeline is uridecodebin (50 sources) → queue (50 sources) → nvstreammux → nvinfer → nvvideoconvert → nvdsosd → fakesink

NOTE: The sink qos property is set to 0 (FALSE).

@marmikshah In my case, I use nvv4l2decoder for mpeg4/h264/h265 decode.

I am not sure the leaky queue works as expected in that scenario because I experienced the same issue. I opened a thread about that here: Gstreamer leaky queue stops the pipeline

@marcoslucianops do you how we can track the memory usage of each gstreamer component?

QoS memory leak has been fixed in DS 6.0 GA version. There will be more memory leak fixes in the future release. Can you please get more leak info with the method in DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums?

Hi @Fiona.Chen , I see it mentioned in the release notes for DS6.0 https://docs.nvidia.com/metropolis/deepstream/DeepStream_6.0_Release_Notes.pdf . What is the difference between version 6.0 and 6.0 GA if any?

There’s more information in the 1st, 2nd and 3rd posts.

I run my pipeline with 1 camera and I have the following output:

==15180== LEAK SUMMARY:
==15180==    definitely lost: 19,058 bytes in 85 blocks
==15180==    indirectly lost: 5,499 bytes in 200 blocks
==15180==      possibly lost: 1,604,596 bytes in 6,843 blocks
==15180==    still reachable: 1,497,043,413 bytes in 940,007 blocks
==15180==                       of which reachable via heuristic:
==15180==                         stdstring          : 1,200,349 bytes in 20,069 blocks
==15180==                         length64           : 2,248 bytes in 49 blocks
==15180==                         newarray           : 2,296 bytes in 40 blocks
==15180==         suppressed: 0 bytes in 0 blocks
==15180== Reachable blocks (those to which a pointer was found) are not shown.
==15180== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==15180== 
==15180== For counts of detected and suppressed errors, rerun with: -v
==15180== ERROR SUMMARY: 38579 errors from 2065 contexts (suppressed: 0 from 0)

How should I stop valgrind? I just run a kill pid. It looks like from data has been lost from the report above. That seems to confirm the memory leak.

@marcoslucianops assuming that a queue with leaky=2 would drop frames when the pipeline can’t keep up (still not 100% sure, what’s your understanding?) if this does not work (and it doesn’t) I think we can assume there is a component after the queue that keeps pulling data from the pipeline. What do you think?
I have been fighting with this leak for weeks. The only solution I found was to decode the video using rtpsrc using drop-on-latency=1 but this lead to corrupted/pixelated images on some video streams: see Gstreamer corrupted image from rtsp video stream - Stack Overflow . However, the fact this solves the memory leak (despite not being a solution) seems to be in contrast with my first statement where I said that the leak should be after the queue. That’s why at this point I am not really sure a leaky queue would drop the frames.

@marcoslucianops this user seems to have a memory leak too Significant Memory leak when streaming from a clients RTSP source - #4 by karan.shetty

I think I found a way to reproduce the memory leak here: Deepstream RTSP memory leak (with code to reproduce the issue)

@mfoglio the queue drops the frames but the rtpjitterbuffer still seems to buffer the frames. I did long tests and, with drop-frame-interval or videorate to set streams to 1 FPS, it doesn’t seem to increase memory after about 3-4 hours. I’m doing more tests but it takes long time.

1 Like