Memory leak in test5-app with smart-record

a7med.hish · October 3, 2021, 10:57pm

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
GPU: RTX 2060 and also happens with RTX 2080 TI
• DeepStream Version
5.1
• TensorRT Version
I am using the devel docker image for deepstream-5.1
• NVIDIA GPU Driver Version (valid for GPU only)
460
• Issue Type( questions, new requirements, bugs)
bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

The issue can be reproduced with any config file containing smart-record configurations for 1 or more sources. The issue only happens with test5-app and doesn’t occur with deepstream-app.
Monitor the memory usage of the app PID or the docker container for a little while and observe the increase as shown in the figure below (taken from my experiment).

The following is the config file I used for this experiment.

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=1

[tiled-display]
enable=1
rows=4
columns=5
width=1280
height=720
gpu-id=0
nvbuf-memory-type=0

[source0]
enable=1
type=4
uri=xxxxxxxxxxxxxxxxx
camera-id=0
rtsp-reconnect-interval-sec=5
gpu-id=0
cudadec-memtype=0

[source1]
enable=1
type=4
uri=xxxxxxxxxxxxxxxxx
camera-id=1
rtsp-reconnect-interval-sec=5
gpu-id=0
cudadec-memtype=0

[source2]
enable=1
type=4
uri=xxxxxxxxxxxxxxxxx
camera-id=2
rtsp-reconnect-interval-sec=5
gpu-id=0
cudadec-memtype=0
smart-record=2
smart-rec-dir-path=xxxxxxxxxxxxxxxxxxx
smart-rec-file-prefix=source2
smart-rec-video-cache=60
smart-rec-container=0
smart-rec-default-duration=10
smart-rec-start-time=5
smart-rec-duration=5

[source3]
enable=1
type=4
uri=xxxxxxxxxxxxxxxxx
camera-id=3
rtsp-reconnect-interval-sec=5
gpu-id=0
cudadec-memtype=0

[source4]
enable=1
type=4
uri=xxxxxxxxxxxxxxxxx
camera-id=4
rtsp-reconnect-interval-sec=5
gpu-id=0
cudadec-memtype=0

[source5]
enable=1
type=4
uri=xxxxxxxxxxxxxxxxx
camera-id=5
rtsp-reconnect-interval-sec=5
gpu-id=0
cudadec-memtype=0

[source6]
enable=1
type=4
uri=xxxxxxxxxxxxxxxxx
camera-id=6
rtsp-reconnect-interval-sec=5
gpu-id=0
cudadec-memtype=0

[source7]
enable=1
type=4
uri=xxxxxxxxxxxxxxxxx
camera-id=7
rtsp-reconnect-interval-sec=5
gpu-id=0
cudadec-memtype=0

[source8]
enable=1
type=4
uri=xxxxxxxxxxxxxxxxx
camera-id=8
rtsp-reconnect-interval-sec=5
gpu-id=0
cudadec-memtype=0

[source9]
enable=1
type=4
uri=xxxxxxxxxxxxxxxxx
camera-id=9
rtsp-reconnect-interval-sec=5
gpu-id=0
cudadec-memtype=0

[source10]
enable=1
type=4
uri=xxxxxxxxxxxxxxxxx
camera-id=10
rtsp-reconnect-interval-sec=5
gpu-id=0
cudadec-memtype=0

[source11]
enable=1
type=4
uri=xxxxxxxxxxxxxxxxx
camera-id=11
rtsp-reconnect-interval-sec=5
gpu-id=0
cudadec-memtype=0

[source12]
enable=1
type=4
uri=xxxxxxxxxxxxxxxxx
camera-id=12
rtsp-reconnect-interval-sec=5
gpu-id=0
cudadec-memtype=0

[source13]
enable=1
type=4
uri=xxxxxxxxxxxxxxxxx
camera-id=13
rtsp-reconnect-interval-sec=5
gpu-id=0
cudadec-memtype=0

[source14]
enable=1
type=4
uri=xxxxxxxxxxxxxxxxx
camera-id=14
rtsp-reconnect-interval-sec=5
gpu-id=0
cudadec-memtype=0

[source15]
enable=1
type=4
uri=xxxxxxxxxxxxxxxxx
camera-id=15
rtsp-reconnect-interval-sec=5
gpu-id=0
cudadec-memtype=0

[source16]
enable=1
type=4
uri=xxxxxxxxxxxxxxxxx
camera-id=16
rtsp-reconnect-interval-sec=5
gpu-id=0
cudadec-memtype=0

[source17]
enable=1
type=4
uri=xxxxxxxxxxxxxxxxx
camera-id=17
rtsp-reconnect-interval-sec=5
gpu-id=0
cudadec-memtype=0

[source18]
enable=1
type=4
uri=xxxxxxxxxxxxxxxxx
camera-id=18
rtsp-reconnect-interval-sec=5
gpu-id=0
cudadec-memtype=0

[source19]
enable=1
type=4
uri=xxxxxxxxxxxxxxxxx
camera-id=19
rtsp-reconnect-interval-sec=5
gpu-id=0
cudadec-memtype=0

[sink0]
enable=1
type=2
sync=0
gpu-id=0
nvbuf-memory-type=0

[osd]
enable=1
gpu-id=0
border-width=1
text-size=8
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Arial
show-clock=1
clock-x-offset=1050
clock-y-offset=20
clock-text-size=18
clock-color=1;0;0;1
nvbuf-memory-type=0

[streammux]
gpu-id=0
live-source=1
batch-size=20
batched-push-timeout=65000
width=1080
height=720
enable-padding=0
nvbuf-memory-type=0

[primary-gie]
enable=1
gpu-id=0
# Required to display the PGIE labels, should be added even when using config-file property
batch-size=20
# Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
gie-unique-id=1
nvbuf-memory-type=0
config-file=xxxxxxxxx

[tracker]
enable=1
tracker-width=640
tracker-height=384
ll-lib-file=/opt/nvidia/deepstream/deepstream-5.1/lib/libnvds_nvdcf.so
ll-config-file=../configs/tracker_config.yml
gpu-id=0
enable-batch-process=1

[nvds-analytics]
enable=1
config-file=../configs/config_nvdsanalytics.txt

a7med.hish · October 4, 2021, 12:24pm

Please help me figure this out asap because it’s urgent.

mchi · October 4, 2021, 2:30pm

could you share your “git diff” based on the release code of deepstream-test5, so that we can reproduce it exactly and debug it?

a7med.hish · October 4, 2021, 2:41pm

This experiment was done using the deepstream-test5 release code without any changes.

mchi · October 4, 2021, 2:42pm

Ok, we ever tested the memory leak with this sample on x86 platform, anyway, we will double check.

a7med.hish · October 4, 2021, 2:55pm

Thank you, I am eager to know if you can reproduce this behavior.
It should be easy to test using the provided config file and with passing only the config file argument: ./deepstream-test5-app -c <path-to-config-file>

I logged the consumption using top -b -d 0.5 -p <PID> >> test5.log and parsed the log file with a python script to achieve the attached figure.

The reassuring thing is that the same configs with deepstream-app didn’t cause any memory leak and the consumption of the app PID was constant. That’s why I felt confident about a possible issue with the test5 app modifications on top of ds-app.

Thanks for taking the time to test this. This is important as our app is based on test5 app and we want to use the smart-record feature.

a7med.hish · October 5, 2021, 1:45pm

any updates?

mchi · October 8, 2021, 2:21pm

sorry for delay! Will get back to you tomorrow

Amycao · October 9, 2021, 2:46am

Here is what i observed, when run after a few seconds, top shown:

top - 09:59:53 up 26 days, 21:28, 17 users, load average: 0.77, 0.72, 0.89
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.4 us, 1.0 sy, 0.0 ni, 97.5 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 13190337+total, 16773184 free, 4100544 used, 11102964+buff/cache
KiB Swap: 55594996 total, 55577844 free, 17152 used. 12657652+avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15128 root 20 0 27.875g 1.928g 801228 S 40.0 1.5 0:24.68 deepstream-test

after 40 minutes, top shown:

top - 10:42:07 up 26 days, 22:10, 20 users, load average: 0.65, 0.81, 0.86
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.4 us, 0.5 sy, 0.0 ni, 97.9 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 13190337+total, 16345212 free, 4134440 used, 11142372+buff/cache
KiB Swap: 55594996 total, 55577844 free, 17152 used. 12654444+avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15128 root 20 0 27.875g 1.931g 801288 S 48.0 1.5 19:28.54 deepstream-test

RES increased around 3M in 40 minutes, is this leak you mentioned, right?

a7med.hish · October 9, 2021, 6:01am

Yes. The logs you posted also show about 35Mb increase in the used KiB Mem. There is also an 8% increase in this process %MEM usage.

a7med.hish · October 12, 2021, 2:39pm

Hello :)
It’s been 3 days, are you able to share any updates yet?

kesong · October 15, 2021, 8:16am

We are checking the issue. I can get RSS memory increase with below tool. But I can’t get the “definitely lost” with valgrind in my side. I share those tools below. Maybe you also can have a try with below tool and share the result. Thanks.

Display the memory usage:

Check memory leak with valgrind:

a7med.hish · October 20, 2021, 2:43pm

Attached are the logs for two separate runs of the test5 app.

valgrind logs for a ~15hr run: raw.txt (5.5 MB)
nvmemstat logs for a ~1hr run: log.txt (744.4 KB)

The app just keeps eating up memory, we once had a run that reached more than 50Gbs of memory [about 3 or 4 days of runtime].
Thanks for taking a look into this. Please keep us updated. This is a critical issue for us, so any update would be helpful.

a7med.hish · October 28, 2021, 12:37am

Any updates yet?

I tried running untill the whole ram was consumed and the app crashed with the following message

ERROR:gstqueue.c:1143:gst_queue_leak_downstream: assertion failed: (leak != NULL)
 Aborted (core dumped)

kayccc · December 7, 2021, 11:00pm

We’re investigating this issue, will update once more progress. Thanks