Deepstream 3d action recognition app leak memory on jetson nx

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) NX
• DeepStream Version 6.1
• JetPack Version (valid for Jetson only)
• TensorRT Version 8.4.0
• NVIDIA GPU Driver Version (valid for GPU only) 11.4
• Issue Type( questions, new requirements, bugs) bugs
I am using deepstream 3d action recognition app samples. I got memory leak when model start inference lead to out of memory and app start freezing. I have to reboot immediately or unplug the source from board. I tried both example model created by TAO and my custom model converted by onnx and got same problem.

Hi @samvdh , how do you confirm the memory leak?
Did you just run our demo without any change?
Could you attach the log when the error happened? Thanks

Hi, thank you for your response. I changed a little bit, I haved change from 4 stream batch to 1 stream. The app start to freezing after first few seconds. With jtop I saw memory continuous increasing significantly until 7g. I tried deepstream_test1 and it worked normally on my jetson NX. I attach the images of log here, without and with debug flag. My nx cannot connect to internet right now so I only can capture image like this. Sorry for this inconvenience.

So is it ok when you run this demo with our 4 demo stream source?
Did you just change the config file from our 4 stream sources to your stream source?
Could you attach your stream source to us?
Could you help to dump the memory log when the demo is running by referring the link below:
https://forums.developer.nvidia.com/t/deepstream-sdk-faq/80236/14

Hi, It got freezing when run with 4 stream source as well so I changed to 1 stream. I use default stream source which is sample_run.mov, I have tried other stream sources and still got problem. I followed your link and put logs here, this time I run with default config with 4 default stream sources.

`PID: 14637 09:38:57 hardware memory: 0.0000 KiB VmSize: 8997.6797 MiB VmRSS: 90.4375 MiB RssFile: 45.2852 MiB RssAnon: 55.9766 MiB lsof: 181

PID: 14637 09:38:59 hardware memory: 0.0000 KiB VmSize: 9564.3242 MiB VmRSS: 302.0117 MiB RssFile: 171.7188 MiB RssAnon: 130.7891 MiB lsof: 462
PID: 14637 09:39:00 hardware memory: 0.0000 KiB VmSize: 9641.6719 MiB VmRSS: 478.9375 MiB RssFile: 259.3477 MiB RssAnon: 222.3477 MiB lsof: 524
PID: 14637 09:39:02 hardware memory: 0.0000 KiB VmSize: 9718.1797 MiB VmRSS: 619.5273 MiB RssFile: 328.0000 MiB RssAnon: 293.2383 MiB lsof: 543
PID: 14637 09:39:03 hardware memory: 0.0000 KiB VmSize: 9951.1914 MiB VmRSS: 709.5820 MiB RssFile: 413.8359 MiB RssAnon: 297.8555 MiB lsof: 543
PID: 14637 09:39:05 hardware memory: 0.0000 KiB VmSize: 9951.1914 MiB VmRSS: 812.7305 MiB RssFile: 516.9609 MiB RssAnon: 297.8555 MiB lsof: 543
PID: 14637 09:39:06 hardware memory: 0.0000 KiB VmSize: 10023.2969 MiB VmRSS: 963.3945 MiB RssFile: 592.4102 MiB RssAnon: 373.2656 MiB lsof: 581
PID: 14637 09:39:07 hardware memory: 0.0000 KiB VmSize: 9929.1094 MiB VmRSS: 890.8789 MiB RssFile: 387.4180 MiB RssAnon: 503.4609 MiB lsof: 599
PID: 14637 09:39:08 hardware memory: 0.0000 KiB VmSize: 10112.7227 MiB VmRSS: 1051.7812 MiB RssFile: 455.6875 MiB RssAnon: 596.6094 MiB lsof: 658
PID: 14637 09:39:10 hardware memory: 0.0000 KiB VmSize: 10170.5195 MiB VmRSS: 1169.9062 MiB RssFile: 516.5430 MiB RssAnon: 661.6094 MiB lsof: 697
PID: 14637 09:39:11 hardware memory: 0.0000 KiB VmSize: 10238.5117 MiB VmRSS: 1308.5234 MiB RssFile: 586.5820 MiB RssAnon: 723.5664 MiB lsof: 741
PID: 14637 09:39:12 hardware memory: 0.0000 KiB VmSize: 10300.2734 MiB VmRSS: 1434.4375 MiB RssFile: 650.4570 MiB RssAnon: 785.7070 MiB lsof: 773
PID: 14637 09:39:14 hardware memory: 0.0000 KiB VmSize: 10347.7031 MiB VmRSS: 1542.0977 MiB RssFile: 711.6562 MiB RssAnon: 831.8359 MiB lsof: 806
PID: 14637 09:39:15 hardware memory: 0.0000 KiB VmSize: 11171.0859 MiB VmRSS: 1548.1250 MiB RssFile: 758.6367 MiB RssAnon: 789.6406 MiB lsof: 814
PID: 14637 09:39:16 hardware memory: 0.0000 KiB VmSize: 11333.4023 MiB VmRSS: 1540.6758 MiB RssFile: 737.5820 MiB RssAnon: 803.1523 MiB lsof: 931
PID: 14637 09:39:18 hardware memory: 0.0000 KiB VmSize: 13541.9375 MiB VmRSS: 1615.0781 MiB RssFile: 765.1992 MiB RssAnon: 850.6289 MiB lsof: 1464
PID: 14637 09:39:19 hardware memory: 0.0000 KiB VmSize: 13795.5781 MiB VmRSS: 1657.6445 MiB RssFile: 779.2852 MiB RssAnon: 878.2969 MiB lsof: 1659
PID: 14637 09:39:21 hardware memory: 0.0000 KiB VmSize: 13859.5781 MiB VmRSS: 1860.0078 MiB RssFile: 867.4883 MiB RssAnon: 992.4570 MiB lsof: 1729
PID: 14637 09:39:22 hardware memory: 0.0000 KiB VmSize: 13923.5781 MiB VmRSS: 1978.6133 MiB RssFile: 943.8203 MiB RssAnon: 1036.4961 MiB lsof: 1777
PID: 14637 09:39:24 hardware memory: 0.0000 KiB VmSize: 13987.5781 MiB VmRSS: 2135.6836 MiB RssFile: 1024.0586 MiB RssAnon: 1116.7852 MiB lsof: 1834
PID: 14637 09:39:25 hardware memory: 0.0000 KiB VmSize: 14115.5781 MiB VmRSS: 2262.2344 MiB RssFile: 1051.0195 MiB RssAnon: 1217.3398 MiB lsof: 1846`

OK,Thanks. There seems to be a memory leak from the log. We’ll run the demo in our NX env.
Please reconfirm the environment:
1.run the deepstream-3d-action-recognition demo without any change

./deepstream-3d-action-recognition -c deepstream_action_recognition_config.txt

2.use the model from the NGC. Which model do you use, deployable_v1.0 or deployable_v2.0?
3.NX jetson with deepstream 6.1

Hi,

  1. yes, I run with that command
  2. I use model “resnet18_3d_rgb_hmdb5_32.etlt”, deployable_v1.0 which I downloaded from here Action Recognition Net | NVIDIA NGC
  3. yes, deepstream 6.1 on NX

We can not repro your freeze issue. can you specify your device RAM size? you can check by df. the VmRSS increased to 2G in one minutes is reasonable, please check the mem states running longer than 30 minutes.

nvidia@ubuntu:/opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-3d-action-recognition$ df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mmcblk0p1 30031808 26926404 1719092 94% /

Hi, thank you for your verification. My jetson nx have 7G RAM. My jetson nx got freezing after few second and I have to reboot system or unplugging the board. I attach the video show progress that my system got freezing, RAM usage increasing rapidly until reach full size, hope that give you more information.

Can you share the video you used?

Hi, I used default 4 sample videos sample_run.mov, sample_push.mov, sample_walk.mov, sample_ride_bike.mov

Did you run other processes as well?

Hi, I don’t run other process. I will check again to make sure that I don’t miss any detail. Add another information my deepstream on NX install with jetpack 5.0.1

Hi @samvdh , we use the same env. You can try to modify the source code, such as changing the nveglglessink to fakesink to narrow down the scope.

Hi, I used fakesink(by set fakesink=1 in config file) but it still get freeze. I may find out the problem, I run this app on 2080ti in deepstream docker and it worked normally. My jetson NX is 8G ram version, the problem maybe solve when I run with 16G ram NX. Can you tell me which NX type your are using? Thank you for your support.

We use the same ram version with yours.

@yuweiw Thank you, I will check again my setting and device.

Hi, I have just fixed the config and the problem is solved, I changed from network-type=1 to network-type=100 in config_infer_primary_3d_action.txt and system run without freezing.
With network-type=100 the output is like this

And with network-type=1 the output is like this, the logs is printed rapidly and non-stop, it look like the inference step is in an endless loop and make image sequence cached until full of memory.
Screenshot from 2022-09-13 16-49-42

If you set network-type=100, it will skip the postprocess. So it’s postprocess that causes this problem. Since we cannot duplicate this problem in our same env, you can help to debug it in your own env.
1.make sure that no code has been changed
2.You can run other classifier model to see if it is the classifier problem
3.You can dump the fd status with the cli below when the app runs:

lsof | grep deepstream-app > log.txt 2>&1

Hi, sorry for late reply, my device is not available recently, I will test again when I get the device. Thank you for your support. p/s the predict output seem to correct without post processing