How to reduce the memory-usage during the infer process?

I had run the Deepstream sample program which name “deepstream_test1_app”,It was working fine,but it took up too much memory(About to take up 16%:8G10240.16=1310M)。I want to transplant this sample into my program,how to reduce the memory-usage during the infer process?
Environment:Nvidia Jetson(8G memory);15w-6code mode;Deepstream 5.0

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
• DeepStream Version
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Complete Information:
Hardware Platform:Jetson NX
DeepStream Version:5.0.0
JetPack Version :4.4.1
TensorRT Version: 7.1
CUDA Driver Version: 10.2
CUDA Runtime Version: 10.2
cuDNN Version: 8.0
libNVWarp360 Version: 2.0.1d3
Issue Type:questions/new requirements
How to reproduce the issue ?:make and run the deepstream sample:“/opt/nvidia/deepstream/deepstream-5.0/sources/apps/sample_apps/deepstream-test1/”
Requirement details:I want it run with lower occupancy rate,because i have multiple processes to run with infer process.

  1. Most of the memory is occupied by TensorRT which is the main module for inference. When TensorRT is working alone, at least 1100M memory is occupied. We have some memory usage optimization in future TensorRT version.
  2. Even with multiple processes case, the memory usage is not 1310M * the number of processes. So you need to evaluate the actual memory usage by test multiple processes case.

I had already tested that operation which opening two processes ,and the memory usage was double.If you use multiple video sources in a process and sharing one infer-plugin,It will not increase its memory;But there are some scenarios that have to be done in a multi-process environment and use multiple infer-plugin。

The most memory is used by TensorRT, it is hard to reduce the memory usage now.

Well, I hope you can solve this problem in the near future,It is probably the most used module.