High RAM consumption in deepstream 6.0.1

Hello everyone
I’m currently using a Jetson Nano Developer Kit with Deepstream version 6.0.1. My current pipeline includes the following elements: a source, nvstreammux, mux_queue, nvinfer(pgie), a queue, nvinfer(sgie), nvtracker, another queue, nvdsanalytics, yet another queue, nvvideoconvert, and more queues, nvdsosd, another nvvideoconvert, and a Tee. There are also two other pipelines with elements like capfilter, encoder, rtppay, upsink, and nvvideoconvert.

Src —> nvstreammux —> mux_queue —> nvinfer(pgie) —> queue —> nvinfer(sgie) —> nvtracker —> queue —> nvdsanalytics —> queue —> nvvideoconvert —> queue —> nvdsosd —> nvvideoconvert —> Tee

queue —> capfilter —> encoder —> rtppay —> upsink

queue —> nvvideoconvert —> capfilter —> fakesink

My environment is running Ubuntu 18.04, and I have pre-installed versions of TensorRT, CUDA, and CUDNN. I recently switched from Deepstream 5.1 to 6.0.1, and I’ve noticed that the RAM consumption of my program has increased from 730MB(in Deepstream 5.1 ) to 1.4GB( Deepstream 6.0.1 ). There have been no changes in my pipeline except for using nvtacker and nvdsanalytics was included in Deepstream 6.0.1.

TensorRT Version : 8.2.1.9
CUDA Version: 10.2.300
CUDNN Version: 8.2.1.32
Operating System + Version: Ububntu 18.04
Python Version (if applicable): 3.6.9
Jetpack 4.6.1 [L4T 32.7.3]
Architecture: aarch64
Model: NVIDIA Jetson Nano Developer Kit

I’m using pgie for number plate detection and sgie for OCR extraction, and I’m getting my input from a camera’s RTSP link. Using ONNX model files that are automatically converted to engine files when the program runs. Additionally, alsousing Redis for data gathering in sgie and analytics purposes.

In the above setup the jetson nano was flashed using Nvidia sdk Manager

The high RAM consumption is causing the device to go out of memory in every few hours. Can you please provide a solution for this issue?

There are lots of changes from DeepStream 5.1 to DeepStream 6.1. Can you compare your pipelines and try to find out which module consumes the most memories?

1 Like

Thanks for your reply
Could you recommend a tool or approach to compare the memory consumption of individual elements in DeepStream pipelines?

The best way is to construct the pipeline with “gst-launch-1.0” so that you can add and remove the components to compare the memory consumption change.

1 Like

Can a custom YOLOv7 Tiny model be trained and implemented in DeepStream 6.0.1 for number plate detection, and which object detection model would you recommend for this task in DeepStream 6.0?

DeepStream can not do training. TAO toolkit is for training. Whether to use yolov7 tiny for number plate detection depends on your own evaluation according to your resources and design. There are also some TAO models for your reference. Overview (nvidia.com)

I have trained a new model in colab notebook for number plate Yolov7 using
GitHub - augmentedstartups/yolov7: Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors repo

the weight file i used:
https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt

below is the command i used for training:
!python train.py --batch 16 --cfg cfg/training/yolov7.yaml --epochs 1000 --data {dataset.location}/data.yaml --weights ‘yolov7.pt’ --device 0

ones the model is trained i got the accuracy of 95% in iou50 : weight file size 209MB

I Reparameterization the the trained model using YOLOv7 reparameterization from :
yolov7/reparameterization.ipynb at main · WongKinYiu/yolov7 · GitHub :output file size 76 MB

I converted to model to onnx using

!python export.py --weights ./trained_rep_yolov7.pt --grid --end2end --simplify --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 864 480 --fp16 : output file size 146.1 MB

I converted to engine file from onnx using jetsin nano with command:
$ /usr/src/tensorrt/bin/trtexec --onnx=yolov7.onnx --saveEngine=yolov7fp16.engine --fp16
: output file size 156.8 MB

I made changes for custom parser using:

below is the config file i am using:

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
#custom-network-config=yolov7.cfg
model-file=/home/parkzap/deepstream-python/models/trained_rep_yolov7.pt
onnx-file=/home/parkzap/deepstream-python/models/trained_rep_16_yolov7.onnx
model-engine-file=/home/parkzap/deepstream-python/models/trained_rep_16_fp16_yolov7.engine
#int8-calib-file=calib.table
labelfile-path=label.txt
batch-size=1
network-mode=0
num-detected-classes=1
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=1
symmetric-padding=1
parse-bbox-func-name=NvDsInferParseCustomEfficientNMS
custom-lib-path=/opt/nvidia/deepstream/deepstream-6.0/sources/libs/nvdsinfer_customparser/libnvds_infercustomparser.so
#engine-create-func-name=NvDsInferYoloCudaEngineGet
[class-attrs-all]
nms-iou-threshold=0.3
pre-cluster-threshold=0.6
topk=300

The Deepstream is running but getting 1 frame after every 3 sec.
I am also not getting any bounding boxes and no error is returning

Total Load : 1.3
Ram consumption: 1.2 GB

I am using the same pipeline as before
What i think the issue is that i need to re write the probe but currently got no idea for it.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Have you tested the model performance Jetson Nano? Jetson Nano has a very low end GPU, it is not suggested to run big models on Jetson Nano. You can measure the model performance with “trtexec” tool from TensorRT.

You need to check whether the preprocess parameters and postprocessing function are correct by yourself.

We have provided a “yolov7” sample here: NVIDIA-AI-IOT/yolo_deepstream: yolo model qat and deploy with deepstream&tensorrt (github.com)

I am using the same pipeline as earlier with previous weights.

Running the deepstream 6.0.1 pipeline once in Python code on a Jetson Nano consumes 1.4 GB of RAM. However, when running two instances of the same pipeline simultaneously in separate shells, each instance only consumes 800 MB of RAM and there is no difference in accuacy and output.

Now What changes i need to perform so that even if i run only one instance of the pipeline it will only consume 800 to 900 MBs of Ram or How can I configure the pipeline to consume only 800 MB of RAM when running a single instance?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.