Progressive RAM Usage Increase in DeepStream 7.1 with 40 RTSP Streams on Kubernetes

Hardware Platform:
GPU NVIDIA L40S

DeepStream Version:
7.1

TensorRT Version:
Aligned with DeepStream 7.1 recommendations

Issue Type:
Question

Requirement Details:
I noticed that the RAM usage keeps increasing progressively.
I have four DeepStream pods running on Kubernetes, each handling multiple RTSP camera streams.: two with 40 cameras each and two with 12 cameras each.

The pods handling 12 cameras work stably without any RAM growth, while the ones handling 40 cameras gradually consume more and more memory without releasing it.

It seems like the pods processing 40 cameras may not be able to handle all incoming frames, possibly buffering them without clearing or discarding them properly. When I restart the application, the RAM usage returns to normal, but it starts increasing again over time for the 40-camera pods.

My questions are:

  • Is this behavior expected due to the ingestion of a large number of RTSP sources?

  • Could the system be accumulating unprocessed frames in memory buffers?

  • Should I reduce the number of cameras handled by each pod?

  • Or is there a configuration parameter I should tune to prevent this progressive memory usage?

  • Would it be reasonable to plan a periodic restart as a workaround?

Thank you,
Yassin

Below is the configuration file for one of the 40-camera pipelines:

[source0]
enable=1
type=4
uri=<rtsp>
num-sources=1
latency=200
drop-frame-interval=2
rtsp-reconnect-interval-sec=30
select-rtp-protocol=4
camera-fps-n=15
camera-fps-d=1

[sourceX]
...

[source39]
enable=1
type=4
uri=<rtsp>
num-sources=1
latency=200
drop-frame-interval=2
rtsp-reconnect-interval-sec=30
select-rtp-protocol=4
camera-fps-n=15
camera-fps-d=1

[primary-gie]
enable=1
batch-size=40
gie-unique-id=1
labelfile-path=labels.txt
nvbuf-memory-type=0
config-file=config_infer_primary.txt

[sink0]
enable=1
type=4
sync=0
codec=1
nvbuf-memory-type=0
bitrate=4000000
iframeinterval=10
rtsp-port=8554
udp-port=5555
profile=0
udp-buffer-size=1000000
qos=0

[sink1]
type=6
enable=1
msg-conv-payload-type=0
msg-conv-msg2p-new-api=0
msg-broker-proto-lib=/opt/nvidia/deepstream/deepstream/lib/libnvds_kafka_proto.so
msg-broker-conn-str=<broker>
msg-broker-config=cfg_broker.txt
topic=<topic>
nvbuf-memory-type=0
msg-conv-msg2p-lib=/root/cameraModel/libs/libnvds_msgconv_det_frame.so
msg-conv-comp-id=2
msg-broker-comp-id=2
msg-conv-config=msgconv_config.txt
sync=0

[streammux]
live-source=1
batch-size=40
batched-push-timeout=4000
width=960
height=540
enable-padding=1
nvbuf-memory-type=0
attach-sys-ts-as-ntp=0

[osd]
enable=1
border-width=1
text-size=1
text-color=1;1;1;1;
text-bg-color=0.7;0.7;0.7;0.8
font=Arial
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[tiled-display]
enable=1
rows=7
columns=6
width=960
height=540

[tracker]
enable=1
tracker-width=320
tracker-height=320
ll-lib-file=../libs/libnvds_nvmultiobjecttracker.so
ll-config-file=config_tracker_NvDCF_accuracy.yml
display-tracking-id=1

[nvds-analytics]
enable=1
config-file=config_nvdsanalytics.txt

config_infer_primary.txt:

[property]
batch-size=40
net-scale-factor=0.003921569790691137
model-color-format=0
num-detected-classes=80
gie-unique-id=1
network-type=0
labelfile-path=labels.txt
maintain-aspect-ratio=1
interval=0
parse-bbox-func-name=NvDsInferParseYoloCuda
custom-lib-path=../libs/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
network-mode=2
cluster-mode=2
symmetric-padding=1
infer-dims=3;640;640
onnx-file=../models/yolo11l.pt.onnx
model-engine-file=../engines/yolo11l/model_b40_gpu0_fp16.engine
workspace-size=2000

[class-attrs-all]
nms-iou-threshold=0.7
pre-cluster-threshold=0.5
detected-max-h=540
detected-max-w=960
detected-min-h=0
detected-min-w=0

How long have you monitored the memory usage? Will the memory be used up when the time is long enough?

Hi, thank you for the reply.

Yes, I’ve monitored the memory usage for several hours.
For the pods handling 40 cameras, the RAM usage keeps increasing gradually over time without going back down.

If I keep the pod running long enough (several days), the memory consumption eventually reaches a critical level and the system reports high memory usage, so I have to restart the pod to recover.

For comparison, the pods handling 12 cameras remain completely stable over the same period.

So it doesn’t look like a temporary buffer fluctuation, but rather a continuous accumulation.
Do you think this could be related to the NvDCF tracker configuration (config_tracker_NvDCF_accuracy.yml)?

Thanks,
Yassin

config_tracker_NvDCF_accuracy.yml :

BaseConfig:
  minDetectorConfidence: 0.1894    # If the confidence of a detector bbox is lower than this, then it won't be considered for tracking

TargetManagement:
  enableBboxUnClipping: 1    # In case the bbox is likely to be clipped by image border, unclip bbox
  preserveStreamUpdateOrder: 0    # When assigning new target ids, preserve input streams' order to keep target ids in a deterministic order over multuple runs
  maxTargetsPerStream: 150    # Max number of targets to track per stream. Recommended to set >10. Note: this value should account for the targets being tracked in shadow mode as well. Max value depends on the GPU memory capacity

  # [Creation & Termination Policy]
  minIouDiff4NewTarget: 0.3686    # If the IOU between the newly detected object and any of the existing targets is higher than this threshold, this newly detected object will be discarded.
  minTrackerConfidence: 0.1513    # If the confidence of an object tracker is lower than this on the fly, then it will be tracked in shadow mode. Valid Range: [0.0, 1.0]
  probationAge: 2    # If the target's age exceeds this, the target will be considered to be valid.
  maxShadowTrackingAge: 42    # Max length of shadow tracking. If the shadowTrackingAge exceeds this limit, the tracker will be terminated.
  earlyTerminationAge: 1    # If the shadowTrackingAge reaches this threshold while in TENTATIVE period, the target will be terminated prematurely.

TrajectoryManagement:
  useUniqueID: 0    # Use 64-bit long Unique ID when assignining tracker ID. Default is [true]
  enableReAssoc: 1    # Enable Re-Assoc

  # [Re-Assoc Metric: Thresholds for valid candidates]
  minMatchingScore4Overall: 0.6622    # min matching score for overall
  minTrackletMatchingScore: 0.2940    # min tracklet similarity score for re-assoc
  minMatchingScore4ReidSimilarity: 0.0771    # min reid similarity score for re-assoc

  # [Re-Assoc Metric: Weights]
  matchingScoreWeight4TrackletSimilarity: 0.7981    # weight for tracklet similarity score
  matchingScoreWeight4ReidSimilarity: 0.3848    # weight for reid similarity score

  # [Re-Assoc: Motion-based]
  minTrajectoryLength4Projection: 34    # min trajectory length required to make projected trajectory
  prepLength4TrajectoryProjection: 58    # the length of the trajectory during which the state estimator is updated to make projections
  trajectoryProjectionLength: 33    # the length of the projected trajectory
  maxAngle4TrackletMatching: 67    # max angle difference for tracklet matching [degree]
  minSpeedSimilarity4TrackletMatching: 0.0574    # min speed similarity for tracklet matching
  minBboxSizeSimilarity4TrackletMatching: 0.1013    # min bbox size similarity for tracklet matching
  maxTrackletMatchingTimeSearchRange: 27    # the search space in time for max tracklet similarity
  trajectoryProjectionProcessNoiseScale: 0.0100    # trajectory projector's process noise scale w.r.t. state estimator
  trajectoryProjectionMeasurementNoiseScale: 100    # trajectory projector's measurement noise scale w.r.t. state estimator
  trackletSpacialSearchRegionScale: 0.0100    # the search region scale for peer tracklet

  # [Re-Assoc: Reid based. Reid model params are set in ReID section]
  reidExtractionInterval: 8    # frame interval to extract reid features per target

DataAssociator:
  dataAssociatorType: 0    # the type of data associator among { DEFAULT= 0 }
  associationMatcherType: 1    # the type of matching algorithm among { GREEDY=0, CASCADED=1 }
  checkClassMatch: 1    # If checked, only the same-class objects are associated with each other. Default: true

  # [Association Metric: Thresholds for valid candidates]
  minMatchingScore4Overall: 0.0222    # Min total score
  minMatchingScore4SizeSimilarity: 0.3552    # Min bbox size similarity score
  minMatchingScore4Iou: 0.0548   # Min IOU score
  minMatchingScore4VisualSimilarity: 0.5043    # Min visual similarity score

  # [Association Metric: Weights]
  matchingScoreWeight4VisualSimilarity: 0.3951    # Weight for the visual similarity (in terms of correlation response ratio)
  matchingScoreWeight4SizeSimilarity: 0.6003    # Weight for the Size-similarity score
  matchingScoreWeight4Iou: 0.4033    # Weight for the IOU score

  # [Association Metric: Tentative detections] only uses iou similarity for tentative detections
  tentativeDetectorConfidence: 0.1024    # If a detection's confidence is lower than this but higher than minDetectorConfidence, then it's considered as a tentative detection
  minMatchingScore4TentativeIou: 0.2852    # Min iou threshold to match targets and tentative detection

StateEstimator:
  stateEstimatorType: 1    # the type of state estimator among { DUMMY=0, SIMPLE=1, REGULAR=2 }

  # [Dynamics Modeling]
  processNoiseVar4Loc: 6810.8668    # Process noise variance for bbox center
  processNoiseVar4Size: 1541.8647   # Process noise variance for bbox size
  processNoiseVar4Vel: 1348.4874    # Process noise variance for velocity
  measurementNoiseVar4Detector: 100.0000   # Measurement noise variance for detector's detection
  measurementNoiseVar4Tracker: 293.3238    # Measurement noise variance for tracker's localization

VisualTracker:
  visualTrackerType: 1    # the type of visual tracker among { DUMMY=0, NvDCF=1 }

  # [NvDCF: Feature Extraction]
  useColorNames: 1    # Use ColorNames feature
  useHog: 1    # Use Histogram-of-Oriented-Gradient (HOG) feature
  featureImgSizeLevel: 3    # Size of a feature image. Valid range: {1, 2, 3, 4, 5}, from the smallest to the largest
  featureFocusOffsetFactor_y: -0.1054    # The offset for the center of hanning window relative to the feature height. The center of hanning window would move by (featureFocusOffsetFactor_y*featureMatSize.height) in vertical direction

  # [NvDCF: Correlation Filter]
  filterLr: 0.0767    # learning rate for DCF filter in exponential moving average. Valid Range: [0.0, 1.0]
  filterChannelWeightsLr: 0.0339    # learning rate for the channel weights among feature channels. Valid Range: [0.0, 1.0]
  gaussianSigma: 0.5687    # Standard deviation for Gaussian for desired response when creating DCF filter [pixels]   

ReID:
  reidType: 0    # The type of reid among { DUMMY=0, NvDEEPSORT=1, Reid based reassoc=2, both NvDEEPSORT and reid based reassoc=3}

  # [Reid Network Info]
  batchSize: 100    # Batch size of reid network
  workspaceSize: 1000    # Workspace size to be used by reid engine, in MB
  reidFeatureSize: 256    # Size of reid feature
  reidHistorySize: 100    # Max number of reid features kept for one object
  inferDims: [3, 256, 128]    # Reid network input dimension CHW or HWC based on inputOrder
  networkMode: 1    # Reid network inference precision mode among {fp32=0, fp16=1, int8=2 }

  # [Input Preprocessing]
  inputOrder: 0    # Reid network input order among { NCHW=0, NHWC=1 }. Batch will be converted to the specified order before reid input.
  colorFormat: 0    # Reid network input color format among {RGB=0, BGR=1 }. Batch will be converted to the specified color before reid input.
  offsets: [123.6750, 116.2800, 103.5300]    # Array of values to be subtracted from each input channel, with length equal to number of channels
  netScaleFactor: 0.01735207    # Scaling factor for reid network input after substracting offsets
  keepAspc: 1    # Whether to keep aspc ratio when resizing input objects for reid

  # [Output Postprocessing]
  addFeatureNormalization: 1       # If reid feature is not normalized in network, adding normalization on output so each reid feature has l2 norm equal to 1
  #minVisibility4GalleryUpdate: 0.6 # Add ReID embedding to the gallery only if the visibility is not lower than this

  # [Paths and Names]
  tltEncodedModel: "/opt/nvidia/deepstream/deepstream/samples/models/Tracker/resnet50_market1501.etlt" # NVIDIA TAO model path
  tltModelKey: "nvidia_tao" # NVIDIA TAO model key
  modelEngineFile: "/opt/nvidia/deepstream/deepstream/samples/models/Tracker/resnet50_market1501.etlt_b100_gpu0_fp16.engine" # Engine file path

What are the difference between the pods handling 40 cameras and the pods handling 12 cameras?

There is no difference in the configuration or hardware between the pods handling 40 cameras and those handling 12 cameras. The only difference is the number of RTSP streams each pod is processing simultaneously.

Can you try the 40 local videos case?

Hi, I haven’t tried the 40 local videos case, as I need to test with RTSP streams.
This memory growth issue has already appeared in our past RTSP deployments.
Is this a known behavior, and is there anything I can adjust in the DeepStream configuration to mitigate it?

Hi @Fiona.Chen, any news or update about this issue?

Is there any other observations such as RTSP connection broken, FPS decrease,… during the 40 RTSP case running? The RAM usage increase is too general description to be analysed.

How did you measure the RAM usage during your case running?

BTW, the local videos testing is important for us to identify whether the issue is related to RTSP connection part.