Nvidia A10 dGPU going out of memory

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
A10 dGPU on x86 platform, Ubuntu 20.04
• DeepStream Version
DS6.1
• JetPack Version (valid for Jetson only)
• TensorRT Version
8.4.3.1-1+cuda11.6
• NVIDIA GPU Driver Version (valid for GPU only)
Driver Version: 515.65.01
• Issue Type( questions, new requirements, bugs)
Question
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hello,
Using Nvidia DS6.1 framework we have created a pipeline to perform some benchmarking .

RTSP stream–> decode–>batching–>detect(deepstream_yolo(yolo2-tiny))–>classify(ResNet18) -->filesink(fake)

While running multiple instance of this pipeline , we are able to spawn maximum of 22 or 23 due to dGPU memory limit of 24GiB.

using nvidia-smi command it is observed GPU(56%) and memory utilization(15%) with 23 instance.
Looks like GPU has power to run more instances, but due to memory usage limit , instances are failing after 23 with out of memory exception.

Using nvidia-smi command it is observed that, each pipeline instance is taking up 984MiB.

  1. Is there a way to increase the memory on A10 dGPU like using swap or addition physical memory?
  2. Can we tune any of the parameters particular to the above pipeline configuration to allocate less memory?
  3. Even though the each pipeline is taking ~1GiB space, overall memory utilization when 20+ instance are running is very low, so in this case is there a way to configure in deepstream to accumulate memory on demand by the instance?
  4. In one of the deepstream sample application (Triton) a configuration attribute “tp_gpi_mem_fraction” is being used , can we use similar configuration in Yolo models?.
    Please advise is there any better way of performing memory allocation in the pipeline.

:nvbuf-memory-type=0 is been used in all of the configuration groups (where it is applied)

Are you working with deepstream-app sample to run your cases? If so, please post your configurations.

Where did you see “tp_gpi_mem_fraction”?

Here is my configuration I am pasting, main, primary and secondary gie configs

/MAIN***********/
[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#gie-kitti-output-dir=/tmp/nvr_low_svet

[tiled-display]
enable=0
rows=1
columns=1
width=1920
height=1080
gpu-id=0
nvbuf-memory-type=0

[source0]
enable=1
#1-Camera 2=URI 3=MultiURI 4=RTSP 5=Camera
#type=3
type=4
num-extra-surfaces=20
#uri=rtsps://172.22.25.102:8555/stream18
uri=rtsps://172.22.25.102:8555/stream18
num-sources=1
#rtsp - latency
select-rtp-protocol=4
#rtsp - rtsp-reconnect-interval-sec
#rtsp - rtsp-reconnect-attempts
gpu-id=0
cudadec-memtype=0
src-codec=h264

[sink0]
enable=1
#1-fakesink 2-EGL
type=1
sync=1
source-id=0
gpu-id=0
nvbuf-memory-type=0

[sink1]
enable=0
type=3
#1=mp4 2=mkv
container=1
#1=h264 2=h265
codec=1
#encoder type 0=Hardware 1=Software
enc-type=0
sync=0
#iframeinterval=10
bitrate=2000000
#H264 Profile - 0=Baseline 2=Main 4=High
#H265 Profile - 0=Main 1=Main10
profile=0
output-file=/tmp/out1.mp4
source-id=0

[osd]
enable=0
gpu-id=0
border-width=1
text-size=30
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
live-source=0

STREAMMUX-BATCH-SIZE

batch-size=1
batched-push-timeout=40000
width=1920
height=1080
#width=1280
#height=720
enable-padding=0
nvbuf-memory-type=0

[primary-gie]
enable=1
gpu-id=0
gie-unique-id=1
nvbuf-memory-type=0
#PRIMARY-GIE-BATCH-SIZE
batch-size=1
config-file=config_infer_primary_nvr_low-R18v2.txt

[tracker]

svet:enable=0 cow:enable=1

enable=1
gpu-id=0

For the case of NvDCF tracker, tracker-width and tracker-height must be a multiple of 32, respectively

tracker-width=640
tracker-height=384
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_max_perf.yml
#ll-config-file=tracker_config.yml
#enable-batch-process and enable-past-frame applicable to DCF only
enable-batch-process=1
enable-past-frame=0
display-tracking-id=1

[secondary-gie0]
enable=1
gpu-id=0
gie-unique-id=2
operate-on-gie-id=1

car=2 cow=19

operate-on-class-ids=19
nvbuf-memory-type=0
config-file=config_infer_secondary1_nvr_low-R18v2.txt

[tests]
file-loop=0

/********* Primary gie *********************************

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
#custom-network-config=/opt/nvidia/deepstream/deepstream/samples/models/NVR_Low_Primary_Detector/yolov2-tiny.cfg
#model-file=/opt/nvidia/deepstream/deepstream/samples/models/NVR_Low_Primary_Detector/yolov2-tiny.weights
model-engine-file=/opt/nvidia/deepstream/deepstream/samples/models/NVR_Low_Primary_Detector/model_b1_gpu0_fp16.engine
labelfile-path=/opt/nvidia/deepstream/deepstream/samples/models/NVR_Low_Primary_Detector/labels.txt
batch-size=1
#0-FP32, 1-INT8, 2=FP16
network-mode=2
num-detected-classes=80
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=0
parse-bbox-func-name=NvDsInferParseYolo
#parse-bbox-func-name=NvDsInferParseCustomYoloV2Tiny
custom-lib-path=/home/cpmusr/project/DeepStream-Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
#Deepstream SDK
#parse-bbox-func-name=NvDsInferParseCustomYoloV2Tiny
#custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so

[class-attrs-all]
nms-iou-threshold=0.45
pre-cluster-threshold=0.25

/************* Secondary gie ******************/

[property]
gpu-id=0
net-scale-factor=1
#Resnet18v2 custom
#onnx-file=/opt/nvidia/deepstream/deepstream/samples/models/NVR_Low_Secondary_Classifier/resnet18-v2-7.onnx
model-engine-file=/opt/nvidia/deepstream/deepstream/samples/models/NVR_Low_Secondary_Classifier/resnet18-v2-7.onnx_b1_gpu0_fp16.engine
labelfile-path=/opt/nvidia/deepstream/deepstream/samples/models/NVR_Low_Secondary_Classifier/labels.txt

0=FP32 and 1=INT8 and 2=FP16

network-mode=2
network-type=1
process-mode=2
model-color-format=1
gpu-id=0
gie-unique-id=2
operate-on-gie-id=1

0-person, 2-car, 19-cow

operate-on-class-ids=19
#is-classifier=1
classifier-async-mode=0
classifier-threshold=0.51

Where did you see “tp_gpi_mem_fraction”?

My apologies , I miss typed it actual parameter name is ‘tf_gpu_memory_fraction’.
It is clearly stated in one of the docs, that this configuration is valid only for Triton model deepstream app.

file name(s) and path(s) :

samples/configs/deepstream-app-triton/config_infer_primary_classifier_inception_graphdef_postprocessInTriton.txt: tf_gpu_memory_fraction: 0.35
samples/configs/deepstream-app-triton/config_infer_primary_detector_ssd_mobilenet_v1_coco_2018_01_28.txt:# tf_gpu_memory_fraction: 0.2 is specified for device with limited memory
samples/configs/deepstream-app-triton/config_infer_primary_detector_ssd_mobilenet_v1_coco_2018_01_28.txt: tf_gpu_memory_fraction: 0.25
samples/configs/deepstream-app-triton/config_infer_primary_classifier_inception_graphdef_postprocessInDS.txt: tf_gpu_memory_fraction: 0.35
samples/configs/deepstream-app-triton/config_infer_primary_classifier_densenet_onnx.txt: tf_gpu_memory_fraction: 0.0
samples/configs/deepstream-app-triton/config_infer_primary_detector_ssd_inception_v2_coco_2018_01_28.txt: tf_gpu_memory_fraction: 0.35
samples/configs/deepstream-app-triton/config_infer_primary_classifier_mobilenet_v1_graphdef.txt:# tf_gpu_memory_fraction: 0.2 is specified for device with limited memory
samples/configs/deepstream-app-triton/config_infer_primary_classifier_mobilenet_v1_graphdef.txt: tf_gpu_memory_fraction: 0.25
samples/configs/deepstream-app-triton/README: other models. Tune a larger ‘tf_gpu_memory_fraction’ value for other
samples/configs/deepstream-app-triton/README: other models. Tune a larger ‘tf_gpu_memory_fraction’ value for other
samples/configs/deepstream-app-triton/README:models can be tuned using the ‘tf_gpu_memory_fraction’ parameter in the

The question I had was, is there any similar configuration for other sample apps /models.

What is the input layer dimensions of your PGIE and SGIE?

Not aware of these values, I guess they are holding default values. Could you please point me out to where to find these values?, or which attributes of PGIE and SGIE I should be looking at?
These are the three configuration files that I have .

Please ask the guy who provided the models to you. And please run trtexec to measure the performance and memory consumption with the models only. Developer Guide :: NVIDIA Deep Learning TensorRT Documentation

Sure will run the exec.

We are using for PGIE Deepstream_YOLO model from (GitHub - marcoslucianops/DeepStream-Yolo: NVIDIA DeepStream SDK 6.1 / 6.0.1 / 6.0 configuration for YOLO models)
and for SGIE using ResNet18. (models/resnet18-v2-7.onnx at main · onnx/models · GitHub)

Will try to get the input layer dimensions for these models, if possible.
thanks

There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

There are many Yolo models here, which one are you using?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.