Deepstream 5.1 inference caps at 30fps

I lauch deepstream 5.1 using the following dockers

'dockerrun --gpus all -it --rm -v /tmp/.X11-unix:/tmp/.X11-unix -e
DISPLAY=$DISPLAY -w /opt/nvidia/deepstream/deepstream-5.1
nvcr.io/nvidia/deepstream:5.1-21.02-triton`

i run infrence using :

deepstream-app -c /opt/nvidia/deepstream/deepstream-5.1/samples/configs/deepstream-app/source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8_gpu1.txt

The problem is i get 4 streams each averaging 30 fps combined 120 fps

  1. I wanna run inference only on a single sample video (how do i change the number of sample)

  2. Only one GPU is being utilised and that also only 15% ,fps don’t go over 30
    i tried setting the sinks to Fake Sink and EglSink,dis-enabled the titled display,how do i maximise the fps possibly to 1000+ ?

TensorRT Version7.2.1.6
Quadro RTX 5000 dual GPU
Driver Version: 455.23.05
CUDA Version: 11.1
Ubuntu 18.04
python 3.6

Please refer to this, how to reach max fps Performance — DeepStream 5.1 Release documentation

Thank you,i had followed that ,but basically it bumped up the fps by 3 to 4 .so basically a stream caps at right around 30 fps , so a powerful GPU allows us to run multiple stream like 30 , 40. according to the computation power.

Can you please let me know,
1.if there any script to run at 1000 fps (inside deepstream)
2. yolo inference script, prebuilt model ?
3.how to run inference simultaneously on 30, 40 streams.

.

1.if there any script to run at 1000 fps (inside deepstream)
2. yolo inference script, prebuilt model ?
[amycao] You can run with multi streams, to reach your GPU computation capability.
3.how to run inference simultaneously on 30, 40 streams.
[amycao ] set multi streams in config file.

one example:

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP
type=3
uri=file://…/…/streams/sample_1080p_h264.mp4
num-sources=15

I ran 30 streams using
file1.

source30_1080p_dec_infer-resnet_tiled_display_int8.txt (4.7 KB)

and 4 streams using

file2.
source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8_gpu1.txt (5.8 KB)

so as you stated

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP
type=3
uri=file://…/…/streams/sample_1080p_h264.mp4
num-sources=15

changing the num-sources doesn’t change the number of streams it still 4 , i want to know where is the line which causes the code to run 4 streams or 30 streams

I think the batch-size is where it reflects the no of streams a s file1 has 30 and file2 has 4,it doesn’t make much sense. Playing around with batch size in both source and config file did not give any fruitful result

[amycao] You can run with multi streams, to reach your GPU computation capability.
[amycao ] set multi streams in config file.
I cant see any Multi steams option in the config file.

Yes, you also need to change batch-size in pgie and streammux batch-size to the number of sources. set batch-size to number of sources in nvinfer element, will let GPU run inference computation simutaneously.

what do you mean by pgie and nvinfer element?

here is my source 30 file:

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#gie-kitti-output-dir=streamscl

[tiled-display]
enable=0
rows=1
columns=6
width=1280
height=720
gpu-id=1
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=0

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP
type=3
uri=file://…/…/streams/sample_qHD.mp4
num-sources=60
#drop-frame-interval=2
gpu-id=1

(0): memtype_device - Memory type Device

(1): memtype_pinned - Memory type Host Pinned

(2): memtype_unified - Memory type Unified

cudadec-memtype=0

[source1]
enable=0
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP
type=3
uri=file://…/…/streams/sample_1080p_h264.mp4
num-sources=60
gpu-id=0

(0): memtype_device - Memory type Device

(1): memtype_pinned - Memory type Host Pinned

(2): memtype_unified - Memory type Unified

cudadec-memtype=0

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=2
sync=1
source-id=0
gpu-id=1
nvbuf-memory-type=0

[sink1]
enable=0
type=2
#1=mp4 2=mkv
container=1
#1=h264 2=h265
codec=1
#encoder type 0=Hardware 1=Software
enc-type=0
sync=0
#iframeinterval=10
bitrate=2000000
#H264 Profile - 0=Baseline 2=Main 4=High
#H265 Profile - 0=Main 1=Main10
profile=0
output-file=out.mp4
source-id=0

[sink2]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File 4=RTSPStreaming
type=2
#1=h264 2=h265
codec=1
#encoder type 0=Hardware 1=Software
enc-type=0
sync=0
bitrate=4000000
#H264 Profile - 0=Baseline 2=Main 4=High
#H265 Profile - 0=Main 1=Main10
profile=4

set below properties in case of RTSPStreaming

rtsp-port=8554
udp-port=5400

[osd]
enable=1
gpu-id=1
border-width=1
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=1
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=60
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000

Set muxer output width and height

width=1920
height=1080
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0

If set to TRUE, system timestamp will be attached as ntp timestamp

If set to FALSE, ntp timestamp from rtspsrc, if available, will be attached

attach-sys-ts-as-ntp=1

config-file property is mandatory for any gie section.

Other properties are optional and if set will override the properties set in

the infer config file.

[primary-gie]
enable=1
gpu-id=1
model-engine-file=…/…/models/Primary_Detector/resnet10.caffemodel_b30_gpu0_int8.engine
#Required to display the PGIE labels, should be added even when using config-file
#property
batch-size=60
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
interval=0
#Required by the app for SGIE, when used along with config-file property
gie-unique-id=1
nvbuf-memory-type=1
config-file=config_infer_primary.txt

[tests]

###################
###################
##################


here is my config file


################################################################################

Copyright (c) 2018-2020, NVIDIA CORPORATION. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a

copy of this software and associated documentation files (the “Software”),

to deal in the Software without restriction, including without limitation

the rights to use, copy, modify, merge, publish, distribute, sublicense,

and/or sell copies of the Software, and to permit persons to whom the

Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in

all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR

IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,

FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL

THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER

LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING

FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER

DEALINGS IN THE SOFTWARE.

################################################################################

Following properties are mandatory when engine files are not specified:

int8-calib-file(Only in INT8)

Caffemodel mandatory properties: model-file, proto-file, output-blob-names

UFF: uff-file, input-dims, uff-input-blob-name, output-blob-names

ONNX: onnx-file

Mandatory properties for detectors:

num-detected-classes

Optional properties for detectors:

cluster-mode(Default=Group Rectangles), interval(Primary mode only, Default=0)

custom-lib-path,

parse-bbox-func-name

Mandatory properties for classifiers:

classifier-threshold, is-classifier

Optional properties for classifiers:

classifier-async-mode(Secondary mode only, Default=false)

Optional properties in secondary mode:

operate-on-gie-id(Default=0), operate-on-class-ids(Defaults to all classes),

input-object-min-width, input-object-min-height, input-object-max-width,

input-object-max-height

Following properties are always recommended:

batch-size(Default=1)

Other optional properties:

net-scale-factor(Default=1), network-mode(Default=0 i.e FP32),

model-color-format(Default=0 i.e. RGB) model-engine-file, labelfile-path,

mean-file, gie-unique-id(Default=0), offsets, process-mode (Default=1 i.e. primary),

custom-lib-path, network-mode(Default=0 i.e FP32)

The values in the config file are overridden by values set through GObject

properties.

[property]
gpu-id=1
net-scale-factor=0.0039215697906911373
model-file=…/…/models/Primary_Detector/resnet10.caffemodel
proto-file=…/…/models/Primary_Detector/resnet10.prototxt
model-engine-file=…/…/models/Primary_Detector/resnet10.caffemodel_b30_gpu0_int8.engine
labelfile-path=…/…/models/Primary_Detector/labels.txt
int8-calib-file=…/…/models/Primary_Detector/cal_trt.bin
batch-size=60
process-mode=1
model-color-format=0

0=FP32, 1=INT8, 2=FP16 mode

network-mode=1
num-detected-classes=4
interval=0
gie-unique-id=1
output-blob-names=conv2d_bbox;conv2d_cov/Sigmoid
force-implicit-batch-dim=1
#parse-bbox-func-name=NvDsInferParseCustomResnet
#custom-lib-path=/path/to/libnvdsparsebbox.so

0=Group Rectangles, 1=DBSCAN, 2=NMS, 3= DBSCAN+NMS Hybrid, 4 = None(No clustering)

#cluster-mode=1
#scaling-filter=0
#scaling-compute-hw=0

#Use these config params for group rectangles clustering mode
[class-attrs-all]
pre-cluster-threshold=0.2
group-threshold=1
eps=0.2
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0

#Use the config params below for dbscan clustering mode
#[class-attrs-all]
#detected-min-w=4
#detected-min-h=4
#minBoxes=3

Per class configurations

#[class-attrs-0]
#pre-cluster-threshold=0.05
#eps=0.7
#dbscan-min-score=0.95

#[class-attrs-1]
#pre-cluster-threshold=0.05
#eps=0.7
#dbscan-min-score=0.5

#[class-attrs-2]
#pre-cluster-threshold=0.1
#eps=0.6
#dbscan-min-score=0.95

#[class-attrs-3]
#pre-cluster-threshold=0.05
#eps=0.7
#dbscan-min-score=0.5

I have set source and batch size to 60 in both of these files it still giving me 30 streams

Nvinfer is the element, pgie or sgie is the created element name.

Sorry, but i still don’t understand. When i look for Nvinfer there are multiple files named the same,do i have to edit one of these files. What is pgie and sgie, i am kinda lost.

for your reference:
GstElement *pgie = NULL, *sgie1 = NULL;

pgie = gst_element_factory_make (“nvinfer”, “primary-nvinference-engine”);

sgie1 = gst_element_factory_make (“nvinfer”, “secondary1-nvinference-engine”);

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.