Adding tracker (deep learning) encoder to deepstream multicamera with cpu parallel function run very slow in xavier

Rabeb · June 2, 2020, 10:32am

hi ,

I added a deep learning tracker (using encoder and python) to multicamera detection, deep streams, I used encoder in batch to get the features of no detections, then I used a tracker update function to update the tracker. I used an instance of tracker class because I used multicamera.

the problem is the tracker update is a little slow 0.007s and up to 0.02 s so I need to run the tracker updates () function in parallel using CPU.

The problem is when I run this function in parallel using thread or multiprocessing, the code run very slow 0.6 fps, or without parallel function is running with 15 fps.

is the deepstream pipeline can’t run with CPU thread or multiprocessing?
is there’s a way to run this function in parallel (run trackers instances update to all cameras in parallel)?

DaneLLL · June 3, 2020, 1:30am

Hi,
Please share the release version you use, such as r32.4.3+ DS SDK5.0.

The sample config for Xavier is

deepstream-5.0\samples\configs\deepstream-app\source30_1080p_dec_infer-resnet_tiled_display_int8.txt

Please share your config file based on it, or the gstreamer pipeline to let us understand the usecase.

Rabeb · June 3, 2020, 8:38am

hi , thank you @DaneLLL , i use deepstream sdk 5.0 , i modified deepstream_imagedata-multistream.py to use yolo tiny v3 and i addede tracker in function tiler_sink_pad_buffer_probe (to use opencv image)

   # Copyright (c) 2018 NVIDIA Corporation.  All rights reserved.
#
# NVIDIA Corporation and its licensors retain all intellectual property
# and proprietary rights in and to this software, related documentation
# and any modifications thereto.  Any use, reproduction, disclosure or
# distribution of this software and related documentation without an express
# license agreement from NVIDIA Corporation is strictly prohibited.

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#gie-kitti-output-dir=streamscl

[tiled-display]
enable=1
rows=5
columns=6
width=1280
height=720
gpu-id=0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=0

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP
type=3
uri=file://../../streams/sample_1080p_h264.mp4
num-sources=15
#drop-frame-interval=2
gpu-id=0
# (0): memtype_device   - Memory type Device
# (1): memtype_pinned   - Memory type Host Pinned
# (2): memtype_unified  - Memory type Unified
cudadec-memtype=0

[source1]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP
type=3
uri=file://../../streams/sample_1080p_h264.mp4
num-sources=15
gpu-id=0
# (0): memtype_device   - Memory type Device
# (1): memtype_pinned   - Memory type Host Pinned
# (2): memtype_unified  - Memory type Unified
cudadec-memtype=0

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=2
sync=1
source-id=0
gpu-id=0
nvbuf-memory-type=0

[sink1]
enable=0
type=3
#1=mp4 2=mkv
container=1
#1=h264 2=h265
codec=1
#encoder type 0=Hardware 1=Software
enc-type=0
sync=0
#iframeinterval=10
bitrate=2000000
#H264 Profile - 0=Baseline 2=Main 4=High
#H265 Profile - 0=Main 1=Main10
profile=0
output-file=out.mp4
source-id=0

[sink2]
enable=0
#Type - 1=FakeSink 2=EglSink 3=File 4=RTSPStreaming
type=4
#1=h264 2=h265
codec=1
#encoder type 0=Hardware 1=Software
enc-type=0
sync=0
bitrate=4000000
#H264 Profile - 0=Baseline 2=Main 4=High
#H265 Profile - 0=Main 1=Main10
profile=0
# set below properties in case of RTSPStreaming
rtsp-port=8554
udp-port=5400


[osd]
enable=1
gpu-id=0
border-width=1
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=30
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=1920
height=1080
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0
## If set to TRUE, system timestamp will be attached as ntp timestamp
## If set to FALSE, ntp timestamp from rtspsrc, if available, will be attached
# attach-sys-ts-as-ntp=1

# config-file property is mandatory for any gie section.
# Other properties are optional and if set will override the properties set in
# the infer config file.
[primary-gie]
enable=1
gpu-id=0
model-engine-file=../../models/Primary_Detector/resnet10.caffemodel_b30_gpu0_int8.engine
#Required to display the PGIE labels, should be added even when using config-file
#property
batch-size=30
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
interval=0
#Required by the app for SGIE, when used along with config-file property
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary.txt

[tests]
file-loop=0

DaneLLL · June 4, 2020, 1:10am

Hi,
So your environment is Xavier/r32.4.2, DS SDK5.0.

The reference sample is:

Please share a patch on the sample so that we can understand your usecase.

You may try sudo nvpmodel -m 0 and sudo jetson_clocks to run Xavier in maximum performance.