Hi, we are running a detection and tracking pipeline. For this, we are using the nvdcf tracker and the trafficcamnet without retraining. We also have in our pipeline a custom plugin to select ROI (because it was implemented before Nvidia’s release for the ROI plugin), a few preprocessing, and post-processing logics.
First, we wanted to ask about the performance of the trafficcamnet model on deepstream by using the default deepstream-app.
As in the performance chart, the jetson Xavier should output 289 frames per second.
In our setup, we are measuring 7 streams of 23.3 fps each. That would be around 163.1 frames per second of performance. Is it possible to improve the performance?
In this pipeline we are not using the tracker and we are following the best practices.
**PERF: FPS 0 (Avg) FPS 1 (Avg) FPS 2 (Avg) FPS 3 (Avg) FPS 4 (Avg) FPS 5 (Avg) FPS 6 (Avg)
**PERF: 23.31 (23.26) 23.31 (23.26) 23.31 (23.26) 23.31 (23.28) 23.31 (23.28) 23.31 (23.26) 23.31 (23.26)
**PERF: 23.29 (23.26) 23.29 (23.26) 23.29 (23.26) 23.29 (23.27) 23.29 (23.27) 23.29 (23.26) 23.29 (23.26)
**PERF: 23.28 (23.27) 23.28 (23.27) 23.28 (23.27) 23.28 (23.28) 23.28 (23.28) 23.28 (23.27) 23.28 (23.27)
[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
[source0]
enable=1
type=3
uri=file://sample_1080p_h264.mp4
num-sources=7
#drop-frame-interval=2
gpu-id=0
# (0): memtype_device - Memory type Device
# (1): memtype_pinned - Memory type Host Pinned
# (2): memtype_unified - Memory type Unified
cudadec-memtype=0
[sink0]
#source0 output as filesink
enable=1
type=1
sync=0
source-id=0
[streammux]
gpu-id=0
live-source=1
batch-size=7
batched-push-timeout=33000
width=1920
height=1080
enable-padding=0
nvbuf-memory-type=0
# attach-sys-ts-as-ntp=1
[tracker]
enable=0
tracker-width=256
tracker-height=256
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
ll-config-file=tracker.yaml
gpu-id=0
enable-batch-process=1
enable-past-frame=1
display-tracking-id=1
[primary-gie]
enable=1
gpu-id=0
model-engine-file=models/resnet18_trafficcamnet_pruned.etlt_b7_gpu0_int8.engine
int8-calib-file=models/trafficcamnet_int8.txt
labelfile-path=models/labels.txt
batch-size=7
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
interval=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=models/nvinfer_config.txt
[tests]
file-loop=1
Below you can find some of the setup information:
• Hardware Platform (Jetson / GPU): Jetson Xavier NX
• DeepStream Version: 6.1
• JetPack Version 5.0.1
• Power Mode 8
• NVIDIA GPU Driver Version: cuda 11.4
• Issue Type: question
To expand on what Fiona mentioned:
Please check with a file source. A live stream will most likely limit the framerate to whatever the camera is configured to.
It seems that you have already configured power mode to max - but can you confirm that is the case?
Thanks.
Thanks for answering.
I’m using a file as a source and configuring sync=0 in the sink properties should not limit the input speed at the video’s framerate.
Indicates how fast the stream is to be rendered.
0: As fast as possible
1: Synchronously
I configured an RTSP stream from a camera and declared it 7 times in the configuration file.
The camera is configured at 30fps 1080p
[source0]
enable=1
type=4
uri=rtsp://admin:Pswd123!@192.168.20.163:554
gpu-id=0
cudadec-memtype=0
[source2]
enable=1
type=4
uri=rtsp://admin:Pswd123!@192.168.20.163:554
gpu-id=0
cudadec-memtype=0
...
[source6]
enable=1
type=4
uri=rtsp://admin:Pswd123!@192.168.20.163:554
gpu-id=0
cudadec-memtype=0
**PERF: 24.22 (24.06) 24.27 (24.11) 24.22 (24.06) 24.22 (24.06) 24.35 (24.18) 24.27 (24.11) 24.22 (24.06)
**PERF: 24.32 (24.24) 24.32 (24.26) 24.32 (24.24) 24.32 (24.24) 24.32 (24.30) 24.32 (24.26) 24.32 (24.24)
**PERF: 24.33 (24.23) 24.33 (24.24) 24.33 (24.23) 24.33 (24.23) 24.33 (24.26) 24.33 (24.24) 24.33 (24.23)
**PERF: 24.36 (24.27) 24.36 (24.28) 24.36 (24.27) 24.36 (24.27) 24.36 (24.30) 24.36 (24.28) 24.36 (24.27)
As requested here’s nvpmodel query
NV Power Mode: MODE_20W_6CORE
Hi @seb.higa ,
Sorry if I was not clear in my previous comment.
Please do not test with an RTSP steam. Use a local file stored in your NX file system.
In the first post, I used a video file as an input
[source0]
enable=1
type=3
uri=file://sample_1080p_h264.mp4
num-sources=7
#drop-frame-interval=2
gpu-id=0
cudadec-memtype=0
Here’s more info about the video used:
Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1920x1080, 5675 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default)
I modified live-source=0 and tried again
[streammux]
gpu-id=0
live-source=0
batch-size=7
batched-push-timeout=33000
width=1920
height=1080
enable-padding=0
nvbuf-memory-type=0
Here’s the performance output
**PERF: 24.39 (24.20) 24.27 (24.08) 24.27 (24.08) 24.27 (24.08) 24.39 (24.20) 24.27 (24.08) 24.27 (24.08)
**PERF: 24.40 (24.30) 24.40 (24.24) 24.40 (24.24) 24.41 (24.24) 24.41 (24.30) 24.40 (24.24) 24.40 (24.24)
For the performance test:
Max power mode is enabled: $ sudo nvpmodel -m 0
The GPU clocks are stepped to maximum: $ sudo jetson_clocks
Since you have 7 input streams, please modify the “batch-size=7” in deepstream_reference_apps/config_infer_primary_trafficcamnet.txt at master · NVIDIA-AI-IOT/deepstream_reference_apps (github.com) for this case.
Hi Fiona!
Using sudo nvpmodel -m 0
changes my power mode from MODE_20W_6CORE down to MODE_15W_2CORE , is that ok?
Jetson clocks is enabled.
I can confirm that the nvinfer plugin is using batch-size7 by using the engine file models/resnet18_trafficcamnet_pruned.etlt_b7_gpu0_int8.engine
Here’s the primary infer config file
#file at models/nvinfer_config.txt
[property]
gpu-id=0
net-scale-factor=0.00392156862745098
offsets=0.0;0.0;0.0
infer-dims=3;544;960
tlt-model-key=tlt_encode
tlt-encoded-model= resnet18_trafficcamnet_pruned.etlt
network-type=0
network-mode=1
batch-size=7
num-detected-classes=4
uff-input-order=0
output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd
uff-input-blob-name=input_1
model-color-format=0
maintain-aspect-ratio=0
output-tensor-meta=0
Here’s the output for power mode 0
**PERF: 21.98 (19.66) 20.53 (18.96) 21.01 (19.15) 21.01 (19.15) 24.51 (21.16) 20.53 (18.96) 21.98 (19.66)
**PERF: 20.46 (20.36) 20.46 (20.29) 20.46 (20.31) 20.46 (20.31) 20.46 (20.43) 20.46 (20.29) 20.46 (20.36)
Here’s the output for power mode 8
**PERF: 24.38 (24.15) 24.09 (23.87) 24.09 (23.87) 24.38 (24.15) 24.09 (23.87) 24.09 (23.87) 24.09 (23.87)
**PERF: 24.42 (24.39) 24.42 (24.25) 24.42 (24.25) 24.42 (24.39) 24.42 (24.25) 24.42 (24.25) 24.42 (24.25)
Thanks!!
@seb.higa
I tried the case again. And I can confirm that the FPS can reach to more than 280 with your configuration in NX board.
The steps should be:
Clean the engine file generated in previous test.
set power mode 8
Run deepstream-app with the attached configuration file (batch-size=7)
test.txt (2.7 KB)
Removing the engine file and generating it showed the following warning, which I didn’t see the first time.
WARNING: INT8 calibration file not specified. Trying FP16 mode.
An engine file with the name resnet18_trafficcamnet_pruned.etlt_b7_gpu0_int8.engine with FP16 precision was created and after following executions, this warning didn’t show up again.
I added the calib file in the default nvinfer_config.txt and removed the engine file.
**PERF: 39.47 (39.34) 39.47 (39.34) 39.47 (39.34) 39.47 (39.34) 39.26 (39.13) 39.26 (39.13) 39.26 (39.13)
**PERF: 39.42 (39.37) 39.42 (39.37) 39.42 (39.37) 39.42 (39.37) 39.42 (39.27) 39.42 (39.27) 39.42 (39.27)
Thanks for the update @seb.higa .
system
Closed
September 8, 2022, 2:44pm
13
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.