Low FPS in Deepstream and Yolov4 on Jetson AGX Xavier

• Hardware Platform (Jetson / GPU) : Jetson AGX Xavier
• DeepStream Version : 5.1
• TensorRT Version : 7.1.3-1
• CUDA Version : 10.2

Hi,

We are analyzing some videos with Yolov4 and Deepstream 5.1 on a Jetson AGX Xavier but we get low FPS (around 17). We thought that this Jetson would get a better performance.
Is there something we could do to improve it?

We tried to disable tracker but FPS remain the same.

Thanks in advance.

We have been able to improve performance enabling more CPUs with this command:
“sudo nvpmodel -m 0”

4 of them were disabled.
Is there another way to get even a better performance?

Hi,

Please try the following:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Thanks.

We tried those commands and we could achieve a better performance (around 25 - 27 fps) but we saw in this post that you can get 57.75 FPS: Options for optimising custom TFRT tiny YOLOv4 implementation to improve live inference speed on Nano.

Why are we getting lower FPS, are we doing something wrong?

This is our deepstream config file:

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5

[tiled-display]
enable=1
rows=1
columns=1
width=1920
height=1080
gpu-id=0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=0

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=2
uri=file:/grabit/prueba_EPIs/videos_input/video_largo_prueba.mp4
num-sources=1
gpu-id=0
# (0): memtype_device   - Memory type Device
# (1): memtype_pinned   - Memory type Host Pinned
# (2): memtype_unified  - Memory type Unified
cudadec-memtype=0

[sink0]
enable=0
#Type - 1=FakeSink 2=EglSink 3=File
type=2
sync=0
source-id=0
gpu-id=0
nvbuf-memory-type=0

[sink1]
enable=1
type=3
#1=mp4 2=mkv
container=1
#1=h264 2=h265 3=mpeg4
codec=1
sync=0
bitrate=2000000
output-file=/grabit/prueba_EPIs/output/video_largo_prueba_analizado.mp4
source-id=0

[osd]
enable=1
gpu-id=0
border-width=1
text-size=9
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=1
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=1920
height=1080
#width=1280
#height=720
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0

# config-file property is mandatory for any gie section.
# Other properties are optional and if set will override the properties set in
# the infer config file.
[primary-gie]
enable=1
gpu-id=0
#model-engine-file=model_b1_gpu0_int8.engine
#labelfile-path=labels.txt
labelfile-path=/opt/nvidia/deepstream/deepstream-5.1/sources/objectDetector_Yolov4/EPIs_2020_12_08.txt
batch-size=1
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
interval=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=/grabit/prueba_EPIs/ficheros_config/config_infer_primary_yoloV4_2021_01_25_608_32000.txt

[tracker]
enable=1
tracker-width=1920
tracker-height=1056
ll-lib-file=/opt/nvidia/deepstream/deepstream-5.1/lib/libnvds_nvdcf.so
ll-config-file=/grabit/prueba_EPIs/ficheros_config/tracker/tracker_config_06_11_v95_v30.yml
enable-batch-process=1
enable-past-frame=0

[nvds-analytics]
enable=1
config-file=/grabit/prueba_EPIs/ficheros_config/nvdsanalytics/config_nvdsanalytics_03_02_springter_nvds5_balanced_cam1_cenital.txt

[tests]
file-loop=0

[ds-example]
enable=1
processing-width=608
processing-height=608
full-frame=0
#batch-size for batch supported optimized plugin
#batch-size=1
unique-id=15
gpu-id=0

Apart from executing these 2 commands:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

We modified this value in deepstream app config file (inside [primary-gie] config group):

interval=0 (we set it up to 2)

Now we get around 55 - 60 fps. The problem is that setting up interval to 2 can cause that deepstream will not be as accurate as with “interval=0” because some frames are ignored, and we would like to know if it´s possible to get this number of FPS (55 - 60) with interval=0 by changing other parameter or doing something else.

Hi,

Our configure can be found in this GitHub.
We also set the interval=0 but use fp16 mode ( network-mode=2 ) for acceleration.

The main difference is that our model has input size = 3x320x512.
But it seems that you are using a 608x based model?

If yes, the slow down is under expectation.

Thanks.

Yes, we are using a 608x based model.

Thank you very much for the information.