I originally repeated this issue using my own live streams.
However for ease of anyone to repeat this issue, I have repeated it with the deepstream-app demo. While running the demo with some minor tweaks (details later), the video o/p & performance starts off ok. However, in less than 30 minutes, the CPU (4 cores) will rail at 100% & the video performance of all 8 streams have dramatically decreased.
The following is a graph of how the 1st video stream degrades over time - note that each sample is 5s:
The following are the details for repeating the issue with deep-stream app:
-The latest Deepstream 4.0.1 is being used & the Jetson Nano is jumpered for high power mode with a 4A power supply. A fan is also attached & continuously running.
-The following modifications were done to the demo configuration file, source8_1080p_dec_infer-resnet_tracker_tiled_display_fp16_nano.txt:
- [b]In [sink0],"sync" was changed from 1 to 0 In [tests], "file-loop" was changed from 0 to 1[/b]
With the above settings, the test runs without any issue. The overall CPU usage is approx. 45% (measured with top or jtop).The reported performance FPS in the terminal is approximately the same at the end of an overnight run as it was at the beginning i.e. approx. 30FPS average.
However we made one change to the configuration file before retesting:
In [source0], “drop-frame-interval” which was originally commented out was enabled & set to 3 as shown:
The test was restarted & it seemed to start up correctly. The sample mp4 was repeated 8 times in the 8 windows & object detection/tracking seemed to work well. The performance was shown as follows:**PERF: FPS 0 (Avg) FPS 1 (Avg) FPS 2 (Avg) FPS 3 (Avg) FPS 4 (Avg)
FPS 5 (Avg) FPS 6 (Avg) FPS 7 (Avg)
**PERF: 12.70 (12.70) 25.38 (25.38) 11.90 (11.90) 12.45 (12.45) 8.41 (8.41)
26.85 (26.85) 17.34 (17.34) 8.99 (8.99)
**PERF: 15.97 (15.74) 14.91 (15.74) 10.57 (10.67) 12.18 (12.21) 7.38 (7.44)
19.88 (20.53) 14.90 (15.10) 8.03 (8.10)
However after 8 minutes, the CPU cores started getting pegged as shown with the jtop utility:
After 30 minutes, the cars/buses in the videos shown on the output are “slowing down”.
The performance output on the terminal has been greatly reduced:
**PERF: 4.13 (7.80) 4.59 (7.77) 4.41 (7.76) 4.43 (7.71) 4.19 (7.72)4.53 (7.72) 4.13 (7.80) 4.25 (7.73)
**PERF: 4.44 (7.79) 4.36 (7.76) 4.06 (7.75) 4.50 (7.70) 4.56 (7.71)4.26 (7.71) 4.36 (7.79) 4.45 (7.72)
The top utility confirms that it is deepstream that is using all of the CPU:
Here are the details shown by the jtop utility:
Previous experimentation prompted us to enable the drop-frame-interval & set it to 2 or 3 (depending on the bandwidth of the attached IP camera).
-With this setting disabled, high latency was seen between actual movement & output seen on screen. By changing it to 2 or 3 (for a 30fps camera), improved this latency & didn’t seem to affect the Object detection/tracking.
-This variable seemed a way to “normalize” the inputs if different camera bandwidths are used.
-Our understanding from the documentation is that this setting determines which frames that the hardware decoder outputs e.g. 3 would mean that the decoder outputs every 3rd frame.
-If that understanding is correct, we would expect that the CPU usage would actually decrease as this value increases. Or is that an incorrect assumption?
We appreciate any help that you can offer with this.
I have attached the configuration file below.
# Copyright (c) 2019 NVIDIA Corporation. All rights reserved. # # NVIDIA Corporation and its licensors retain all intellectual property # and proprietary rights in and to this software, related documentation # and any modifications thereto. Any use, reproduction, disclosure or # distribution of this software and related documentation without an express # license agreement from NVIDIA Corporation is strictly prohibited. [application] enable-perf-measurement=1 perf-measurement-interval-sec=5 #gie-kitti-output-dir=streamscl [tiled-display] enable=1 rows=2 columns=4 ##Orig was width-1280 & height=720 width=1280 height=720 gpu-id=0 #(0): nvbuf-mem-default - Default memory allocated, specific to particular platform #(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla #(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla #(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla #(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson nvbuf-memory-type=0 [source0] enable=1 #Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP type=3 uri=file://../../streams/sample_1080p_h264.mp4 num-sources=8 ##Orig was commented out, 3 causes High CPU usage,1 worked with demo mp4 drop-frame-interval=3 gpu-id=0 # (0): memtype_device - Memory type Device # (1): memtype_pinned - Memory type Host Pinned # (2): memtype_unified - Memory type Unified cudadec-memtype=0 [sink0] enable=1 #Type - 1=FakeSink 2=EglSink 3=File type=5 ##Orig was 1 but we have seen stuttering with it sync=0 source-id=0 gpu-id=0 qos=0 nvbuf-memory-type=0 overlay-id=1 [sink1] enable=0 type=3 #1=mp4 2=mkv container=1 #1=h264 2=h265 codec=1 sync=0 #iframeinterval=10 bitrate=2000000 output-file=out.mp4 source-id=0 [sink2] enable=0 #Type - 1=FakeSink 2=EglSink 3=File 4=RTSPStreaming type=4 #1=h264 2=h265 codec=1 sync=0 bitrate=4000000 # set below properties in case of RTSPStreaming rtsp-port=8554 udp-port=5400 [osd] enable=1 gpu-id=0 border-width=1 text-size=15 text-color=1;1;1;1; text-bg-color=0.3;0.3;0.3;1 font=Serif show-clock=0 clock-x-offset=800 clock-y-offset=820 clock-text-size=12 clock-color=1;0;0;0 nvbuf-memory-type=0 [streammux] gpu-id=0 ##Boolean property to inform muxer that sources are live live-source=0 batch-size=8 ##time out in usec, to wait after the first buffer is available ##to push the batch even if the complete batch is not formed batched-push-timeout=40000 ## Set muxer output width and height width=1920 height=1080 ##Enable to maintain aspect ratio wrt source, and allow black borders, works ##along with width, height properties enable-padding=0 nvbuf-memory-type=0 # config-file property is mandatory for any gie section. # Other properties are optional and if set will override the properties set in # the infer config file. [primary-gie] enable=1 gpu-id=0 model-engine-file=../../models/Primary_Detector_Nano/resnet10.caffemodel_b8_fp16.engine batch-size=8 #Required by the app for OSD, not a plugin property bbox-border-color0=1;0;0;1 bbox-border-color1=0;1;1;1 bbox-border-color2=0;0;1;1 bbox-border-color3=0;1;0;1 interval=4 gie-unique-id=1 nvbuf-memory-type=0 config-file=config_infer_primary_nano.txt [tracker] enable=1 tracker-width=480 tracker-height=272 #ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_mot_iou.so ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_mot_klt.so #ll-config-file required for IOU only #ll-config-file=iou_config.txt gpu-id=0 [tests] ##Orig was 0 file-loop=1