Inference with deepstream yolov5s-3.0 on 2 camera long delay (20-25s)

Description

Hi, I’m currently running inference with deepstream yolov5s-3.0 on 2 camera IP on a jetson nano 4GB and its runs with a long delay.
Here you can see the log

I saw that the jetson nano can handle multiple IP camera with full FPS. I have 20 seconds delay minimum.
My GPU is at 99% usage.

I tested with another model : yolov3 tiny and I had the same problem.

Environment

Deepstream:5.1
Jetpack: 4.5.1

Relevant Files

Here’s my config files and the model to run with deepstream

config_infer_primary.txt (444 Bytes) deepstream_app_config.txt (1.1 KB)
yolov5s.engine (19.9 MB)

Steps To Reproduce

I follow this tutorial to run yolov5 with deepstream :

https://github.com/marcoslucianops/DeepStream-Yolo/blob/master/YOLOv5.md
I run the command deepstream-app -c deepstream_app_config.txt

If you have any idea ? Thanks

Could you refer to GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream to measure the trtexec perf of yolov5 on NANO?

Here is the result :
Warmup completed 1 queries over 200 ms
[03/15/2021-09:43:56] [I] Timing trace has 10 queries over 0.873358 s
[03/15/2021-09:43:56] [I] Trace averages of 10 runs:
[03/15/2021-09:43:56] [I] Average on 10 runs - GPU latency: 85.5117 ms - Host latency: 85.9994 ms (end to end 87.1849 ms, enqueue 17.0025 ms)
[03/15/2021-09:43:56] [I] Host Latency
[03/15/2021-09:43:56] [I] min: 82.0215 ms (end to end 83.311 ms)
[03/15/2021-09:43:56] [I] max: 96.0195 ms (end to end 96.1392 ms)
[03/15/2021-09:43:56] [I] mean: 85.9994 ms (end to end 87.1849 ms)
[03/15/2021-09:43:56] [I] median: 85.0918 ms (end to end 86.5874 ms)
[03/15/2021-09:43:56] [I] percentile: 96.0195 ms at 99% (end to end 96.1392 ms at 99%)
[03/15/2021-09:43:56] [I] throughput: 11.4501 qps
[03/15/2021-09:43:56] [I] walltime: 0.873358 s
[03/15/2021-09:43:56] [I] Enqueue Time
[03/15/2021-09:43:56] [I] min: 14.2544 ms
[03/15/2021-09:43:56] [I] max: 21.5146 ms
[03/15/2021-09:43:56] [I] median: 16.8276 ms
[03/15/2021-09:43:56] [I] GPU Compute
[03/15/2021-09:43:56] [I] min: 81.543 ms
[03/15/2021-09:43:56] [I] max: 95.5269 ms
[03/15/2021-09:43:56] [I] mean: 85.5117 ms
[03/15/2021-09:43:56] [I] median: 84.6216 ms
[03/15/2021-09:43:56] [I] percentile: 95.5269 ms at 99%
[03/15/2021-09:43:56] [I] total compute time: 0.855117 s
&&&& PASSED TensorRT.trtexec # ./trtexec --loadEngine=/opt/nvidia/deepstream/deepstream-5.1/sources/yolo/yolov5s.engine --plugins=/home/jetson-nano/Documents/shark/countingshark/tensorrtx/yolov5/build/libmyplugins.so

this should be the perf of batch-size=1,
so, the inference time is ~ 86ms / frame with batch-size=1.

you could try below command to batch-size=2, I guess the inference time / batch will be double since yolov5 with batch-size=1 should already fully use NANO GPU.

$ ./trtexec --loadEngine=/opt/nvidia/deepstream/deepstream-5.1/sources/yolo/yolov5s.engine – plugins=/home/jetson-nano/Documents/shark/countingshark/tensorrtx/yolov5/build/libmyplugins.so --batch=2

And, where did you get 20 seconds delay?

Thanks!

here the result with batch of 2

[03/15/2021-11:12:58] [I] Average on 10 runs - GPU latency: 0.000634766 ms - Host latency: 0.951318 ms (end to end 0.961865 ms, enqueue 0.138208 ms)
[03/15/2021-11:12:58] [I] Average on 10 runs - GPU latency: 0.00065918 ms - Host latency: 0.997388 ms (end to end 1.31833 ms, enqueue 0.136426 ms)
[03/15/2021-11:12:58] [I] Host Latency
[03/15/2021-11:12:58] [I] min: 0.882751 ms (end to end 0.890137 ms)
[03/15/2021-11:12:58] [I] max: 2.59497 ms (end to end 2.71497 ms)
[03/15/2021-11:12:58] [I] mean: 0.964853 ms (end to end 1.10102 ms)
[03/15/2021-11:12:58] [I] median: 0.955078 ms (end to end 0.992981 ms)
[03/15/2021-11:12:58] [I] percentile: 1.09448 ms at 99% (end to end 1.83429 ms at 99%)
[03/15/2021-11:12:58] [I] throughput: 1722.59 qps
[03/15/2021-11:12:58] [I] walltime: 3.00246 s
[03/15/2021-11:12:58] [I] Enqueue Time
[03/15/2021-11:12:58] [I] min: 0.0470581 ms
[03/15/2021-11:12:58] [I] max: 1.92053 ms
[03/15/2021-11:12:58] [I] median: 0.110413 ms
[03/15/2021-11:12:58] [I] GPU Compute
[03/15/2021-11:12:58] [I] min: 0.000244141 ms
[03/15/2021-11:12:58] [I] max: 0.625153 ms
[03/15/2021-11:12:58] [I] mean: 0.00102237 ms
[03/15/2021-11:12:58] [I] median: 0.000732422 ms
[03/15/2021-11:12:58] [I] percentile: 0.00109863 ms at 99%
[03/15/2021-11:12:58] [I] total compute time: 0.00264384 s

I got 20s and more delay when I run inference on 2 camera IP, I tested on 1 camera but It is the same (15s delay)

the latency of batch-size=2 is much less than that of batch-size=1, looks there is problem with batch-size=2.

Where and how did you observe 15 or 20 second latency?

When I run the detection on the cameras, there is a delay of 20s between the reality and the display window. With 1 or 2 cameras it is the same
In the console it shows 6 FPS for each camera but in the display window it is showing with more than 30s delay

6FPS makes sense.
As per previous test, bs=1, infer time = 86ms, then bs=2, infer time = ~ 2 x 86 = 176ms, so fps = 1000 / 176 = 5.7 fps

Is the performance normal ? I saw that with some model you can have full fps:

And why there is a delay between the display window and the reality ?

This seriously depends on the model you use. Different model requires different compute caability. The video you shared is based on some more lightweight video.

why there is a delay between the display window and the reality ?
15 ~ 20 s? As I asked twice above, how do you get this delay?

you can try set batched-push-timeout in group [streammux] from 40000 to 20000 or lower , and run again

Ok thanks !
I got this delay when I run the command deepstream-app -c deepstream_app_config.txt on my 2 camera (rtsp stream) with the yolov5s.engine
What do you mean by “how do you get this delay” ?

I decreased gradually to 0 but it doesn’t change the delay :/

Ciao Constantin,

I would try to simplify gradually the input stream to see if at least you can reach the desired performance with a lighter computation request.
For example: how does it look with 1 source?
Have you tried to skip some frames?

#drop-frame-interval=5

(you can use the previous option in the config file)

Ciao!

Ciao,
Thanks for yor response !
I have the same delay with 1 camera,
If i set drop-frame-interval=5 the delay no longer exists but one camera loses 2 FPS

Uhm, I have no additional input to give you for I have never faced such an issue.

However, I would try to troubleshoot the situation with a simpler pipeline and implementation.
For example I would try to use the same model in deepstreamapp-test3 and eventually inspect which part of the pipeline is creating the delay.

However, just a quick check.
Have you tried to run your very same setup with a different model than yolov5? for example resnet18 that is provided with the deepstream sample apps?

I try with yolov3-tiny and I obtained 21FPS and 18FPS for the 2 camera.
One camera has a delay of 4 seconds and the other one no delay.

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

HI @constantin.fite ,
Could you refer to DeepStream SDK FAQ - #12 by bcao to capture the DS component and pipeline latency log?