Hi, I’m currently running inference with deepstream yolov5s-3.0 on 2 camera IP on a jetson nano 4GB and its runs with a long delay.
Here you can see the log
Here is the result :
Warmup completed 1 queries over 200 ms
[03/15/2021-09:43:56] [I] Timing trace has 10 queries over 0.873358 s
[03/15/2021-09:43:56] [I] Trace averages of 10 runs:
[03/15/2021-09:43:56] [I] Average on 10 runs - GPU latency: 85.5117 ms - Host latency: 85.9994 ms (end to end 87.1849 ms, enqueue 17.0025 ms)
[03/15/2021-09:43:56] [I] Host Latency
[03/15/2021-09:43:56] [I] min: 82.0215 ms (end to end 83.311 ms)
[03/15/2021-09:43:56] [I] max: 96.0195 ms (end to end 96.1392 ms)
[03/15/2021-09:43:56] [I] mean: 85.9994 ms (end to end 87.1849 ms)
[03/15/2021-09:43:56] [I] median: 85.0918 ms (end to end 86.5874 ms)
[03/15/2021-09:43:56] [I] percentile: 96.0195 ms at 99% (end to end 96.1392 ms at 99%)
[03/15/2021-09:43:56] [I] throughput: 11.4501 qps
[03/15/2021-09:43:56] [I] walltime: 0.873358 s
[03/15/2021-09:43:56] [I] Enqueue Time
[03/15/2021-09:43:56] [I] min: 14.2544 ms
[03/15/2021-09:43:56] [I] max: 21.5146 ms
[03/15/2021-09:43:56] [I] median: 16.8276 ms
[03/15/2021-09:43:56] [I] GPU Compute
[03/15/2021-09:43:56] [I] min: 81.543 ms
[03/15/2021-09:43:56] [I] max: 95.5269 ms
[03/15/2021-09:43:56] [I] mean: 85.5117 ms
[03/15/2021-09:43:56] [I] median: 84.6216 ms
[03/15/2021-09:43:56] [I] percentile: 95.5269 ms at 99%
[03/15/2021-09:43:56] [I] total compute time: 0.855117 s
&&&& PASSED TensorRT.trtexec # ./trtexec --loadEngine=/opt/nvidia/deepstream/deepstream-5.1/sources/yolo/yolov5s.engine --plugins=/home/jetson-nano/Documents/shark/countingshark/tensorrtx/yolov5/build/libmyplugins.so
this should be the perf of batch-size=1,
so, the inference time is ~ 86ms / frame with batch-size=1.
you could try below command to batch-size=2, I guess the inference time / batch will be double since yolov5 with batch-size=1 should already fully use NANO GPU.
[03/15/2021-11:12:58] [I] Average on 10 runs - GPU latency: 0.000634766 ms - Host latency: 0.951318 ms (end to end 0.961865 ms, enqueue 0.138208 ms)
[03/15/2021-11:12:58] [I] Average on 10 runs - GPU latency: 0.00065918 ms - Host latency: 0.997388 ms (end to end 1.31833 ms, enqueue 0.136426 ms)
[03/15/2021-11:12:58] [I] Host Latency
[03/15/2021-11:12:58] [I] min: 0.882751 ms (end to end 0.890137 ms)
[03/15/2021-11:12:58] [I] max: 2.59497 ms (end to end 2.71497 ms)
[03/15/2021-11:12:58] [I] mean: 0.964853 ms (end to end 1.10102 ms)
[03/15/2021-11:12:58] [I] median: 0.955078 ms (end to end 0.992981 ms)
[03/15/2021-11:12:58] [I] percentile: 1.09448 ms at 99% (end to end 1.83429 ms at 99%)
[03/15/2021-11:12:58] [I] throughput: 1722.59 qps
[03/15/2021-11:12:58] [I] walltime: 3.00246 s
[03/15/2021-11:12:58] [I] Enqueue Time
[03/15/2021-11:12:58] [I] min: 0.0470581 ms
[03/15/2021-11:12:58] [I] max: 1.92053 ms
[03/15/2021-11:12:58] [I] median: 0.110413 ms
[03/15/2021-11:12:58] [I] GPU Compute
[03/15/2021-11:12:58] [I] min: 0.000244141 ms
[03/15/2021-11:12:58] [I] max: 0.625153 ms
[03/15/2021-11:12:58] [I] mean: 0.00102237 ms
[03/15/2021-11:12:58] [I] median: 0.000732422 ms
[03/15/2021-11:12:58] [I] percentile: 0.00109863 ms at 99%
[03/15/2021-11:12:58] [I] total compute time: 0.00264384 s
I got 20s and more delay when I run inference on 2 camera IP, I tested on 1 camera but It is the same (15s delay)
When I run the detection on the cameras, there is a delay of 20s between the reality and the display window. With 1 or 2 cameras it is the same
In the console it shows 6 FPS for each camera but in the display window it is showing with more than 30s delay
This seriously depends on the model you use. Different model requires different compute caability. The video you shared is based on some more lightweight video.
why there is a delay between the display window and the reality ?
15 ~ 20 s? As I asked twice above, how do you get this delay?
Ok thanks !
I got this delay when I run the command deepstream-app -c deepstream_app_config.txt on my 2 camera (rtsp stream) with the yolov5s.engine
What do you mean by “how do you get this delay” ?
I would try to simplify gradually the input stream to see if at least you can reach the desired performance with a lighter computation request.
For example: how does it look with 1 source?
Have you tried to skip some frames?
#drop-frame-interval=5
(you can use the previous option in the config file)
Ciao,
Thanks for yor response !
I have the same delay with 1 camera,
If i set drop-frame-interval=5 the delay no longer exists but one camera loses 2 FPS
Uhm, I have no additional input to give you for I have never faced such an issue.
However, I would try to troubleshoot the situation with a simpler pipeline and implementation.
For example I would try to use the same model in deepstreamapp-test3 and eventually inspect which part of the pipeline is creating the delay.
However, just a quick check.
Have you tried to run your very same setup with a different model than yolov5? for example resnet18 that is provided with the deepstream sample apps?
There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks