When enable latency measurement, the result seems has some problem

jiangyuzhe1992 · November 2, 2022, 8:13am

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 6.1
• JetPack Version (valid for Jetson only)
• TensorRT Version 8.2.5-1+cuda11.4
• NVIDIA GPU Driver Version (valid for GPU only) 510.68.02
• Issue Type( questions, new requirements, bugs) questions
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing) export NVDS_ENABLE_LATENCY_MEASUREMENT=1
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hi, recently i found a question when i run deepstream-app program in the nvcr.io/nvidia/deepstream:6.1-devel container,
when i export NVDS_ENABLE_LATENCY_MEASUREMENT=1,
the results seems strange,here is the screenshot:

batch-num = 148 and batch-num=149 deals the same data, i choose 1 ipc and 3 files with file-loop for test, set streammux batch-size to 4, set yolov5 batch-size to 4 ,why come to this results?
here is my txt:
[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP 5=CSI
type=4
#uri=file:/opt/nvidia/deepstream/deepstream-6.1/samples/streams/sample_1080p_h264.mp4
uri=rtsp://admin:zhy12345@172.16.72.64:554/Streaming/Channels/101
#uri=rtmp://192.168.0.14:1935/live/live999
num-sources=1
gpu-id=0

(0): memtype_device - Memory type Device

(1): memtype_pinned - Memory type Host Pinned

(2): memtype_unified - Memory type Unified

cudadec-memtype=0

[source1]
enable=1
type=3
uri=file:/opt/nvidia/deepstream/deepstream-6.1/samples/streams/sample_qHD.mp4
#uri=rtmp://192.168.0.14:1935/live/live999
#uri=rtsp://172.16.80.161:554/live/main_stream
num-sources=1
gpu-id=0
cudadec-memtype=0

[source2]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP 5=CSI
type=3
uri=file:/opt/nvidia/deepstream/deepstream-6.1/samples/streams/sample_ride_bike.mov
#uri=rtmp://192.168.0.14:1935/live/live999
#uri=rtsp://172.16.80.161:554/live/main_stream
num-sources=1
gpu-id=0

(0): memtype_device - Memory type Device

(1): memtype_pinned - Memory type Host Pinned

(2): memtype_unified - Memory type Unified

cudadec-memtype=0

[source3]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP 5=CSI
type=3
uri=file:/opt/nvidia/deepstream/deepstream-6.1/samples/streams/sample_1080p_h265.mp4
#uri=rtmp://192.168.0.14:1935/live/live999
num-sources=1
gpu-id=0

(0): memtype_device - Memory type Device

(1): memtype_pinned - Memory type Host Pinned

(2): memtype_unified - Memory type Unified

cudadec-memtype=0

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File 4=RTSPStreaming 5=Overlay
type=1
source-id=0

Indicates how fast the stream is to be rendered. 0: As fast as possible 1: Synchronously

sync=0
gpu-id=0
nvbuf-memory-type=0
codec=1
enc-type=0
qos=0
bitrate=4000000
iframeinterval=30
rtsp-port=8857
udp-port=5400

[sink1]
enable=1
type=1
source-id=1
sync=0
gpu-id=0
nvbuf-memory-type=0
codec=1
enc-type=0
qos=0
bitrate=4000000
iframeinterval=30
rtsp-port=8858
udp-port=5401
[sink2]
enable=1
type=1
source-id=2
sync=0
gpu-id=0
nvbuf-memory-type=0
codec=1
enc-type=0
qos=0
bitrate=4000000
iframeinterval=30
rtsp-port=8859
udp-port=5402

[sink3]
enable=1
type=1
source-id=3
sync=0
gpu-id=0
nvbuf-memory-type=0
codec=1
enc-type=1
qos=0
bitrate=4000000
iframeinterval=30
rtsp-port=8860
udp-port=5403

[osd]
enable=1
gpu-id=0
border-width=5
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=1

根据路数进行设置

batch-size=4
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000

Set muxer output width and height

width=1920
height=1080
enable-padding=0
nvbuf-memory-type=0

[primary-gie]
enable=1
gpu-id=0
batch-size=4
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary_yoloV5.txt

[tests]
file-loop=1

jiangyuzhe1992 · November 3, 2022, 1:45am

There is another strange result, when i choost one rtsp stream , and set streammux batch-size =4 , batched-push-timeout=40000, yolov5 infer-eigine batch-size = 4, but in one batch , a frame latency is less than 40ms, but another longer than 40ms, does this mean the gpu infer is serial although i set the infer-eigine batch-size=4? how can i infer the data parallelly?
URL_ef0daf9efc9b1dd13b2bca64ca90f416

fanzh · November 7, 2022, 6:57am

I will try.

please set bat-size =1 if only one source.

GPU will wait a batch data, then process them parallelly.

jiangyuzhe1992 · November 7, 2022, 11:15am

uh…

what can i do if i want to batch process data but only have a stream ? If i set infer-eigine batch-size=4, it means gpu will batch data automatically with streammux batch-size=1 ?

fanzh · November 8, 2022, 2:59pm

using the similar configuration file, I can’t reproduce this issue, could your share your whole terminal logs and config_infer_primary_yoloV5.txt? here is my test report.
log.txt (6.7 MB)
cfg.txt (6.6 KB)

jiangyuzhe1992 · November 9, 2022, 7:06am

i found the difference between the cfg files,
i enable a few sinks, u only enable one sink, when i enable one sink, the result seem correct.

source0 is a rtsp stream.
The others are file stream. I set file-loop=1 . It seems the value strange.

fanzh · November 10, 2022, 5:55am

here is the reason:
tiled-display will be default value 0 because it is not be set, the pipeline will include nvstreamdemux and four fakesinks, there will be four probe functions latency_measurement_buf_prob on the four sinks( you can debug in create_pipeline()). after inferencing every frame, usermeta will be copied in four copies by nvstreamdemux , and latency_measurement_buf_prob will be entered four times, so there will be the multiple same printings because the usermeta is the same, batch_num is a global ,variant, it will changed every time.

jiangyuzhe1992 · November 10, 2022, 7:34am

thanks, i will figure the details follow your guide

fanzh · November 15, 2022, 6:50am

Is this still an issue to support? Thanks

jiangyuzhe1992 · November 15, 2022, 7:31am

yeah, thanks for your support.

system · November 30, 2022, 12:56am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.