Trying to figure out how to gather performance metrics for the deepstream reference app running the Object_Detector_SSD example. Ideally, I want to be able to track the frame latencies from ingestion until osd.
Following the post: How to measure PGIE latency?, I was able to view latency measurements for only the nvosd0 component. However, I am not able to view the frame latency since the num_sources_in_batch equals 0 inside the latency_measurement_buf_prob function (deepstream_app.c) when there should be 1 rtsp source.
Could you provide pointers to (1) see the frame latency (2) view latencies for more components than just the nvosd?
config
[application]
enable-perf-measurement=1
perf-measurement-interval-sec=1
gie-kitti-output-dir=streamscl
[tiled-display]
enable=0
rows=1
columns=1
width=1280
height=720
gpu-id=0
nvbuf-memory-type=0
[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=4
num-sources=1
uri=rtsp://192.168.1.251:8554/mystream
gpu-id=0
cudadec-memtype=0
[streammux]
gpu-id=0
live-source=1
batch-size=1
batched-push-timeout=40000
## Set muxer output width and height
width=1920
height=1080
enable-padding=0
nvbuf-memory-type=0
[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=2
sync=1
source-id=0
gpu-id=0
[sink1]
enable=0
#Type - 1=FakeSink 2=EglSink 3=File 4=RTSPStreaming
type=4
#1=h264 2=h265
codec=1
sync=0
#iframeinterval=10
bitrate=4000
# set below properties in case of RTSPStreaming
rtsp-port=8554
udp-port=5400
[osd]
enable=1
gpu-id=0
border-width=3
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0
[primary-gie]
enable=1
gpu-id=0
batch-size=1
gie-unique-id=1
interval=0
labelfile-path=ssd_coco_labels.txt
model-engine-file=sample_ssd_relu6.uff_b1_fp32.engine
# model-engine-file=mec_v2.engine
config-file=config_infer_primary_ssd.txt
nvbuf-memory-type=0
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
[tracker]
enable=1
tracker-width=640
tracker-height=368
ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_mot_klt.so
gpu-id=0
#enable-batch-process applicable to DCF only
enable-batch-process=1
I am not sure if it’s possible to set measure the fps for different elements using the config file. You might have to change the code to point to the element that you want to measure the fps for. If you see this block of code in somewhere in your app, modify the INSERT_YOUR_ELEMENT_FOR_FPS with the element that you want to measure your fps for. Hope this helps
Please enable below variables, then you can get the latency for components and frame.
for frame
NVDS_ENABLE_LATENCY_MEASUREMENT
for components
NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT
please noted you should use rtsp or live sources.
Yes, I have enabled those env variables, and am using a rtsp source (config file posted above). After looking at deepstream_app.c, inside the function latency_measurement_buf_prob, the num_sources_in_batch is equal to 0 (despite setting num-sources=1 under [source0] in the config file). This results in seeing zero latency for all frames, though I see latency measurements for component nvosd0.
However, when I experimentally added a probe swapping out the second argument of NVGSTDS_ELEM_ADD_PROBE (in main) with appCtx->pipeline.demuxer, I am able to view more latency measurements for additional components:
I’m trying to understand why changing this probe resulted in showing more components, since I can’t seem to find any documentation on NVGSTDS_ELEM_ADD_PROBE (would appreciate if someone could tell me where this function is defined). I’m also not sure whether the second output is showing the entire pipeline since it’s missing the nvosd0 component seen in the first output. Ideally, I want to be able to show latencies from decoding until the osd (not sure if the tracker technically covers this).
NVGSTDS_ELEM_ADD_PROBE defined in sources/apps/apps-common/includes/deepstream_common.h
it make sense you see components decoder, streammux, nvinfer, tracker latency, but without osd, since you add probe on demuxer, but osd is behind demuxer;
but i am not sure where goes wrong you just got osd latency before you change to add probe on demuxer, can you specify how we could repro your issue?
Wanted to ask if you have an alternative rtsp source setup such that you can just change the rtsp-uri in the config shared in my first post to this thread. If not, please let me know.
Assuming that there is an alternate rtsp source, to reproduce the error:
Set env variables mentioned above
Run the example ObjectDetector_SSD using the deepstream reference app, using the above config file (from the first post)
To view the multiple components, I changed the second argument ~ line 1035 of the function call NVGSTDS_ELEM_ADD_PROBE from appCtx-> pipeline->instance_bins->sink_bin.sub_bins[0].sink to appCtx->pipeline.demuxer.
Thanks!
[Edit: Still no fix, but changed sync property to equal 0 in the config above for sink0]