Deepstream as OSD Only - nvstreammux LAG on RTSP source

Hello,
I would like to use Deepstream only for OSD functionality. I would like to use it using python bindings and I’ve already managed it but I’ve an issue relative to nvstreammux.

In fact this gst plugin, that for what I’ve understood is requied in order to add frame meta data, causes a kind of lag if video input is live.
If I use a simple pipeline like:

gst-launch-1.0 rtspsrc location="rtsp://admin:Password@192.168.0.200:554/" latency=500 ! rtph264depay ! nvv4l2decoder bufapi-version=1 ! nvvideoconvert ! videorate !  "video/x-raw(memory:NVMM),width=1920,height=1080,framerate=25/1" !  queue ! nvstreammux0.sink_0 nvstreammux name=nvstreammux0 batch-size=1 width=1920 height=1080 live-source=TRUE ! queue ! nvvideoconvert ! queue ! nvdsosd ! queue ! nvvideoconvert ! nvv4l2h264enc bitrate=2000000 iframeinterval=50 ! video/x-h264,stream-format=byte-stream,alignment=au,framerate=25/1 ! queue ! rtspclientsink location=rtsp://127.0.0.1:554/live/test

As you could see, output is lagging:

Deepstream Managed RTSP Stream

https://www.youtube.com/watch?v=0HmHyDKGGS4

Original RTSP Stream

https://www.youtube.com/watch?v=LZHjjeaLikM

If I try a simple pipeline without using nvstreammux (and nvosd):

gst-launch-1.0 rtspsrc location="rtsp://admin:Password@192.168.0.200:554/" latency=500 ! rtph264depay ! nvv4l2decoder bufapi-version=1 ! nvvideoconvert ! videorate ! queue ! nvv4l2h264enc bitrate=2000000 iframeinterval=50 ! video/x-h264,stream-format=byte-stream,alignment=au,framerate=25/1 ! queue ! rtspclientsink location=rtsp://127.0.0.1:554/live/test

All works like a charm… any explanation?

I’m going crazy…

• Hardware Platform (Jetson / GPU): Jetson Nano
• DeepStream Version: 5.1
• JetPack Version (valid for Jetson only): 32.5

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
• DeepStream Version
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

• Hardware Platform (Jetson / GPU): Jetson Nano
• DeepStream Version: 5.1 (manually updated, issue was available also in 5.0)
• JetPack Version (valid for Jetson only): 4.4
• TensorRT Version: 7.1.3
• NVIDIA GPU Driver Version (valid for GPU only): No
• Issue Type( questions, new requirements, bugs): Issue
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing): Starting from deepstream_test1_rtsp_out.py, I’ve generated a Gstreamer pipeline that reads from an RTSP source and pushes video in another RTSP server. As you can see from my pipelines and from video samples if I use nvstreammux, video output is lagging. Maybe I’m doing something wrong but I cannot understand what.
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description): no

Up

The two pipelines are quite different. Can you use “tegrastats” to monitor the performance when you run the case?

Thanks for your reply.
I’m not sure that my pipeline is right, but I cannot undestand where is the issue.

ods@ods:~$ gst-launch-1.0 rtspsrc location="rtsp://admin:Password@192.168.0.200:554/" latency=500 ! rtph264depay ! nvv4l2decoder bufapi-version=1 ! nvvideoconvert ! videorate !  "video/x-raw(memory:NVMM),width=1920,height=1080,framerate=25/1" !  queue ! nvstreammux0.sink_0 nvstreammux name=nvstreammux0 batch-size=1 width=1920 height=1080 live-source=TRUE ! queue ! nvvideoconvert ! queue ! nvdsosd ! queue ! nvvideoconvert ! nvv4l2h264enc bitrate=2000000 iframeinterval=50 ! video/x-h264,stream-format=byte-stream,alignment=au,framerate=25/1 ! queue ! rtspclientsink location=rtsp://127.0.0.1:554/live/test
Setting pipeline to PAUSED ...
Opening in BLOCKING MODE
Opening in BLOCKING MODE
Pipeline is live and does not need PREROLL ...
Progress: (open) Opening Stream
Progress: (connect) Connecting to rtsp://127.0.0.1:554/live/test
Progress: (open) Opening Stream
Progress: (connect) Connecting to rtsp://admin:Password@192.168.0.200:554/
Progress: (open) Retrieving server options
Progress: (open) Opened Stream
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Progress: (request) Sending RECORD request
Progress: (request) Sending PLAY request
Progress: (open) Retrieving server options
Progress: (open) Retrieving media info
Progress: (request) SETUP stream 0
Progress: (request) SETUP stream 1

Progress: (open) Opened Stream
Progress: (request) Sending PLAY request
Progress: (request) Sent PLAY request
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
Redistribute latency...
NvMMLiteOpen : Block : BlockType = 4
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
H264: Profile = 66, Level = 0
Progress: (record) Sending server stream info
Progress: (request) SETUP stream 0
Progress: (request) Sending RECORD request
Progress: (record) Starting recording

Tegrastats

RAM 2214/3963MB (lfb 33x4MB) SWAP 2/1981MB (cached 0MB) CPU [7%@1479,5%@1479,8%@1479,8%@1479] EMC_FREQ 0% GR3D_FREQ 0% PLL@23C CPU@25C PMIC@100C GPU@23C AO@30C thermal@24.25C POM_5V_IN 4778/3148 POM_5V_GPU 118/118 POM_5V_CPU 671/456
RAM 2214/3963MB (lfb 33x4MB) SWAP 2/1981MB (cached 0MB) CPU [8%@1479,4%@1479,6%@1479,3%@1479] EMC_FREQ 0% GR3D_FREQ 0% PLL@22.5C CPU@25C PMIC@100C GPU@22.5C AO@30C thermal@24C POM_5V_IN 3327/3149 POM_5V_GPU 118/118 POM_5V_CPU 435/456
RAM 2214/3963MB (lfb 33x4MB) SWAP 2/1981MB (cached 0MB) CPU [26%@1479,5%@1479,5%@1479,5%@1479] EMC_FREQ 0% GR3D_FREQ 0% PLL@22.5C CPU@25C PMIC@100C GPU@23C AO@30C thermal@24C POM_5V_IN 4390/3160 POM_5V_GPU 118/118 POM_5V_CPU 671/458
RAM 2214/3963MB (lfb 33x4MB) SWAP 2/1981MB (cached 0MB) CPU [5%@1479,3%@1479,2%@1479,3%@1479] EMC_FREQ 0% GR3D_FREQ 0% PLL@22.5C CPU@25C PMIC@100C GPU@23C AO@29.5C thermal@24C POM_5V_IN 3214/3160 POM_5V_GPU 119/118 POM_5V_CPU 435/458
RAM 2214/3963MB (lfb 33x4MB) SWAP 2/1981MB (cached 0MB) CPU [9%@1479,6%@1479,10%@1479,6%@1479] EMC_FREQ 0% GR3D_FREQ 0% PLL@22.5C CPU@25.5C PMIC@100C GPU@23C AO@30C thermal@23.75C POM_5V_IN 4501/3172 POM_5V_GPU 118/118 POM_5V_CPU 710/460
RAM 2214/3963MB (lfb 33x4MB) SWAP 2/1981MB (cached 0MB) CPU [11%@1479,2%@1479,1%@1479,3%@1479] EMC_FREQ 0% GR3D_FREQ 0% PLL@22C CPU@25C PMIC@100C GPU@22.5C AO@30C thermal@23.75C POM_5V_IN 3134/3171 POM_5V_GPU 119/118 POM_5V_CPU 435/460
RAM 2214/3963MB (lfb 33x4MB) SWAP 2/1981MB (cached 0MB) CPU [6%@1479,6%@1479,5%@1479,4%@1479] EMC_FREQ 0% GR3D_FREQ 0% PLL@22.5C CPU@25.5C PMIC@100C GPU@22.5C AO@30C thermal@24C POM_5V_IN 3876/3177 POM_5V_GPU 118/118 POM_5V_CPU 553/460
RAM 2214/3963MB (lfb 33x4MB) SWAP 2/1981MB (cached 0MB) CPU [5%@1479,1%@1479,5%@1479,3%@1479] EMC_FREQ 0% GR3D_FREQ 0% PLL@22.5C CPU@25C PMIC@100C GPU@23C AO@29.5C thermal@23.75C POM_5V_IN 3174/3177 POM_5V_GPU 119/118 POM_5V_CPU 435/460
RAM 2214/3963MB (lfb 33x4MB) SWAP 2/1981MB (cached 0MB) CPU [19%@1479,8%@1479,4%@1479,4%@1479] EMC_FREQ 0% GR3D_FREQ 0% PLL@22.5C CPU@25C PMIC@100C GPU@23C AO@30.5C thermal@24C POM_5V_IN 4928/3191 POM_5V_GPU 118/118 POM_5V_CPU 749/462
RAM 2214/3963MB (lfb 33x4MB) SWAP 2/1981MB (cached 0MB) CPU [6%@1479,3%@1479,4%@1479,3%@1479] EMC_FREQ 0% GR3D_FREQ 0% PLL@22.5C CPU@25.5C PMIC@100C GPU@23C AO@30C thermal@24C POM_5V_IN 3406/3193 POM_5V_GPU 118/118 POM_5V_CPU 475/463
RAM 2214/3963MB (lfb 33x4MB) SWAP 2/1981MB (cached 0MB) CPU [9%@1479,6%@1479,5%@1479,1%@1479] EMC_FREQ 0% GR3D_FREQ 0% PLL@23C CPU@25C PMIC@100C GPU@23C AO@30C thermal@24C POM_5V_IN 4152/3201 POM_5V_GPU 118/118 POM_5V_CPU 672/464
RAM 2214/3963MB (lfb 33x4MB) SWAP 2/1981MB (cached 0MB) CPU [7%@1479,13%@1479,3%@1479,3%@1479] EMC_FREQ 0% GR3D_FREQ 0% PLL@22.5C CPU@25C PMIC@100C GPU@23C AO@30C thermal@24C POM_5V_IN 3095/3200 POM_5V_GPU 119/118 POM_5V_CPU 475/464
RAM 2214/3963MB (lfb 33x4MB) SWAP 2/1981MB (cached 0MB) CPU [7%@1479,3%@1479,5%@1479,5%@1479] EMC_FREQ 0% GR3D_FREQ 0% PLL@22.5C CPU@25C PMIC@100C GPU@22.5C AO@29.5C thermal@24C POM_5V_IN 3446/3202 POM_5V_GPU 118/118 POM_5V_CPU 475/464
RAM 2214/3963MB (lfb 33x4MB) SWAP 2/1981MB (cached 0MB) CPU [8%@1479,1%@1479,7%@1479,4%@1479] EMC_FREQ 0% GR3D_FREQ 0% PLL@22.5C CPU@25C PMIC@100C GPU@22.5C AO@30C thermal@23.75C POM_5V_IN 3214/3202 POM_5V_GPU 119/118 POM_5V_CPU 515/465
RAM 2214/3963MB (lfb 33x4MB) SWAP 2/1981MB (cached 0MB) CPU [10%@1479,18%@1479,6%@1479,5%@1479] EMC_FREQ 0% GR3D_FREQ 0% PLL@22.5C CPU@25C PMIC@100C GPU@22.5C AO@30C thermal@24.25C POM_5V_IN 3446/3204 POM_5V_GPU 118/118 POM_5V_CPU 475/465
RAM 2214/3963MB (lfb 33x4MB) SWAP 2/1981MB (cached 0MB) CPU [7%@1479,3%@1479,4%@1479,3%@1479] EMC_FREQ 0% GR3D_FREQ 0% PLL@22.5C CPU@25C PMIC@100C GPU@22.5C AO@29.5C thermal@24.25C POM_5V_IN 3214/3204 POM_5V_GPU 119/118 POM_5V_CPU 435/465

Have you found something I missed?
Is my pipeline wrong?

Seems there is no performance issue with the first pipeline. Can you try to set “sync=0” with rtspclientsink plugin?

And can you upgrade to latest deepstream 6.0 SDK and try with the first pipeline? You can also try to set “sync-input=true” with nvstreammux with DS 6.0 SDK.