Bottleneck in SGIE

i have 2 pipelines. The first one is as follow:

uridecodebin (rtspsrc) -> ... ->pgie(retina-mobinetv1) -> tracker -> queue -> sgie (Resnet50) -> ... -> sink.

The second one is pretty much the same except it uses v4l2src:

v4l2src-> ... ->pgie(retina-mobinetv1) -> tracker -> queue -> sgie (Resnet50) -> ... -> sink.

I inspect weird behavior with the first pipeline when there are objects for sgie to infer (more objects, more laggy and delay). The second one have no problem with this.

The question is - why does it delay? Is there any way to pass this without modifying sgie?

Device: Jetson Nano
Deepstream SDK: 5.0

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
• DeepStream Version
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Camera video format and resolution is the same as rtsp source?

The preprocessing before pgie will impact pgie result.

• Hardware Platform (Jetson / GPU): Jetson Nano
• DeepStream Version: 5.0.1
• JetPack Version (valid for Jetson only): JP4.4
• TensorRT Version: 7.1.3

Camera video format and resolution is the same as rtsp source?
Yes, They are in the same format. Input resolution = 1920x1080

My pige config

[property]
gpu-id=0
net-scale-factor=1.0
offsets=104.0;117.0;123.0
model-engine-file=/data/pretrained/faceDetector_180_320_batch_sim_arm.plan
#labelfile-path=labels.txt
force-implicit-batch-dim=1
batch-size=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
process-mode=1
model-color-format=1
#infer-dims=3;540;960
num-detected-classes=1
interval=0
gie-unique-id=1
output-blob-names=boxes;scores;landms
#parse-bbox-func-name=NvDsInferParseRetinaNet
#custom-lib-path=build/libnvdsparsebbox_retinaface.so
#enable-dbscan=1

network-type=100
output-tensor-meta=1


What is the format? Uridecodebin means the video is received as comprssed format(H264, HEVC, MJPEG,…), is the v4l2src also receiving the camera video as H264/HEVC/MJPEG? Can you post the complete pipeline with “gst-launch-1.0” command format so that we can know more details?

Uridecodebin receives a rtsp video under H264 format. v4l2src actually comes from this pipeline with the same rtsp link

gst-launch-1.0 uridecodebin uri=$URL source::latency=200 ! nvvidconv ! v4l2sink device=/dev/video1 (v4l2 loopback)

-Can you post the complete pipeline with “gst-launch-1.0” command format so that we can know more details?
Sorry, it’s a bit difficult to rewrite my pipeline with “gst-launch-1.0”, Basically, the pipeline contains the following elements:
nvvidconv, caps_filter, pgie, nvtracker, queue, sgie, tiler, nvosd, [sink]

-I think the problem is because resnet50 is not light enough for jetson nano.

nvvidconv is not deepstream plugin, the video from nvidconv can not work with deepstream plugins.

it’s actually nvvideoconvert, nvvidconv is just a variable name. Sorry for misleading information.

Do you mean your first pipeline is handling the rtsp stream directly and the input of the second pipeline is from another loopback pipeline?

What delay do you mean? How do you measure it? Why do you think the delay is caused by the model?

Initially, the delay is about 2s, then gradually increases to tens of seconds compared to the source. i think the problem is because resnet-50 is too heavy for jetson nano.

Have tried “interval” parameter of nvinfer to offload inference module? Gst-nvinfer — DeepStream 5.1 Release documentation