Unexpected Delay When Setting Model Interval > 0 in Custom RetinaNet Pipeline

• Hardware Platform (Jetson / GPU) : NVIDIA Jetson AGX Orin
• DeepStream Version : 7.1
• JetPack Version (valid for Jetson only) : 6.1
• TensorRT Version : 8.6.2.3
• Issue Type( questions, new requirements, bugs) : question
Hello everyone,

I’m working on a DeepStream pipeline using a custom RetinaNet (ResNet50 backbone) model. Below is a visual representation of the pipeline:

Model and Performance

The model is deployed via TensorRT (FP16 mode), and here’s the inference performance summary:

=== Performance summary ===
Throughput: 19.9816 qps
Latency: min = 49.9731 ms, max = 50.7459 ms, mean = 50.045 ms, median = 50.0352 ms

The pipeline includes a camera source (nvarguscamerasrc) and a custom NvDsInferParseCustomDropperWire parser. Here’s the relevant configuration:

[property]
gpu-id=0
net-scale-factor=0.017352074
offsets=123.675;116.28;103.53
model-color-format=0 # 0=RGB, 1=BGR
onnx-file=models/dropper_wire/dropper_wire_model.onnx
model-engine-file=models/dropper_wire/dropper_wire_model.onnx_b1_gpu0_fp16.engine
labelfile-path=dropper_wire_labels.txt
network-input-order=0
batch-size=1
network-mode=2 # 0=FP32, 1=INT8, 2=FP16 mode
network-type=0 # 0 for detector
num-detected-classes=2
process-mode=1
gie-unique-id=4
interval=0 # 10 - skips every 10 batches
scaling-compute-hw=2
parse-bbox-func-name=NvDsInferParseCustomDropperWire
custom-lib-path=libnvds_dw_bboxparser.so
cluster-mode=2

[class-attrs-all]
pre-cluster-threshold=0.5
nms-iou-threshold=0.3
topk=50

Probe Function and Observation

To monitor source frame timing and whether there is some delay or not, I added a simple probe on the camera source that calculates time difference every 60 frames:

import time

old_pts = 0
frame_counter = 0

def buffer_probe_callback(pad, info):
    global old_pts
    global frame_counter

    frame_counter += 1
    if frame_counter % 60 == 0:
        new_pts = time.time_ns()
        print(f"new pts: {new_pts} | old pts: {old_pts} | diff: {new_pts - old_pts}")
        old_pts = new_pts

    return Gst.PadProbeReturn.OK

Problem: Delay with interval=10

When I set interval=10, which should reduce GPU load by running inference on every 10th frame, I noticed increased latency between timestamps on the source pad (even though the probe is on the camera source, not the inference output). Here are my logs:

new pts: 1745933478882083548 | old pts: 1745933477854277917 | diff: 1027805631
new pts: 1745933479881374875 | old pts: 1745933478882083548 | diff: 999291327
new pts: 1745933480919815184 | old pts: 1745933479881374875 | diff: 1038440309
new pts: 1745933481974549303 | old pts: 1745933480919815184 | diff: 1054734119
new pts: 1745933483012883547 | old pts: 1745933481974549303 | diff: 1038334244
new pts: 1745933484048887011 | old pts: 1745933483012883547 | diff: 1036003464
new pts: 1745933485061569819 | old pts: 1745933484048887011 | diff: 1012682808
new pts: 1745933486080445473 | old pts: 1745933485061569819 | diff: 1018875654
new pts: 1745933487112439325 | old pts: 1745933486080445473 | diff: 1031993852
new pts: 1745933488129228906 | old pts: 1745933487112439325 | diff: 1016789581
new pts: 1745933489178694631 | old pts: 1745933488129228906 | diff: 1049465725
new pts: 1745933490195145097 | old pts: 1745933489178694631 | diff: 1016450466
new pts: 1745933491215312214 | old pts: 1745933490195145097 | diff: 1020167117

However, with interval=0, the timestamps are as expected — around 1 second apart:


new pts: 1745933583804167777 | old pts: 1745933582804185553 | diff: 999982224
new pts: 1745933584807426806 | old pts: 1745933583804167777 | diff: 1003259029
new pts: 1745933585804810812 | old pts: 1745933584807426806 | diff: 997384006
new pts: 1745933586806014311 | old pts: 1745933585804810812 | diff: 1001203499
new pts: 1745933587806854886 | old pts: 1745933586806014311 | diff: 1000840575
new pts: 1745933588805590885 | old pts: 1745933587806854886 | diff: 998735999
new pts: 1745933589806389363 | old pts: 1745933588805590885 | diff: 1000798478
new pts: 1745933590808212773 | old pts: 1745933589806389363 | diff: 1001823410
new pts: 1745933591809462850 | old pts: 1745933590808212773 | diff: 1001250077
new pts: 1745933592807678617 | old pts: 1745933591809462850 | diff: 998215767
new pts: 1745933593808176268 | old pts: 1745933592807678617 | diff: 1000497651
new pts: 1745933594809096377 | old pts: 1745933593808176268 | diff: 1000920109
new pts: 1745933595808256813 | old pts: 1745933594809096377 | diff: 999160436
new pts: 1745933596810548820 | old pts: 1745933595808256813 | diff: 1002292007
new pts: 1745933597812056947 | old pts: 1745933596810548820 | diff: 1001508127
new pts: 1745933598809397080 | old pts: 1745933597812056947 | diff: 997340133

This is very strange as I would expect difference to raise because model needs more time to perform inference on each frame rarger than on every 10th framne,

Question

Why does increasing the interval in the primary inference element cause larger delays in the upstream camera source timestamps?

Could this be due to using interpipesink and interpipesrc elements in the pipeline?

I want to use interval=10 to reduce GPU and power consumption, but I cannot afford the added delay in source frames. Is there a way to set the interval without stalling or delaying frame generation at the source?

Any insights or suggestions on how to handle inference interval without affecting the upstream timing would be greatly appreciated!

How did you get this?

This is a function. How did you use it?

Have you measured the speed of this?

@Fiona.Chen sorry for my late response.

I got that by following this Nvidia Benchmarking Practices

This is a probe function connected to very first element of the pipeline which is nvarguscamerasrc. It just counts average delay every 60 frames.

I have not measured this function. How can i measure it, can this be done by Nsight System? I think the problem does not lie in the speed of the bbox function but rather in the model itself. The model is huge and provides latency. However, I do not understand why it provides latency in obtaining camera frames, because as shown in the post timestamps, when connecting probe function to nvarguscamerasrc element and calculating differences between current and previous timestamps, the differences are too significant.

Hello @Fiona.Chen, any update on this topic?

Can you remove interpipesink and interpipesrc elements from the pipeline and test the latency with and without “interval=10”?

1 Like

Since the postprocessing function is also a part of nvinfer, it will also impact the delay.

@Fiona.Chen thank you for your response.

I’ve removed the interpipesink and interpipesrc elements from my DeepStream pipeline. Here is the updated pipeline structure:

1. Interval = 0

When interval=0, I’m seeing unexpectedly large gaps between frames — around 3 seconds for every 60 frames. Here’s a sample of the PTS (Presentation Timestamps):

new pts: 1747210683929946463 | old pts: 1747210680950808140 | diff: 2979138323
new pts: 1747210686911879670 | old pts: 1747210683929946463 | diff: 2981933207
new pts: 1747210689895941546 | old pts: 1747210686911879670 | diff: 2984061876
new pts: 1747210692868643880 | old pts: 1747210689895941546 | diff: 2972702334
new pts: 1747210695849485724 | old pts: 1747210692868643880 | diff: 2980841844
new pts: 1747210698830135043 | old pts: 1747210695849485724 | diff: 2980649319
new pts: 1747210701812106464 | old pts: 1747210698830135043 | diff: 2981971421
new pts: 1747210704796648414 | old pts: 1747210701812106464 | diff: 2984541950
new pts: 1747210707777961543 | old pts: 1747210704796648414 | diff: 2981313129
new pts: 1747210710759578075 | old pts: 1747210707777961543 | diff: 2981616532
new pts: 1747210713733690728 | old pts: 1747210710759578075 | diff: 2974112653
new pts: 1747210716715390303 | old pts: 1747210713733690728 | diff: 2981699575

This behavior is significantly different than what I observed when using interpipesink and interpipesrc, where setting interval=0 resulted in a much tighter frame interval — around 1 second across 60 frames.

2. Interval = 10

Setting interval=10 reduces the timestamp differences, but I still observe notable delays between consecutive frames:

new pts: 1747211780565745738 | old pts: 1747211779510535188 | diff: 1055210550
new pts: 1747211781629032572 | old pts: 1747211780565745738 | diff: 1063286834
new pts: 1747211782656375157 | old pts: 1747211781629032572 | diff: 1027342585
new pts: 1747211783678928339 | old pts: 1747211782656375157 | diff: 1022553182
new pts: 1747211784717816667 | old pts: 1747211783678928339 | diff: 1038888328
new pts: 1747211785746181137 | old pts: 1747211784717816667 | diff: 1028364470
new pts: 1747211786778787458 | old pts: 1747211785746181137 | diff: 1032606321
new pts: 1747211787863084904 | old pts: 1747211786778787458 | diff: 1084297446
new pts: 1747211788889374461 | old pts: 1747211787863084904 | diff: 1026289557

This is an improvement, but the frame spacing is still inconsistent and affected by inference timing.

3. Interval = 100

At interval=100, the behavior is initially correct. However, as soon as inference starts, the delay between frames increases significantly.

new pts: 1747211839492752386 | old pts: 1747211838492363163 | diff: 1000389223
new pts: 1747211840492891916 | old pts: 1747211839492752386 | diff: 1000139530
new pts: 1747211841493887459 | old pts: 1747211840492891916 | diff: 1000995543
new pts: 1747211842493297361 | old pts: 1747211841493887459 | diff: 999409902
new pts: 1747211843495408564 | old pts: 1747211842493297361 | diff: 1002111203
HERE -> new pts: 1747211844511842385 | old pts: 1747211843495408564 | diff: 1016433821
new pts: 1747211845512336232 | old pts: 1747211844511842385 | diff: 1000493847
new pts: 1747211846513552735 | old pts: 1747211845512336232 | diff: 1001216503

QUESTION

I don’t understand why the inference operation affects the camera’s frame acquisition timing .

I’m using a probe function attached to the first element in the pipeline (nvarguscamerasrc) to capture frame timestamps. These timestamps vary significantly depending on the interval setting even though they should be independent of inference.

Why is this happening? Shouldn’t the frame acquisition from nvarguscamerasrc be decoupled from downstream inference performance?

The test result without " interpipesink and interpipesrc" is reasonable. Your model speed is more than 50ms for one batch. The inference delay will be longer than that because the preprocessing and postprocessing will take extra time either.

Because the model can’t handle 60 frames in one second according to your measurement. Even the camera provide 60 frames in one second, they will wait in queue for inferencing.

The camera device capture frames in buffers, but the total buffer number is limited. If the downstream can’t consume the buffers and return the empty buffers back to camera in time, the frame will be missed.

It can be decoupled by coping the frame to new buffers and store the new buffers in unlimited queue, the camera buffers will be returned in time, but if you keep the downstream to work in the speed slower than 60 fps, the buffers in queue will continue to increase until the memory is used up. Is this what you want?

@Fiona.Chen thank you for your reply.

Yes, I completely agree with this point.

I understand this as well. However, my probe function is connected directly to the nvarguscamerasrc, not the nvinfer element. My question is:

Why are frames not dropped when the queue exceeds capacity?

Let me explain with an example:

Even if the model’s inference time is ~60 ms, and I set interval=100, the model should only process 1 frame every 100 frames, assuming batch-size=1. That would mean inferencing occurs approximately every 100 × 16.6 ms = 1660 ms (1.66 seconds).

Despite this, I still observe a delay occuring every 100 frames, which implies that frames that inference is not performed on there is no delay, but inference on this one single frame implies delay which can be seen here:

new pts: 1747211842493297361 | old pts: 1747211841493887459 | diff: 999409902
new pts: 1747211843495408564 | old pts: 1747211842493297361 | diff: 1002111203
HERE -> new pts: 1747211844511842385 | old pts: 1747211843495408564 | diff: 1016433821
new pts: 1747211845512336232 | old pts: 1747211844511842385 | diff: 1000493847

This causes the timestamps at the camera source itself to drift.

What I would like to achieve is:

  • No latency accumulation at the camera, regardless of how long inference takes (e.g., 1 ms or 50 ms).
  • I want frames to be dropped if necessary, especially when inference cannot keep up — rather than introducing delay.
  • I expect that the difference between every 60 frames (i.e., every second) should remain close to 1 second, not stretch to 1.03s, 2s, or even more.

How can I modify my pipeline to achieve this?

I want real-time behavior where the source operates independently of the inference load, dropping frames if needed but maintaining consistent frame generation and acquisition timing.

Thank you for your suggestions.

You set the “sync=0” with your fakesink element, the sink element will always release buffers as soon as possible. Then the pipeline will not be smooth.

I’ve given the suggestion in Unexpected Delay When Setting Model Interval > 0 in Custom RetinaNet Pipeline - #9 by Fiona.Chen

There is no such element in DeepStream.

Please add queue before your sink element and set “sync” property as True.

@Fiona.Chen thank you very much for reply.

After setting fakesink sync property to 0, there is no improvement. i have the same pipeline (without interpipesinkandinterpipesrc`) with interval set to 10. Here are the results, the difference is still significantly larger than what is should be:

new pts: 1747294528474001350 | old pts: 1747294527443356093 | diff: 1030645257
new pts: 1747294529503351392 | old pts: 1747294528474001350 | diff: 1029350042
new pts: 1747294530530276630 | old pts: 1747294529503351392 | diff: 1026925238
new pts: 1747294531565229152 | old pts: 1747294530530276630 | diff: 1034952522

Doesn’t queue provide such a functionality with leaky property?

I also added queue element before fakesink and set sync property of fakesink to True. This also do not improve the results, and they are the same. The delay still occurs.
However, here I am confused. In previous message, you wrote to set sync property to 0, but in this message you mentioned to set sync to True. It doesn’t hold together or those 2 cases are completely different?

“leak” only works when the queue is full. What you need is " dropped if necessary".

When “sync” is true, sink element will drop the frames whose timestamp is late.

The “drop frame” strategy should be used before inferencing if you want to reduce the interval between the camera generated frames. The proper way is to separated the camera output from the inferencing.

The way you calculate

@Fiona.Chen how can i separate camera output from inferencing? How is it possible with the following pipeline?

After your suggestions I still do not know how I am supposed to solve my problem with delay that occur on camera because of long model inference. I set fakesink sync property to True and added queue before fakesink however it changed nothing.

My sugegstion can’t be implemented by such pipeline. You need to implement the copy and buffering by yourself.

@Fiona.Chen thank you for your reply.

I’m curious about how scenarios like mine are typically handled specifically, when the model’s inference time significantly exceeds the frame generation interval (e.g., >16.6 ms for 60 FPS input).

For instance, in the retinanet-examples, using a RetinaNet model with a ResNet-50 backbone, the INT8 inference latency on a T4 is around 22 ms which is already longer than 16.6 ms.

How are such cases usually managed within a real-time pipeline?

I find it hard to believe that this isn’t a common scenario for larger models, and I’m wondering if there are recommended practices or built-in mechanisms to deal with this, or if such handling must be implemented manually.

Your request is special. For most customers, the average FPS is around 60 is enough. If you want the FPS to be exactly the same at any moment, you need to add extra logic to control the buffering timing by yourself.

@Fiona.Chen to clarify, I’m not aiming to maintain a constant FPS at all times. My main goal is to ensure that the model’s inference latency does NOT affect the frame generation rate of the camera source. Whether the model has a latency of 3 ms or 100 ms, I want the camera to continue generating frames at its native rate (e.g., 60 FPS) without being throttled or blocked by downstream processing.

If that means introducing a queue element that drops frames when inference is slower than the frame rate, that’s perfectly acceptable. The key requirement is that model inference should not delay or regulate the camera’s frame production.

From what I observe, it seems like the camera source waits for the generated frame to reach the sink (e.g., fakesink) before producing the next frame. Is this correct?

For example:

  • With a fast model (e.g., 3 ms inference time), the camera generates frames every 16.6 ms as expected (60 FPS), because the frame flows quickly through the pipeline and reaches the sink without delay.
  • But with a slower model (e.g., 50 ms latency), the camera seems to wait for the current frame to reach the sink before generating the next one, effectively reducing the frame generation rate.

Is this the expected behavior in DeepStream? And if so, is there a recommended way to decouple the camera frame generation from inference latency? If so, would you mind providing some example or what I am supposed to search for?

@Fiona.Chen any update on that? Thanks

Yes. Since there is no copy out in your pipeline, the buffers are sharing between elements, you need to add buffer copy logic to decouple the source buffers and the buffers to be inferenced.

DeepStream is just a SDK, the way of using the APIs is decided by your own requirement and implementation.

There is no recommended way. What we can tell you is that “It can be decoupled by coping the frame to new buffers and store the new buffers in unlimited queue, the camera buffers will be returned in time” as I have told you in Unexpected Delay When Setting Model Interval > 0 in Custom RetinaNet Pipeline - #9 by Fiona.Chen