Deepstream-server FPS Keep Falling After A While

Fiona.Chen · November 28, 2025, 7:16am

So we must know the model engine’s performance first. Have you measured it by “trtexec”?

finnytjenxia · November 28, 2025, 7:21am

Can I use this results from when I first convert it from onnx to trt?

[11/20/2025-02:09:52] [I] === Performance summary === 

[11/20/2025-02:09:52] [I] Throughput: 67.0395 qps 

[11/20/2025-02:09:52] [I] Latency: min = 24.5115 ms, max = 25.1259 ms, mean = 24.8803 ms, median = 24.8773 ms, percentile(90%) = 24.9915 ms, percentile(95%) = 25.0333 ms, percentile(99%) = 25.0992 ms 

[11/20/2025-02:09:52] [I] Enqueue Time: min = 0.864746 ms, max = 1.53534 ms, mean = 0.908439 ms, median = 0.892609 ms, percentile(90%) = 0.942047 ms, percentile(95%) = 0.963501 ms, percentile(99%) = 1.15454 ms 

[11/20/2025-02:09:52] [I] H2D Latency: min = 6.30591 ms, max = 6.34033 ms, mean = 6.31577 ms, median = 6.31271 ms, percentile(90%) = 6.32501 ms, percentile(95%) = 6.33289 ms, percentile(99%) = 6.33716 ms 

[11/20/2025-02:09:52] [I] GPU Compute Time: min = 14.6442 ms, max = 15.0804 ms, mean = 14.843 ms, median = 14.8311 ms, percentile(90%) = 14.9565 ms, percentile(95%) = 14.9995 ms, percentile(99%) = 15.0723 ms 

[11/20/2025-02:09:52] [I] D2H Latency: min = 3.43481 ms, max = 3.77759 ms, mean = 3.72158 ms, median = 3.72382 ms, percentile(90%) = 3.74524 ms, percentile(95%) = 3.75122 ms, percentile(99%) = 3.7627 ms 

[11/20/2025-02:09:52] [I] Total Host Walltime: 3.04298 s 

[11/20/2025-02:09:52] [I] Total GPU Compute Time: 3.02797 s

Fiona.Chen · November 28, 2025, 7:24am

Yes. You can. Is the batch size 128?

finnytjenxia · November 28, 2025, 7:27am

Yep, my command when converting is:

trtexec --onnx=yolo11s.onnx --saveEngine=model.plan --minShapes=images:1x3x640x640 --optShapes=images:32x3x640x640 --maxShapes=images:128x3x640x640 --fp16

Fiona.Chen · November 28, 2025, 8:10am

So the model engine seems quite fast. It can handle around 1000/25 x 128 =5120 frames per second.

According to Video Codec SDK | NVIDIA Developer, L40s(similar to L20) can decode at most 80x1080p@30fps H264 video streams which can provides 30x80=2400 frames per second. If you have to use hardware decoder in your case, the hardware decoder will be a bottleneck.
Even if the sources can provide enough frames for inferencing, you also need to check your customized postprocessing function NvDsInferParseYolo() performance to make sure it can handle more than 5120 times per second.

finnytjenxia · December 1, 2025, 2:00am

Good morning, Fiona.

I tried your advice on the custom parser problem and switch to the Ultralytics-recognized custom YOLO parser instead (GitHub - marcoslucianops/DeepStream-Yolo: NVIDIA DeepStream SDK 8.0 / 7.1 / 7.0 / 6.4 / 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 implementation for YOLO models). The results do look better for 20fps streams. I can manage up-to 45 streams before the pipeline die, with better SM utilization. Seeing that it is most probably the custom parser, is there perhaps a way to squeeze its performance? Perhaps changing the value of some of the config parameters?

I am using 20fps streams, so:

batch-push-timeout is 50000 Then, 
max stream size is 64 (goal is this)

Streammux:

max-same-source-frames=2        
max-num-frames-per-batch=64
num-surfaces-per-frame=1
buffer-pool-size=4

Should I change these in nvinferserver_yolo11s:
extra {
copy_input_to_host_buffers: false
output_buffer_pool_size: 32
}

My full configs (just in case):

nvinferserver_yolo11s.txt (1.7 KB)

custom_nvstreammux_config.txt (661 Bytes)

dsserver_config.txt (1.7 KB)

46 streams:
gpu mem usage:./deepstream-server-app               21816MiB

**PERF : FPS_0 (0.00)   FPS_1 (18.55)   FPS_2 (18.54)   FPS_3 (18.54)   
FPS_4 (18.54)   FPS_5 (18.54)   FPS_6 (18.54)   FPS_7 (18.54)   
FPS_8 (18.52)   FPS_9 (18.52)       FPS_10 (18.52)  FPS_11 (18.52)  
FPS_12 (18.52)  FPS_13 (18.51)  FPS_14 (18.51)  FPS_15 (18.49)  
FPS_16 (18.49)  FPS_17 (18.49)  FPS_18 (18.49)  FPS_19 (18.48)      
FPS_20 (18.04)  FPS_21 (18.04)  FPS_22 (18.04)  FPS_23 (18.04)  
FPS_24 (18.04)  FPS_25 (17.99)  FPS_26 (17.99)  FPS_27 (17.98)  
FPS_28 (17.98)  FPS_29 (17.98)      FPS_30 (17.53)  FPS_31 (17.53)  
FPS_32 (17.53)  FPS_33 (17.53)  FPS_34 (17.45)  FPS_35 (17.45) 
FPS_36 (17.44)  FPS_37 (17.45)  FPS_38 (17.44)  FPS_39 (17.45)      
FPS_40 (16.19)  FPS_41 (16.19)  FPS_42 (16.19)  FPS_43 (15.03)  
FPS_44 (14.34)  FPS_45 (8.79)

gpu    pwr  gtemp  mtemp     sm    mem    enc    dec    jpg    ofa   mclk   pclk

Idx      W      C      C      %      %      %      %      %      %    MHz    MHz

0    301     57      -     62     42      1     30      0      0   9001   2520 
0    292     53      -     65     48      1     29      0      0   9001   2520 
0    214     53      -     62     45      1     31      0      0   9001   2520 
0    156     54      -     63     47      1     30      0      0   9001   2520 
0    193     57      -     60     43      1     35      0      0   9001   2520 
0    254     57      -     61     44      1     32      0      0   9001   2520 
0    308     55      -     63     47      1     30      0      0   9001   2520 
0    269     53      -     65     46      1     29      0      0   9001   2520 
0    169     54      -     64     49      1     29      0      0   9001   2520 
0    175     54      -     61     45      1     31      0      0   9001   2520 
0    216     57      -     60     43      1     33      0      0   9001   2520 
0    292     57      -     63     43      1     30      0      0   9001   2520 
0    307     54      -     63     45      1     30      0      0   9001   2520 
0    237     53      -     64     45      1     30      0      0   9001   2520 
0     94     48      -      0      0      0      0      0      0   9001   2520 
0     86     47      -      0      0      0      0      0      0   9001   2520 
0     85     47      -      0      0      0      0      0      0   9001   2520 
0     84     46      -      0      0      0      0      0      0   9001   2520

Fiona.Chen · December 1, 2025, 2:06am

For the algorithm performance, you can consult the author of the algorithm for how to improve the algorithm efficiency itself. It is not provided by Nvidia.

For the common sense, we’d suggest you to implement the postprocessing tensor parsing algorithm with CUDA. Just as what we have done with deepstream_tools/yolo_deepstream/deepstream_yolo/config_infer_primary_yoloV11.txt at main · NVIDIA-AI-IOT/deepstream_tools

finnytjenxia · December 1, 2025, 2:22am

By the way, @Fiona.Chen

Is it possible to dynamically add another model while the pipeline is running? (kinda like dynamic source but now we add/remove models instead). Also, the deepstream-server-app do not support SGIE is it? how do I add this option?

Fiona.Chen · December 1, 2025, 2:30am

Can you provide more details of the scenario?

All the DeepStream samples demonstrate the usage of the DeepStream components and APIs. deepstream-server-app focuses on demonstrating the usage of “nvmultiurisrcbin” APIs. It has no conflict to the usage of “PGIE+SGIE”. The deepstream-test2 sample demonstrates how to construct “PGIE+SGIE” pipelines. You can construct the nvmultiurisrcbin+PGIE+SGIE pipeline according to the samples.

What option?

finnytjenxia · December 1, 2025, 2:42am

And I tested it, thanks @Fiona.Chen !

The results looks a bit better as I can add one more streams before collapse:

./deepstream-server-app               21792MiB

**PERF : FPS_0 (0.00)   FPS_1 (19.81)   FPS_2 (19.81)   FPS_3 (19.81)   
FPS_4 (19.81)   FPS_5 (19.81)   FPS_6 (19.81)   FPS_7 (19.81)   FPS_8 (19.81)FPS_9 (19.81)   FPS_10 (19.81)  FPS_11 (19.81)  
FPS_12 (19.81)  FPS_13 (19.81)  FPS_14 (19.81)  FPS_15 (19.81)  
FPS_16 (19.81)  FPS_17 (19.81)  FPS_18 (19.81)       FPS_19 (19.80)  
FPS_20 (19.80)  FPS_21 (19.80)  FPS_22 (19.80)  FPS_23 (19.80)  
FPS_24 (19.81)  FPS_25 (19.81)  FPS_26 (19.81)  FPS_27 (19.81)      
FPS_28 (19.81)  FPS_29 (19.81)  FPS_30 (19.81)  FPS_31 (19.81)  
FPS_32 (19.81)  FPS_33 (19.81)  FPS_34 (19.80)  FPS_35 (19.80)  
FPS_36 (19.81)       FPS_37 (19.81)  FPS_38 (19.81)  FPS_39 (19.81)  
FPS_40 (19.81)  FPS_41 (19.81)  FPS_42 (19.81)  FPS_43 (19.81)  
FPS_44 (19.80)  FPS_45 (19.76)

gpu    pwr  gtemp  mtemp     sm    mem    enc    dec    jpg    ofa   mclk   pclk

Idx      W      C      C      %      %      %      %      %      %    MHz    MHz

0    235     56      -     61     43      1     31      0      0   9001   2520 
0    316     54      -     65     46      1     31      0      0   9001   2520 
0    259     52      -     65     45      1     31      0      0   9001   2520 
0    167     53      -     65     47      1     31      0      0   9001   2520 
0    172     54      -     61     43      1     34      0      0   9001   2520 
0    240     56      -     61     43      1     31      0      0   9001   2520 
0    317     53      -     66     45      1     30      0      0   9001   2520 
0    253     53      -     64     45      1     31      0      0   9001   2520 
0    162     53      -     64     46      1     31      0      0   9001   2520 
0    176     55      -     61     43      1     35      0      0   9001   2520 
0    240     56      -     66     47      1     34      0      0   9001   2520 
0    317     54      -     61     43      1     34      0      0   9001   2520 
0    252     52      -     61     44      1     34      0      0   9001   2520 
0    165     53      -     63     44      1     30      0      0   9001   2520 
0    180     55      -     67     47      0     27      0      0   9001   2520 
0    253     56      -     65     47      1     34      0      0   9001   2520 
0    315     54      -     61     43      1     34      0      0   9001   2520 
0    224     53      -     61     43      1     35      0      0   9001   2520 
0    158     53      -     63     44      1     30      0      0   9001   2520 
0    185     56      -     67     47      0     27      0      0   9001   2520 
0    263     56      -     63     47      1     34      0      0   9001   2520 
0    304     53      -     60     42      1     36      0      0   9001   2520 
0    203     53      -     61     41      1     34      0      0   9001   2520 
0    158     54      -     65     45      1     27      0      0   9001   2520 
0    194     57      -     66     46      0     29      0      0   9001   2520 




adding 47 streams:

gpu    pwr  gtemp  mtemp     sm    mem    enc    dec    jpg    ofa   mclk   pclk

Idx      W      C      C      %      %      %      %      %      %    MHz    MHz

0    253     57      -     65     47      1     34      0      0   9001   2520 
0    311     54      -     61     43      1     34      0      0   9001   2520 
0    226     53      -     61     43      1     34      0      0   9001   2520 
0    158     54      -     63     44      1     29      0      0   9001   2520 
0    191     57      -     67     47      0     27      0      0   9001   2520 
0    283     57      -     63     47      1     34      0      0   9001   2520 
0    299     54      -     61     43      1     34      0      0   9001   2520 
0    197     53      -     61     41      1     34      0      0   9001   2520 
0    161     54      -     66     46      1     28      0      0   9001   2520 
0    199     57      -     66     46      0     29      0      0   9001   2520 
0    291     57      -     63     49      1     35      0      0   9001   2520 
0    293     53      -     60     41      1     35      0      0   9001   2520 
0    195     53      -     61     41      1     34      0      0   9001   2520 
0    162     55      -     66     46      1     29      0      0   9001   2520 
0    205     57      -     66     47      0     31      0      0   9001   2520 
0    299     55      -     62     47      1     35      0      0   9001   2520 
0    283     53      -     61     43      1     34      0      0   9001   2520 
0    181     53      -     60     39      1     35      0      0   9001   2520 
0    169     55      -     67     47      1     28      0      0   9001   2520 
0    224     57      -     66     47      0     30      0      0   9001   2520 
0    313     55      -     61     43      1     34      0      0   9001   2520 
0    250     53      -     60     42      1     35      0      0   9001   2520 
0    144     50      -      0      0      0      0      0      0   9001   2520 
0     86     48      -      0      0      0      0      0      0   9001   2520 
0     85     47      -      0      0      0      0      0      0   9001   2520 
0     84     47      -      0      0      0      0      0      0   9001   2520 
0     84     46      -      0      0      0      0      0      0   9001   2520

can we still maximize this or it is what it is for this case?

finnytjenxia · December 1, 2025, 2:50am

I meant, the dsserver_config have no sgie in it, so if i want to add one or two, do i just do it like the way we add SGIE in normall deepstream-app config?

Can i just add this in dsserver_config:

primary-gie:
plugin-type: 1
config-file-path: /workspace/configs/nvinferserver_yolo11s.txt

secondary-gie0:
  enable: 0
  plugin-type: 1
  config-file-path: dsserver_sgie_config.txt

...

secondary-gieN:
...
...

or do I need to adjust the cpp file?

Fiona.Chen · December 1, 2025, 2:57am

I think I have told you the basic principle. You need to find the bottleneck according to your own implementation.

Fiona.Chen · December 1, 2025, 3:01am

The deepstream-server-app does not use “deepstream-app” APIs. Please refer to deepstream-test2 sample for how to add SGIE in the pipeline.

All sample apps are open source. They are samples but not apps for the end users.

finnytjenxia · December 1, 2025, 3:05am

Alright, i got this. But how do i check for which pipeline is the slowest? Is there something i can monitor for each of this supposed parts?

Fiona.Chen · December 1, 2025, 3:13am

Have you guarantees the sources can provide 5120 frames per second?
Have you break down all your downstream components to make sure they can handle 5120 frames per second?

finnytjenxia · December 1, 2025, 3:21am

Ok, so for the first one,
I did manage to push around 45 stable streams before collapsing after adding more.
So, total sources should be 20fps x 45 streams = 900 frames/s (way under 5120).

I don’t know how to check the second one. Any idea?

Fiona.Chen · December 1, 2025, 5:23am

Have you measured the speed of the postprocessing?
2. Seems you have enabled the video encoder and file save with your configuration

So you also need to check the encoding and file save speed.
Video Codec SDK | NVIDIA Developer

The deepstream-server is open source, you can check which components are enabled by your configuration and try to break down them one by one.

finnytjenxia · December 1, 2025, 6:05am

Yep. Adding some time checking in the yolo11 parser from nvidia (the link you gave me), I can see that per frame latency for post-processing is around 0.1 ms.

some samples:

YOLOv11 Post-Process Latency: 0.114689 ms YOLOv11 Post-Process Latency: 0.089131 ms YOLOv11 Post-Process Latency: 0.09148 ms YOLOv11 Post-Process Latency: 0.1009 ms YOLOv11 Post-Process Latency: 0.121041 ms YOLOv11 Post-Process Latency: 0.091336 ms YOLOv11 Post-Process Latency: 0.109893 ms

finnytjenxia · December 4, 2025, 3:50am

Afternoon,

I am here to say that i now managed to add 128 streams dynamically and maintain the pipeline. I just need to adjust the deepstream_server_app.cpp file because that is where the pipeline is apparently.

@Fiona.Chen I do have another question, you can say whether its possible or not.

Is it possible to dynamically add new models (SGIE2, etc) to the pipeline while its running?
For each stream, is it possible to customize the tracker and its config?
For each stream, is it possible to have different PGIE + dynamic crop + SGIE?

Fiona.Chen · December 4, 2025, 5:36am

It depends on how do you define the meaning of “dynamically”.

If each stream has its own PGIE+SGIE, you don’t need to put them in the same pipeline, you can use multiple pipelines.

Topic		Replies	Views
Increase the FPS DeepStream SDK	25	1832	April 17, 2024
Some question about Deep stream 5 DeepStream SDK	42	2285	October 12, 2021
Deepstream 5.1 inference caps at 30fps DeepStream SDK	10	839	November 23, 2021
Facing Glitches RTSP using my deepstream app DeepStream SDK rtsp , jetson-inference , gstreamer , jetson , deepstream	17	413	January 14, 2025
Performance drop when using multiple sources DeepStream SDK	27	1231	April 29, 2024
Inference FLickers on Nvstreeammux Batch-size increase to number of streams DeepStream SDK deepstream	43	331	September 30, 2025
Deepstreamer Pipeline: Optimisation GPU Utilisation DeepStream SDK gstreamer , fps , deepstream	22	370	December 12, 2024
Scaling problem using Triton server and RTSP Multi-stream DeepStream SDK	39	1965	July 9, 2024
Low fps rate and low volatile gpu-util running mutiple streams DeepStream SDK	9	840	May 6, 2023
DeepStream6.1 freezes at a specific position DeepStream SDK deepstream61	19	1193	June 1, 2022

Deepstream-server FPS Keep Falling After A While

Related topics