Deepstream nvinfer batch size and Tensorrt engine QPS Relationship

neeraj.sj · October 8, 2025, 6:04pm

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) A30
• DeepStream Version 8.0
• TensorRT Version 10.0
**• Issue: Unable to reach HigherFPS with Current Engine, even with higher nvinfer batch size.

Title: How to configure TensorRT/DeepStream batch size to maximize throughput (>1300 FPS target)

Hello,

I am trying to optimize inference throughput for my ONNX model integrated into DeepStream. My goal is to understand how TensorRT engine configuration (min/opt/max batch size, streams) and DeepStream nvinfer batch size relate to achieving higher FPS, ideally above 1300 FPS.

DeepStream `nvinfer` configuration

Below is my current ds_demux_pgie_config.txt:

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
onnx-file=folded.onnx
model-engine-file=model_b100_gpu0_fp16.engine
#int8-calib-file=calib.table
labelfile-path=labels.txt
batch-size=100
network-mode=2
num-detected-classes=1
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=1
symmetric-padding=1
workspace-size=16384
#parse-bbox-func-name=NvDsInferParseYolo
parse-bbox-func-name=NvDsInferParseYoloCuda
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet

[class-attrs-all]
nms-iou-threshold=0.45
pre-cluster-threshold=0.35
topk=300

Batch size in DeepStream is currently set as:

batch-size=100
minShapes=1, optShapes=50, maxShapes=100 (during engine build)

TensorRT Performance (trtexec)

Without --streams:

Throughput: ~774.9 QPS
Mean Latency: ~1.48 ms
GPU Compute Time (mean): ~1.28 ms
Coeff of variance: ~18.8%

With --streams enabled:

Throughput: ~1379.2 QPS
Mean Latency: ~6.0 ms (note: parallel streams, latency less reliable)
GPU Compute Time (mean): ~5.76 ms
Coeff of variance: ~12.6%

My Question

Given the above QPS results and DeepStream nvinfer settings:

How should I set TensorRT engine batch sizes (min/opt/max) and DeepStream nvinfer batch-size to maximize GPU utilization and achieve >4000 FPS throughput?
How does the trtexec QPS (with vs. without streams) translate into real DeepStream FPS when the engine is deployed?
Is there a recommended formula or rule of thumb for choosing batch size so that throughput scales without hitting diminishing returns (e.g., higher latency or inefficient GPU scheduling)?
Does using multiple inference streams at the TensorRT level map effectively when DeepStream batches frames, or should one rely more on larger DeepStream batch-size instead?

Any guidance or best practices from the NVIDIA team or the community would be very helpful.

Thank you

Fiona.Chen · October 9, 2025, 8:55am

The trtexec only runs the TensorRT engine performance while DeepStream pipeline/app is a complete application which involves many different resources and modules. They are totally different things, there is no way to calculate such thing.

In general, the most slow module(bottleneck) decided the performance(FPS) of the DeepStream pipeline/app.

No such formula or rule for DeepStream. You can just refer to Troubleshooting — DeepStream documentation

yingliu · November 7, 2025, 7:17am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks.

system · November 21, 2025, 7:18am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Inference FLickers on Nvstreeammux Batch-size increase to number of streams DeepStream SDK deepstream	43	465	September 30, 2025
Nvinfer batch-size from video file input DeepStream SDK jetson-inference	14	1063	March 19, 2024
Batch-size has marginal affect on multi-source performance DeepStream SDK fps	6	1580	October 12, 2021
Batch size is smaller than number of streams in DS pipeline DeepStream SDK	4	529	October 12, 2021
Batch size adjustment DeepStream SDK	5	255	July 2, 2024
Question about tensorRT batch size DeepStream SDK tensorrt	2	947	October 12, 2021
Inference time when max batch size is smaller than when smaller batch size DeepStream SDK tensorrt , deepstream	3	477	September 19, 2023
Deepstream on GPU DeepStream SDK gpu	5	880	March 1, 2022
Does triton-deepstream support dynamic batching? How to config it? DeepStream SDK	2	422	October 26, 2021
DeepStream Inference Fails for ONNX Model with Batch Size different than 1 DeepStream SDK tensorrt , jetson-inference , onnx , deepstream	16	340	February 21, 2025

Deepstream nvinfer batch size and Tensorrt engine QPS Relationship

DeepStream nvinfer configuration

TensorRT Performance (trtexec)

My Question

Related topics

DeepStream `nvinfer` configuration