Optimization using Inference batch size

user9377 · January 14, 2022, 6:21am

Dear Team,

We have clarification related to Optimization/performance improvement using larger Inference batch size.

Environment:
Jetson AGX Xavier
GStreamer: 1.14.5
Jetpack: 4.6
CUDA Version: cuda_10.2_r440
Operating System + Version: Ubuntu 18.04.6 LT
TensorRT Version: 8.0.1-1+cuda10.2
Python Version: 3.6.9

We tried to speed up the inference for our TensorRT model converted from ONNX model.
Model is based on MobileNetv1 architecture.
We did some experiments with different batch size and observed below results.
We measured inference execution time and observed the performance improvement is achieved
for batch sizes above 1024. Is this expected?
Please help us to analyze below table behavior in terms of throughput (QPS), latency
and inference time(sec).

Batch size	Throughput	Latency	Total TRT engine Inference time(sec)	Per image TRT engine Inference time(sec)
1	3786.82 qps	0.262451 ms	0.001962185	0.001962185
4	1671.8 qps	0.599274 ms	1.157729626	0.289432407
8	1254.14 qps	0.800781 ms	1.18355608	0.14794451
16	810.952 qps	1.23877 ms	1.15067625	0.071917266
32	494.04 qps	2.03979 ms	1.163462639	0.036358207
64	280.011 qps	3.60278 ms	1.182549238	0.018477332
128	152.859 qps	6.59839 ms	1.182056665	0.009234818
256	79.2021 qps	12.7529 ms	1.196300983	0.004673051
512	40.6503 qps	24.9807 ms	1.210321426	0.002363909
1024	19.2564 qps	52.2966 ms	1.353152514	0.001321438
2048	9.14893 qps	110.009 ms	1.396543026	0.00068190

MarkusHoHo · January 19, 2022, 3:41pm

Hi @user9377 ,

I took the liberty of moving your topic to the Jetson AGX specific forum to give it more visibility.

Please feel free to change it back if you think this is incorrect. You might also consider looking for information in the TensorRT specific categories here in the forums, for example Deep Learning (Training & Inference) - NVIDIA Developer Forums and its TensorRT sub-category.

Thanks!

Topic		Replies	Views
TRT inference on batches is not giving any performance benefit Jetson TX2 tensorrt , nvbugs	11	1200	October 18, 2021
Inference Time Scales Linearly With Batch Size Jetson AGX Xavier yolo	9	872	December 18, 2023
Inference on large batch size TensorRT	5	4609	September 21, 2018
TensorRT 5.0.2 Batch Size Problem: bigger batch size Inference Time increase??? General	6	1558	October 12, 2021
Inference time on jetson nano Jetson AGX Xavier tensorrt , cuda , kernel , jetson-inference	2	950	May 30, 2022
Inference time is not improving with the increase in batch size TensorRT	8	1882	June 1, 2022
Latency proportionally increases with batch size TensorRT	2	1098	September 12, 2021
TensorRT 5.X / 6.X Batch Size Problem TensorRT	4	608	August 19, 2020
Inference time is linear respective to batch size while using TENSORRT MODEL TensorRT tensorrt , yolo	8	2859	May 5, 2021
Inference Speed Jetson Xavier NX pytorch	6	894	April 12, 2023

Optimization using Inference batch size

Related topics