Latency proportionally increases with batch size

ekaufmann · July 26, 2019, 6:48pm

Hi all,

I encounter the following issue: increasing the batch size leads to a proportional increase in latency.

I’m using TRT 5.1.5.0, C++ API, and converted the network from UFF.

Inference times:
Batch size 1: 12.7ms
Batch size 2: 25.2ms
Batch size 3: 37.5ms

However, the SDK documentation implies that increasing the batch size should not have large impact on the latency. The documenation states: Often the time taken to compute results for batch size N=1 is almost identical to batch sizes up to N=16 or N=32. ([url]https://docs.nvidia.com/deeplearning/sdk/tensorrt-best-practices/index.html#batching[/url])

Is the documentation wrong or am I missing something?

ekaufmann · July 26, 2019, 7:10pm

Topic		Replies	Views
Latency proportionally increases with batch size TensorRT	2	1152	September 12, 2021
TensorRT 5.0.2 Batch Size Problem: bigger batch size Inference Time increase??? General	6	1609	October 12, 2021
Latency linearly increases when increased batch size or concurrent models Tensorrt Triton Inference Server (archived) tensorrt	3	1896	October 1, 2021
Questions about using TensorRT - batch size TensorRT	0	477	March 12, 2020
Latency linearly increases when increased batch size or concurrent models TensorRT inference-server-triton	15	2169	September 29, 2021
TRT inference on batches is not giving any performance benefit Jetson TX2 tensorrt , nvbugs	11	1305	October 18, 2021
Inference time is not improving with the increase in batch size TensorRT	8	2017	June 1, 2022
Batchsize performance differs greatly in the two application methods of tensorrt TensorRT	2	705	April 4, 2019
Inference time is linear respective to batch size while using TENSORRT MODEL TensorRT tensorrt , yolo	8	2942	May 5, 2021
tensorRT inference engine that setting bigger max_batch_size is slower? TensorRT	3	885	October 12, 2021