Latency proportionally increases with batch size

Hi all,

I encounter the following issue: increasing the batch size leads to a proportional increase in latency.

I’m using TRT 5.1.5.0, C++ API, and converted the network from UFF.

Inference times:
Batch size 1: 12.7ms
Batch size 2: 25.2ms
Batch size 3: 37.5ms

However, the SDK documentation implies that increasing the batch size should not have large impact on the latency. The documenation states: Often the time taken to compute results for batch size N=1 is almost identical to batch sizes up to N=16 or N=32. ([url]https://docs.nvidia.com/deeplearning/sdk/tensorrt-best-practices/index.html#batching[/url])

Is the documentation wrong or am I missing something?

Sorry for posting in the wrong forum. Reposted in the general TRT forum. [url]https://devtalk.nvidia.com/default/topic/1058127/tensorrt/latency-proportionally-increases-with-batch-size/[/url]