Latency proportionally increases with batch size

Hi all,

I encounter the following issue: increasing the batch size leads to a proportional increase in latency.

I’m using TRT, C++ API, and converted the network from UFF.

Inference times:
Batch size 1: 12.7ms
Batch size 2: 25.2ms
Batch size 3: 37.5ms

However, the SDK documentation implies that increasing the batch size should not have large impact on the latency. The documenation states: Often the time taken to compute results for batch size N=1 is almost identical to batch sizes up to N=16 or N=32. (

Is the documentation wrong or am I missing something?

Sorry for posting in the wrong forum. Reposted in the general TRT forum.