I encounter the following issue: increasing the batch size leads to a proportional increase in latency.
I’m using TRT 188.8.131.52, C++ API, and converted the network from UFF.
Batch size 1: 12.7ms
Batch size 2: 25.2ms
Batch size 3: 37.5ms
However, the SDK documentation implies that increasing the batch size should not have large impact on the latency. The documenation states: Often the time taken to compute results for batch size N=1 is almost identical to batch sizes up to N=16 or N=32. (https://docs.nvidia.com/deeplearning/sdk/tensorrt-best-practices/index.html#batching)
Is the documentation wrong or am I missing something?