Hi all,
I encounter the following issue: increasing the batch size leads to a proportional increase in latency.
I’m using TRT 5.1.5.0, C++ API, and converted the network from UFF.
Inference times:
Batch size 1: 12.7ms
Batch size 2: 25.2ms
Batch size 3: 37.5ms
However, the SDK documentation implies that increasing the batch size should not have large impact on the latency. The documenation states: Often the time taken to compute results for batch size N=1 is almost identical to batch sizes up to N=16 or N=32. ([url]https://docs.nvidia.com/deeplearning/sdk/tensorrt-best-practices/index.html#batching[/url])
Is the documentation wrong or am I missing something?