tensorRT inference engine that setting bigger max_batch_size is slower?

Hi all!

In my exp, i found that tensorRT inference engine that initialized with bigger max_batch_size is slower than the engine that initialized with smaller max_batch_size.

For example, i have two engine, one initialized with max_batch_size is 32, we call it engine_a, the other one initialized with max_batch_size is 1, we call it engine_b. In my exp, both engine use batch_size is 1. the engine_a’s fps(frame per second) is 113, but the engine_b’s 125, which mean engine_a is slow 10% than engine_b while their are the same but the max_batch_size setting.

Is this result normal?

In my application, the batch_size is uncertain, i usually can set the max_batch_size very big(eg. 32), but my exp show that the engine will be slower, so it is not a good idea。

I found something similar. I’m running benchmarks on high-end GPUs like P100, P40, V100 and what I found out is that bs 32 performs best (a bit better than 1, 8, 16), but values like 64, 128 etc. perform way worse than 32.


The max_batch_size parameter limits the batch size at runtime and TensorRT optimizes for this number of batches. It does not indicate this batch size will be used during runtime. In your experiments, both models run 1 batch during runtime, therefore, engine_b performs better since TensorRT optimizes on 1 batch.