Number of batches during inference.

Hello everyone. In TRT there is max batch size parameter, which to my understanding controls how many datapoints are going to be inferred at once during the inference stage, right? So when I look at logs I notice that there is also a different parameter, namely number of batches.

[TensorRT] INFO: Detecting input data format
[TensorRT] INFO: Dectected data format LCHW
[TensorRT] INFO: Verifying data format is uniform accross all input layers
[TensorRT] INFO: Verifying batches are the expected data type
[TensorRT] INFO: Executing inference
[TensorRT] INFO: Number of Batches: 1
[TensorRT] INFO: Execution batch size: 100

Is there any way to set this parameter to a different value than 1? What if I wanted to do 10 batches of size 10 instead of 1 batch of size 100? The most obvious way is to use the for loop, but that doesn’t seem very tensorrt-ish to me. Also, when I use the for loop with 10 iterations and with 100 iterations I get different results (I take the mean, not the total time), with the latter they are about 10% better when it comes to frames per second. I hope that setting this number of batches parameters directly in trt inference engine would settle this issue for me, but I can’t find any information about it. Is there any way?

The first inference takes much more time than the actual inference, exclude those when comparing.

Yeah, I will try that, thanks. Any ideas why is it that at the beginning it is slower? For the record I’m timing just the inference, not counting the engine building process and data loading. Also, as I said, I noticed that not only is the first iteration slower, but only after like 5 or 10 iterations the results stabilize. Btw. what do you mean by “actual inference”? Isn’t the first iteration “actual”? Thank you for your answer.

A simple google search would have gotten you this , https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/PDIBnp1ftxk

Actual Inference times = Time that is reported for inference

Thank you very much.