How to deal with dynamic batch size at runtime and explicit batch size


Imagine the task of calculating embeddings of the found faces in the video frames.
I have a CNN that calculates this embeddings and I want to calculate them in batches, the size of which can vary depending on the number of persons in the frame.

If i create network with kEXPLICIT_BATCH then i need to call setBindingDimensions on execution context every time when the number of faces changes, BUT this leads to a significant slowdown in the next call of enqueueV2.
OR I can always call enqueueV2 with maximum batch size, BUT it is also extremely suboptimal.

Could you give me some advice?



TensorRT Version: 7.
GPU Type: 1070
Nvidia Driver Version: 430.64
CUDA Version: 10.0
CUDNN Version: 7.6.3
Operating System + Version: Linux Manjaro

You have to bucket and batch your results.
You can specify your max batch size and then grab powers of 2 smaller.
So if you won’t ever support more than 40 faces, you have a context/profile for batch size 1, 2, 4, 8, 16, 32, and 40.
Then if you have 30 faces in an image, you can either pad it to 32 and run that contexts/profiles, or launch the contexts/profiles that support 16, 8, 4, and 2.


I correctly understood that you are proposing to create multiple instances of IExecutionContext for different batch size (1, 2, 4 …) and choose the optimal one before inference?

It looks like a solution, but it will cost some memory for each context.

Anyway thanks for your help!