Batching inference on two different streams


I have two separate streams for two video streams. Is it possible to batch their inference using the same object detection model? If so, how?


It’s 112% possible.

You’ll need an IExecutionContext per stream and then you’ll also have to set the appropriate optimization profile for each one.

From there, you’ll need make sure you setup your bindings properly, i.e. the buffer sequence TensorRT expects to read from/write out to.

Then all you really have to do is call enqueueV2 from each execution context, supplying the appropriate cudaStream_t.

Hi Christian,

Thank you for the reply. It indeed helps.

I have one more question: Is it possible to change the scheduling policy of TensorRT?

I understand FIFO is the default policy. But is it possible to customize to my own scheduling policy?


I don’t believe so. Instead, you’d have to handle this at an application level, i.e. you enqueue precisely in the order that you want.