Is multi threaded execution possible with tensorRT?

For multi stream video analytics, is it possible to create threads and thread blocks and do inference for every stream on a different thread using TensorRT inference engine in C++ as we wish?


As per the document for TensorRT and slides for Cuda streams says I created multiple streams and multiple execution context and checked the output in visual profiler.
This is the code within my inference function, where I have used BufferManager from buffer.h library which was provided with tensorrt samples.


    // Execute the inference work
    context->enqueue(mParams.batchSize, buffers.getDeviceBindings().data(),stream1,nullptr);
    context2->enqueue(mParams.batchSize, buffers2.getDeviceBindings().data(),stream2,nullptr);
    // Copy data from device output buffers to host output buffers

As you can see, I have separate stream, separate buffer and separate execution context to inference 2 images. But when I checked the visual profiler results both the executions (inference) did not happen simultaneously.

Can you please explain me what went wrong here?


Unless otherwise specified all calls are placed into a default stream, often referred to as “Stream 0”. It has special synchronization rules:

It might also be due to low GPU compute space to run multiple streams. Please check the memory consumption of each stream.