Is multi threaded execution possible with tensorRT?

For multi stream video analytics, is it possible to create threads and thread blocks and do inference for every stream on a different thread using TensorRT inference engine in C++ as we wish?

Hi,

Please refer below link:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-best-practices/index.html#optimize-performance

Here is a tutorial for your reference for executing with multiple CUDA stream.

Deepstream SDK:

Deepstream reference apps:

Thanks

First of all thanks a lot for your response.
As per the document for TensorRT and slides for Cuda streams says I created multiple streams and multiple execution context and checked the output in visual profiler.
This is the code within my inference function, where I have used BufferManager from buffer.h library which was provided with tensorrt samples.

cudaProfilerStart();
buffers.copyInputToDeviceAsync(stream1);
buffers2.copyInputToDeviceAsync(stream2);

    // Execute the inference work
    
    context->enqueue(mParams.batchSize, buffers.getDeviceBindings().data(),stream1,nullptr);
    context2->enqueue(mParams.batchSize, buffers2.getDeviceBindings().data(),stream2,nullptr);
    // Copy data from device output buffers to host output buffers
    buffers.copyOutputToHostAsync(stream1);
    buffers2.copyOutputToHostAsync(stream2);
    cudaProfilerStop();

As you can see, I have separate stream, separate buffer and separate execution context to inference 2 images. But when I checked the visual profiler results both the executions (inference) did not happen simultaneously.

Can you please explain me what went wrong here?

Hi,

Unless otherwise specified all calls are placed into a default stream, often referred to as “Stream 0”. It has special synchronization rules:

It might also be due to low GPU compute space to run multiple streams. Please check the memory consumption of each stream.

Thanks