For multi stream video analytics, is it possible to create threads and thread blocks and do inference for every stream on a different thread using TensorRT inference engine in C++ as we wish?
Please refer below link:
Here is a tutorial for your reference for executing with multiple CUDA stream.
Deepstream reference apps:
First of all thanks a lot for your response.
As per the document for TensorRT and slides for Cuda streams says I created multiple streams and multiple execution context and checked the output in visual profiler.
This is the code within my inference function, where I have used BufferManager from buffer.h library which was provided with tensorrt samples.
// Execute the inference work context->enqueue(mParams.batchSize, buffers.getDeviceBindings().data(),stream1,nullptr); context2->enqueue(mParams.batchSize, buffers2.getDeviceBindings().data(),stream2,nullptr); // Copy data from device output buffers to host output buffers buffers.copyOutputToHostAsync(stream1); buffers2.copyOutputToHostAsync(stream2); cudaProfilerStop();
As you can see, I have separate stream, separate buffer and separate execution context to inference 2 images. But when I checked the visual profiler results both the executions (inference) did not happen simultaneously.
Can you please explain me what went wrong here?
Unless otherwise specified all calls are placed into a default stream, often referred to as “Stream 0”. It has special synchronization rules:
- Synchronous with all streams
- Operations in stream 0 cannot overlap other streams
You avoid this you need to createsStreams with non-blocking flag set
Please refer this link:
It might also be due to low GPU compute space to run multiple streams. Please check the memory consumption of each stream.