For multi stream video analytics, is it possible to create threads and thread blocks and do inference for every stream on a different thread using TensorRT inference engine in C++ as we wish?
Hi,
Please refer below link:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-best-practices/index.html#optimize-performance
Here is a tutorial for your reference for executing with multiple CUDA stream.
Deepstream SDK:
Deepstream reference apps:
Thanks
First of all thanks a lot for your response.
As per the document for TensorRT and slides for Cuda streams says I created multiple streams and multiple execution context and checked the output in visual profiler.
This is the code within my inference function, where I have used BufferManager from buffer.h library which was provided with tensorrt samples.
cudaProfilerStart();
buffers.copyInputToDeviceAsync(stream1);
buffers2.copyInputToDeviceAsync(stream2);
// Execute the inference work
context->enqueue(mParams.batchSize, buffers.getDeviceBindings().data(),stream1,nullptr);
context2->enqueue(mParams.batchSize, buffers2.getDeviceBindings().data(),stream2,nullptr);
// Copy data from device output buffers to host output buffers
buffers.copyOutputToHostAsync(stream1);
buffers2.copyOutputToHostAsync(stream2);
cudaProfilerStop();
As you can see, I have separate stream, separate buffer and separate execution context to inference 2 images. But when I checked the visual profiler results both the executions (inference) did not happen simultaneously.
Can you please explain me what went wrong here?
Hi,
Unless otherwise specified all calls are placed into a default stream, often referred to as “Stream 0”. It has special synchronization rules:
- Synchronous with all streams
- Operations in stream 0 cannot overlap other streams
You avoid this you need to createsStreams with non-blocking flag set
— cudaStreamCreateWithFlags(&stream,cudaStreamNonBlocking)
Please refer this link:
http://on-demand.gputechconf.com/gtc/2014/presentations/S4158-cuda-streams-best-practices-common-pitfalls.pdf
It might also be due to low GPU compute space to run multiple streams. Please check the memory consumption of each stream.
Thanks
