Is multi threaded execution possible with tensorRT?

lavinan26 · April 4, 2020, 6:56pm

For multi stream video analytics, is it possible to create threads and thread blocks and do inference for every stream on a different thread using TensorRT inference engine in C++ as we wish?

SunilJB · April 6, 2020, 6:08am

Hi,

Please refer below link:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-best-practices/index.html#optimize-performance

Here is a tutorial for your reference for executing with multiple CUDA stream.

Deepstream SDK:

Deepstream reference apps:

Thanks

lavinan26 · April 6, 2020, 12:27pm

First of all thanks a lot for your response.
As per the document for TensorRT and slides for Cuda streams says I created multiple streams and multiple execution context and checked the output in visual profiler.
This is the code within my inference function, where I have used BufferManager from buffer.h library which was provided with tensorrt samples.

cudaProfilerStart();
buffers.copyInputToDeviceAsync(stream1);
buffers2.copyInputToDeviceAsync(stream2);

    // Execute the inference work
    
    context->enqueue(mParams.batchSize, buffers.getDeviceBindings().data(),stream1,nullptr);
    context2->enqueue(mParams.batchSize, buffers2.getDeviceBindings().data(),stream2,nullptr);
    // Copy data from device output buffers to host output buffers
    buffers.copyOutputToHostAsync(stream1);
    buffers2.copyOutputToHostAsync(stream2);
    cudaProfilerStop();

As you can see, I have separate stream, separate buffer and separate execution context to inference 2 images. But when I checked the visual profiler results both the executions (inference) did not happen simultaneously.

Can you please explain me what went wrong here?

SunilJB · April 13, 2020, 7:46am

Hi,

Unless otherwise specified all calls are placed into a default stream, often referred to as “Stream 0”. It has special synchronization rules:

Synchronous with all streams
Operations in stream 0 cannot overlap other streams
You avoid this you need to createsStreams with non-blocking flag set
— cudaStreamCreateWithFlags(&stream,cudaStreamNonBlocking)
Please refer this link:
http://on-demand.gputechconf.com/gtc/2014/presentations/S4158-cuda-streams-best-practices-common-pitfalls.pdf

It might also be due to low GPU compute space to run multiple streams. Please check the memory consumption of each stream.

Thanks

Topic		Replies	Views
Concurrent inference in a single IExecutionContext TensorRT	2	1028	February 11, 2020
TensorRT on Multiple CUDA-Streams GPU-Accelerated Libraries	1	2455	May 9, 2018
TensorRT 3.0.2 with multi-streaming TensorRT	3	2853	September 10, 2018
Inference Time When Using Multi Stream in TensorRT is Much Slower than a Single One TensorRT tensorrt	5	2572	March 30, 2023
Batch inference parallelization on tensorrt TensorRT tensorrt , cuda	5	1012	May 5, 2021
TensorRT multi stream TensorRT	3	2849	February 29, 2024
Issue in making streams concurrent Jetson AGX Xavier	6	952	April 11, 2019
Concurrent instances of TensorRT TensorRT	0	737	March 9, 2019
Execute multiple TensorRT TensorRT	1	703	October 22, 2019
[Question] trtexec understanding issue TensorRT	4	1060	December 6, 2021

Is multi threaded execution possible with tensorRT?

Related topics