Concurrent inference in a single IExecutionContext

rinatshigapov · February 10, 2020, 10:48am

I’m using torch2rt to convert my Torch model to TensorRT. TRTModule creates single execution context (IExecutionContext) for the runtime engine (https://github.com/NVIDIA-AI-IOT/torch2trt/blob/master/torch2trt/torch2trt.py#L333)

My inference code is concurrent and uses different CUDA streams for each inference execution.
Only single inference per stream is guaranteed to exist at each moment.

Is it correct to use just one execution context for multi stream concurrent inference?

SunilJB · February 11, 2020, 3:54am

Hi,

I think each thread should have it’s own execution context during inference and it’s own stream if doing asynchronous inference.

The TensorRT best practices doc explicitly states that each thread should have it’s own execution context:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-best-practices/index.html#thread-safety

And multiple CUDA streams can run in parallel, so for async inference, each thread should have it’s own stream that it’s queuing and synchronizing on.
If you shared a single stream, you would be pipe-lining all of your threads and you wouldn’t get as much of the parallel performance gain.

Thanks

rinatshigapov · February 11, 2020, 7:33am

Thank you for the link!

My current implementation uses single execution context, several Python threads and a dedicated pool of streams ordered in a queue. Each inference is done in the first available/free stream.

I haven’t noticed any issues with thread safety so far.

And extra question: can I use single execution context in an single async inference thread that does inference concurrently in different streams? Only single inference per stream is done at each moment.

Topic		Replies	Views
How to run inference in multithread( only Allocate host and device buffers once for all execution contexts) TensorRT tensorrt	1	459	July 14, 2021
Is multi threaded execution possible with tensorRT? TensorRT	3	2241	April 13, 2020
Is TensorRT safe to create engine & context in one thread, and execute in another thread? TensorRT	1	692	June 5, 2022
TensorRT Concurrent inference in C++ TensorRT cudnn	4	615	February 6, 2024
Batch inference parallelization on tensorrt DeepStream SDK tensorrt	2	481	October 12, 2021
Inference Time When Using Multi Stream in TensorRT is Much Slower than a Single One TensorRT tensorrt	5	2471	March 30, 2023
Thread safe while use tensorRT TensorRT	1	2599	March 25, 2019
Concurrent instances of TensorRT TensorRT	0	716	March 9, 2019
how to run trt in multithreading？ Jetson TX2	15	7954	October 18, 2021
Parallel execution of several trt contexts on one GPU TensorRT onnx	1	1176	August 7, 2023

Concurrent inference in a single IExecutionContext

Related topics