How to run inference in multithread( only Allocate host and device buffers once for all execution contexts)

1036758468 · July 13, 2021, 8:59am

Hi,

I have several TensorRT engines including yolo and inception for detection and classification tasks, respectively. What I have done is running these engines one by one, which takes much time though. What I want to do is to run them in parallel using multi-threading in python. I know the TensorRT runtime can be used by multiple threads simultaneously, so long as each object uses a different execution context. But in each execution context, It will allocate host and device buffers for itself, so the total allocated buffers will be quite large. Is there any way to run inferences in multi-threading and only allocate buffers once for all execution contexts? Thanks!

NVES · July 14, 2021, 7:37am

Hi,
The below link might be useful for you
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#thread-safety

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html
For multi threading/streaming, will suggest you to use Deepstream or TRITON
For more details, we recommend you to raise the query to the Deepstream or TRITON forum.

Thanks!

Topic		Replies	Views
How to run inference in multithread( only Allocate host and device buffers once for all execution contexts) CUDA-GDB tensorrt , cuda	2	870	July 13, 2021
Concurrent inference in a single IExecutionContext TensorRT	2	977	February 11, 2020
Multiple context and/or multithreading TensorRT	1	1261	March 24, 2022
Speeding up multi-threaded C++ program of TensorRT models TensorRT tensorrt	7	1340	February 20, 2025
Tensorrt multiple process TensorRT tensorrt	2	1550	February 21, 2024
Is multi threaded execution possible with tensorRT? TensorRT	3	2240	April 13, 2020
Is TensorRT safe to create engine & context in one thread, and execute in another thread? TensorRT	1	692	June 5, 2022
Multithread does not improve inference performance with tensorrt models TensorRT tensorrt	2	1176	May 11, 2021
TensorRT MultiThread with MultiGPU TensorRT	1	483	February 14, 2023
Latency when running TensorRT engine on two GPU TensorRT	9	1233	August 24, 2020

How to run inference in multithread( only Allocate host and device buffers once for all execution contexts)

Related topics