[TensorRT] Speed of concurrent execute multiple TensorRT model on one GPU

summerxi · May 24, 2020, 3:46pm

Description

When I run a Yolo3 model it cost about 10 ms.

When I run 2 Yolo3 models in a 2080 GPU in 2 threads with 10000 loop concurrently with multiple streams, it cost about 20 ms for every time.

Yolo3 model GPU usage is about 2G, 2080 has 8 G memory, running batch =1.

HOW can I concurrent execute multiple models in multiple threads with multiple streams, the average cost time be 10 ms every time ???

Environment

TensorRT Version: TensorRT 7 and TensorRT 5
GPU Type: TensorRT 7 for 2080 and TensorRT 5 for Titan XP
Nvidia Driver Version: TensorRT 7 for 10.0 and TensorRT 5 for 9.0
CUDA Version: TensorRT 7 for 10.2 and TensorRT 5 for 9.0
CUDNN Version: TensorRT 7 for 7.6.5
Operating System + Version: Ubuntu 16.04
Python Version (if applicable): NA
TensorFlow Version (if applicable): NA
PyTorch Version (if applicable): NA
Baremetal or Container (if container which image + tag): NA

SunilJB · May 24, 2020, 5:55pm

In order to run multiple model with TensorRT, i will recommend you to either use NVIDIA deepstream or NVIDIA Triton Inference Server.
Please refer below link for more details:

If you want to perform multi threading using TensorRT, please refer below link for best practices:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt-700/tensorrt-best-practices/index.html#thread-safety

You can also try batch-inference in a single IExecutionContext. Batching might give higher throughput compared to multiple Execution Contexts.

Thanks

Topic		Replies	Views
Tensorrt Threads affect each other during multithreaded inference TensorRT tensorrt	16	1878	September 6, 2024
How to inference with tensorrt on multi gpus in python TensorRT	2	2274	April 9, 2021
Speeding up multi-threaded C++ program of TensorRT models TensorRT tensorrt	7	1629	February 20, 2025
Parallel execution of several trt contexts on one GPU TensorRT onnx	1	1507	August 7, 2023
Batch inference parallelization on tensorrt DeepStream SDK tensorrt	1	559	April 23, 2021
Run multiple model(engine) with tensorrt without deepstream TensorRT	1	1194	April 20, 2020
TensorRT Parallel Inference /concurrent inferecing TensorRT tensorrt	10	4466	October 13, 2022
Is it possible to run multiple TensorRT model inference on a GPU simultaneously and parallelly? TensorRT tensorrt , cuda	3	2264	August 23, 2022
Run two YOLOv3 models with CUDA Stream use TensorRT have a lot of cudaEventRecord TensorRT	3	969	May 13, 2020
Multi-model parallel inferencing TensorRT	1	457	March 31, 2023

[TensorRT] Speed of concurrent execute multiple TensorRT model on one GPU

Description

Environment

Related topics