How to run multi trt model instance in single gpu efficentilly?

AI & Data Science Deep Learning (Training & Inference) TensorRT

yangcheng01 June 20, 2019, 9:12am 1

I have run 4 trt model instance for inference in single gpu, the utils of it is about 80% and the latency of each model was increasing, I’m curious if trt can isolate each model on gpu for avoid competition??? What else I can do for improving their inference speed ?

1 Like

Topic		Replies	Views
Multithread does not improve inference performance with tensorrt models TensorRT tensorrt	2	1186	May 11, 2021
Is it possible to run multiple TensorRT model inference on a GPU simultaneously and parallelly? TensorRT tensorrt , cuda	3	2023	August 23, 2022
Ideas to maximize throughput using TensorRT TensorRT	1	366	November 20, 2020
Running Real-Time Instance Segmentation with Local GPUs TensorRT tensorrt , camera , ros , python , cudnn	2	66	February 18, 2025
How to inference with tensorrt on multi gpus in python TensorRT	2	2180	April 9, 2021
[TensorRT] Speed of concurrent execute multiple TensorRT model on one GPU TensorRT tensorrt	1	1773	May 24, 2020
Slow first inference and very slow two models inference TensorRT	3	1258	August 2, 2022
Inference Time When Using Multi Stream in TensorRT is Much Slower than a Single One TensorRT tensorrt	5	2502	March 30, 2023
Tensorrt Threads affect each other during multithreaded inference TensorRT tensorrt	16	1460	September 6, 2024
How to accelerate model inference speed and reduce the overall inference time of multiple TRT model thread pools? General tensorrt , cuda , jetson-inference	0	531	September 21, 2023

How to run multi trt model instance in single gpu efficentilly?

Related topics