I have run 4 trt model instance for inference in single gpu, the utils of it is about 80% and the latency of each model was increasing, I’m curious if trt can isolate each model on gpu for avoid competition??? What else I can do for improving their inference speed ?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Multithread does not improve inference performance with tensorrt models | 2 | 1186 | May 11, 2021 | |
Is it possible to run multiple TensorRT model inference on a GPU simultaneously and parallelly? | 3 | 2023 | August 23, 2022 | |
Ideas to maximize throughput using TensorRT | 1 | 366 | November 20, 2020 | |
Running Real-Time Instance Segmentation with Local GPUs | 2 | 66 | February 18, 2025 | |
How to inference with tensorrt on multi gpus in python | 2 | 2180 | April 9, 2021 | |
[TensorRT] Speed of concurrent execute multiple TensorRT model on one GPU | 1 | 1773 | May 24, 2020 | |
Slow first inference and very slow two models inference | 3 | 1258 | August 2, 2022 | |
Inference Time When Using Multi Stream in TensorRT is Much Slower than a Single One | 5 | 2502 | March 30, 2023 | |
Tensorrt Threads affect each other during multithreaded inference | 16 | 1460 | September 6, 2024 | |
How to accelerate model inference speed and reduce the overall inference time of multiple TRT model thread pools? | 0 | 531 | September 21, 2023 |