I have run 4 trt model instance for inference in single gpu, the utils of it is about 80% and the latency of each model was increasing, I’m curious if trt can isolate each model on gpu for avoid competition??? What else I can do for improving their inference speed ?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Ideas to maximize throughput using TensorRT | 1 | 360 | November 20, 2020 | |
Multithread does not improve inference performance with tensorrt models | 2 | 1175 | May 11, 2021 | |
How to increase TensorRT GPU utilization for lots of requests? | 3 | 784 | January 28, 2021 | |
Multi tensorrt task issue | 3 | 316 | October 18, 2021 | |
Performance discrepancy using TensorRT engines | 3 | 659 | October 5, 2021 | |
How to specify the GPU to do the inference when there are multiple GPUs installed? | 0 | 527 | June 13, 2019 | |
the inference time increases linearly when running more than 2 tensorrt instance on single GPU | 1 | 1572 | April 4, 2019 | |
tensorRT5 inference speed slown down in multithread application. | 2 | 1323 | December 3, 2019 | |
Multi-GPU inference from caffemodel in TensorRT | 0 | 845 | January 31, 2018 | |
Tensorrt inference time fluctuated when test a big model | 2 | 684 | June 4, 2021 |