Is there any other way besides TensorRT to increase the GPU utilization while doing the inference?

J8oe · February 14, 2019, 6:54am

we’ve used tensorRT to increase the inference GPU utiliz, it worked, but still the utilization is around 30%,not high enough. Is there any other way? It will be great if it goes up to 70%.
We tried to use MPS（muti-process service）to run several models on the same GPU, but seems no help, our GPU is P40.

NVES_R · March 25, 2019, 8:12pm

Hi,

Can you clarify what you mean by using TensorRT to increase the inference GPU utilization?

Also, TensorRT Inference Server ([url]https://github.com/NVIDIA/tensorrt-inference-server[/url] and [url]https://ngc.nvidia.com/catalog/containers/nvidia:tensorrtserver[/url]) can load multiple instances of multiple models on each GPU based on a configuration file.

Have you looked into this?

Thanks,
NVIDIA Enterprise Support

Topic		Replies	Views
Help with increasing performance on TensorRT Inference Server TensorRT	0	434	August 19, 2019
GPU Utilization TensorRT tensorrt	3	935	August 29, 2023
How to increase TensorRT GPU utilization for lots of requests? TensorRT tensorrt	3	914	January 28, 2021
Real Time Inference with Multi GPU - Multiple Model Triton Inference Server (archived)	1	1518	January 29, 2020
TF-TRT5: How to run tensorflow-tensorrt inferences with multiple GPUs TensorRT	10	3775	September 3, 2019
TensorRT Inference Consuming Large Amount of System Resources TensorRT	1	744	July 5, 2022
Tensor RT server with GPU only instances high CPU usage Triton Inference Server (archived)	4	2698	February 27, 2020
GPUs are underutilized with Triton Triton Inference Server (archived) inference-server-triton , inception	2	1040	November 22, 2023
Ideas to maximize throughput using TensorRT TensorRT	1	415	November 20, 2020
TensorRT Inference server low performance with 8 GPUs Triton Inference Server (archived)	2	1036	September 10, 2019

Is there any other way besides TensorRT to increase the GPU utilization while doing the inference?

Related topics