Is there any other way besides TensorRT to increase the GPU utilization while doing the inference?

we’ve used tensorRT to increase the inference GPU utiliz, it worked, but still the utilization is around 30%,not high enough. Is there any other way? It will be great if it goes up to 70%.
We tried to use MPS(muti-process service)to run several models on the same GPU, but seems no help, our GPU is P40.

Hi,

Can you clarify what you mean by using TensorRT to increase the inference GPU utilization?

Also, TensorRT Inference Server (https://github.com/NVIDIA/tensorrt-inference-server and https://ngc.nvidia.com/catalog/containers/nvidia:tensorrtserver) can load multiple instances of multiple models on each GPU based on a configuration file.

Have you looked into this?

Thanks,
NVIDIA Enterprise Support