Real Time Inference with Multi GPU - Multiple Model

Hi everyone,

I have multiple gpu and multiple different models. I coverted models to TensorRT for high performance. I need to get the highest fps thats possible. Which one is more suitable for this problem, using Nvidia Inference Server or allocate TensorRT models to specific gpu ? I will send real time sensor data over ROS, inference server made for data centers, this is a little confused my mind.

Thanks.

You can use either TRTIS to serve your TensorRT models or you could write a custom application that uses the TensorRT APIs to server your models. The benefit of TRTIS is that it is easy to use and gives you many options like multi-instance, multi-gpu, dynamic batcher, etc. Of course, you could write those features into your own custom application but that is probably time and effort you don’t want to spend.

Note that TRTIS allows you to map different models to different GPUs if that is what you want. Look at the protobuf and documentation on the instance_groups feature in the model configuration.