I’ve been using the TensorRT Inference Server for a while and I love it! It is awesome!
I’m trying to use the mode “model control” to schedule myself the load/unload of the models I need, because I deal with models that are quite big and I do not have enough available GPU memory to load all the models at once (even combining all the GPU memory available in all the GPUs). I have only memory to load 2 or 3 models, however I cannot know in advance which models I’ll need for a certain task, so I would like to dynamically load the models I need and if possible choose pairs of Model/GPU(s).
Is there a way to get the list of available models even if they are not loaded?
Is there a way to dynamically assign a GPU to a model that we request to load?