I need a little advice with deploying Triton inference server with explicit model control. From the looks of it, this mode is the one that gives the user the most control of which model goes live, etc. But the problem I’m not able to solve is how to load models in case the server goes in production and then a new instance spawns up.
The only solution I can think of is to have a service poll the server at regular time intervals, constantly check if my live models are actually live and if not, then load them. All honesty, i’m not a big fan of this solution.
How have you guys gone about solving this problem? Would really appreciate some ideas.
Thanks in advance