Sorry, I will try to explain it better:
An example: I will open a deepstream app 4 times. Each time I close app then reopen I must wait for the model to be loaded (perhaps a few minutes). Each of the 4 times I will have to wait a few minutes for the model to load. I want to use deepstream since it gives the most optimised pipeline for maximum speed (is this true?), however when developing/debugging my own app based on the deepstream examples the long startup / model load time is a hindrance to fast debug cycles.
Therefore, perhaps naively, my thought is that if we could instead have deepstream using the standalone Triton server then the model is loaded once and stays loaded in the separate Triton process, meaning that the separate deepstream app will start quickly since it is not loading the model itself, and my debug cycles would be shorter.
I hope that’s clearer - basically I am looking for the most efficient way to debug apps.
Edit: Perhaps I am misunderstanding due to my weak knowledge - do the deepstream triton examples spawn a Triton server that remains running once the deepstream app is closed? I have only recently come to appreciate (and more so with your help) that there are separate examples for triton and perhaps I haven’t run one to see how it works.