Deepstream - Use standalone Triton server?

Is there any way to use the standalone triton server with deepstream? e.g. this: GitHub - triton-inference-server/server: The Triton Inference Server provides an optimized cloud and edge inferencing solution.

It seems that the deepstream examples use an different version of the triton server? This means that it has to load the model every time I run an app. With the standalone triton server it would load the model once and keep this model loaded and make the app loading quicker I assume?

Or am I missing some important aspect of quickly debugging a new app without loading the model each time?

Thanks!

What’s your platform?

Please provide the setup info as other topic does

I had originally posted this in the Jetson Xavier NX forum but it must have been moved. I had hoped the answer would be platform agnostic, but the details are:

• Hardware Platform (Jetson / GPU)

Jetson Xavier NX

• DeepStream Version

Any

• JetPack Version (valid for Jetson only)

Any

• TensorRT Version

Any

• Issue Type( questions, new requirements, bugs)

Question

• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

N/A

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

N/A

  1. triton support on Jetson and dGPU platform are different.
  2. Is there any way to use the standalone triton server with deepstream? ===> it’s not supported

For Triton on Jetson, please find below info in Frameworks Support Matrix :: NVIDIA Deep Learning Frameworks Documentation .

1 Like

Thanks for the reply. I’m not sure what part of your link is most relevant - is it that you are saying that it’s not supported at all? Or only for the Xavier NX? I see the “Triton for Jetson” section doesn’t mention the Xavier NX, however the release section for the triton standalone app doesn’t list the compatibility to this level, it only mentions “Jetson”: Releases · triton-inference-server/server · GitHub

In any case, I am able to run the standalone server on the Xavier NX.

If you could please clarify:

  1. Does the deepstream app launch its own version of the Triton server? Is it invoking a ‘special’ deepstream version or is it launching the standalone one, albeit in the same process? Or does it not use the Triton server at all?

  2. Is it possible (on any platform) to have the deepstream app use the standalone server?

  3. If 2) is false, I would hope the nvinfer plugin would use the C-API interface to the Triton server and could therefore be modified slightly to not launch its own version and to use an already running version?

Thanks!

Xavier, NX are both Jetson.

Does the deepstream app launch its own version of the Triton server? ==> DS5.1 is based on triton-2.5(20.11) interface, since it’s wrapped as GST plugin, it can only receive the GST buffer from other GST plugins.

Is it possible (on any platform) to have the deepstream app use the standalone server? ==> No

nvinfer plugin would use the C-API interface to the Triton server ==> nvinfer is based on TensorRT, nvinferserer is based on Triton.

would use the C-API interface to the Triton server and could therefore be modified slightly to not launch its own version and to use an already running version? =====> nvinferserver is not open source, you can’t modify it.

If you want to use standalone Triton, as the link and the screenshot I shared above, you can install the Jetson Triton on Xavier NX.

1 Like

OK, thanks for the clear answers.

To summarise as I understand it: deepstream cannot use the standalone triton server and therefore I cannot use the advantage of the server running constantly with the model already loaded, and I also cannot edit the plugin to allow this. This is the case for any platform.

I guess in a perfect world there would be a deepstream plugin to allow use of the Triton standalone server, using CUDA shared memory or some other fast, low latency zero copy way of inference - if you have a system to keep note of such requests I would appreciate it if you added this.

One last clarification - Is there another way of speeding up the deepstream startup time? Or must I always wait in some way for the model to load each time I start an app?

Sorry! I’m not clear about what this mean. For DS nvinferserver (triton), it only needs load the model once at the initiliaztion stage, after that, it can use the loaded model for inference.

For exmaple, with Triton server to inference a TF saved model, doesn’t it need to load the saved model in every boot up?

Sorry, I will try to explain it better:

An example: I will open a deepstream app 4 times. Each time I close app then reopen I must wait for the model to be loaded (perhaps a few minutes). Each of the 4 times I will have to wait a few minutes for the model to load. I want to use deepstream since it gives the most optimised pipeline for maximum speed (is this true?), however when developing/debugging my own app based on the deepstream examples the long startup / model load time is a hindrance to fast debug cycles.

Therefore, perhaps naively, my thought is that if we could instead have deepstream using the standalone Triton server then the model is loaded once and stays loaded in the separate Triton process, meaning that the separate deepstream app will start quickly since it is not loading the model itself, and my debug cycles would be shorter.

I hope that’s clearer - basically I am looking for the most efficient way to debug apps.

Edit: Perhaps I am misunderstanding due to my weak knowledge - do the deepstream triton examples spawn a Triton server that remains running once the deepstream app is closed? I have only recently come to appreciate (and more so with your help) that there are separate examples for triton and perhaps I haven’t run one to see how it works.

understood now.

DS-Triton does not spawn a new standalone server, so when deepstream application is closed, the ds-triton will be closed along with it.

1 Like