Deepstream - Use standalone Triton server?

brian0b6iu · June 9, 2021, 8:12pm

Is there any way to use the standalone triton server with deepstream? e.g. this: GitHub - triton-inference-server/server: The Triton Inference Server provides an optimized cloud and edge inferencing solution.

It seems that the deepstream examples use an different version of the triton server? This means that it has to load the model every time I run an app. With the standalone triton server it would load the model once and keep this model loaded and make the app loading quicker I assume?

Or am I missing some important aspect of quickly debugging a new app without loading the model each time?

Thanks!

mchi · June 10, 2021, 10:05am

What’s your platform?

Please provide the setup info as other topic does

brian0b6iu · June 10, 2021, 10:34am

I had originally posted this in the Jetson Xavier NX forum but it must have been moved. I had hoped the answer would be platform agnostic, but the details are:

• Hardware Platform (Jetson / GPU)

Jetson Xavier NX

• DeepStream Version

Any

• JetPack Version (valid for Jetson only)

Any

• TensorRT Version

Any

• Issue Type( questions, new requirements, bugs)

Question

• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

N/A

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

N/A

mchi · June 10, 2021, 12:01pm

triton support on Jetson and dGPU platform are different.
Is there any way to use the standalone triton server with deepstream? ===> it’s not supported

For Triton on Jetson, please find below info in https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html .

brian0b6iu · June 10, 2021, 12:25pm

Thanks for the reply. I’m not sure what part of your link is most relevant - is it that you are saying that it’s not supported at all? Or only for the Xavier NX? I see the “Triton for Jetson” section doesn’t mention the Xavier NX, however the release section for the triton standalone app doesn’t list the compatibility to this level, it only mentions “Jetson”: Releases · triton-inference-server/server · GitHub

In any case, I am able to run the standalone server on the Xavier NX.

If you could please clarify:

Does the deepstream app launch its own version of the Triton server? Is it invoking a ‘special’ deepstream version or is it launching the standalone one, albeit in the same process? Or does it not use the Triton server at all?
Is it possible (on any platform) to have the deepstream app use the standalone server?
If 2) is false, I would hope the nvinfer plugin would use the C-API interface to the Triton server and could therefore be modified slightly to not launch its own version and to use an already running version?

Thanks!

mchi · June 10, 2021, 1:56pm

Xavier, NX are both Jetson.

Does the deepstream app launch its own version of the Triton server? ==> DS5.1 is based on triton-2.5(20.11) interface, since it’s wrapped as GST plugin, it can only receive the GST buffer from other GST plugins.

Is it possible (on any platform) to have the deepstream app use the standalone server? ==> No

nvinfer plugin would use the C-API interface to the Triton server ==> nvinfer is based on TensorRT, nvinferserer is based on Triton.

would use the C-API interface to the Triton server and could therefore be modified slightly to not launch its own version and to use an already running version? =====> nvinferserver is not open source, you can’t modify it.

If you want to use standalone Triton, as the link and the screenshot I shared above, you can install the Jetson Triton on Xavier NX.

brian0b6iu · June 10, 2021, 2:33pm

OK, thanks for the clear answers.

To summarise as I understand it: deepstream cannot use the standalone triton server and therefore I cannot use the advantage of the server running constantly with the model already loaded, and I also cannot edit the plugin to allow this. This is the case for any platform.

I guess in a perfect world there would be a deepstream plugin to allow use of the Triton standalone server, using CUDA shared memory or some other fast, low latency zero copy way of inference - if you have a system to keep note of such requests I would appreciate it if you added this.

One last clarification - Is there another way of speeding up the deepstream startup time? Or must I always wait in some way for the model to load each time I start an app?

mchi · June 10, 2021, 2:49pm

Sorry! I’m not clear about what this mean. For DS nvinferserver (triton), it only needs load the model once at the initiliaztion stage, after that, it can use the loaded model for inference.

For exmaple, with Triton server to inference a TF saved model, doesn’t it need to load the saved model in every boot up?

brian0b6iu · June 10, 2021, 3:03pm

Sorry, I will try to explain it better:

An example: I will open a deepstream app 4 times. Each time I close app then reopen I must wait for the model to be loaded (perhaps a few minutes). Each of the 4 times I will have to wait a few minutes for the model to load. I want to use deepstream since it gives the most optimised pipeline for maximum speed (is this true?), however when developing/debugging my own app based on the deepstream examples the long startup / model load time is a hindrance to fast debug cycles.

Therefore, perhaps naively, my thought is that if we could instead have deepstream using the standalone Triton server then the model is loaded once and stays loaded in the separate Triton process, meaning that the separate deepstream app will start quickly since it is not loading the model itself, and my debug cycles would be shorter.

I hope that’s clearer - basically I am looking for the most efficient way to debug apps.

Edit: Perhaps I am misunderstanding due to my weak knowledge - do the deepstream triton examples spawn a Triton server that remains running once the deepstream app is closed? I have only recently come to appreciate (and more so with your help) that there are separate examples for triton and perhaps I haven’t run one to see how it works.

mchi · June 11, 2021, 7:59am

understood now.

DS-Triton does not spawn a new standalone server, so when deepstream application is closed, the ds-triton will be closed along with it.

Topic		Replies	Views
How to add triton server to deepstream in different device? DeepStream SDK	10	808	September 7, 2023
Deepstream with standalone triton server DeepStream SDK	4	893	November 1, 2021
Deepstream 6.0.1 on Jetson Xavier NX with Triton Python backend DeepStream SDK	4	291	January 20, 2023
JetsonNano - Using triton inference server via the DeepStream gstreamer plugin DeepStream SDK tensorrt , jetson-inference , gstreamer , inference-server-triton	3	1451	July 20, 2022
Deepstream Triton DeepStream SDK deepstream	18	76	April 14, 2025
Support for Triton Inference Server on Jetson NX Jetson Xavier NX cuda , jetson-inference , inference-server-triton , jetson , deepstream61	2	885	November 2, 2022
Deepstream with triton DeepStream SDK	12	563	October 9, 2023
Deepstream and Triton containers DeepStream SDK deepstream	5	28	September 30, 2024
GRPC Data Corruption/Issue with Yolo Object Detection with Triton on Jetson DeepStream SDK	20	679	June 25, 2024
DeepStream Triton Server and Triton Client cannot be used together DeepStream SDK nvbugs , inference-server-triton , deepstream , deepstream61	11	159	October 30, 2024

Deepstream - Use standalone Triton server?

Related topics