Triton-server model load balancing

dilip.patel · February 1, 2023, 9:26am

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) T4, RTX4000
• DeepStream Version 6.1.1
• JetPack Version (valid for Jetson only)
• TensorRT Version 8.1.2
• NVIDIA GPU Driver Version (valid for GPU only) 515.65.01
• Issue Type( questions, new requirements, bugs) questions
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hello,

I am using triton-inference server on single GPU server and I cannot fully utilise my GPU due to one of model is slow.

is there a way to scale individual model in triton-server?

mchi · February 2, 2023, 3:28pm

one of model is slow ==> do you run it with TensorRT? How did you find it’s slow?

scale individual model in triton-server ==> sorry, what do you mean by ‘scale’?

dilip.patel · February 2, 2023, 4:54pm

It is python pre-process model. I want to run multiple instance of same model. If supported, does triton-server manage number of request to any instance?

mchi · February 6, 2023, 3:53pm

Hi @dilip.patel

does triton-server manage number of request to any instance?

Yes, it’s supported. For this, you can use batch inference mode also.
What backend will you use for Triton inference?

dilip.patel · February 7, 2023, 5:34am

I will use python backend.

mchi · February 8, 2023, 2:32am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Pytorch bbackend supports dynamic batch, so you can build the model with max batch, and send request to Triton server/pytorch backend with any batch less than the max batch

system · March 7, 2023, 6:06am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Triton Inference server : Inference on multi-gpus and load balancing across gpus General	3	66	November 28, 2024
Triton server configuration instance group DeepStream SDK	4	1528	March 30, 2022
Triton infererence server example 'simple_grpc_infer_client.py' DeepStream SDK	11	4856	March 23, 2022
Using Torch model without conversion DeepStream SDK	5	791	October 12, 2021
Install Deepstream Triton Server dependencies DeepStream SDK docker , inference-server-triton	4	578	June 7, 2022
Deepstream 6.0.1 on Jetson Xavier NX with Triton Python backend DeepStream SDK	4	289	January 20, 2023
DeepStream Gst-nvinferserver features to run triton inference server DeepStream SDK	3	317	January 9, 2024
Deepstream with triton DeepStream SDK	12	509	October 9, 2023
Deepstream - Use standalone Triton server? DeepStream SDK	10	1311	October 12, 2021
Deepstream with standalone triton server DeepStream SDK	4	890	November 1, 2021

Triton-server model load balancing

Related topics