How to improve ensemble concurrent performance

krzysztof.begiedza · September 28, 2021, 1:21pm

In my scenario many services are using single Triton instance with defined ensemble model.
Is there any way to tweak performance of my ensemble model?

name: "pipeline"
platform: "ensemble"
input [
  {
    name: "RAW_IMAGE_RGB"
    data_type: TYPE_UINT8
    dims: [ -1, -1, 3 ]
  }
]
output [
  {
    name: "target_output"
    data_type: TYPE_FP32
    dims: [ 6001, 1, 1 ]
  }
]
ensemble_scheduling {
  step [
    {
      model_name: "preprocess"
      model_version: -1
      input_map {
        key: "RAW_IMAGE_RGB"
        value: "RAW_IMAGE_RGB"
      }
      output_map {
        key: "IMAGE_CHW_640"
        value: "preprocess_output"
      }
    },
    {
      model_name: "target_model"
      model_version: -1
      input_map {
        key: "data"
        value: "preprocess_output"
      }
      output_map {
        key: "target_output"
        value: "target_output"
      }
    }
  ]
}

After deploying this model with my services, I can observe performance improvement only on first service:

I have multiple (3 instances each) instances of my preprocess which is simple DALI backend model and target_model (tensorrt)

I think every service should benefit from using ensembling here, 'cause I moved heavy code to preprocessing model.

GIthub issue: Improve ensemble concurrent performance · Issue #3413 · triton-inference-server/server · GitHub

TomNVIDIA · September 28, 2021, 2:59pm

Hello,

You will be better served if you post this here: Triton Inference Server · GitHub
This forum is not monitored by the Triton team on a regular basis.

Topic		Replies	Views
GPUs are underutilized with Triton Triton Inference Server - archived inference-server-triton , inception	2	877	November 22, 2023
Help with efficient execution of triton ensembles DeepStream SDK inference-server-triton	8	389	March 1, 2024
Serving ML Model Pipelines on NVIDIA Triton Inference Server with Ensemble Models Technical Blog	1	531	July 13, 2023
Triton-server model load balancing DeepStream SDK inference-server-triton	6	944	February 8, 2023
Accelerating Inference with NVIDIA Triton Inference Server and NVIDIA DALI Technical Blog	2	725	March 22, 2022
Triton Inference Server Ensemble Lost Intermediate Output Forum Feedback inference-server-triton	2	30	September 16, 2024
Error when using ensemble model with deepstream-5.1 : failed to get input buffer in CPU memory DeepStream SDK inference-server-triton	7	1200	September 4, 2021
Deepstream-Triton vs perf_analyzer throughputs DeepStream SDK	9	1154	April 26, 2022
Deepstream Triton Ensemble Model Error DeepStream SDK inference-server-triton	8	1013	June 15, 2022
A30 Triton Inference slower than trtexec for increasing batch sizes TAO Toolkit	4	747	March 22, 2022

How to improve ensemble concurrent performance

Related topics