How to improve ensemble concurrent performance

In my scenario many services are using single Triton instance with defined ensemble model.
Is there any way to tweak performance of my ensemble model?

name: "pipeline"
platform: "ensemble"
input [
    name: "RAW_IMAGE_RGB"
    data_type: TYPE_UINT8
    dims: [ -1, -1, 3 ]
output [
    name: "target_output"
    data_type: TYPE_FP32
    dims: [ 6001, 1, 1 ]
ensemble_scheduling {
  step [
      model_name: "preprocess"
      model_version: -1
      input_map {
        key: "RAW_IMAGE_RGB"
        value: "RAW_IMAGE_RGB"
      output_map {
        key: "IMAGE_CHW_640"
        value: "preprocess_output"
      model_name: "target_model"
      model_version: -1
      input_map {
        key: "data"
        value: "preprocess_output"
      output_map {
        key: "target_output"
        value: "target_output"

After deploying this model with my services, I can observe performance improvement only on first service:

I have multiple (3 instances each) instances of my preprocess which is simple DALI backend model and target_model (tensorrt)

I think every service should benefit from using ensembling here, 'cause I moved heavy code to preprocessing model.

GIthub issue: Improve ensemble concurrent performance · Issue #3413 · triton-inference-server/server · GitHub


You will be better served if you post this here: Triton Inference Server · GitHub
This forum is not monitored by the Triton team on a regular basis.