In my scenario many services are using single Triton instance with defined ensemble model.
Is there any way to tweak performance of my ensemble model?
name: "pipeline"
platform: "ensemble"
input [
{
name: "RAW_IMAGE_RGB"
data_type: TYPE_UINT8
dims: [ -1, -1, 3 ]
}
]
output [
{
name: "target_output"
data_type: TYPE_FP32
dims: [ 6001, 1, 1 ]
}
]
ensemble_scheduling {
step [
{
model_name: "preprocess"
model_version: -1
input_map {
key: "RAW_IMAGE_RGB"
value: "RAW_IMAGE_RGB"
}
output_map {
key: "IMAGE_CHW_640"
value: "preprocess_output"
}
},
{
model_name: "target_model"
model_version: -1
input_map {
key: "data"
value: "preprocess_output"
}
output_map {
key: "target_output"
value: "target_output"
}
}
]
}
After deploying this model with my services, I can observe performance improvement only on first service:
I have multiple (3 instances each) instances of my preprocess which is simple DALI backend
model and target_model
(tensorrt)
I think every service should benefit from using ensembling here, 'cause I moved heavy code to preprocessing model.
GIthub issue: Improve ensemble concurrent performance · Issue #3413 · triton-inference-server/server · GitHub