Issues with setting up Dynamic Batching for Triton server

Description

So I am trying to enable dynamic batching on my .onnx Yolo model. I used a third-party library (Ultralytics) to export a YOLOv11 model to ONNX format. Then, I transformed the inputs and the outputs to allow for dynamic batching. Now, the shape of the input and outputs are as follows:
Input: images Shape: [-1, 3, 640, 640]
Output: output0 Shape: [-1, 7, 8400]

However, whenever I try to start the Triton server, it gives me this error:
UNAVAILABLE: Invalid argument: model ‘yolo_v1’, tensor ‘output0’: the model expects 3 dimensions (shape [1,7,8400]) but the model configuration specifies 3 dimensions (shape [-1,7,8400]).

Is there anything else I need to do in order to allow for dynamic batching?

Thanks.

Environment

Using Triton version 24.09

Relevant Files

Here is the code I used to check the dimensions of the inputs and outputs:

import onnx

model_path = "/home/model_repository/yolo_v1/1/model.onnx"  # Update with your actual model path
model = onnx.load(model_path)

for inp in model.graph.input:
    print(f"Input: {inp.name} Shape: {[dim.dim_value if dim.dim_value > 0 else -1 for dim in inp.type.tensor_type.shape.dim]}")

for out in model.graph.output:
    print(f"Output: {out.name} Shape: {[dim.dim_value if dim.dim_value > 0 else -1 for dim in out.type.tensor_type.shape.dim]}")

Here is what I used to convert my model to allow for dynamic batching:

import onnx

# Load the ONNX model
model_path = "/home/model_repository/yolo_v1/1/model.onnx"
model = onnx.load(model_path)

# Update input dimensions (set batch size to dynamic for all inputs)
for input_tensor in model.graph.input:
    input_tensor.type.tensor_type.shape.dim[0].dim_value = -1
    input_tensor.type.tensor_type.shape.dim[0].dim_param = "batch_size"  # Set as dynamic

# Update output dimensions (set batch size to dynamic for all outputs)
for output_tensor in model.graph.output:
    output_dims = output_tensor.type.tensor_type.shape.dim
    # Ensure first dimension (batch size) is dynamic (-1)
    output_dims[0].dim_value = -1
    output_dims[0].dim_param = "batch_size"  # Set as dynamic for the batch dimension
    
    # Ensure all other dimensions are correctly set (if necessary)
    for dim in output_dims[1:]:
        dim.dim_value = dim.dim_value  # Keep the existing size for other dimensions

# Save the updated model
updated_model_path = "yolo_v1_dynamic.onnx"
onnx.save(model, updated_model_path)

print(f"Updated model saved to {updated_model_path}")

Here is my config file:

name: "yolo_v1"
platform: "onnxruntime_onnx"
input [
  {
    name: "images"
    data_type: TYPE_FP32
    dims: [ -1, 3,  640, 640 ]  # Since the resolution of your image is 115 by 133
  }
]

output [
  {
    name: "output0"
    data_type: TYPE_FP32
    dims: [ -1, 7, 8400 ]  # Adjust based on YOLO's output format
  }
]
instance_group [
    {
      count: 15
      kind: KIND_GPU
    }
]
dynamic_batching { }

Here is the code I use to start the Triton server:

docker run --gpus all -it --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.09-py3 tritonserver  --model-repository=/models

You can follow the Ultralytics Triton Inference Sevrer guide. You don’t need to provide input or output shape in config.