Issues with setting up Dynamic Batching for Triton server

bryan11.lee · March 4, 2025, 7:54pm

Description

So I am trying to enable dynamic batching on my .onnx Yolo model. I used a third-party library (Ultralytics) to export a YOLOv11 model to ONNX format. Then, I transformed the inputs and the outputs to allow for dynamic batching. Now, the shape of the input and outputs are as follows:
Input: images Shape: [-1, 3, 640, 640]
Output: output0 Shape: [-1, 7, 8400]

However, whenever I try to start the Triton server, it gives me this error:
UNAVAILABLE: Invalid argument: model ‘yolo_v1’, tensor ‘output0’: the model expects 3 dimensions (shape [1,7,8400]) but the model configuration specifies 3 dimensions (shape [-1,7,8400]).

Is there anything else I need to do in order to allow for dynamic batching?

Thanks.

Environment

Using Triton version 24.09

Relevant Files

Here is the code I used to check the dimensions of the inputs and outputs:

import onnx

model_path = "/home/model_repository/yolo_v1/1/model.onnx"  # Update with your actual model path
model = onnx.load(model_path)

for inp in model.graph.input:
    print(f"Input: {inp.name} Shape: {[dim.dim_value if dim.dim_value > 0 else -1 for dim in inp.type.tensor_type.shape.dim]}")

for out in model.graph.output:
    print(f"Output: {out.name} Shape: {[dim.dim_value if dim.dim_value > 0 else -1 for dim in out.type.tensor_type.shape.dim]}")

Here is what I used to convert my model to allow for dynamic batching:

import onnx

# Load the ONNX model
model_path = "/home/model_repository/yolo_v1/1/model.onnx"
model = onnx.load(model_path)

# Update input dimensions (set batch size to dynamic for all inputs)
for input_tensor in model.graph.input:
    input_tensor.type.tensor_type.shape.dim[0].dim_value = -1
    input_tensor.type.tensor_type.shape.dim[0].dim_param = "batch_size"  # Set as dynamic

# Update output dimensions (set batch size to dynamic for all outputs)
for output_tensor in model.graph.output:
    output_dims = output_tensor.type.tensor_type.shape.dim
    # Ensure first dimension (batch size) is dynamic (-1)
    output_dims[0].dim_value = -1
    output_dims[0].dim_param = "batch_size"  # Set as dynamic for the batch dimension
    
    # Ensure all other dimensions are correctly set (if necessary)
    for dim in output_dims[1:]:
        dim.dim_value = dim.dim_value  # Keep the existing size for other dimensions

# Save the updated model
updated_model_path = "yolo_v1_dynamic.onnx"
onnx.save(model, updated_model_path)

print(f"Updated model saved to {updated_model_path}")

Here is my config file:

name: "yolo_v1"
platform: "onnxruntime_onnx"
input [
  {
    name: "images"
    data_type: TYPE_FP32
    dims: [ -1, 3,  640, 640 ]  # Since the resolution of your image is 115 by 133
  }
]

output [
  {
    name: "output0"
    data_type: TYPE_FP32
    dims: [ -1, 7, 8400 ]  # Adjust based on YOLO's output format
  }
]
instance_group [
    {
      count: 15
      kind: KIND_GPU
    }
]
dynamic_batching { }

Here is the code I use to start the Triton server:

docker run --gpus all -it --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.09-py3 tritonserver  --model-repository=/models

Y-T-G · March 6, 2025, 7:52am

You can follow the Ultralytics Triton Inference Sevrer guide. You don’t need to provide input or output shape in config.

Topic		Replies	Views
How to enable dynamic batching for models on triton inference server Frameworks inference-server-triton	0	878	May 22, 2023
Invalid argument: model input NHWC/NCHW require 3 dims for visual_changenet_segmentation_tao TAO Toolkit tensorrt , inference-server-triton	5	20	March 13, 2025
YOLOV3 example in DeepStream-Triton Integration DeepStream SDK inference-server-triton	9	2360	October 12, 2021
Dynamic batching for tensorrt engine model TAO Toolkit	5	1138	January 4, 2022
Model tensor shape configuration hints for dynamic batching but the underlying engine doesn't support batching Triton Inference Server - archived	4	2400	October 12, 2021
TRITON's config.pbtxt only accepts 3dim input layers? Triton Inference Server - archived tensorrt , pytorch	4	1667	October 12, 2021
YoloV4 Jupyter Notebook trained models with Triton errors TAO Toolkit	7	711	October 15, 2023
Triton server getting error JAX inference-server-triton	0	386	February 14, 2024
Trtexec : Static model does not take explicit shapes since the shape of inference tensors will be determined by the model itself TensorRT	6	3280	June 8, 2022
Load ONNX model with batch size TensorRT	3	1758	October 12, 2021

Issues with setting up Dynamic Batching for Triton server

Description

Environment

Relevant Files

Related topics