ONNX batch inference does not work

Hi, I’m testing the inference speed of my ResNet ONNX model. I found that the inference speed is linearly correlated to the batch size, meaning there is no speed-up when using batch inference. Could anyone help? Thanks in advance.

batch size = 1, time cost: 4.902ms
batch size = 4, time cost: 20.88ms
batch size = 8, time cost: 46.344ms
batch size = 16, time cost: 93.003ms

I’ve attached my model, the input size is (dynamic, 128, 128, 3)
model.zip (1.4 MB)

Driver Version: 535.171.04
CUDA Version: 12.2
CUDNN Version: 8.9.7.29-1+cuda12.2
GPU: NVIDIA GeForce RTX 4060 Ti
Operating System: Ubuntu 20.04

Could anyone help? Thanks in advance.