Inference scales linearly with batch size

I am trying to deploy YOLOv7 model onto NVIDIA Xavier AGX, and everything works well with batch size 1. I utilize dynamic batch and deploy models on batch>1 images, and the performance is nominal. However, the time it takes to do inference scales with batch size. This is especially odd because GPU usage is not at 100% as I increase batch size.

yolov7_log.txt (1.1 MB)

Here is the output of the following command

trtexec --verbose --onnx=yolov7.onnx --minShapes=input:1x3x640x640 --optShapes=input:4x3x640x640 --maxShapes=input:8x3x640x640 --shapes=input:1x3x640x640,input:4x3x640x640,input:8x3x640x640

Please let me know if I can provide any additional data to help. I really want to completely utilize the GPU performance for object detection. Thanks for your help.


Have you maximized the device performance to see if any difference?

At batch=1, my GPU load is at 0-1%. It is running 27ms for inference.
At batch=2, my GPU load is at 3-4%. It is running 49ms for inference.
At batch=5, my GPU load is at 100%. It is running 114ms for inference.
At batch=8, my GPU load is at 100%. It is running 181ms for inference.

It seems that the linear increase of inference time is independent of the GPU load. All stats are reported by JTOP.


Could you check if tegrastats outputs the same?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.