Description
Inference time linear proportionality with batch size while using Tensorrt engine for scaledyolov4 for object detection(scaled yolov4).
A clear and concise description of the bug or issue.
When I am increasing batch size, inference time is increasing linearly.
Environment
TensorRT Version:
Checked on two versions (7.2.2 and 7.0.0)
GPU Type:
Tesla T4
Nvidia Driver Version:
455
CUDA Version:
7.2.2 with cuda-11.1 and 7.0.0 with cuda-10.2
CUDNN Version:
7 with trt-7.0.0 and 8 with trt-7.2.2
Operating System + Version:
ubuntu-18.04
Python Version (if applicable):
3.6.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
nvcr.io/nvidia/tensorrt:20.12-py3 - trt-7.2.2
nvcr.io/nvidia/tensorrt:20.03-py3 - trt-7.0.0
FOR BATCH SIZE - 1:
Inference take: 48.5283 ms.
Inference take: 48.518 ms.
Inference take: 40.1897 ms.
Inference take: 40.0713 ms.
Inference take: 38.54 ms.
Inference take: 38.7829 ms.
Inference take: 38.6083 ms.
Inference take: 38.6635 ms.
Inference take: 38.1827 ms.
Inference take: 38.1016 ms
FOR BATCH SIZE - 2:
Inference take: 76.3045 ms.
Inference take: 74.9346 ms.
Inference take: 73.3341 ms.
Inference take: 73.9554 ms.
Inference take: 73.4185 ms.
Inference take: 75.4546 ms.
Inference take: 77.7809 ms.
Inference take: 78.3289 ms.
Inference take: 79.5533 ms.
Inference take: 79.0556 ms.
Inference take: 79.2939 ms.
Inference take: 77.214 ms.
FOR BATCH SIZE - 4:
Inference take: 158.327 ms.
Inference take: 157.001 ms.
Inference take: 157.107 ms.
Inference take: 154.237 ms.
Inference take: 155.899 ms.
Inference take: 157.408 ms.
Inference take: 155.758 ms.
Inference take: 155.906 ms.
I expected batch size not to have this proportionality. Can anything done to improve the inference time using batching?
TIY.