Inference time is linear respective to batch size while using TENSORRT MODEL

bschandu67 · February 2, 2021, 12:41pm

Description

Inference time linear proportionality with batch size while using Tensorrt engine for scaledyolov4 for object detection(scaled yolov4).
A clear and concise description of the bug or issue.
When I am increasing batch size, inference time is increasing linearly.

Environment

TensorRT Version:
Checked on two versions (7.2.2 and 7.0.0)
GPU Type:
Tesla T4
Nvidia Driver Version:
455
CUDA Version:
7.2.2 with cuda-11.1 and 7.0.0 with cuda-10.2
CUDNN Version:
7 with trt-7.0.0 and 8 with trt-7.2.2
Operating System + Version:
ubuntu-18.04
Python Version (if applicable):
3.6.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
nvcr.io/nvidia/tensorrt:20.12-py3 - trt-7.2.2
nvcr.io/nvidia/tensorrt:20.03-py3 - trt-7.0.0

FOR BATCH SIZE - 1:

Inference take: 48.5283 ms.
Inference take: 48.518 ms.
Inference take: 40.1897 ms.
Inference take: 40.0713 ms.
Inference take: 38.54 ms.
Inference take: 38.7829 ms.
Inference take: 38.6083 ms.
Inference take: 38.6635 ms.
Inference take: 38.1827 ms.
Inference take: 38.1016 ms

FOR BATCH SIZE - 2:

Inference take: 76.3045 ms.
Inference take: 74.9346 ms.
Inference take: 73.3341 ms.
Inference take: 73.9554 ms.
Inference take: 73.4185 ms.
Inference take: 75.4546 ms.
Inference take: 77.7809 ms.
Inference take: 78.3289 ms.
Inference take: 79.5533 ms.
Inference take: 79.0556 ms.
Inference take: 79.2939 ms.
Inference take: 77.214 ms.

FOR BATCH SIZE - 4:

Inference take: 158.327 ms.
Inference take: 157.001 ms.
Inference take: 157.107 ms.
Inference take: 154.237 ms.
Inference take: 155.899 ms.
Inference take: 157.408 ms.
Inference take: 155.758 ms.
Inference take: 155.906 ms.

I expected batch size not to have this proportionality. Can anything done to improve the inference time using batching?
TIY.

NVES · February 2, 2021, 1:07pm

Hi, Request you to share your model and script, so that we can help you better.

Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

Thanks!

bschandu67 · February 2, 2021, 1:21pm

ScaledYOLOv4 model
I am using this model for object detection.

Model to ONNX conversion script

Followed this repository to convert my onnx model to tensorrt model.

I will try to run my model with trtexec. But at the EOD, I need to export my model as a python library. So I can’t use trtexec(which hinders my end goal ).
Thanks!

spolisetty · February 3, 2021, 9:31am

Hi @bschandu67,

Could you please check the gpu utilization.

Thank you.

bschandu67 · February 3, 2021, 12:35pm

83% volatile GPU and 1.7 GB memory was used for batch size 1 model. and 100% volatile GPU and 2.7 GB memory was used for batch size 4 model.
While using baremetal tensorrt engine for inference using python, 46% volatile for batch size 1 and 100% volatile for batch size 4.

spolisetty · February 5, 2021, 11:38am

Hi @bschandu67,

Thanks for providing gpu utilization. It is fine. Could you please also provide the engine build verbose log, and inference layer performance.

build: trtexec --verbose .....
inference: trtexec --dumpProfile --separateProfileRun

Thank you.

Aref · May 5, 2021, 11:33am

Hi @bschandu67 .

Have you been able to reduce the runtime with higher batch sizes?

Thanks!

bschandu67 · May 5, 2021, 11:59am

Yes.
But not by a huge margin.
With batching, we were able to achieve 4ms less per frame.

Aref · May 5, 2021, 2:18pm

Right. I realized that when the inference uses GPU’s full capacity even with a batch size of 1, increasing the batch size wouldn’t help much, according to this thread.

Was it the same case with your problem?

Thanks a lot!

Topic		Replies	Views
Inference Time Scales Linearly With Batch Size Jetson AGX Xavier yolo	9	751	December 18, 2023
Inference time increases in for loop TensorRT	2	350	February 6, 2023
Inference time of tensorrt 6.3 is slower than tensorrt 6.0 TensorRT tensorrt , driveos	7	910	October 12, 2021
TensorRT 5.0.2 Batch Size Problem: bigger batch size Inference Time increase??? General	6	1528	October 12, 2021
Darknet YoloV4-tiny model in TensorRT 8 inference TensorRT tensorrt , onnx	7	2143	October 22, 2021
The "GPU Compute Time" doesn't change, when setting different batch size TensorRT tensorrt	3	1141	July 8, 2022
Inference result gets worse when converting pytorch model to TensorRT model TensorRT pytorch	6	1057	January 19, 2022
TensorRT Batching Speed scales poorly TensorRT tensorrt , cuda	6	1673	September 30, 2021
Optimization using Inference batch size General Topics and Other SDKs	1	1005	January 19, 2022
Inference time changes after training TensorRT tensorrt	5	576	September 25, 2020

Inference time is linear respective to batch size while using TENSORRT MODEL

Description

Environment

Related Topics