Batch inference on tensorrt

jhanvi · February 12, 2021, 9:45am

I’m trying to convert onnx model to tensorrt with batchsize 64.
During inference,it takes 390ms for batch inference (batch_size = 64)
and 7 ms for batch size = 1.

Please let me know if I’m missing out on something.
Attaching google drive link for tensorrt model generation from onnx where build_retina_trt.py where it converts mxnet-> onn → tensorrt.

BUILD_LINK## Environment

TensorRT Version: 7.2.2-1+cuda11.1
Nvidia Driver Version: 460.27.04
Operating System + Version: Ubuntu 20.04.1 LTS
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:20.12-py3

NVES · February 12, 2021, 10:07am

Hi, Request you to share your model and script, so that we can help you better.

Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

Thanks!

jhanvi · February 12, 2021, 10:24am

I’m following this github repository
here

jhanvi · February 12, 2021, 10:27am

I was able to generate the model successfully, only thing is during the inference the performance is not what i expected. Hence sharing the github repository.
here

spolisetty · February 15, 2021, 3:29pm

Hi @jhanvi,

Sorry for late reply. Could you please check and confirm gpu utilization.
And also share engine build verbose log, and inference layer perf?

build: trtexec --verbose .....
inference: trtexec --dumpProfile --separateProfileRun ......

Thank you.

Topic		Replies	Views
ONNX batchsize setting and buffer.h assert error TensorRT	3	1161	March 23, 2021
TensorRT inference take too much time than expected TensorRT tensorrt	2	1023	December 22, 2020
Performance discrepancy using TensorRT engines TensorRT tensorrt	3	654	October 5, 2021
Tensorrt Engine use too much memory TensorRT tensorrt	1	1567	December 13, 2021
Tensorrt inference on multiple batches TensorRT tensorrt , jetson-inference	5	2870	October 27, 2022
Error occurred while running the Tensorrt samples: [reformat.cpp::executeCutensor::385] TensorRT tensorrt	3	1172	December 12, 2023
ONNX Model Int64 Weights TensorRT	12	12776	February 17, 2024
TensorRT Batching Speed scales poorly TensorRT tensorrt , cuda	6	1696	September 30, 2021
How can I optimize multi-batch and parallel inference in TensorRT for faster performance on high-resolution image patches? TensorRT tensorrt , cuda , ubuntu , python , cudnn , deep-learning	2	59	December 2, 2024
Model inferenced with tensorrt is slower than regular pytorch TensorRT cudnn	1	433	February 16, 2024

Batch inference on tensorrt

Related topics