Batch inference on tensorrt

I’m trying to convert onnx model to tensorrt with batchsize 64.
During inference,it takes 390ms for batch inference (batch_size = 64)
and 7 ms for batch size = 1.

Please let me know if I’m missing out on something.
Attaching google drive link for tensorrt model generation from onnx where build_retina_trt.py where it converts mxnet-> onn → tensorrt.

BUILD_LINK## Environment

TensorRT Version: 7.2.2-1+cuda11.1
Nvidia Driver Version: 460.27.04
Operating System + Version: Ubuntu 20.04.1 LTS
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:20.12-py3

Hi, Request you to share your model and script, so that we can help you better.

Alternatively, you can try running your model with trtexec command.

Thanks!

I’m following this github repository
here

I was able to generate the model successfully, only thing is during the inference the performance is not what i expected. Hence sharing the github repository.
here

Hi @jhanvi,

Sorry for late reply. Could you please check and confirm gpu utilization.
And also share engine build verbose log, and inference layer perf?

build: trtexec --verbose .....
inference: trtexec --dumpProfile --separateProfileRun ......

Thank you.