Batch inference on tensorrt

I’m trying to convert onnx model to tensorrt with batchsize 64.
During inference,it takes 390ms for batch inference (batch_size = 64)
and 7 ms for batch size = 1.

Please let me know if I’m missing out on something.
Attaching google drive link for tensorrt model generation from onnx where build_retina_trt.py where it converts mxnet-> onn → tensorrt.

BUILD_LINK## Environment

TensorRT Version: 7.2.2-1+cuda11.1
Nvidia Driver Version: 460.27.04
Operating System + Version: Ubuntu 20.04.1 LTS
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:20.12-py3

Hi, Request you to share your model and script, so that we can help you better.

Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

Thanks!

I’m following this github repository
here

I was able to generate the model successfully, only thing is during the inference the performance is not what i expected. Hence sharing the github repository.
here

Hi @jhanvi,

Sorry for late reply. Could you please check and confirm gpu utilization.
And also share engine build verbose log, and inference layer perf?

build: trtexec --verbose .....
inference: trtexec --dumpProfile --separateProfileRun ......

Thank you.