Batch inference on tensorrt

I’m trying to convert onnx model to tensorrt with batchsize 64.
During inference,it takes 390ms for batch inference (batch_size = 64)
and 7 ms for batch size = 1.

Please let me know if I’m missing out on something.
Attaching google drive link for tensorrt model generation from onnx where where it converts mxnet-> onn → tensorrt.

BUILD_LINK## Environment

TensorRT Version: 7.2.2-1+cuda11.1
Nvidia Driver Version: 460.27.04
Operating System + Version: Ubuntu 20.04.1 LTS
Baremetal or Container (if container which image + tag):

Hi, Request you to share your model and script, so that we can help you better.

Alternatively, you can try running your model with trtexec command.


I’m following this github repository

I was able to generate the model successfully, only thing is during the inference the performance is not what i expected. Hence sharing the github repository.

Hi @jhanvi,

Sorry for late reply. Could you please check and confirm gpu utilization.
And also share engine build verbose log, and inference layer perf?

build: trtexec --verbose .....
inference: trtexec --dumpProfile --separateProfileRun ......

Thank you.