Not seeing performance increase from tensorrt compared to pytorch

Description

Hi.

I tried compiling the decoder part of MobileSAMv2 from pt format to tensorrt engine through onnx in the hopes of improving inference performance, however I’m not seeing any performance difference compared to the pt model. Could you tell me what could be possible causes for this? Thank you

Environment

TensorRT Version: 8.5.3
GPU Type: T4
Nvidia Driver Version: 535.183.01
CUDA Version: 12.2
CUDNN Version: 8.5.0
Python Version (if applicable): python 3.7
PyTorch Version (if applicable): 1.13.1

Relevant Files

The exported onnx model: export.onnx - Google Drive
The scripts used for exporting (modified from the MobileSam/MobileSamv2 repo to only use bboxes instead of points+masks+bboxes):
Exporter

Hello,
Would you mind upgrading the trt version and check for performance.

Thanks