Is it ever reasonable to have ONNX Runtime with CUDAExecutionProvider faster than native TensorRT?

Is it ever reasonable to have ONNX Runtime with CUDAExecutionProvider faster than native TensorRT? I find this counter intuitive, do you have any thoughts on this? or is it a bug on my side?

Hi,

TensorRT is faster than ONNX Runtime. Could you please share the minimal issue repro model/scripts and the following environment details. Please use the latest TensorRT version 8.5.3 for better performance.

TensorRT Version:
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Thank you.