Is it ever reasonable to have ONNX Runtime with CUDAExecutionProvider faster than native TensorRT? I find this counter intuitive, do you have any thoughts on this? or is it a bug on my side?
Hi,
TensorRT is faster than ONNX Runtime. Could you please share the minimal issue repro model/scripts and the following environment details. Please use the latest TensorRT version 8.5.3 for better performance.
TensorRT Version:
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Thank you.