Inference speed of ONNX vs. ONNX + TensorRT



I want to know if there is any benchmark for comparing the inference speed of ONNX model and ONNX + TensorRT (build engine).

We know that ONNX has done some optimization to the inference speed, so I am curious about how much improve can TensorRT do.

Here is a following question. I want to know which one we expect to get better inference speed?

  1. PyTorch model optimized with Torch-TensorRT
  2. PyTorch model → ONNX model, then optimize the ONNX with TensorRT

We assume both of the above situations are tuned with optimal setting.

1 Like

Hi @foreveronehundred,

Which GPU architecture are you interested in? Numbers can vary greatly depending on arch.

Thank you.

I am interested in A100 80GB. Thanks.