I want to know if there is any benchmark for comparing the inference speed of ONNX model and ONNX + TensorRT (build engine).

We know that ONNX has done some optimization to the inference speed, so I am curious about how much improve can TensorRT do.

Here is a following question. I want to know which one we expect to get better inference speed?

  1. PyTorch model optimized with Torch-TensorRT
  2. PyTorch model → ONNX model, then optimize the ONNX with TensorRT

We assume both of the above situations are tuned with optimal setting.

Which GPU architecture are you interested in? Numbers can vary greatly depending on arch.

I am interested in A100 80GB. Thanks.