TensorRT Engine

Hello,

I am having difficulty understanding the difference between running ONNX model using TensorRT Execution Provider vs converting an ONNX model into a TensorRT engine using trtexec and then using this .trt file. Can you explain what is the difference?

Thank you.

1. TensorRT Execution Provider:

  • This is a built-in functionality within frameworks like ONNX Runtime.
  • It allows you to leverage TensorRT for optimized inference without creating a separate engine file (.trt).
  • The framework itself handles the conversion of the ONNX model to a TensorRT engine on the fly during runtime.
  • This approach is simpler to use but may offer less control over the optimization process

2. trtexec tool:

  • This is a separate command-line tool included in the TensorRT installation.
  • It allows you to convert an ONNX model into a serialized TensorRT engine (.trt file).
  • This .trt file is then loaded by your application for inference.
  • This approach offers more control over the optimization process. You can specify various options like optimization profiles, etc.