TensorRT Engine Creation Methods’ Differences


There is a section on this page that explains the conversion process.

6.2. Converting ONNX to a TensorRT Engine
7. 7. Using the TensorRT Runtime API

I understood that number 7 is introduced as one of the best methods and that it is introduced as a way to use the TensorRT API.

However, I don’t know the difference between both methods except converting the model to onnx and then converting it to trt format in 6.2 and engine format in 7 via trtexec.
I think it’s converting through onnx the same, but I don’t know which part makes the performance difference and which part makes the difference.
In addition, does the performance part here mean only the speed of inference or what part?

I’m asking for your help…

Thank you for reading until the end…

A clear and concise description of the bug or issue.


TensorRT Version:
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered