TensorRT Engine Creation Methods’ Differences


There is a section on this page that explains the conversion process.

6.2. Converting ONNX to a TensorRT Engine
7. 7. Using the TensorRT Runtime API

I understood that number 7 is introduced as one of the best methods and that it is introduced as a way to use the TensorRT API.

However, I don’t know the difference between both methods except converting the model to onnx and then converting it to trt format in 6.2 and engine format in 7 via trtexec.
I think it’s converting through onnx the same, but I don’t know which part makes the performance difference and which part makes the difference.
In addition, does the performance part here mean only the speed of inference or what part?

I’m asking for your help…

Thank you for reading until the end…

