Description
It appears that there are three different methods for creating a TensorRT engine while working with different file formats:
- .pth → .wts → .engine: Conversion from a PyTorch model file (.pth) to a TensorRT engine via a weight file (.wts).
- .pth → .onnx → .engine: Conversion from a PyTorch model file (.pth) to an ONNX format (.onnx), followed by creating a TensorRT engine from the ONNX file.
- .weights → .engine: Direct conversion from a custom weight file (.weights) to a TensorRT engine.
I’m curious why these three methods exist for achieving the same result and would like to understand any differences among them, especially in terms of performance.
Could you please help clarify why these three methods exist, what differentiates them, and if there are any performance differences among them?
Environment
TensorRT Version:
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered
The reason these three methods exist is because they each have their own advantages and disadvantages. In terms of performance, the second method, .pth → .onnx → .engine, is generally the most efficient. This is because the ONNX format is optimized for TensorRT. The .pth → .wts → .engine method is the simplest method, but it can be less efficient and slower than the other two methods. The .weights → .engine method is typically efficient, but it can be less efficient for models that have a large number of parameters and difficult to use.
The best method for creating a TensorRT engine depends on the specific model and the desired performance. If you are using a model that was trained using PyTorch, and you are not concerned about performance, then the .pth → .wts → .engine method is the simplest option.
If you are using a model that was trained using another framework or if you are concerned about performance, then the .pth → .onnx → .engine method is a good option.
If you are using a model that was trained using a custom framework, then the .weights → .engine method is the best option.
I hope the above information is helpful to you.
Please refer to the TensorRT documentation for more information.
https://docs.nvidia.com/deeplearning/tensorrt/index.html
Thank you.
1 Like
I’m truly grateful that you answer my question helpful, and I’ve read them all.
Regarding the content mentioned in the link you provided, I accessed the link you provided and searched for the mentioned content, but I couldn’t find it.
Can you please provide me with the links to the materials or websites where I can find the information you mentioned in your previous responses? Thank you again…!