There are 3 components:
- Proprietary TensorRT
- GitHub - onnx/onnx-tensorrt: ONNX-TensorRT: TensorRT backend for ONNX
- GitHub - NVIDIA/TensorRT: TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
If I have a onnx
model and want to optimize it with TensorRT, I am supposed to run the onnx-tensorrt
onnx2trt model.onnx -o model.plan
and then run the inference on that model.plan?
I am confused a little bit, what is a purpose of this OSS part of TensorRT then? Where it is used?