Description
I am trying understand the differences between the various ways to compile/export a PyTorch model to a TensorRT engine. I’m using PyTorch 2.2.
Background: My end goal is to export and use my detectron2 PyTorch trained model as a TensorRT .engine file in order to use it in NVIDIA Deepstream afterwards.
This got me into reading about TorchScript, torch.fx, torch.export, torch.compile, TorchDynamo with different backends e.g. torch_tensorrt which apparently cannot be serialized?, as well as the standalone torch_tensorrt project.
Since the model (mask2former with SWIN transformer backend) and the codebase includes complex code constructs and dynamic control flow, I’ve ruled out torch.fx and all tracing methods (please correct me if my thinking is wrong).
I’m now left with these questions:
- Should I first convert to TorchScript using torch.jit.script? Is it the only “easy” option due to graph breaks and wanting to use it outside Python runtime?
- Is torch.compile (TorchDynamo) with the PyTorch model as input suitable for my goal (eventually serializing to a TensorRT engine file for use in Deepstream), or should I first convert the model to TorchScript?
- After compiling the model with any of the above methods, my understanding is I still need to use torch_tensorrt to serialize the model. Is there another way?
- I’ve stumbled upon the torch2rt project but I’m not certain if it’s a better option.
Sorry for the long post and appreciate any help!
Environment
TensorRT Version: 8.4.1.6
GPU Type: RTX3090
Nvidia Driver Version : 550.54.15
CUDA Version: 12.1
CUDNN Version: 8.9.2
Operating System + Version: Ubuntu 22.04
Python Version (if applicable): 3.8
TensorFlow Version (if applicable): -
PyTorch Version (if applicable): 2.2
Baremetal or Container (if container which image + tag): -