About Detectron2 on TensorRT

Chieh · April 28, 2020, 8:35am

Description

Hi all,

I wonder did anyone successfully convert the detectron2 model to TensorRT engine? Besides, did anyone try the detectron2 model on TensorRT?

I was trying the detectron2 model on TensorRT; however, I met two significant troubles during converting the Detectron2 model by two ways.

First Issue

I was using the official file, caffe2_export.py, to export the onnx model on Detectron2 GitHub. Please check this part from codes. After I tested it, I got this wrong below:

RuntimeError:No such operator caffe2::AliasWithName

The command is quite similar with the instructions from here.

The environment setting of testing was on Windows with Pytorch v1.3.2.

Environment setting

GPU Type: GTX 1080 TI
Nvidia Driver Version: 431.86
CUDA Version: 10.0
CUDNN Version: 7.6.3
Operating System + Version: Win10
Python Version (if applicable): 3.6
TensorFlow Version (if applicable): 1.4
PyTorch Version (if applicable): 1.31

Second Issue

I also tried to use the torch.onnx.export function to export the model.
However, I met the significant problem that there is a python class issue on such as post-processing or many places if it needs to use the class. For example, the ROI Align or post-processing part were written by python class in the detectron2 model, but onnx seems unable to handle python class. I think this issue is quite serious if someone desires to use it with the TensorRT. Currently, I can successfully export the onnx model which only includes the backbone and FPN. (I will show the detail steps to reproduce it below.) Is there any possible to export an entire Detectron2 of onnx model?

Environment setting

GPU Type: GTX1080
Nvidia Driver Version: 445.74
CUDA Version: 10.0.130
CUDNN Version: 7.6.0
Operating System + Version: Windows 10 10.0.18363
Python Version (if applicable): 3.6.5
PyTorch Version (if applicable): 1.3.1

Reproduce

Reproduce my first issue:

git clone from detectron2
Run tools/deploy/caffe2_converter.py and successfully export the onnx model.
(Instructions: Deployment — detectron2 0.6 documentation)

Reproduce my second issue:
Here I will show what I have tried the success part that it only includes Backbone+FPN part.

Do step Part A (install the detectron and additional requirements)
(this) Go to detectron/tools folder
Download test_detect.py (The document is here.) and put in (this) folder
Open command line/terminal to (this) folder, and type python3 test_detect.py

NOTE:
Check this : For detectron2 issue · GitHub

Please check the line 165 and line 166 below:

dummy_convert(cfg, only_backbone = True) # only backbone + FPN
dummy_convert(cfg, only_backbone = False) # all

If only_backbone = True , you can convert it successfully that only with backbone + FPN.
However, if only_backbone = False , it means including whole model that it will get wrong.

Done

Part A

Requirements:

See DETECTRON2 requirements here

additional requirements that has been tested

onnxruntime-gpu 1.1.2

onnx-simplifier 0.2.9

pytorch 1.3.1

Python 3.6.5

Log Success

Log test convert:

Log successfull simplify onnx:

Above I showed the successful case which can convert to onnx model without class part of Detectron2.
For the detectron2, I think the second issue is more serious and important.

What I Expect

My purpose is to successfully convert the detectron2 model to onnx model and convert onnx model to TensorRT engine!!!

773800126 · August 11, 2020, 2:13am

Any progress on converting the detectron2 model to TensorRT？

Chieh · August 11, 2020, 3:10am

Unfortunately, No.
We are still looking forward to it.

ani.karapetyan · May 1, 2021, 8:03am

Hi,
Is this still an open issue?
Thanks.

ChrisB · May 2, 2021, 2:23am

I always chuckle at how many engineer-months are wasted these days on trying to convert model framework X into optimized deployment target by way of Big Third Party Representation (ONNX).

Do you know a little C++? This can be done in 1-2 work days in the following manner. It requires modifying the source of Detectron2 (or any other computer vision training framework) a bit but isn’t too hard. The key insight is that CNNs are mostly dead simple repetitive structures. So it’s easy to parse and recreate in TRT’s API.

First, read the TRT Developer Guide. Set up a simple C++ project and create a simple conv net with some random weights. You will see how the API works. It is very simple. The main idea is to just get a feeling of the API here and have a C++ project environment setup where you can compile code.

Now, augment your source code of your framework so that you can get the graph structure of the network out after the model is built. You may want to just forget about trying to parse Torchscript or Tensorflow graph representations, those are at a far lower level than you really need. On the other hand, existing projects like TorchTRT take this approach so you could look at their source code. Basically you just want a representation like “ConvLayer1 feeds into ReLU1” and so on, because this is the level of the TensorRT API. For detectron2, I did this by subclassing all the basic PyTorch components and adding some hooks so that I could build the representation and get out all the tensor sizes and information required. Give each layer a unique name. Basically you just want the graph representation. Be aware of details in some layers like reuse of layers in the CNN “heads”, so take this into account when you build your graph representation.

Obviously you should serialize the weights in a simple manner. Maybe just write them as a contiguous array of floats to a single file for each weight, or use something like Protobuf, but it doesn’t have to be fancy. I used flatbuffers.

Now just write a short program to take this graph representation and generate the C++ or TRT Python code required to build the network in tensorRT. Make sure you build into this program some simple testing that will test numerical accuracy layer-by-layer.

Then you’re done, and you’ll know a lot more.

Be aware of:
You will need to convert BatchNorm into a TRT “Scaling” operation. The forumla’s for both torch and tensorflow are around, but you can also just work it out yourself.
Be aware of memory layouts for how weights are represented.

Depending on your ability, I would try this with a simple ConvNet trained in PyTorch before moving on to try to do the full D2 models. This gives you a way to play around with FP16 and Int8 performance/accuracy as well as a test project.

Final caveat:
Depending on your TRT version, post processing will be the only real major hurdle. The output of my TRT models are arrays of logits and class indexes for the dense anchor sets. After that you can feed into an NMS kernel (I suggest looking at torchvision’s NMS implementation) or try to use TRT’s open source plugin. I would suggest avoid writing plugins if possible.

Chieh · May 4, 2021, 8:30am

Dear @ChrisB ,

I highly appreciated your reply and really detail descriptions.

I totally understood your suggestions. Indeed, I have dived into TensorRT for a long time so I probably knew which parts are tough for someone who is not an expert in TensorRT. According to your descriptions, it is not a low threshold way to achieve; for instance, we have to know the exact mechanisms of those operations in that framework. I believed that was not kind for most researchers or engineers who wanna take this model on other applications. As we know that the d2 performance is really awesome; however, there is no relevant information about d2 tensorrt developments until now. Of course, if we wanna make it work by any chance, that is the only way to consider that method. In other words, do it from scratch.

For the TensorRT model case of D2, we got significant progress already that we used graphsurgeon this way to resolve the problems.

Thanks for your information anyway and let this community be stronger. :)

ChrisB · May 4, 2021, 4:23pm

Yeah totally agree that the threshold is not necessarily low. But I definitely think what I described above is the realistic way to go. I was trying to be honest in saying that if your goal is to just get your model into TRT, then it’s probably easier to roll your own converter.

My experience however is based on TRT 6 and TRT 5 while working to convert models which use the latest PyTorch (1.8.1) for C++ applications.

I haven’t looked at TorchTRT’s TorchScript parser closely, but I suspect since most models in Detectron2 now support TorchScript, that TorchTRT would probably be a better path vs. ONNX if you’re going to go with a third party tool.

Also I wanted to respond to something you said that highlights the crux of this whole problem: “You have to know the exact mechanisms of those operations in that framework.”

This is true regardless of whether you are rolling your own converter or using a third party one! This is why all the various ONNX or other third party format conversion tools don’t work or go out of date very soon after they are created. When any of the dependent software (Torch, Tensorflow, the framework project like D2, etc) make a change they think is relatively small, it could be a breaking change to your conversion pipeline because the operational change is somehow not supported. So unless the teams on the dependent projects are directly supporting your conversion path, you’re going to be repeating this process of trial -and-error where you are trying to figure out what third party software combination works for exporting the model. It gets more complicated if plugins or layers with various implementations in the deep learning community (like Deformable Conv) are used.

On the other hand, if you roll your own conversion tool, then it’s relatively easy to troubleshoot. Detectron2 overrides the Torch Conv2D layer here. So you could put a couple lines of code in this block to capture the majority of the graph and use that information to generate the TRT C++ or Python code.

rajupadhyay59 · February 9, 2023, 1:38am

I know I am late but if someone still wants to know how to export a detectron2 model to onnx format, use export_model.py present in detectron2.
export_model.py is present in tools/deploy/ folder of detectron2.
Atleast I was able to convert it to onnx format using this file.

As for TensorRT, while I am going to do that now but probably, refer this link.
TensorRT/samples/python/detectron2 at main · NVIDIA/TensorRT · GitHub

Chieh · February 9, 2023, 7:51am

Hi @rajupadhyay59

Thank you for your information!!

Topic		Replies	Views
Can we convert Detectron2 to TensorRT engine? Jetson Nano tensorrt	7	2940	May 12, 2022
About detectron2 and TensorRT Jetson Nano tensorrt	4	1745	October 15, 2021
Cannot export the Detectron2 onnx model in order to use on TensorRT on TX2 Jetson TX2 tensorrt , jetson-inference , nvbugs	19	5246	October 18, 2021
How to directly convert a trained Pytorch model into TensorRT model？There are already. pth files ready after training TensorRT	7	17610	February 9, 2023
Detectron2: faster inferencing TensorRT	2	1428	April 29, 2022
Can't convert from torch-onnx to tensorrt Jetson Xavier NX tensorrt	6	2175	October 18, 2021
Tensorflow Object Detection API model (.pb) to TensorRT Jetson TX2 tensorrt , tensorflow , jetson-inference	4	1561	February 9, 2022
How to run Pytorch trained MaskRCNN (detectron2) with TensorRT Jetson AGX Xavier tensorrt	2	1880	October 18, 2021
Best way to convert PyTorch to TensorRT model TensorRT cudnn	6	3718	June 14, 2024
Is there any overal introduction about how to convert pytorch detection model trt engine to accelerate by tensorrt？ TensorRT	3	587	November 14, 2023

About Detectron2 on TensorRT

Description

First Issue

Environment setting

Second Issue

Environment setting

Reproduce

Part A

Log Success

What I Expect

Related topics