TensorRT merges wrongly two different layers?

cheivan · December 6, 2019, 12:41pm

I’m porting CenterNet https://github.com/xingyizhou/CenterNet to TensorRT. Im using C++ API and implementing plugin for deformable convolution layer.

On last layer I’ve got

../builder/cudnnBuilderGraph.cpp (660) - Assertion Error in checkSanity: 0 (tensors.size() == g.tensors.size())

I thought that it might be shape mismatch, but on defining network it doesn’t assert, only on engine building.

This error shows up on conv_offset_mask. TensorRT merges two layers, but they have different weights though have same shapes. In PyTorch it corresponds to this layer https://github.com/xingyizhou/CenterNet/blob/master/src/lib/models/networks/pose_dla_dcn.py#L477

Can I send project so you can check what is wrong? (I can’t publicly publish code)

Graphical cards on which was tested:

GeForce 1050Ti (Ubuntu 16.04, CUDA 10.1, TensorRT 6.0.1.5)
Tesla K80 (CentOS7, CUDA 10.1, TensorRT 6.0.1.5)
Tesla P100 (Ubuntu 16.04, CUDA 10.0, TensorRT 6.0.1.5)

UPD:
Here is log - https://gist.github.com/blacksailer/ba795610cedca5747271da6698b7b994

This line shows that two layers are merged, but they have diffrent parameters - https://gist.github.com/blacksailer/ba795610cedca5747271da6698b7b994#file-tensorrt-log-L102

SunilJB · December 11, 2019, 11:30am

Hi,

Can you please share the script & model file to reproduce the issue?

Thanks

uestchanyan · April 13, 2020, 3:47am

I meet the same error.In my project,if I use only plugin for deformable convolution layer,it can get correct result, but when I run centernet,I get the same error.
I can share my onnx file and plugin code for you.

cheivan · April 13, 2020, 11:06am

@uestchanyan Hi! Can you post your code? So I can also check this error

xmpeng · April 16, 2020, 5:33am

TensorRT部署深度学习模型 - 知乎, if you read Chinese, it says it is caused by Slice layer in TensorRT. I have not confirm it yet.
It is very strange that when I build a dummy model which contain only a few conv layers with a modulated deform conv inserted in the middle, I can build the model and do inference. But when I do the same procedures with centernet, it failed with the same assertion mentioned above.
BTW, have you tried a simple model with modulated deform conv before?

cheivan · April 16, 2020, 7:24am

No, never tried.

Currently I have three engines because of some strange behavior of TensorRT for CenterNet to speed up inference and got 30fps. But I want to have one engine.

xmpeng · April 16, 2020, 3:17pm

Hi,
When I use a small model to debug my modulated deform conv plugin, it works fine. But when I apply the layer to centernet it shows the same error message as stated above. How can I give you my code and centernet onnx model to you to reproduce the error?
Thanks.
P.S. @cheivan I would like to share it with you as well for discussion.

cheivan · April 16, 2020, 3:54pm

Can you post it on github and share link here?

xmpeng · April 16, 2020, 5:18pm

Here it is:
https://github.com/sudo-rm-covid19/sample_trt_dcnv2
@SunilJB @cheivan

ersheng · April 21, 2020, 11:33am

@1051323399 Could you please provide your updates in CMakeList.txt files as references for us so that we can compile and run your plugin with pytorch dependencies?

xmpeng · April 21, 2020, 1:48pm

Sure, the repo has been updated. Thanks. @ersheng

ersheng · April 22, 2020, 6:30am

@xmpeng A macro named CHECK_PLUGIN_STATUS is referred multiple times but its definition is missing in the plugin source code.
Please provide us its definition so that we can continue.
Thank you.

xmpeng · April 22, 2020, 6:42am

@ersheng I renamed the CHECK macro defined in plugin.h (TensorRT/plugin/common) due to its conflict with the one defined in libtorch.

#define CHECK_PLUGIN_STATUS(status)                                                                                                  \
    do                                                                                                                 \
    {                                                                                                                  \
        if (status != 0)                                                                                               \
            abort();                                                                                                   \
    } while (0)

Thanks for your attention.

ersheng · April 28, 2020, 10:44am

Hello, @xmpeng

Sorry for late response. We have verified both TensorRT6.0 and TensorRT7.0 with your plugin, and there are some conclusions from us:

This problem does not seem to be related to your plugin (ModulatedDeformConv).
This is a graph building error. There are structures such as parallel conv layers followed by split operations that TensorRT6.0 cannot handle properly.
TensorRT7.0 can handle this kind of structures without errors.

So, we recommend you to upgrade to TensorRT7.0 if possible.

xmpeng · April 28, 2020, 1:03pm

Thanks a lot for your clear explanations.
Hopefully TensorRT7 will support CUDA10.1 soon.
Regards.

ersheng · April 29, 2020, 2:03am

In June, TensorRT 7.1 should be able to support CUDA 10.1

andreydung · February 10, 2021, 5:44pm

Hi @ersheng , I’m meeting the same problem and could you please elaborate on why the current plugin does not work with TRT6? Is it due to Split layer, or due to running parallel convolutions before it? Would there be any other work around (for example writing own Split custom layer)?

Due to our deployment environment, only trt <= 6 could be used. Your help is much appreciated.

Topic		Replies	Views
ElementwiseLayer fault assertion TensorRT	2	717	October 12, 2021
Build engine error when use pointnet-like structure and TensorRT 8.0.1.6 TensorRT tensorrt	13	1676	January 14, 2022
Two build results of the same model are different TensorRT	12	612	April 10, 2021
Pytorch model to trt engine, sometimes crash TensorRT tensorrt	9	521	March 10, 2021
Converting TensorFlow autoencoder decoder to TensorRT engine via UFF TensorRT	1	1173	February 11, 2020
Layer fusion issue TensorRT	3	1698	February 25, 2021
Could not convert reshape layer in Mobilevit model TensorRT	8	1126	August 4, 2022
Why while ONNX-TensorRT conversion with INT8 quantizations some layers are not quantized? TensorRT tensorrt , pytorch , onnx	12	2742	December 4, 2022
TensorRT 7 conv3d is not running on Tensor Cores TensorRT	7	1370	September 22, 2021
Running a pytorch network converted to ONNX with TensorRT on the TX2 Jetson TX2	24	8931	October 18, 2021

TensorRT merges wrongly two different layers?

Related topics