crash when importing caffe model with plugin layers

cuda_new_bird · October 26, 2018, 8:27am

here is the log， I am using tensor rt 4.

the situation is like this:

To run this model, I wrote two plugins , slice layer A and l2normalization B,

from logs, I can see the construction of A and construction of B , and also the getOutputDims of A is called,
but B::getOutputDims is not called.

the A and B layers are both before the concat layer.

I wonder why the l2normalization is not called and it just jumps to create the concat layer, and crashed.

did i describe it clearly?

hoping for replies…

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000021 in ?? ()
(gdb) bt
#0 0x0000000000000021 in ?? ()
#1 0x00007ffff3305056 in nvinfer1::PluginLayer::getOutputForm(int, std::vector<nvinfer1::TensorForm, std::allocatornvinfer1::TensorForm > const&) const () from /usr/local/tensor_rt/lib/libnvinfer.so.4
#2 0x00007ffff331d557 in nvinfer1::Network::updateTensor(nvinfer1::NetworkTensor const*) const ()
from /usr/local/tensor_rt/lib/libnvinfer.so.4
#3 0x00007ffff331d8e7 in nvinfer1::NetworkTensor::getDimensions() const ()
from /usr/local/tensor_rt/lib/libnvinfer.so.4
#4 0x00007ffff330bc63 in nvinfer1::ConcatenationLayer::ConcatenationLayer(nvinfer1::Network*, std::string const&, nvinfer1::ITensor* const*, int, nvinfer1::ITensor*) ()
from /usr/local/tensor_rt/lib/libnvinfer.so.4
#5 0x00007ffff331f717 in nvinfer1::Network::addConcatenation(nvinfer1::ITensor* const*, int) ()
from /usr/local/tensor_rt/lib/libnvinfer.so.4
#6 0x00007ffff29bce05 in parseConcat(nvinfer1::INetworkDefinition&, ditcaffe::LayerParameter const&, CaffeWeightFactory&, BlobNameToTensor&) () from /usr/local/tensor_rt/lib/libnvparsers.so.4.1.2
#7 0x00007ffff29bf7af in CaffeParser::parse(char const*, char const*, nvinfer1::INetworkDefinition&, nvinfer1::DataType) () from /usr/local/tensor_rt/lib/libnvparsers.so.4.1.2
#8 0x0000000000404f25 in caffeToTRTModel (
deployFile=0x45b2b0 “/home/user/codes/tensor_rt/mx2caffe/arcface-0.1.0/deploy.prototxt”,
modelFile=0x45b2f8 “/home/user/codes/tensor_rt/mx2caffe/arcface-0.1.0/arcface.caffemodel”,
outputs=std::vector of length 1, capacity 1 = {…}, maxBatchSize=1, pluginFactory=0x7fffffffcbf8,
trtModelStream=@0x7fffffffcb98: 0x0) at samplePlugin.cpp:429
#9 0x0000000000405508 in main (argc=1, argv=0x7fffffffdce8) at samplePlugin.cpp:496

cuda_new_bird · October 26, 2018, 8:47am

@NVES

NVES · October 26, 2018, 4:34pm

Hello,

It’d help us debug if you can provide a small repro package containing your source, dataset, model, and steps to recreate the symptoms you are seeing.

regards
NVIDIA Enterprise Support.

cuda_new_bird · October 29, 2018, 7:12am

hello， I have upload my project to google drive, and here is the link.

I am running it on ubuntu 14.04-64bit. GTX 1080Ti.
NVIDIA-SMI 384.111 Driver Version: 384.111

please download them and modify the tensor_rt/samples/samplePlugin/samplePlugin.cpp ,change the proto path and model path.
that is to modify the two lines :

const char* proto_file = “/home/user/codes/tensor_rt/mx2caffe/arcface-0.1.0/deploy.prototxt”;
const char* model_file = “/home/user/codes/tensor_rt/mx2caffe/arcface-0.1.0/arcface.caffemodel”;

and then ,build and run the bin file.

the model file:

the project folder:

cuda_new_bird · October 29, 2018, 10:08am

Another questions:

1 、does tensor rt support multi plugins in one PluginFactory ? I wrote two plugins: normalize and slice_layer.

2、Is it sure that after a plugin is registered, both the getNbOutputs and getOutputDimensions will be called ？

I see that in the normalize plugin class , only the getNbOutputs is called 3 times, but the getOutputDimensions is never called.
Is that normal ?

hoping for your reply!

NVES · October 30, 2018, 12:43am

Hello,

TensorRT does support network with multiple plugin layers.

To help us debug, can we get a simple repro that depicts this situation?

The given network seems quite complex with 100s of layers.
Also, in the prototxt I don’t see the layer corresponding to “slicechannel0/1/2/3/4/5”

layer {
name: “data_slice1”
type: “Slice”
bottom: “fea_flat”
top: “slicechannel0”
top: “slicechannel1”
top: “slicechannel2”
top: “slicechannel3”
top: “slicechannel4”
top: “slicechannel5”
slice_param {
axis: 1
slice_point: 3
slice_point: 6
slice_point: 9
slice_point: 12
slice_point: 15
}
}

cuda_new_bird · October 30, 2018, 2:36am

hello， I have uploaded the simple.txt in the attachment.
Please cover the deploy.prototxt file. This new proto can also work with the .caffemodel file and will just crash with the same error I described above.

I wrote 2 plugins . the slice_layer and l2normalization layer. you can see their implementation in the cpp file.

layer {
name: “data_slice1”
type: “Slice”

this layer you mentioned is corresponding to the slice_layer plugin.

In case you can’t see the attachment, I paste the content below.
It’s a very simple architecture, which just slice the 18-channel data into 6 pieces and then normalize each and then concat them.

name: “arcface”
layer {
name: “data”
type: “Input”
top: “data”
input_param {
shape: {
dim: 1
dim: 18
dim: 108
dim: 108
}
}
}

layer {
name: “data_slice0”
type: “Slice”
bottom: “data”
top: “data_slice0”
top: “data_slice1”
top: “data_slice2”
top: “data_slice3”
top: “data_slice4”
top: “data_slice5”
slice_param {
axis: 1
slice_point: 3
slice_point: 6
slice_point: 9
slice_point: 12
slice_point: 15
}
}

layer {
bottom: “data_slice0”
top: “l2normalization0”
name: “l2normalization0”
type: “Normalize”
}

layer {
bottom: “data_slice1”
top: “l2normalization1”
name: “l2normalization1”
type: “Normalize”
}

layer {
bottom: “data_slice2”
top: “l2normalization2”
name: “l2normalization2”
type: “Normalize”
}

layer {
bottom: “data_slice3”
top: “l2normalization3”
name: “l2normalization3”
type: “Normalize”
}

layer {
bottom: “data_slice4”
top: “l2normalization4”
name: “l2normalization4”
type: “Normalize”
}

layer {
bottom: “data_slice5”
top: “l2normalization5”
name: “l2normalization5”
type: “Normalize”
}

layer {
name: “concat0”
type: “Concat”
bottom: “l2normalization0”
bottom: “l2normalization1”
bottom: “l2normalization2”
bottom: “l2normalization3”
bottom: “l2normalization4”
bottom: “l2normalization5”
top: “concat0”
concat_param {
axis: 1
}
}

simple.txt (1.38 KB)

cuda_new_bird · October 30, 2018, 8:42am

Hello！

I have found something new.

Here is the question:

If a model has parallal branches , will tensor-rt use multi threads to parse the model?

It seems that if I use plugin in more than one branch, the crash will happen.

Does tensor-rt support multi-threads or multi branch model just like in the file simple.txt I uploaded ？

When I am writing plugins for multi branch models , is there anything I need to take care,such as thread safety ?

Hoping for your reply!

NVES · October 30, 2018, 4:42pm

Hello,

It looks like your network is creating multiple plugins of the type l2normalization. But in the code these are all being assigned to the same unique_ptr (normal_ptr).

Each instance of the l2normalization layer is a new plugin object and cannot just overwrite the previous pointer as that makes it invalid.

Each plugin object will have to exist until it is explicitly destroyed after engine is built. Creating an array of unique_ptr to store all l2normalization plugins should solve the segfault issue.

cuda_new_bird · October 31, 2018, 1:46am

OMG…such a naive error!

thank you !!!

It really helps me a lot!

Topic		Replies	Views
Writing layer for NonMaxSuppression in onnx parser DRIVE AGX Xavier General driveos-dl	21	3717	October 12, 2021
crash when converting onnx ReID model to tensorrt TensorRT	14	2083	October 12, 2021
Tensor RT supports caffe model layers Jetson TX1	28	10465	October 18, 2021
Serialized engine contains plugin, but no plugin factory was provided DeepStream SDK	11	1613	October 12, 2021
Tensor RT get the wrong classifier result GPU-Accelerated Libraries	6	1664	July 5, 2017
Tensorrt fails for custom ssd_inception Model TensorRT	18	2799	May 14, 2020
Engine build crashes after parsing a simple "OneHot" model TensorRT	8	1475	October 12, 2021
onnx2trt - Depthwise Cross Correlation TensorRT	4	1759	July 12, 2020
customer layer questions Jetson Nano	21	1018	October 18, 2021
[TensorRT] Failure on loading VGG-16 caffe model Jetson TX2	11	3586	March 12, 2019

crash when importing caffe model with plugin layers

and then ,build and run the bin file.

Related topics