crash when importing caffe model with plugin layers

here is the log, I am using tensor rt 4.

the situation is like this:

To run this model, I wrote two plugins , slice layer A and l2normalization B,

from logs, I can see the construction of A and construction of B , and also the getOutputDims of A is called,
but B::getOutputDims is not called.

the A and B layers are both before the concat layer.

I wonder why the l2normalization is not called and it just jumps to create the concat layer, and crashed.

did i describe it clearly?

hoping for replies…

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000021 in ?? ()
(gdb) bt
#0 0x0000000000000021 in ?? ()
#1 0x00007ffff3305056 in nvinfer1::PluginLayer::getOutputForm(int, std::vector<nvinfer1::TensorForm, std::allocatornvinfer1::TensorForm > const&) const () from /usr/local/tensor_rt/lib/libnvinfer.so.4
#2 0x00007ffff331d557 in nvinfer1::Network::updateTensor(nvinfer1::NetworkTensor const*) const ()
from /usr/local/tensor_rt/lib/libnvinfer.so.4
#3 0x00007ffff331d8e7 in nvinfer1::NetworkTensor::getDimensions() const ()
from /usr/local/tensor_rt/lib/libnvinfer.so.4
#4 0x00007ffff330bc63 in nvinfer1::ConcatenationLayer::ConcatenationLayer(nvinfer1::Network*, std::string const&, nvinfer1::ITensor* const*, int, nvinfer1::ITensor*) ()
from /usr/local/tensor_rt/lib/libnvinfer.so.4
#5 0x00007ffff331f717 in nvinfer1::Network::addConcatenation(nvinfer1::ITensor* const*, int) ()
from /usr/local/tensor_rt/lib/libnvinfer.so.4
#6 0x00007ffff29bce05 in parseConcat(nvinfer1::INetworkDefinition&, ditcaffe::LayerParameter const&, CaffeWeightFactory&, BlobNameToTensor&) () from /usr/local/tensor_rt/lib/libnvparsers.so.4.1.2
#7 0x00007ffff29bf7af in CaffeParser::parse(char const*, char const*, nvinfer1::INetworkDefinition&, nvinfer1::DataType) () from /usr/local/tensor_rt/lib/libnvparsers.so.4.1.2
#8 0x0000000000404f25 in caffeToTRTModel (
deployFile=0x45b2b0 “/home/user/codes/tensor_rt/mx2caffe/arcface-0.1.0/deploy.prototxt”,
modelFile=0x45b2f8 “/home/user/codes/tensor_rt/mx2caffe/arcface-0.1.0/arcface.caffemodel”,
outputs=std::vector of length 1, capacity 1 = {…}, maxBatchSize=1, pluginFactory=0x7fffffffcbf8,
trtModelStream=@0x7fffffffcb98: 0x0) at samplePlugin.cpp:429
#9 0x0000000000405508 in main (argc=1, argv=0x7fffffffdce8) at samplePlugin.cpp:496

@NVES

Hello,

It’d help us debug if you can provide a small repro package containing your source, dataset, model, and steps to recreate the symptoms you are seeing.

regards
NVIDIA Enterprise Support.

hello, I have upload my project to google drive, and here is the link.

I am running it on ubuntu 14.04-64bit. GTX 1080Ti.
NVIDIA-SMI 384.111 Driver Version: 384.111

please download them and modify the tensor_rt/samples/samplePlugin/samplePlugin.cpp ,change the proto path and model path.
that is to modify the two lines :

const char* proto_file = “/home/user/codes/tensor_rt/mx2caffe/arcface-0.1.0/deploy.prototxt”;
const char* model_file = “/home/user/codes/tensor_rt/mx2caffe/arcface-0.1.0/arcface.caffemodel”;

and then ,build and run the bin file.

the model file:
https://drive.google.com/open?id=1G9aGeY7N34Yg3gVjXAV4HS3x3cp2_9tf
the project folder:
https://drive.google.com/open?id=1LPM6xrRuh1z_aTJNGx2q50x7UgeVmbzG

Another questions:

1 、does tensor rt support multi plugins in one PluginFactory ? I wrote two plugins: normalize and slice_layer.

2、Is it sure that after a plugin is registered, both the getNbOutputs and getOutputDimensions will be called ?

I see that in the normalize plugin class , only the getNbOutputs is called 3 times, but the getOutputDimensions is never called.
Is that normal ?

hoping for your reply!

Hello,

TensorRT does support network with multiple plugin layers.

To help us debug, can we get a simple repro that depicts this situation?

The given network seems quite complex with 100s of layers.
Also, in the prototxt I don’t see the layer corresponding to “slicechannel0/1/2/3/4/5”

layer {
name: “data_slice1”
type: “Slice”
bottom: “fea_flat”
top: “slicechannel0”
top: “slicechannel1”
top: “slicechannel2”
top: “slicechannel3”
top: “slicechannel4”
top: “slicechannel5”
slice_param {
axis: 1
slice_point: 3
slice_point: 6
slice_point: 9
slice_point: 12
slice_point: 15
}
}

hello, I have uploaded the simple.txt in the attachment.
Please cover the deploy.prototxt file. This new proto can also work with the .caffemodel file and will just crash with the same error I described above.

I wrote 2 plugins . the slice_layer and l2normalization layer. you can see their implementation in the cpp file.

layer {
name: “data_slice1”
type: “Slice”

this layer you mentioned is corresponding to the slice_layer plugin.

In case you can’t see the attachment, I paste the content below.
It’s a very simple architecture, which just slice the 18-channel data into 6 pieces and then normalize each and then concat them.

name: “arcface”
layer {
name: “data”
type: “Input”
top: “data”
input_param {
shape: {
dim: 1
dim: 18
dim: 108
dim: 108
}
}
}

layer {
name: “data_slice0”
type: “Slice”
bottom: “data”
top: “data_slice0”
top: “data_slice1”
top: “data_slice2”
top: “data_slice3”
top: “data_slice4”
top: “data_slice5”
slice_param {
axis: 1
slice_point: 3
slice_point: 6
slice_point: 9
slice_point: 12
slice_point: 15
}
}

layer {
bottom: “data_slice0”
top: “l2normalization0”
name: “l2normalization0”
type: “Normalize”
}

layer {
bottom: “data_slice1”
top: “l2normalization1”
name: “l2normalization1”
type: “Normalize”
}

layer {
bottom: “data_slice2”
top: “l2normalization2”
name: “l2normalization2”
type: “Normalize”
}

layer {
bottom: “data_slice3”
top: “l2normalization3”
name: “l2normalization3”
type: “Normalize”
}

layer {
bottom: “data_slice4”
top: “l2normalization4”
name: “l2normalization4”
type: “Normalize”
}

layer {
bottom: “data_slice5”
top: “l2normalization5”
name: “l2normalization5”
type: “Normalize”
}

layer {
name: “concat0”
type: “Concat”
bottom: “l2normalization0”
bottom: “l2normalization1”
bottom: “l2normalization2”
bottom: “l2normalization3”
bottom: “l2normalization4”
bottom: “l2normalization5”
top: “concat0”
concat_param {
axis: 1
}
}

simple.txt (1.38 KB)

Hello!

I have found something new.

Here is the question:

If a model has parallal branches , will tensor-rt use multi threads to parse the model?

It seems that if I use plugin in more than one branch, the crash will happen.

Does tensor-rt support multi-threads or multi branch model just like in the file simple.txt I uploaded ?

When I am writing plugins for multi branch models , is there anything I need to take care,such as thread safety ?

Hoping for your reply!

Hello,

It looks like your network is creating multiple plugins of the type l2normalization. But in the code these are all being assigned to the same unique_ptr (normal_ptr).

Each instance of the l2normalization layer is a new plugin object and cannot just overwrite the previous pointer as that makes it invalid.

Each plugin object will have to exist until it is explicitly destroyed after engine is built. Creating an array of unique_ptr to store all l2normalization plugins should solve the segfault issue.

OMG…such a naive error!

thank you !!!

It really helps me a lot!