caffeToGIEModel() segmentation fault

I now use TensorRT to implement the forward reasoning of yolo v2 on TX1, which I took in order to implement the custom network layer:

IExecutionContext *contextA =
IExecutionContext *contextB =
contextA.enqueue(batchSize, buffersA, stream, nullptr);
myLayer(outputFromA, inputToB, stream);
contextB.enqueue(batchSize, buffersB, stream, nullptr);

Then I put prototxt file into three files, but no change caffemodel, the program runs to caffeparser - > parser () function when Segmentation fault

const char* OUTPUT_BLOB_NAME1 = "layer13";
	const char* OUTPUT_BLOB_NAME2 = "layer20";
	const char* OUTPUT_BLOB_NAME3 = "conv22";
	caffeTotensorrtModel("1yolo.prototxt", "z1_yolo.caffemodel", vector<string>{OUTPUT_BLOB_NAME1}, 
					BATCH_SIZE, 1<<29, gieModelStream1, 0);
	caffeTotensorrtModel("2yolo.prototxt", "z1_yolo.caffemodel", vector<string>{OUTPUT_BLOB_NAME2},
					BATCH_SIZE, 1<<22, gieModelStream2, 0);
	caffeTotensorrtModel("3yolo.prototxt", "z1_yolo.caffemodel", vector<string>{OUTPUT_BLOB_NAME3}, 
					BATCH_SIZE, 1<<23, gieModelStream3, 0);
void caffeTotensorrtModel(const string& deployFile,const string& modelFile,
									const vector<string>&outputs,unsigned int batchsize,
									unsigned int workSpaceSize, IHostMemory *&gieModelStream,
									int force_use_fp16)

	 // create API root class - must span the lifetime of the engine usage
    IBuilder* builder = createInferBuilder(gLogger);
    INetworkDefinition* network = builder->createNetwork();

    // parse the caffe model to populate the network, then set the outputs
    ICaffeParser* parser = createCaffeParser();

    bool useFp16 = builder->platformHasFastFp16();

    if (! force_use_fp16)
        useFp16 = 0;
	cout<<"useFp16: "<<useFp16<<std::endl;

	nvinfer1::DataType modelDataType = useFp16 ? nvinfer1::DataType::kHALF : nvinfer1::DataType::kFLOAT;
	cout<<" caffe model exchange start"<<endl;
	cout<<deployFile.c_str()<<"  "<<modelFile.c_str()<<"  "<<endl;
	const IBlobNameToTensor *blobNameToTensor =
              parser->parse(deployFile.c_str(), modelFile.c_str(),*network,modelDataType);
	cout<<"caffe parser done"<<endl;
	 assert(blobNameToTensor != nullptr);
    // the caffe file has no notion of outputs, so we need to manually say
    // which tensors the engine should generate
    for (auto& s : outputs)

    // Build the engine

    // Eliminate the side-effect from the delay of GPU frequency boost

    // set up the network for paired-fp16 format, only on DriveCX

	ICudaEngine* engine = builder->buildCudaEngine(*network);

    // we don't need the network any more, and we can destroy the parser

    // serialize the engine, then close everything down


TensorRT support plugin API now.
You can implement custom layer with plugin API rather than to create two TensorRT engine.

This sample can give you some hint about writing a custom layer:


Thank you for your reply. I have another question. When I change the type of custom network layer in the prototxt file to Iplugin, do I need to retrain caffemodel ?


You don’t need to retrain the model.

Plugin layer is considered to be a replacement for a non-supported layer.
Although the type is changed, the functionality of network should be the same.


Thank you very much for your reply.I will learn in the follow-up work under IPlugin usage, since I am in the project of using the multiple TensorRT engine method implementation, prototxt file into three files, I want to ask next, is can’t use caffe parser to translate directly?Can you use the TensorRT API to build network layer implementation layer by layer?


Yes, TensorRT has an API to add layer directly.
Please check our sampleMNISTAPI sample, which is located at ‘/usr/src/tensorrt/samples/’.


Thank you very much for your reply.I implement custom layer with plugin API,but the test results are wrong. I used caffeParser to exchange model,and I want to ask, how do I locate which layer is the problem?


You can print the input/output data of a plugin to check if the behavior within expectation.

Please remember to copy the memory back to CPU first if you create the buffer with cudaMalloc().