Segmentation fault when building an ICudaEngine in TensorRT3

Hi

Why do I get a segmentation fault when building an ICudaEngine running

// create the builder
	IBuilder* builder = createInferBuilder(gLogger);

	// parse the caffe model to populate the network, then set the outputs
	INetworkDefinition* network = builder->createNetwork();
	ICaffeParser* parser = createCaffeParser();
	parser->setPluginFactory(pluginFactory);

	std::cout << "Begin parsing model..." << std::endl;
	const IBlobNameToTensor* blobNameToTensor = parser->parse(locateFile(deployFile).c_str(),
		locateFile(modelFile).c_str(),
		*network,
		DataType::kFLOAT);
	std::cout << "End parsing model..." << std::endl;
	// specify which tensors are outputs
	for (auto& s : outputs)
		network->markOutput(*blobNameToTensor->find(s.c_str()));

	// Build the engine
	builder->setMaxBatchSize(maxBatchSize);
	builder->setMaxWorkspaceSize(10 << 20);	// we need about 6MB of scratch space for the plugin layer for batch size 5

	std::cout << "Begin building engine..." << std::endl;
	ICudaEngine* engine = builder->buildCudaEngine(*network);
	assert(engine);
	std::cout << "End building engine..." << std::endl;

	// we don't need the network any more, and we can destroy the parser
	network->destroy();
	parser->destroy();

	// serialize the engine, then close everything down
	(*gieModelStream) = engine->serialize();

	engine->destroy();
	builder->destroy();
	shutdownProtobufLibrary();
Begin building engine...

Thread 1 "sample_fasterRC" received signal SIGSEGV, Segmentation fault.
0x00007fffe7994ba7 in nvinfer1::Network::validate(nvinfer1::cudnn::HardwareContext const&, bool, bool, int) const ()
   from /home/TensorRT-3/TensorRT-3.0.1/lib/libnvinfer.so.4

I can correctly build a cuda engine when running the jetson-inference samples. Does this mean that there is something wrong with the network I implemented?

Hi,

It looks like you met the similar error with this topic:
https://devtalk.nvidia.com/default/topic/1027521/why-received-signal-sigsegv-when-import-deploy-prototxt-with-tensor-rt-3-0/

Please check it first.
Thanks.

Hi.

Thank you for providing the link, that helped me find out that it was something wrong with the outputs of a layer.

By the way, does the Concat layer in TensorRT support 2 channel Concat? The layer parameters looks like this:

layer {
  name: "mbox_priorbox"
  type: "Concat"
  bottom: "conv4_3_norm_mbox_priorbox"
  bottom: "fc7_conv_mbox_priorbox"
  bottom: "conv6_2_mbox_priorbox"
  bottom: "conv7_2_mbox_priorbox"
  bottom: "conv8_2_mbox_priorbox"
  bottom: "conv9_2_mbox_priorbox"
  top: "mbox_priorbox"
  concat_param {
    axis: 2
  }
}

And when I try to parse the definition I get the following error:

Parameter check failed at: Network.cpp::addConcatenation::152, condition: first->getDimensions().d[j] == dims.d[j] && "All non-channel dimensions must match across tensors."
error parsing layer type Concat index 96

I do not, however, get the error when I remove the layer from the network. I also see that Concat works for other Concat layers that only uses 1 axis as parameter. Do you think I need to implement my own Concat plugin that supports 2 axis, or is it something I have done wrong in my code?

Hi,

TensorRT only supports channel-axis concatenate (axis=1).
You can find this information in our document:

1.1. TensorRT Layers

Concatenation
The Concatenation layer links together multiple tensors of the same height and width
across the channel dimension.

Thanks.

Thank you, that cleared up the misunderstanding. Also, AastaLLL, what is the most efficient way to do inference. Synchronously with IExecutionContext::execute or asynchronously with IExecutionContext::enqueue? And when executing IExecutionContext::execute will the buffers then be on the GPU or CPU?

Hi,

Buffer for TensorRT is on the GPU.
A user may have some advantage with enqueue(). But still depends on the use case.

Check details in our document:
------------------------------------------------[i]
In a typical production case, TensorRT will execute asynchronously. The enqueue() method will add kernels to a cuda stream specified by the application, which may then wait on that stream for completion. The fourth parameter to enqueue() is an optional cudaEvent which will be signaled when the input buffers are no longer in use and can be refilled.

In this sample we simply copy the input buffer to the GPU, run inference, then copy the result back and wait on the stream:

cudaMemcpyAsync(<…>, cudaMemcpyHostToDevice, stream);
context.enqueue(batchSize, buffers, stream, nullptr);
cudaMemcpyAsync(<…>, cudaMemcpyDeviceToHost, stream);
cudaStreamSynchronize(stream);
[/i]
Thanks.

Hi again Aasta. Do you know if the Softmax layer in TensorRT supports ‘axis: 2’ as parameter? And also am I supposed to get different outputs from layers in TensorRT and the output from layers in Caffe?

Hi,

Currently, TensorRT only support cross-channel SoftMax layer(axis=1).
And the output of TensorRT and Caffe should be similar.

Thanks.

Does this TensorRT’s concatenation require all dimension size to be known before the graph is evaluated? I am also seeing this error when concatenating tensors of shape [None, None, 1, 4]. At evaluation time, the value of the first “None”, i.e. batch size, is same across all tensors.

Hi,

Suppose that you can assign the batch-size when runtime.
But the dimension of other axis should be assinged since TensorRT is not yet supported dynamic input.

Thanks.