Create TensorRT net error using DataType::kHALF

I have built a SSD net using TensorRT 3.0 in TX2 with some plugin layers such as reshape, permute and so on. When I use DataType::kHALF to create the TensorRT net, it comes the error as follows:

ERROR: Internal error: could not find any implementation for node fc6 + relu6, try increasing the workspace size with IBuilder::setMaxWorkspaceSize()
ERROR: cudnnBuilder2.cpp (452) - OutOfMemory Error in buildSingleLayer
sample_SSD: SSD.cpp:108: void caffeToGIEModel(const string&, const string&, const std::vector<std::__cxx11::basic_string<char> >&, unsigned int, nvcaffeparser1::IPluginFactory*, nvinfer1::IHostMemory**): Assertion `engine' failed.
Aborted (core dumped)

I create the TensorRT net as follows:

void caffeToGIEModel(const std::string& deployFile,					// name for caffe prototxt
					 const std::string& modelFile,					// name for model
					 const std::vector<std::string>& outputs,		// network outputs
					 unsigned int maxBatchSize,						// batch size - NB must be at least as large as the batch we want to run with)
					 nvcaffeparser1::IPluginFactory* pluginFactory,	// factory for plugin layers
					 IHostMemory **gieModelStream)					// output stream for the GIE model
{
	// create the builder
	IBuilder* builder = createInferBuilder(gLogger);

	// parse the caffe model to populate the network, then set the outputs
	INetworkDefinition* network = builder->createNetwork();
	ICaffeParser* parser = createCaffeParser();
	parser->setPluginFactory(pluginFactory);

	bool fp16 = builder->platformHasFastFp16();
	
	std::cout << "Begin parsing model..." << std::endl;
	const IBlobNameToTensor* blobNameToTensor = parser->parse(locateFile(deployFile).c_str(),
						locateFile(modelFile).c_str(),
						*network,
				                fp16 ? nvinfer1::DataType::kHALF : nvinfer1::DataType::kFLOAT);
	std::cout << "End parsing model..." << std::endl;
	// specify which tensors are outputs
	for (auto& s : outputs)
		network->markOutput(*blobNameToTensor->find(s.c_str()));

	// Build the engine
	builder->setMaxBatchSize(maxBatchSize);
	builder->setMaxWorkspaceSize(10 << 20);	// we need about 6MB of scratch space for the plugin layer for batch size 5
	builder->setHalf2Mode(fp16);
	ICudaEngine* engine = builder->buildCudaEngine(*network);
	assert(engine);	
	std::cout << "End building engine..." << std::endl;

	// we don't need the network any more, and we can destroy the parser
	network->destroy();
	parser->destroy();

	// serialize the engine, then close everything down
	(*gieModelStream) = engine->serialize();

	engine->destroy();
	builder->destroy();
	shutdownProtobufLibrary();
}

I set the setMaxWorkspaceSize lager such as “16<<20” or even lager, it also comes the same error.
When I set fp16=false, it runs successfully.
Could someone give me some suggestions? Thank you in advance!

Hi,

This is a DeepStream for Tesla board. For TX2 issue, please file topic here:
https://devtalk.nvidia.com/default/board/188/jetson-tx2/

For this issue, could you set the setMaxBatchSize() smaller and give it a try.
This may be a known issue but requiring further confirming.

Thanks.

Hello! are some of you able to share the code for this? I have never done CUDA or TensorRT before, so it would be really helpful.

Hi, tianfangzhang

We provide lots of sample for CUDA/TensorRT/DeepStream.
Please check the following path for the samples you want:

CUDA:

/usr/local/cuda-9.0/bin/cuda-install-samples-9.0.sh .
cd NVIDIA_CUDA-9.0_Samples
make

TensorRT:

cp -r /usr/src/tensorrt/ .
cd tensorrt/samples/
make

DeepStream:

cd deepstream/samples/decPerf
./run.sh

Thanks and Happy New Year : )