tensorrt fails to build FP16 model

Hi, I hit the following CUDNN assertion failure when building TensorRT FP16 model with python. Any idea what might be causing it?

python: cudnnLayerUtils.cpp:98: void* nvinfer1::cudnn::getTensorMem(const nvinfer1::cudnn::EngineTensor&, void**, void**): Assertion `start[vectorIndex]%spv == 0' failed.

Below is the python code for converting tensorflow model into UFF and building the TRT engine. The same code works for FP32.

        model_file, output_nodes,
    parser.register_input(uff_input_node, (3, height, width), 0)
    for node in output_nodes:

    # create the TensorRT CUDA engine
    G_LOGGER = trt.infer.ConsoleLogger(trt.infer.LogSeverity.ERROR)
    print("Creating TensorRT CUDA Engine");
    self.trt_engine = trt.utils.uff_to_trt_engine(G_LOGGER, uff_model, parser, max_batch_size, 1 << 24, trt.infer.DataType.HALF)

I second this.

I don’t do this from python but I get exactly the same message.
But also no problem here when using FLOAT32.

camera_server: cudnnLayerUtils.cpp:98: void* nvinfer1::cudnn::getTensorMem(const nvinfer1::cudnn::EngineTensor&, void**, void**): Assertion `start[vectorIndex]%spv == 0’ failed.

nvinfer1::IBuilder* builder = createInferBuilder(gLogger);
nvinfer1::INetworkDefinition* network = builder->createNetwork();

builder->setMinFindIterations(3);	// allow time for TX1 GPU to spin up
builder->setMaxWorkspaceSize(1 << 30);

// parse the caffe model to populate the network, then set the outputs
nvcaffeparser1::ICaffeParser* parser = nvcaffeparser1::createCaffeParser();
mEnableFP16 = true;
nvinfer1::DataType modelDataType = mEnableFP16 ? nvinfer1::DataType::kHALF : nvinfer1::DataType::kFLOAT; // create a 16-bit model if it's natively supported
const nvcaffeparser1::IBlobNameToTensor *blobNameToTensor =
	parser->parse(deployFile.c_str(),		// caffe deploy file
	modelFile.c_str(),		// caffe model file
	*network,					// network definition that the parser will populate
// the caffe file has no notion of outputs, so we need to manually say which tensors the engine should generate
const size_t num_outputs = outputs.size();

for( size_t n=0; n < num_outputs; n++ )
	nvinfer1::ITensor* tensor = blobNameToTensor->find(outputs[n].c_str());

	if( !tensor )
		printf(LOG_GIE "failed to retrieve tensor for output '%s'\n", outputs[n].c_str());
		printf(LOG_GIE "retrieved output tensor '%s'\n", tensor->getName());


// Build the engine
printf(LOG_GIE "configuring CUDA engine\n");

//builder->setMaxWorkspaceSize(16 << 20); //from tensorNet.cpp
builder->setMaxWorkspaceSize(1 << 30); //isn't this better?

// set up the network for paired-fp16 format

printf(LOG_GIE "building CUDA engine\n");
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);
nvinfer1::ICudaEngine* engine = builder->buildCudaEngine(*network);

buildCudaEngine crashes.

Edit: I also want to point out that only TensorRT 3.0 fails at this.

No idea what’s going on…