Error with Concatenate Layer in TensorRT2

AndreiStoian · May 15, 2017, 4:46pm

Hi,

I have an network that uses a concatenate layer. Initially it concatenated 3 input blobs but this does not seem to be supported so I changed the layer to two layers concatenating the first and second blob initially and then the intermediary result with the third blob. This finally stopped an error in the IBuilder::validate method called from IBuilder::buildCudaEngine.

Now I have another error, also in buildCudaEngine but in a different sub-function: IBuilder::buildSingleLayer. It occurs when the network optimizer encounters the first concatenate layer. The error is the following:

cudnnBuilder2.cpp:1528: std::unique_ptr<nvinfer1::cudnn::Layer>
    nvinfer1::builder::buildSingleLayer(
         nvinfer1::cudnn::EngineBuildContext&, 
         const nvinfer1::builder::Node&, 
         const EngineTensors&, const EngineTensors&): 

Assertion `0' failed.

The layer proto is reassembles the following:

layer {
  name: "concat2"
  type: "Concat"
  bottom: "conv_1"
  bottom: "conv_2"
  top: "concat_2_output"
  concat_param {
    axis: 1
  }
}

Is the Concatenation layer well supported in TensorRT2? It seems present in the documentation (IConcatenationLayer) but it’s not in the list given here: https://devtalk.nvidia.com/default/topic/997770/tensor-rt-supports-caffe-model-layers-/. Will the concat_param::axis=1 parameters be handled?

Note: I’m running this on a GTX 1070 but I’m targeting the TX1/2 and the P4. Would running directly on those boards make any difference?

Andrei Stoian
R&D Engineer, Thales Services SAS

AastaLLL · May 16, 2017, 2:59am

Hi,

Thanks for your question.
Supported layers can be found in document which is located at ‘/usr/share/doc/gie/’.

Concat layer is not contained in tensorRT-1.0 but already included into tensorRT-3.0, our latest version. (Not available yet)

For your last question, code will be the same but please re-compile your model with aarch64 tensorRT library to make it compatible to TX1 GPU architecture.

Thanks.

AndreiStoian · May 16, 2017, 7:57am

Thank you for your answer.

Is there a release date for TensorRT 3?

Since I’m looking to get this working until the end of summer, I guess that then I could implement it myself in CUDA and use two contexts, one for the part before the concat and one for the part after like you show here: [url]https://devtalk.nvidia.com/default/topic/997770/tensor-rt-supports-caffe-model-layers-/[/url]. Is there any performance penalty when taking this approach ?

AastaLLL · May 17, 2017, 2:45am

Hi,

Sorry for that we can’t disclosure any schedule plan.
Please pay attention to our announcement and update.

For not supported layer, please add your own layer into tensorRT flow.

IExecutionContext *contextA =
engineA->createExecutionContext();
IExecutionContext *contextB =
engineB->createExecutionContext();
<...>
contextA.enqueue(batchSize, buffersA, stream, nullptr);
myLayer(outputFromA, inputToB, stream);
contextB.enqueue(batchSize, buffersB, stream, nullptr);

If you implement myLayer with CUDA, there is no extra penalty. (ex. cpu <-> gpu memory copy)

AndreiStoian · May 17, 2017, 9:43pm

I will try to use this method to split the network. In fp32 mode, as you say, I guess there should be no extra penalty when running a custom CUDA kernel on the output from the first context execution.

However, if I want to use fp16 or int8 mode, would there be a penalty? From what I see, TensorRT automatically converts/dequantizes the output tensors from fp16 or int8 to fp32. After my kernel is run if I want to run the second part of the network in int8/fp16, a new conversion/quantization would be necessary. Is there a way to get the fp16 or int8 raw output data directly from TensorRT?

Additionally, could you elaborate on the int8 to fp32 conversion and on how fp32 values are quantized to int8?

AastaLLL · May 18, 2017, 2:23am

Hi,

Thanks for your feedback.

For simplicity, you can set input/output to be fp32 type and execute tensorRT on fp16 mode.
Conversion will be applied automatically and you can handle myLayer() function with general float format.

For better performance, you can handle myLayer() function with float16 directly.
We follow standard fp16 format: Half-precision floating-point format - Wikipedia
Conversion code from float to FP16 is available in the NVIDIA® CUDA® library for GPU execution.

AndreiStoian · May 18, 2017, 12:17pm

That sounds interesting, I looked at the doc and I found a ITensor::setType function, I’m guessing this is the function to enable/disable conversion on the input/output layers?

Thanks!

AastaLLL · May 19, 2017, 4:36am

Hi,

In compuatational mode=FP16, TensorRT can accept input or output data in either FP32 or FP16 mode.
You can change to use any combinations below for input and output:
• Input FP32, output FP32
• Input FP16, output FP32
• Input FP16, output FP16
• Input FP32, output FP16

setAllNetworkInputsToHalf(network);

static void setAllNetworkInputsToHalf(INetworkDefinition* network){
    for (int i = 0; i < network->getNbInputs(); i++)
        network->getInput(i)->setType(DataType::kHALF);
}

Thanks.

AndreiStoian · May 19, 2017, 8:13am

Excellent! I’ll use that then.

One more question about reduced precision: In the sampleGoogleNet.cpp example, for parsing the caffe model the code is :

DataType modelDataType = useFp16 ? DataType::kHALF : DataType::kFLOAT; // create a 16-bit model if it's natively supported
	const IBlobNameToTensor *blobNameToTensor =
		parser->parse(locateFile(deployFile).c_str(),	// caffe deploy file
		locateFile(modelFile).c_str(),		// caffe model file
		*network,				// network definition that the parser will populate
		modelDataType);

Does modelDataType reflect the data type that the parser expects to find the in the caffemodel file or does it imply that the parser will read fp32 from the caffemodel and convert to fp16 if modelDataType=kHALF ?

Since I’m training with vanilla Caffe in fp32 all my caffemodels are in fp32. Should I thus always be passing DataType::kFLOAT to ICaffeParser::parse?

AastaLLL · May 22, 2017, 8:17am

Hi,

Thanks for your feedback.

If you want to run fp16 mode, it’s required to construct a kHALF tensorRT model.

parser->parse(locateFile(deployFile).c_str(), locateFile(modelFile).c_str(), *network, DataType::kHALF);
builder->setHalf2Mode(true)

But please notice that you still can input/output float or fp16 type. If float type is used, conversion will be called automatically.

AndreiStoian · May 22, 2017, 2:47pm

Ok, thanks for the info!

I have a few more questions about quantization in TensorRT, I’ll post them here unless you think there’s a more adequate forum section.

In the TensorRT guide it says:

I can’t find the ‘accompanying white paper’, where could I get it?

Does the Jetson TX2 support int8 operations in hardware?

AastaLLL · May 23, 2017, 5:50am

Hi,

Could you share which TensorRT guide you read? Then we can find the corresponding white paper.
No, int8 only works on P4/P40/TitanX/…, not for TX1 and TX2.

Thanks.

AndreiStoian · May 23, 2017, 7:14am

It’s in the TensorRT User Guide.html that is installed by the deb package to /usr/share/doc/gie/doc/. It is provided by the libnvinfer-dev ver. 2.0.0-1+cuda8 package.

Thanks for the info on the int8 support!

tolstikh · June 13, 2017, 6:57pm

I am confused, the TensorRT2 user guide states that the build phase performs:

“elision of concatenation layers by directing layer outputs to the correct eventual destination”

Doesn’t this mean that the outputs are copied appropriately into memory, rather than explicitly executing concat code. This still effectively implementing concat layer, right?

Anyways, I get a failed assertion during the build phase when analyzing the first concat layer

main: cudnnBuilder2.cpp:371: void nvinfer1::builder::checkSanity(const nvinfer1::builder::Graph&): Assertion `readRegions.find(t->region.get()) == readRegions.end()' failed.

Any workarounds, other than splitting up the pipeline with custom concat layer?

Thanks!

AastaLLL · June 14, 2017, 9:24am

Hi,

Do you use tensorRT2.0, tensorRT2.0 only supports desktop GPU and can’t use on Jetson.

tolstikh · June 14, 2017, 2:15pm

Hello,

I came across this post from an online search and didn’t notice its in the embedded section of the forum. I am in fact using a desktop GPU. I made a similar thread in the compute libraries forum.

Thanks.

Topic		Replies	Views
Caffe model with Concatenation layer gives wrong results using TensorRT 3/4 on TX2 Jetson TX2	12	1446	October 18, 2021
cudnn6 slow and problematic on TX2, JetPack 3.1 Jetson TX2	22	2527	October 18, 2021
Segmentation fault when building an ICudaEngine in TensorRT3 Jetson TX2	10	3464	October 18, 2021
Tensor RT supports caffe model layers Jetson TX1	28	10467	October 18, 2021
A simple example on the custom layer API (TensorRT 2.1)? Jetson TX2	28	6574	October 18, 2021
Different FP16 inference with tensorrt and pytorch TensorRT	5	4461	October 25, 2021
TensorRT concatenate layer doesn't work TensorRT	12	2500	April 13, 2020
INT8 calibration file not generating, not building in INT8 mode TensorRT tensorrt , ubuntu , python , jetson-nano	15	2409	June 4, 2022
Concat in Caffe parser is wrong when working with int8 calibration TensorRT cuda	8	1320	May 18, 2022
Could not find any supported formats consistent with input/output data types TensorRT	1	836	April 11, 2023

Error with Concatenate Layer in TensorRT2

Related topics