int8 calibration problem

SharpYoung · January 10, 2019, 3:12am

After calibration a custom deep net work. I got an error.

net:
layer {
name: “attenstion1/relu1”
type: “ReLU”
bottom: “attention1_max_fc1”
top: “attention1_max_relu1”
}

if i use “attention1_max_fc1” as output, i can get right calibration result.
but when using “attention1_max_relu1” as output, got the following error.

Tensor attention1_max_relu1 is uniformly zero; network calibration failed.
INT8_CALIB: cudnnBuilder2.cpp:996: nvinfer1::cudnn::Engine* nvinfer1::builder::buildEngine(nvinfer1::CudaEngineBuildConfig&, const nvinfer1::cudnn::HardwareContext&, const nvinfer1::Network&): Assertion `it != tensorScales.end()’ failed.

br.

Nfeng · January 10, 2019, 4:48am

Several questions to get the question clarified,

Can you run your network in INT8 mode with our test tool trtexec?
Can you run your network in FP32 or FP16 mode successfully?
INT8 calibration is not compatible across different TensorRT version, so please ensure you are not doing that.
The error indicates the output of “attenstion1/relu1” is all zero, can you dump its output when running you network in FP32 mode and confirm whether this is valid information?

SharpYoung · January 10, 2019, 6:00am

I made some custom layers in my original network. the origin network could work under INT8/FP32/FP16 in my computer;
I add some plugin in tensorrt(version 3.0.2), but the error layer is not the plugin layer. its before the plugin layer.
As you know. if i using the bottom blob as output “attention1_max_fc1”, the output is all zero. But using the top blob as output, then output is all zero.
I am sure about that under FP32 mode (under bvlc caffe platform) the top blob value is not zero.

Nfeng · January 10, 2019, 6:08am

Thanks for your reply.

I think we should figure out firstly why some middle layer output all zero number. Is it observed even in FP32 mode? Can you narrow down it layer by layer?

SharpYoung · January 10, 2019, 7:08am

Hi.

Because of custom layer. My network could be not run under tensorrt platform. But I can calibrate my network layer by layer then I find out that error is in this layer.

Following are part of network.

...
...
...
layer{
	name: "attention1/maxpool"
    type: "Pooling"
    bottom: "stem3"
    top: "attention1_max"
    pooling_param {
    	pool: MAX
    	global_pooling: true    
  }
}
layer {
  name: "attenstion1/fc1"
  type: "InnerProduct"
  param { lr_mult: 1 decay_mult: 1 }
  param { lr_mult: 2 decay_mult: 0 }
  inner_product_param {
    num_output: 4
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
  bottom: "attention1_max"
  top: "attention1_max_fc1"
}
layer {
  name: "attenstion1/relu1"
  type: "ReLU"
  bottom: "attention1_max_fc1"
  top: "attention1_max_relu1"
}
layer {
  name: "attenstion1/fc2"
  type: "InnerProduct"
  param { lr_mult: 1 decay_mult: 1 }
  param { lr_mult: 2 decay_mult: 0 }
  inner_product_param {
    num_output: 32
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
  bottom: "attention1_max_relu1"
  top: "attention1_max_fc2"
}
layer {
  name: "attenstion1/relu2"
  type: "ReLU"
  bottom: "attention1_max_fc2"
  top: "attention1_max_relu2"
}


layer{
	name: "attention1/avgpool"
    type: "Pooling"
    bottom: "stem3"
    top: "attention1_avg"
    pooling_param {
    	pool: AVE
    	global_pooling: true    
  }
}

layer {
  name: "attenstion1/fc1b"
  type: "InnerProduct"
  param { lr_mult: 1 decay_mult: 1 }
  param { lr_mult: 2 decay_mult: 0 }
  inner_product_param {
    num_output: 4
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
  bottom: "attention1_avg"
  top: "attention1_avg_fc1"
}
layer {
  name: "attenstion1/relu1b"
  type: "ReLU"
  bottom: "attention1_avg_fc1"
  top: "attention1_avg_relu1"
}
layer {
  name: "attenstion1/fc2b"
  type: "InnerProduct"
  param { lr_mult: 1 decay_mult: 1 }
  param { lr_mult: 2 decay_mult: 0 }
  inner_product_param {
    num_output: 32
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
  bottom: "attention1_avg_relu1"
  top: "attention1_avg_fc2"
}
layer {
  name: "attenstion1/relu2b"
  type: "ReLU"
  bottom: "attention1_avg_fc2"
  top: "attention1_avg_relu2"
}

layer{
	name: "attention1/eltwise"
    type: "Eltwise"
    bottom: "attention1_avg_relu2"
    bottom: "attention1_max_relu2"
    top: "attention1_Scale"
    eltwise_param {
    	operation: SUM
    }
}
layer {
  name: "attention1/sigmoid"
  type: "Sigmoid"
  bottom: "attention1_Scale"
  top: "attention1_Scale_sig"
}
layer{
	name: "attention1/scale"
    type: "Scale"
    bottom: "stem3"
    bottom: "attention1_Scale_sig"
    top: "stem3_attention1"
    scale_param{
    	axis: 0
    }
}

layer{
	name: "atttention2/reshape"
    type: "Reshape"
    bottom: "stem3_attention1"
    top: "attention2_reshape"
    reshape_param{
    shape{
		dim: 0
        dim: 1
        dim: 32
        dim: -1
    }}
}

layer{
	name: "attention2/max"
    type: "Pooling"
    bottom: "attention2_reshape"
    top: "attention2_max"
    pooling_param{
    	pool: MAX
        kernel_h: 32
        kernel_w: 1        
    }
}
layer{
	name: "atttention2/reshape2"
    type: "Reshape"
    bottom: "attention2_max"
    top: "attention2_max_reshape"
    reshape_param{
    	shape{
		dim: 0
        dim: 1
        dim: 56
        dim: -1
    }}
}
layer{
	name: "attention2/avg"
    type: "Pooling"
    bottom: "attention2_reshape"
    top: "attention2_avg"
    pooling_param{
    	pool: AVE
        kernel_h: 32
        kernel_w: 1        
    }
}
layer{
	name: "atttention2/reshape3"
    type: "Reshape"
    bottom: "attention2_avg"
    top: "attention2_avg_reshape"
    reshape_param{
    shape{
	dim: 0
        dim: 1
        dim: 56
        dim: -1
    }
    }
}

layer{
	name: "attention2/concat"
    type: "Concat"
    bottom: "attention2_avg_reshape"
    bottom: "attention2_max_reshape"
    top: "attention2_concat"
    concat_param:{
    	axis: 1
    }
}
layer {
  name: "attention2/conv"
  type: "Convolution"
  bottom: "attention2_concat"
  top: "attention2_conv"
  convolution_param {
    num_output: 1
    bias_term: false
    pad: 3
    kernel_size: 7
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "attention2/sigmoid"
  type: "Sigmoid"
  bottom: "attention2_conv"
  top: "attention2_conv_sig"
}
layer{
    name: "attention2/tile"
    type: "Tile"
    bottom: "attention2_conv_sig"
    top: "attention2_conv_tile"
    tile_param{
	axis: 1
	tiles: 32
    }
}

layer{
	name: "attention2/scale"
    type: "Scale"
    bottom: "stem3_attention1"
    bottom: "attention2_conv_tile"
    top: "stem3_attention2"
    scale_param{
    	axis: 0
    }
}
...
...
...

The error layer is before the custom layer. (custom layer includes “Tile, Scale, Reshape”)

SharpYoung · January 10, 2019, 7:44am

Hi Nfeng:

This network is used for classification mission, and the expected out put should be “Softmax” layer.

BR

SharpYoung · January 10, 2019, 8:42am

Hi Nfeng:

One more thing, I print my plugin layer top blob data below (every batch). I don’t know what kind of op could be done after loaded very batch.

Scale1Layer debug1: 0.410006, 0.395001, 0.455328, 0.449018
Scale1Layer debug2: 0.483588, 0.859372, 0.667961, 0.496670
Scale1Layer debug3: 0.508751, 0.347732, 0.376068, 0.365252
reshape1 mCopySize is  ++++++++++++++++++++++++++++++++++++++++ 401408
ReshapeLayer1 debug1: 0.410006, 0.395001, 0.455328, 0.449018
ReshapeLayer1 debug2: 0.483588, 0.859372, 0.667961, 0.496670
ReshapeLayer1 debug3: 0.508751, 0.347732, 0.376068, 0.365252
reshape2 mCopySize is  ++++++++++++++++++++++++++++++++++++++++ 12544
ReshapeLayer2 debug1: 0.938205, 0.942541, 0.721910, 0.752248
ReshapeLayer2 debug2: 1.380868, 1.544697, 1.140635, 1.526752
ReshapeLayer2 debug3: 1.334607, 1.099593, 1.011290, 1.269011
reshape2 mCopySize is  ++++++++++++++++++++++++++++++++++++++++ 12544
ReshapeLayer2 debug1: 0.362215, 0.370243, 0.395881, 0.379113
ReshapeLayer2 debug2: 0.360657, 0.391315, 0.375102, 0.399680
ReshapeLayer2 debug3: 0.356505, 0.393370, 0.343840, 0.357661
TileLayer debug1: 0.999999, 1.000000, 1.000000, 1.000000
TileLayer debug2: 1.000000, 0.999999, 0.999999, 1.000000
TileLayer debug3: 0.999999, 0.999994, 0.999931, 0.999698
 +++++++++++++++++++++++ scale 2 layer is ++++++++++++++++++++++++++++++ 
Scale2Layer debug1: 0.410006, 0.395001, 0.455328, 0.449018
Scale2Layer debug2: 0.483588, 0.859371, 0.667961, 0.496669
Scale2Layer debug3: 0.508750, 0.347730, 0.376042, 0.365142
Tensor attention1_max_relu1 is uniformly zero; network calibration failed.
INT8_CALIB: cudnnBuilder2.cpp:996: nvinfer1::cudnn::Engine* nvinfer1::builder::buildEngine(nvinfer1::CudaEngineBuildConfig&, const nvinfer1::cudnn::HardwareContext&, const nvinfer1::Network&): Assertion `it != tensorScales.end()' failed.
Aborted (core dumped)

Scale/reshape/Tile layer are all behind “attention1_max_relu1”.

BR.

Nfeng · January 10, 2019, 8:52am

Looks like you are facing a known issue which has been fixed in TRT 5.0.
Which TRT version are you using?

SharpYoung · January 10, 2019, 9:10am

Hi Nfeng:

You sure this issue is a known issue?
I used Tensorrt version 3.0.2.

BR.

Nfeng · January 10, 2019, 9:13am

TRT 5.0 probably fixed your problem, so please give a try.

SharpYoung · January 10, 2019, 9:40am

Hi Nfeng:

ＯＫ， I will try it.

BR.

SharpYoung · January 10, 2019, 12:27pm

Hi Nfeng:

I still got the error

---------------------------------------- batch
Scale1Layer debug1: 0.414822, 0.528595, 0.473277, 0.526545
Scale1Layer debug2: 0.462024, 0.404506, 0.341671, 0.517626
Scale1Layer debug3: 0.508143, 0.602298, 0.453369, 0.473022
reshape1 mCopySize is  ++++++++++++++++++++++++++++++++++++++++ 401408
ReshapeLayer1 debug1: 0.414822, 0.528595, 0.473277, 0.526545
ReshapeLayer1 debug2: 0.462024, 0.404506, 0.341671, 0.517626
ReshapeLayer1 debug3: 0.508143, 0.602298, 0.453369, 0.473022
reshape2 mCopySize is  ++++++++++++++++++++++++++++++++++++++++ 12544
ReshapeLayer2 debug1: 1.177924, 0.881256, 0.658008, 0.787237
ReshapeLayer2 debug2: 0.839847, 0.831413, 0.690506, 0.695053
ReshapeLayer2 debug3: 0.806642, 0.840020, 0.850429, 0.913184
reshape2 mCopySize is  ++++++++++++++++++++++++++++++++++++++++ 12544
ReshapeLayer2 debug1: 0.382978, 0.337429, 0.310880, 0.386707
ReshapeLayer2 debug2: 0.345608, 0.364051, 0.330987, 0.302146
ReshapeLayer2 debug3: 0.344323, 0.319922, 0.310011, 0.345065
 +++++++++++++++++++++++ scale 2 layer is ++++++++++++++++++++++++++++++ 
Scale2Layer debug1: 0.414821, 0.528594, 0.473277, 0.526545
Scale2Layer debug2: 0.462024, 0.404506, 0.341671, 0.517626
Scale2Layer debug3: 0.508142, 0.602292, 0.453302, 0.472688
Tensor attention1_max_relu1 is uniformly zero; network calibration failed.
../builder/cudnnBuilder2.cpp (1508) - Misc Error in buildEngine: -1 (Could not find tensor stem2a in tensorScales.)
../builder/cudnnBuilder2.cpp (1508) - Misc Error in buildEngine: -1 (Could not find tensor stem2a in tensorScales.)

Nfeng · January 11, 2019, 1:49am

Hi Sharp,

I went through all your comment again carefully, and kind of understand what you are facing.
Generally, TRT doesn’t allow to do calibration for the layer with all output zero. But we did encounter the corner case that breaks this condition, and eventually we removed the constraint for some known layer from TRT 5.0, like Relu6, but those unconstrained layers don’t contain the general Relu in your case.
So I suspected all your output of attention1_max_fc1 is 0 or less than 0, and casuse all the result of following attention1_max_relu1 becomes zero, and then break the condition as well.

I would suggest the following steps,

Use the latest TRT 5.0 for the development and production instead of too old 3.0. And TRT 5.0 provides native support for Reshape and Scale (axis=0 means all the axis to be scaled, right? if it’s true, you can just remove it as TRT does scale like that also).
Dump the output of attention1_max_fc1 and check its data distribution. If all the data is 0 or less than 0, then the result after Relu becomes 0, right? In this case, is it still a valid output for your network?
Tip: how to dump the output of some middle layer? You can just remove the other layer and set the attention1_max_fc1 as the output when building the network.
If the zero result of Relu is indeed a valid input for the remaining, we can reconsider to remove the constraint for general Relu for TRT as well.

SharpYoung · January 11, 2019, 6:52am

Hi Nfeng:

Now I am using tensorrt 5.0.2.6. There are scale layer in version 3.0.2 but scale layer does not accept two bottom layer, and it’s the same as verion 5.0.2.6. And the reshape layer is the same.

attention1/scale: expected 1 bottom blobs, found 2

After I remove the other layers and set attention1_max_fc1 as output I can’t build the network under tensorrt.
I will try to remove this “relu” layer and train the network, to find if this could change the accuracy or not.

br.

SharpYoung · January 11, 2019, 12:37pm

Hi Nfeng:

I saved the problem by removing this relu layer.

Thanks a lot.

br.

SharpYoung · January 16, 2019, 6:54am

Hi Nfeng:

There is something wrong in my calibration file. After I calibrate my model there are some layers are missing.

example:

layer {
  name: "stem1"
  type: "Convolution"
  bottom: "data"
  top: "stem1"
  convolution_param {
    num_output: 32
    bias_term: false
    pad: 1
    kernel_size: 3
    stride: 2
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "stem1/bn"
  type: "BatchNorm"
  bottom: "stem1"
  top: "stem1"
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  batch_norm_param {
    moving_average_fraction: 0.999000012875
    eps: 0.0010000000475
  }
}
layer {
  name: "stem1/scale"
  type: "Scale"
  bottom: "stem1"
  top: "stem1"
  scale_param {
    filler {
      value: 1.0
    }
    bias_term: true
    bias_filler {
      value: 0.0
    }
  }
}

But I could only get “stem1” could not get “stem1/scale”,“stem1/bn”;

stem1: 3d588566

Nfeng · January 16, 2019, 7:11am

BN and scale are fused into your convolution layer during network optimization, so in the final graph TRT executes, these two layers are not existing at all.

rotorliu · May 29, 2019, 3:07am

Hi Nfeng:

Now I am using tensorrt 5.0.2.6. There are scale layer in version 3.0.2 but scale layer does not accept two bottom layer, and it’s the same as verion 5.0.2.6. And the reshape layer is the same.
attention1/scale: expected 1 bottom blobs, found 2
After I remove the other layers and set attention1_max_fc1 as output I can’t build the network under tensorrt.

I will try to remove this “relu” layer and train the network, to find if this could change the accuracy or not.

br.

Hi Sharp and Nfeng,

I have a question about scale layer. How do you fix this?

Nfeng · May 29, 2019, 5:26am

Yes, TensorRT doesn’t support scale layer with two bottoms.
I think there are two options,

implement this layer through IPlugin, but this may introduce too much overhead for your network.
when TensorRT fuses scale layer with convolution, it extracts the scale weight and compute it into convolution weight and remove all scale layers. In nature, it’s an optimization from math perspective. If I understand it right, this way doesn’t care about along with which axis you are going to scale. So, you can achieve it through the similar way after training done, and then there will be no any scale layer existing when you deploy it on TensorRT.

rotorliu · May 29, 2019, 7:37am

In first, Thank you very much!

In my network, the scale layer is used to implement channel-wise scale for SENet block.
so the option 2 isn’t suitable.

In TensorRT, Is there better method to implement channel-wise scale?

Topic		Replies	Views
TensorRT fails to build FasterRCNN GIE model with using INT8 TensorRT	28	9209	May 3, 2018
TensorRT's nvinfer1::INetworkDefinition::addFullyConnected() does not work as expected for C3D network DeepStream SDK tensorrt	30	1804	October 12, 2021
I don't get similar results with TensorRT and the trained tensorflow model! Jetson TX2	20	4483	October 18, 2021
TensorRT 4.0.1 - Int8 precision Vs. FP32 precision objects detections inference results TensorRT	12	3419	December 1, 2019
[TensorRT] ERROR: Parameter check failed at: Utils.cpp::reshapeWeights::71, condition: input.values != nullptr TensorRT	13	5618	October 12, 2021
TF-TRT INT8 Failing to convert due to no calibration TensorRT	3	1383	April 2, 2019
TensorRT YOLO inference error Jetson TX1	21	12419	October 18, 2021
Issues with dynamic shapes Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize() TensorRT	6	1413	June 14, 2022
How to build the objection detection framework SSD with tensorRT on tx2? Jetson TX2	96	21869	February 21, 2018
TAO toolkit fails to convert RetinaNet INT8 etlt model to INT8 CUDA engine (calibration cache needs to be deleted?) TAO Toolkit tensorrt , cuda	4	459	June 10, 2022

int8 calibration problem

Related topics