int8 calibration problem

After calibration a custom deep net work. I got an error.

net:
layer {
name: “attenstion1/relu1”
type: “ReLU”
bottom: “attention1_max_fc1”
top: “attention1_max_relu1”
}

if i use “attention1_max_fc1” as output, i can get right calibration result.
but when using “attention1_max_relu1” as output, got the following error.

Tensor attention1_max_relu1 is uniformly zero; network calibration failed.
INT8_CALIB: cudnnBuilder2.cpp:996: nvinfer1::cudnn::Engine* nvinfer1::builder::buildEngine(nvinfer1::CudaEngineBuildConfig&, const nvinfer1::cudnn::HardwareContext&, const nvinfer1::Network&): Assertion `it != tensorScales.end()’ failed.

br.

Several questions to get the question clarified,

  1. Can you run your network in INT8 mode with our test tool trtexec?

  2. Can you run your network in FP32 or FP16 mode successfully?

  3. INT8 calibration is not compatible across different TensorRT version, so please ensure you are not doing that.

  4. The error indicates the output of “attenstion1/relu1” is all zero, can you dump its output when running you network in FP32 mode and confirm whether this is valid information?

  1. I made some custom layers in my original network. the origin network could work under INT8/FP32/FP16 in my computer;
  2. I add some plugin in tensorrt(version 3.0.2), but the error layer is not the plugin layer. its before the plugin layer.
  3. As you know. if i using the bottom blob as output “attention1_max_fc1”, the output is all zero. But using the top blob as output, then output is all zero.
  4. I am sure about that under FP32 mode (under bvlc caffe platform) the top blob value is not zero.

Thanks for your reply.

I think we should figure out firstly why some middle layer output all zero number. Is it observed even in FP32 mode? Can you narrow down it layer by layer?

Hi.

Because of custom layer. My network could be not run under tensorrt platform. But I can calibrate my network layer by layer then I find out that error is in this layer.

Following are part of network.

...
...
...
layer{
	name: "attention1/maxpool"
    type: "Pooling"
    bottom: "stem3"
    top: "attention1_max"
    pooling_param {
    	pool: MAX
    	global_pooling: true    
  }
}
layer {
  name: "attenstion1/fc1"
  type: "InnerProduct"
  param { lr_mult: 1 decay_mult: 1 }
  param { lr_mult: 2 decay_mult: 0 }
  inner_product_param {
    num_output: 4
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
  bottom: "attention1_max"
  top: "attention1_max_fc1"
}
layer {
  name: "attenstion1/relu1"
  type: "ReLU"
  bottom: "attention1_max_fc1"
  top: "attention1_max_relu1"
}
layer {
  name: "attenstion1/fc2"
  type: "InnerProduct"
  param { lr_mult: 1 decay_mult: 1 }
  param { lr_mult: 2 decay_mult: 0 }
  inner_product_param {
    num_output: 32
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
  bottom: "attention1_max_relu1"
  top: "attention1_max_fc2"
}
layer {
  name: "attenstion1/relu2"
  type: "ReLU"
  bottom: "attention1_max_fc2"
  top: "attention1_max_relu2"
}


layer{
	name: "attention1/avgpool"
    type: "Pooling"
    bottom: "stem3"
    top: "attention1_avg"
    pooling_param {
    	pool: AVE
    	global_pooling: true    
  }
}

layer {
  name: "attenstion1/fc1b"
  type: "InnerProduct"
  param { lr_mult: 1 decay_mult: 1 }
  param { lr_mult: 2 decay_mult: 0 }
  inner_product_param {
    num_output: 4
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
  bottom: "attention1_avg"
  top: "attention1_avg_fc1"
}
layer {
  name: "attenstion1/relu1b"
  type: "ReLU"
  bottom: "attention1_avg_fc1"
  top: "attention1_avg_relu1"
}
layer {
  name: "attenstion1/fc2b"
  type: "InnerProduct"
  param { lr_mult: 1 decay_mult: 1 }
  param { lr_mult: 2 decay_mult: 0 }
  inner_product_param {
    num_output: 32
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
  bottom: "attention1_avg_relu1"
  top: "attention1_avg_fc2"
}
layer {
  name: "attenstion1/relu2b"
  type: "ReLU"
  bottom: "attention1_avg_fc2"
  top: "attention1_avg_relu2"
}

layer{
	name: "attention1/eltwise"
    type: "Eltwise"
    bottom: "attention1_avg_relu2"
    bottom: "attention1_max_relu2"
    top: "attention1_Scale"
    eltwise_param {
    	operation: SUM
    }
}
layer {
  name: "attention1/sigmoid"
  type: "Sigmoid"
  bottom: "attention1_Scale"
  top: "attention1_Scale_sig"
}
layer{
	name: "attention1/scale"
    type: "Scale"
    bottom: "stem3"
    bottom: "attention1_Scale_sig"
    top: "stem3_attention1"
    scale_param{
    	axis: 0
    }
}

layer{
	name: "atttention2/reshape"
    type: "Reshape"
    bottom: "stem3_attention1"
    top: "attention2_reshape"
    reshape_param{
    shape{
		dim: 0
        dim: 1
        dim: 32
        dim: -1
    }}
}

layer{
	name: "attention2/max"
    type: "Pooling"
    bottom: "attention2_reshape"
    top: "attention2_max"
    pooling_param{
    	pool: MAX
        kernel_h: 32
        kernel_w: 1        
    }
}
layer{
	name: "atttention2/reshape2"
    type: "Reshape"
    bottom: "attention2_max"
    top: "attention2_max_reshape"
    reshape_param{
    	shape{
		dim: 0
        dim: 1
        dim: 56
        dim: -1
    }}
}
layer{
	name: "attention2/avg"
    type: "Pooling"
    bottom: "attention2_reshape"
    top: "attention2_avg"
    pooling_param{
    	pool: AVE
        kernel_h: 32
        kernel_w: 1        
    }
}
layer{
	name: "atttention2/reshape3"
    type: "Reshape"
    bottom: "attention2_avg"
    top: "attention2_avg_reshape"
    reshape_param{
    shape{
	dim: 0
        dim: 1
        dim: 56
        dim: -1
    }
    }
}

layer{
	name: "attention2/concat"
    type: "Concat"
    bottom: "attention2_avg_reshape"
    bottom: "attention2_max_reshape"
    top: "attention2_concat"
    concat_param:{
    	axis: 1
    }
}
layer {
  name: "attention2/conv"
  type: "Convolution"
  bottom: "attention2_concat"
  top: "attention2_conv"
  convolution_param {
    num_output: 1
    bias_term: false
    pad: 3
    kernel_size: 7
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "attention2/sigmoid"
  type: "Sigmoid"
  bottom: "attention2_conv"
  top: "attention2_conv_sig"
}
layer{
    name: "attention2/tile"
    type: "Tile"
    bottom: "attention2_conv_sig"
    top: "attention2_conv_tile"
    tile_param{
	axis: 1
	tiles: 32
    }
}

layer{
	name: "attention2/scale"
    type: "Scale"
    bottom: "stem3_attention1"
    bottom: "attention2_conv_tile"
    top: "stem3_attention2"
    scale_param{
    	axis: 0
    }
}
...
...
...

The error layer is before the custom layer. (custom layer includes “Tile, Scale, Reshape”)

Hi Nfeng:

This network is used for classification mission, and the expected out put should be “Softmax” layer.

BR

Hi Nfeng:

One more thing, I print my plugin layer top blob data below (every batch). I don’t know what kind of op could be done after loaded very batch.

Scale1Layer debug1: 0.410006, 0.395001, 0.455328, 0.449018
Scale1Layer debug2: 0.483588, 0.859372, 0.667961, 0.496670
Scale1Layer debug3: 0.508751, 0.347732, 0.376068, 0.365252
reshape1 mCopySize is  ++++++++++++++++++++++++++++++++++++++++ 401408
ReshapeLayer1 debug1: 0.410006, 0.395001, 0.455328, 0.449018
ReshapeLayer1 debug2: 0.483588, 0.859372, 0.667961, 0.496670
ReshapeLayer1 debug3: 0.508751, 0.347732, 0.376068, 0.365252
reshape2 mCopySize is  ++++++++++++++++++++++++++++++++++++++++ 12544
ReshapeLayer2 debug1: 0.938205, 0.942541, 0.721910, 0.752248
ReshapeLayer2 debug2: 1.380868, 1.544697, 1.140635, 1.526752
ReshapeLayer2 debug3: 1.334607, 1.099593, 1.011290, 1.269011
reshape2 mCopySize is  ++++++++++++++++++++++++++++++++++++++++ 12544
ReshapeLayer2 debug1: 0.362215, 0.370243, 0.395881, 0.379113
ReshapeLayer2 debug2: 0.360657, 0.391315, 0.375102, 0.399680
ReshapeLayer2 debug3: 0.356505, 0.393370, 0.343840, 0.357661
TileLayer debug1: 0.999999, 1.000000, 1.000000, 1.000000
TileLayer debug2: 1.000000, 0.999999, 0.999999, 1.000000
TileLayer debug3: 0.999999, 0.999994, 0.999931, 0.999698
 +++++++++++++++++++++++ scale 2 layer is ++++++++++++++++++++++++++++++ 
Scale2Layer debug1: 0.410006, 0.395001, 0.455328, 0.449018
Scale2Layer debug2: 0.483588, 0.859371, 0.667961, 0.496669
Scale2Layer debug3: 0.508750, 0.347730, 0.376042, 0.365142
Tensor attention1_max_relu1 is uniformly zero; network calibration failed.
INT8_CALIB: cudnnBuilder2.cpp:996: nvinfer1::cudnn::Engine* nvinfer1::builder::buildEngine(nvinfer1::CudaEngineBuildConfig&, const nvinfer1::cudnn::HardwareContext&, const nvinfer1::Network&): Assertion `it != tensorScales.end()' failed.
Aborted (core dumped)

Scale/reshape/Tile layer are all behind “attention1_max_relu1”.

BR.

Looks like you are facing a known issue which has been fixed in TRT 5.0.
Which TRT version are you using?

Hi Nfeng:

You sure this issue is a known issue?
I used Tensorrt version 3.0.2.

BR.

TRT 5.0 probably fixed your problem, so please give a try.

Hi Nfeng:

OK, I will try it.

BR.

Hi Nfeng:

I still got the error
---------------------------------------- batch
Scale1Layer debug1: 0.414822, 0.528595, 0.473277, 0.526545
Scale1Layer debug2: 0.462024, 0.404506, 0.341671, 0.517626
Scale1Layer debug3: 0.508143, 0.602298, 0.453369, 0.473022
reshape1 mCopySize is  ++++++++++++++++++++++++++++++++++++++++ 401408
ReshapeLayer1 debug1: 0.414822, 0.528595, 0.473277, 0.526545
ReshapeLayer1 debug2: 0.462024, 0.404506, 0.341671, 0.517626
ReshapeLayer1 debug3: 0.508143, 0.602298, 0.453369, 0.473022
reshape2 mCopySize is  ++++++++++++++++++++++++++++++++++++++++ 12544
ReshapeLayer2 debug1: 1.177924, 0.881256, 0.658008, 0.787237
ReshapeLayer2 debug2: 0.839847, 0.831413, 0.690506, 0.695053
ReshapeLayer2 debug3: 0.806642, 0.840020, 0.850429, 0.913184
reshape2 mCopySize is  ++++++++++++++++++++++++++++++++++++++++ 12544
ReshapeLayer2 debug1: 0.382978, 0.337429, 0.310880, 0.386707
ReshapeLayer2 debug2: 0.345608, 0.364051, 0.330987, 0.302146
ReshapeLayer2 debug3: 0.344323, 0.319922, 0.310011, 0.345065
 +++++++++++++++++++++++ scale 2 layer is ++++++++++++++++++++++++++++++ 
Scale2Layer debug1: 0.414821, 0.528594, 0.473277, 0.526545
Scale2Layer debug2: 0.462024, 0.404506, 0.341671, 0.517626
Scale2Layer debug3: 0.508142, 0.602292, 0.453302, 0.472688
Tensor attention1_max_relu1 is uniformly zero; network calibration failed.
../builder/cudnnBuilder2.cpp (1508) - Misc Error in buildEngine: -1 (Could not find tensor stem2a in tensorScales.)
../builder/cudnnBuilder2.cpp (1508) - Misc Error in buildEngine: -1 (Could not find tensor stem2a in tensorScales.)

Hi Sharp,

I went through all your comment again carefully, and kind of understand what you are facing.
Generally, TRT doesn’t allow to do calibration for the layer with all output zero. But we did encounter the corner case that breaks this condition, and eventually we removed the constraint for some known layer from TRT 5.0, like Relu6, but those unconstrained layers don’t contain the general Relu in your case.
So I suspected all your output of attention1_max_fc1 is 0 or less than 0, and casuse all the result of following attention1_max_relu1 becomes zero, and then break the condition as well.

I would suggest the following steps,

  1. Use the latest TRT 5.0 for the development and production instead of too old 3.0. And TRT 5.0 provides native support for Reshape and Scale (axis=0 means all the axis to be scaled, right? if it’s true, you can just remove it as TRT does scale like that also).

  2. Dump the output of attention1_max_fc1 and check its data distribution. If all the data is 0 or less than 0, then the result after Relu becomes 0, right? In this case, is it still a valid output for your network?
    Tip: how to dump the output of some middle layer? You can just remove the other layer and set the attention1_max_fc1 as the output when building the network.

  3. If the zero result of Relu is indeed a valid input for the remaining, we can reconsider to remove the constraint for general Relu for TRT as well.

Hi Nfeng:

  1. Now I am using tensorrt 5.0.2.6. There are scale layer in version 3.0.2 but scale layer does not accept two bottom layer, and it’s the same as verion 5.0.2.6. And the reshape layer is the same.
attention1/scale: expected 1 bottom blobs, found 2
  1. After I remove the other layers and set attention1_max_fc1 as output I can’t build the network under tensorrt.

  2. I will try to remove this “relu” layer and train the network, to find if this could change the accuracy or not.

br.

Hi Nfeng:

I saved the problem by removing this relu layer.

Thanks a lot.

br.

Hi Nfeng:

There is something wrong in my calibration file. After I calibrate my model there are some layers are missing.

example:

layer {
  name: "stem1"
  type: "Convolution"
  bottom: "data"
  top: "stem1"
  convolution_param {
    num_output: 32
    bias_term: false
    pad: 1
    kernel_size: 3
    stride: 2
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "stem1/bn"
  type: "BatchNorm"
  bottom: "stem1"
  top: "stem1"
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  batch_norm_param {
    moving_average_fraction: 0.999000012875
    eps: 0.0010000000475
  }
}
layer {
  name: "stem1/scale"
  type: "Scale"
  bottom: "stem1"
  top: "stem1"
  scale_param {
    filler {
      value: 1.0
    }
    bias_term: true
    bias_filler {
      value: 0.0
    }
  }
}

But I could only get “stem1” could not get “stem1/scale”,“stem1/bn”;

stem1: 3d588566

BN and scale are fused into your convolution layer during network optimization, so in the final graph TRT executes, these two layers are not existing at all.

Hi Sharp and Nfeng,

I have a question about scale layer. How do you fix this?

Yes, TensorRT doesn’t support scale layer with two bottoms.
I think there are two options,

  1. implement this layer through IPlugin, but this may introduce too much overhead for your network.
  2. when TensorRT fuses scale layer with convolution, it extracts the scale weight and compute it into convolution weight and remove all scale layers. In nature, it’s an optimization from math perspective. If I understand it right, this way doesn’t care about along with which axis you are going to scale. So, you can achieve it through the similar way after training done, and then there will be no any scale layer existing when you deploy it on TensorRT.

In first, Thank you very much!

In my network, the scale layer is used to implement channel-wise scale for SENet block.
so the option 2 isn’t suitable.

In TensorRT, Is there better method to implement channel-wise scale?