Batch Normalization / Scale Layer

Hi,
In TensorRT documentation it’s mention Batch Normalization layer are not supported, but can be implemented using a Scale layer.

If I have a pre-trained (caffemodel + deploy.prototxt) that contain batch normalization layers, can I still use it in TensorRT? After doing conversions in deploy.prototxt?

If conversion are required in deploy.prototxt, how to do the conversion?

Do I need to re-train after replacing Batch Normalization layer with Scale layer?

Thank you.

Hi,

Please change Batch Normalization layer in the prototxt to Scale layer first.

Update: TensorRT can parse “BatchNorm” directly.
Sorry for any inconvenience.

Thanks.

How TensorRT identify the new "scale "layer from BatchNormalization and the original scale layer?

Hi,

Sorry for the miss-understanding.

If you are using Caffe .prototxt file, TensorRT can parse the type ‘BatchNorm’ layer directly.
For example,

layer {
  name: "Layer1/BatchNorm"
  type: "BatchNorm"
  bottom: "Layer1"
  top: "Layer1/BatchNorm"
  ...
}

Thanks and sorry for any inconvenience.

hi @AastaLLL
I have a BatchNorm Layer in my caffe deploy file with a cafe model weight file

layer {
  name: "relu_decoder_1_1"
  type: "ReLU"
  bottom: "conv_decoder_1_1"
  top: "relu_decoder_1_1"
}

I have compiled my code which use TensorRT
when I run the my program I have a Error

`Reading symbols from build/caffeTest...done.
  db) r  data/bn_conv_merged_model.prototxt data/bn_conv_merged_weights.caffemodel data/img.jpg
Starting program: /home/nvidia/iv-system/build/caffeTest data/bn_conv_merged_model.prototxt data/bn_conv_merged_weights.caffemodel data/img.jpg
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
tensorRT start... 
start caffe_to_giemodel

Program received signal SIGSEGV, Segmentation fault.
0x0000007fb469e7ec in CaffeWeightFactory::operator()(std::string const&, WeightType) () from /usr/lib/aarch64-linux-gnu/libnvcaffe_parser.so.3
(gdb) bt 
#0  0x0000007fb469e7ec in CaffeWeightFactory::operator()(std::string const&, WeightType) () from /usr/lib/aarch64-linux-gnu/libnvcaffe_parser.so.3
#1  0x0000007fb4696f30 in parseBatchNormalization(nvinfer1::INetworkDefinition&, ditcaffe::LayerParameter const&, CaffeWeightFactory&, BlobNameToTensor&) ()
   from /usr/lib/aarch64-linux-gnu/libnvcaffe_parser.so.3
#2  0x0000007fb469b79c in CaffeParser::parse(char const*, char const*, nvinfer1::INetworkDefinition&, nvinfer1::DataType) () from /usr/lib/aarch64-linux-gnu/libnvcaffe_parser.so.3
#3  0x0000000000425264 in TensorRT::caffe_to_giemodel (this=0x7fffffec70, deployFile="./data/bn_conv_merged_model_tensorrt.prototxt", modelFile="data/bn_conv_merged_weights.caffemodel", 
    outputs=std::vector of length 1, capacity 1 = {...}, maxBatchSize=1, calibrator=0x0, gieModelStream=@0x7fffffec88: 0x19) at /home/nvidia/iv-system/core/tensorrt.cpp:109
#4  0x0000000000424d24 in TensorRT::init (this=0x7fffffec70, model="./data/bn_conv_merged_model_tensorrt.prototxt", weights="data/bn_conv_merged_weights.caffemodel", 
    input_names=std::vector of length 1, capacity 1 = {...}, output_names=std::vector of length 1, capacity 1 = {...}, batch_size=1) at /home/nvidia/iv-system/core/tensorrt.cpp:50
#5  0x000000000041f514 in lane_mark (model="./data/bn_conv_merged_model_tensorrt.prototxt", weights="data/bn_conv_merged_weights.caffemodel", img="data/img.jpg")
    at /home/nvidia/iv-system/caffeTest.cpp:47
#6  0x0000000000420100 in main (argc=4, argv=0x7ffffff1f8) at /home/nvidia/iv-system/caffeTest.cpp:140
(gdb) q

I can run success without NormBatch Layer ,but the detection result if very bad.
So my question is How should I use the NormBatch Layer? is there a demo?

Hi,

The layer you shared is RELU layer, not a BatchNorm layer.
Could you run the following commands to check if TensorRT can launch your model correctly?

cp -r /usr/src/tensorrt/ .
cd tensorrt/samples/
make
cd ../bin/
./giexec --deploy=/path/to/prototxt --output=/name/of/output

Thanks.

Sorry I paste the wrong layer, Actually I want to paste the following layer:

layer {
  name: "bn_de_1_fullconv2"
  type: "BatchNorm"
  bottom: "cn_de_1_fullconv2"
  top: "cn_de_1_fullconv2"
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  batch_norm_param {
    moving_average_fraction: 0.999000012875
    eps: 0.0010000000475
  }
}

I run the command :

nvidia@p22:~/iv-system/data$ /usr/src/tensorrt/bin/giexec --deploy=bn_conv_merged_model_tensorrt.prototxt  --output=.
deploy: bn_conv_merged_model_tensorrt.prototxt
output: .
Input "data": 3x256x960
could not find output blob .
Engine could not be created
Engine could not be created

In the last of my deploy file, the following layer is the output layer

layer {
  name: "deconv0_4"
  type: "Deconvolution"
  bottom: "relu0_3"
  top: "deconv0_4"
  convolution_param {
    num_output: 8
    bias_term: true
    pad: 1
    kernel_size: 2
    stride: 2
  }
}

Hi,

Please remember to feed the output information to TensorRT.

If the layer you shared in #7 is the final layer, please run the command like this:

/usr/src/tensorrt/bin/giexec --deploy=bn_conv_merged_model_tensorrt.prototxt --output=deconv0_4

Thanks.

Thank you very much @AstaLLL
my deploy file and caffe weights is no porblem ,But my program has still coredump when running it.

nvidia@p22:~/iv-system/data$ /usr/src/tensorrt/bin/giexec --deploy=bn_conv_merged_model_tensorrt.prototxt  --output=deconv0_4
deploy: bn_conv_merged_model_tensorrt.prototxt
output: deconv0_4
Input "data": 3x256x1024
Output "deconv0_4": 8x256x1024
name=data, bindingIndex=0, buffers.size()=2
name=deconv0_4, bindingIndex=1, buffers.size()=2
Average over 10 runs is 7.01573 ms.
Average over 10 runs is 7.01778 ms.
Average over 10 runs is 7.01245 ms.
Average over 10 runs is 7.01409 ms.
Average over 10 runs is 7.00887 ms.
Average over 10 runs is 7.0059 ms.
Average over 10 runs is 7.00026 ms.
Average over 10 runs is 6.99709 ms.
Average over 10 runs is 7.00293 ms.
Average over 10 runs is 6.99771 ms.

the core dump error stack

(gdb) r  data/bn_conv_merged_model.prototxt data/bn_conv_merged_weights.caffemodel data/img.jpg
Starting program: /home/nvidia/iv-system/build/caffeTest data/bn_conv_merged_model.prototxt data/bn_conv_merged_weights.caffemodel data/img.jpg
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
tensorRT start... 
start caffe_to_giemodel

Program received signal SIGSEGV, Segmentation fault.
0x0000007fb469e7ec in CaffeWeightFactory::operator()(std::string const&, WeightType) () from /usr/lib/aarch64-linux-gnu/libnvcaffe_parser.so.3
(gdb) bt
#0  0x0000007fb469e7ec in CaffeWeightFactory::operator()(std::string const&, WeightType) () from /usr/lib/aarch64-linux-gnu/libnvcaffe_parser.so.3
#1  0x0000007fb4696f30 in parseBatchNormalization(nvinfer1::INetworkDefinition&, ditcaffe::LayerParameter const&, CaffeWeightFactory&, BlobNameToTensor&) ()
   from /usr/lib/aarch64-linux-gnu/libnvcaffe_parser.so.3
#2  0x0000007fb469b79c in CaffeParser::parse(char const*, char const*, nvinfer1::INetworkDefinition&, nvinfer1::DataType) () from /usr/lib/aarch64-linux-gnu/libnvcaffe_parser.so.3
#3  0x0000000000425174 in TensorRT::caffe_to_giemodel (this=0x7fffffea90, deployFile="./data/bn_conv_merged_model_tensorrt.prototxt", modelFile="data/bn_conv_merged_weights.caffemodel", 
    outputs=std::vector of length 1, capacity 1 = {...}, maxBatchSize=1, calibrator=0x0, gieModelStream=@0x7fffffeaa8: 0x19) at /home/nvidia/iv-system/core/tensorrt.cpp:109
#4  0x0000000000424c34 in TensorRT::init (this=0x7fffffea90, model="./data/bn_conv_merged_model_tensorrt.prototxt", weights="data/bn_conv_merged_weights.caffemodel", 
    input_names=std::vector of length 1, capacity 1 = {...}, output_names=std::vector of length 1, capacity 1 = {...}, batch_size=1) at /home/nvidia/iv-system/core/tensorrt.cpp:50
#5  0x000000000041f514 in lane_mark (model="./data/bn_conv_merged_model_tensorrt.prototxt", weights="data/bn_conv_merged_weights.caffemodel", img="data/img.jpg")
    at /home/nvidia/iv-system/caffeTest.cpp:47
#6  0x00000000004201c8 in main (argc=4, argv=0x7ffffff018) at /home/nvidia/iv-system/caffeTest.cpp:141

Hi,

For the information you shared in #9, the error should come from the program you implemented.

Have you set the output layer name correctly?
For example, in /usr/src/tensorrt/samples/sampleGoogleNet/sampleGoogleNet.cpp, it should be applied like this:

diff --git a/sampleGoogleNet.cpp b/sampleGoogleNet.cpp
index 470383d..eb31e68 100644
--- a/sampleGoogleNet.cpp
+++ b/sampleGoogleNet.cpp
@@ -23,7 +23,7 @@ static const int BATCH_SIZE = 4;
 static const int TIMING_ITERATIONS = 1000;
 
 const char* INPUT_BLOB_NAME = "data";
-const char* OUTPUT_BLOB_NAME = "prob";
+const char* OUTPUT_BLOB_NAME = "deconv0_4";
 
 
 std::string locateFile(const std::string& input)

Thanks