TensorRT4:How to do BatchNorm in scale layer?

As doc say: “Batch Normalization can be implemented using the TensorRT Scale layer.”
[url]Developer Guide :: NVIDIA Deep Learning TensorRT Documentation

In the sample code:
// Create a scale layer with default power/shift and specified scale parameter.
float scale_param = 0.0125f;
Weights power{DataType::kFLOAT, nullptr, 0};
Weights shift{DataType::kFLOAT, nullptr, 0};
Weights scale{DataType::kFLOAT, &scale_param, 1};
auto scale_1 = network->addScale(*data, ScaleMode::kUNIFORM, shift, scale, power);
assert(scale_1 != nullptr);

How to implement Batch Normalization(BN) in network->addScale setting?

If you have the unscaled mean and variance along with epsilon, you can compute the scale and shift via the formula:

scale[i] = (1.0f / sqrt(variance[i])) + epsilon
shift[i] = (-mean[i] * scale[i])

Hi mvillmow,

Thank you for you help!

The answer seems reasonable, but I can not calculate the “shift[i] = (-mean[i] * scale[i])”

In the sample code:

// Create a scale layer with default power/shift and specified scale parameter.
float scale_param = 0.0125f;
Weights power{DataType::kFLOAT, nullptr, 0};
Weights shift{DataType::kFLOAT, nullptr, 0};
Weights scale{DataType::kFLOAT, &scale_param, 1};
auto scale_1 = network->addScale(*data,	ScaleMode::kUNIFORM, shift, scale, power);
assert(scale_1 != nullptr);

shift[i] need to be calculate using “*data”'s data

SO, I think: It is too hard to coding with the expression “auto scale_1 = network->addScale(*data, ScaleMode::kUNIFORM, shift, scale, power);”

Hi mvillmow,

Could you help with more detail? Thanks!

If you want to compute the batch norm using statistics gathered from the inference data set, then you cannot use the scale layer. Scale layer uses weights from the training phase, so to compute batch norm this way, you have to get mean/variance from your training data set. You can use a series of binary and unary elementwise layers to do the same computation.

Hi mvillmow,

After 1 month’s work, I write a plugin to do batchnorm, and after a while, I found it really can do using scale layer.

in the caffe blob, it provide 3 block of data:
data[0][channel] = mean[channel]*scale
data[1][channel] = var[channel]*scale
data[2] = scale

so:
mean[channel] = data[0][channel]/data[2]
var[channel] = data[1][channel]/data[2]
batchNorm = (xi - mean[channel]) / sqrt(var[channel] + 0.00001)

so:
for scale layer, ax + b
a = 1 / sqrt(var[channel] + 0.00001)
b = -mean[channel] / sqrt(var[channel] + 0.00001)

And for caffe, it use batchnorm layer and scale layer to do Batch norm.

so, 2 scale layer can merge into 1:

a2(a1 * x + b1) + b2 = a1a2 * x + a2b1+b2

a = a1a2;
b = a2b1+b2

I was implementing the batchnorm layer from Pytorch weights and bias.

mean = self.running_mean
variance = self.running_var
gamma = self.weight
beta = self.bias

How to implement the gamma and beta in the batch norm layer by using scale layer ?