Ubuntu 16.04
python: 3.6.8
pytorch: 1.1.0 with cuda 10
tensorrt: 5.1.5
cuda: 10
cudnn: 7.5.0
I’m trying to convert the batch norm layer from pytorch to tensorrt.
I found scale layer for tensorrt and decided to use this function to implement.
The batchnorm2d layer consists of two steps as below.

nomalization(zero mean and unit variance)
hat(x) = (xE) / sqrt(Var + eps) 
affine transformation.
y = gamma * hat(x) + beta
so, I implemented the batchnorm2d in tensorrt with two consecutive scale layers.

the first layer do normalization
scale = 1 / sqrt(running_var+eps)
shift = running_mean * 1.0
power = 1.0 
and second layer do the affine transformation
scale = gamma(weight in pytorch)
shift = beta(bias in pytorch)
power = 1.0
ok, done.
But…unfortunately I got different results from tensorrt unlike pytorch.
the l2norm value of the differences is about 0.47.
the l2norm value of differences of other layers (convolution, pooling) are is about 1e5.
what is the matter for my implementation?