Deepstream config has net-scale-factor
which you can essentially use to do 1/std
part of normalisation (though not channel-wise). There is also offsets
parameters which can be used to do (x- mean)
part of normalisation (channel-wise).
So now we just need to factor in the fact that e.g. Pytorch uses pixel values scaled to [0,1], while Deepstream does not scale and uses original [0,255] range.
Thus if we have mean=[0.485, 0.456, 0.406]
and std = [0.229, 0.224, 0.225]
transformations on input scaled to [0,1] during training we need to unscale them back to [0,255] range. We can do mean channel-wise and use those values for offsets
.
np.array([0.485, 0.456, 0.406])*255
array([123.675, 116.28 , 103.53 ])
For net-scale-factor
we can unscale the mean value across channels of our std = [0.229, 0.224, 0.225]
that was used in training.
np.array([0.229, 0.224, 0.225]).mean()*255
57.63
And our net-scale-factor
is going to be 1/unscaled std = 1/57.63 = 0.01735207357279195.
And that are the same values used in RetinaNet example mentioned above https://github.com/NVIDIA/retinanet-examples/blob/master/extras/deepstream/deepstream-sample/infer_config_batch1.txt.
By applying those calculated net-scale-factor
and offsets
our models show the same performance during DeepStream inference as when we test them within PyTorch framework.
Hope that helps!