Resnet101 can't be speed up by TensorRT C++ API,but onnx can,why?

I have implemented Resnet101 using the TensorRT C++ API and BatchNormalization using the plugin layer(cudnnBatchNormalizationForwardInference), but the executeV2 function was still as slow as the pytorch function and not be speed up,why?

you need to use scale layer to implement batch norm… otherwise related graph optimization will be disabled.

shift = (-mean / np.sqrt(var + eps)) * weight + bias
scale = weight / np.sqrt(var + eps)

I implemented batch norm using Constant Layer and Elementwise layer before I used plugin layer(cudnnBatchNormalizationForwardInference) ,it cannot accelerate Resnet101

Thank you very much!