I want to convert a mxnet model to tensorRT through caffe parser (mxnet->caffe->tensorRT). The issue is with caffe’s padding convention.
Assume the input is 28x28 (HxW), pooling kernel is 3x3, stride is 2x2, pad is 0, for caffe, the pooling output size is 14x14, for mxnet it’s 13x13.
I notice the API: nvinfer1::INetworkDefinition::setPoolingOutputDimensionsFormula(IOutputDimensionsFormula *formula), and the default formula in each dimension is (inputDim + padding * 2 - kernelSize) / stride + 1. Following this convention, the pooling output size should be 13x13, rather than 14x14.
How can I ensure that pooling layer follows the default convention even if caffe parser is used?