I want to convert a mxnet model to tensorRT through caffe parser (mxnet->caffe->tensorRT). The issue is with caffe’s padding convention.
Assume the input is 28x28 (HxW), pooling kernel is 3x3, stride is 2x2, pad is 0, for caffe, the pooling output size is 14x14, for mxnet it’s 13x13.
I notice the API: nvinfer1::INetworkDefinition::setPoolingOutputDimensionsFormula(IOutputDimensionsFormula *formula), and the default formula in each dimension is (inputDim + padding * 2 - kernelSize) / stride + 1. Following this convention, the pooling output size should be 13x13, rather than 14x14.
How can I ensure that pooling layer follows the default convention even if caffe parser is used?
Yes, so the question is how can I choose between the two padding conventions in tensorRT?
Specifically, could you please give an example on the usage of nvinfer1::INetworkDefinition::setPoolingOutputDimensionsFormula(IOutputDimensionsFormula *formula)?
Sorry I do not know that, but if you have trained the model in mxnet. Then you can train it with again with the ‘full’ option and then port it to caffe.
We don’t have a sample to demonstrate setPoolingOutputDimensionsFormula() API, but you can check this document for some information: /usr/share/doc/tensorrt/html/classnvinfer1_1_1_i_output_dimensions_formula.html