I assume that you mean cudnnGetPoolingNdForwardOutputDim does not exactly follow what Caffe expects.
In any case, the actual pooling routine cudnnPoolingForward will respect the dimension of the output tensor descriptor provided. In other words, if the output tensor provided is a bit smaller to what cudnnGetPoolingNdForwardOutputDim would have advised, cudnnPoolingForward will not write out of the bounds of the provided output descriptor, thus should provide identical results than CAffe
I have found a situation where the convolution layer does not work that way; meaning if I provide an output tensor that is different than what cudnn recommends for the convolution layer I can get undefined results (reading into memory past the input buffer). Is this interesting or is this just user error?
For pooling, we have verified explicitly that #2 is correct.
However, you are right that for convolution, there are some cases when you provide an output tensor different than what cudnn recommends, you can get undefined results. We are in the process of fixing them.
If you can provide your use case (convolution descriptor config, input/output tensor descriptors), we will make sure that it works.
My use case does work honestly; I was producing incorrect layer sizes and getting nans in output. I then tracked it down and studied caffe which seems to be something of a gold standard and that is where I saw that they used two different ways of calculating the output width depending on if the layer was pooling or convolution.
Once I implemented this everything worked just fine so cudnn. I think for my use case an error would have been better as my bug just had a code. Tracking down the nans can really take time but it was arguably more useful in the long run to have had to do it ;).
Since caffe uses ceiling and cudnn doesn’t the described scenario is irrelevant as the suggested cudnnGetPoolingNdForwardOutputDim tensor will always be equal to or smaller than one sized to match caffe. I.e. the question is if cudnn pooling will respect a slightly to large output tensor and not fill the border with garbage.