cudnnGetConvolutionForwardWorkspaceSize - possible overflow?

I am trying to run an example from the paper “cuDNN: Efficient Primitives for Deep Learning”.
Everything seems to be in order, but the function cudnnGetConvolutionForwardWorkspaceSize seems to have a problem.
For instance, if I set the parameters to those of Layer 2 of Table 2, e.g. (N, C, H, W, K, R, S) = (128, 96, 64, 64, 128, 9, 9), then the function returns an insane number.
If I just change the first number, the batch size, from 128 to 64, the function returns a reasonable 5.8GB, but at 128 it returns a number that is obviously wrong.
I am not excluding the possibility that the error is in my code, however, I think there is a chance that overflow happens in that function.

It is a bug in cudnn. The fix will be available for you in the future release. Thank you for reporting.