cuDNN v8.2, CUDA 11.3, Windows, Driver 466.27
All tensors NCHW formatted.
Input dimensions: (2, 3, 4, 4)
Kernel dimensions: (5, 3, 3, 3)
Padding: 1, 1
Stride: 1, 1
Dilation: 1, 1
I have tried to look for the fastest algorithm in this case:
-
cudnnGetConvolutionForwardAlgorithm_v7
The API suggests the fastest algorithm isCUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM
which fails withCUDNN_STATUS_BAD_PARAM
when it comes to actual forward convolution.This algorithm works fine when padding is set to (0, 0).
-
cudnnFindConvolutionForwardAlgorithm
returnsCUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM
(which works!) but I’ve also seen it returnCUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD
which fails.
I haven’t noticed any problems with cudnnGetConvolutionBackwardXXXXAlgorithm_v7
calls, but I guess it’s a good idea to change them to their analogs without the “_v7” suffix.