cuDNN v8.2, CUDA 11.3, Windows, Driver 466.27
All tensors NCHW formatted.
Input dimensions: (2, 3, 4, 4)
Kernel dimensions: (5, 3, 3, 3)
Padding: 1, 1
Stride: 1, 1
Dilation: 1, 1
I have tried to look for the fastest algorithm in this case:
The API suggests the fastest algorithm is
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMMwhich fails with
CUDNN_STATUS_BAD_PARAMwhen it comes to actual forward convolution.
This algorithm works fine when padding is set to (0, 0).
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM(which works!) but I’ve also seen it return
I haven’t noticed any problems with
cudnnGetConvolutionBackwardXXXXAlgorithm_v7 calls, but I guess it’s a good idea to change them to their analogs without the “_v7” suffix.