cuDNN: cudnnGetConvolutionForwardWorkspaceSize fails with bad parameter

I am currently trying to implement a very basic 2D convolution using CUDA cuDNN between an “image” of size 3x3 and a kernel of size 2x2, resulting in a 2x2 output.

This is my code:

    // Create a cuDNN handle:
    cudnnHandle_t handle;
    cudnnCreate(&handle);

    // Create your tensor descriptors:
    cudnnTensorDescriptor_t cudnnIdesc;
    cudnnFilterDescriptor_t cudnnFdesc;
    cudnnTensorDescriptor_t cudnnOdesc;
    cudnnConvolutionDescriptor_t cudnnConvDesc;
    cudnnCreateTensorDescriptor( &cudnnIdesc );
    cudnnCreateFilterDescriptor( &cudnnFdesc );
    cudnnCreateTensorDescriptor( &cudnnOdesc );
    cudnnCreateConvolutionDescriptor( &cudnnConvDesc );

    // Set tensor dimensions as multiples of eight (only the input tensor is shown here):
    // W, H, D, C, N
    const int dimI[] = { I_M, I_N, 1, 1 };
    // Wstride, Hstride, Dstride, Cstride, Nstride
    const int strideI[] = { 1, 1, 1, 1 };
    checkCUDAError( "SetImgDescriptor failed", cudnnSetTensorNdDescriptor(cudnnIdesc, CUDNN_DATA_HALF, 4, dimI, strideI) );

    const int dimF[] = { K_M, K_N, 1, 1 };
    checkCUDAError( "SetFilterDescriptor failed", cudnnSetFilterNdDescriptor(cudnnFdesc, CUDNN_DATA_HALF, CUDNN_TENSOR_NCHW, 4, dimF) );

    const int dimO[] = { I_M - K_M + 1, I_N - K_N + 1, 1, 1 };
    const int strideO[] = { 1, 1, 1, 1 };
    checkCUDAError( "SetOutDescriptor failed", cudnnSetTensorNdDescriptor(cudnnOdesc, CUDNN_DATA_HALF, 4, dimO, strideO) );

    checkCUDAError( "SetConvDescriptor failed", cudnnSetConvolution2dDescriptor(cudnnConvDesc, 0, 0, 1, 1, 1, 1, CUDNN_CONVOLUTION, CUDNN_DATA_HALF) );

    // Set the math type to allow cuDNN to use Tensor Cores:
    checkCUDAError( "SetConvMathType failed", cudnnSetConvolutionMathType(cudnnConvDesc, CUDNN_TENSOR_OP_MATH) );

    // Choose a supported algorithm:
    int algoCount = 0;
    cudnnConvolutionFwdAlgoPerf_t algoPerf;
    checkCUDAError( "GetConvForwardAlgo failed", cudnnFindConvolutionForwardAlgorithm(handle, cudnnIdesc, cudnnFdesc, cudnnConvDesc, cudnnOdesc, 1, &algoCount, &algoPerf) );

    // Allocate your workspace:
    void *workSpace;
    size_t workSpaceSize = 0;
    checkCUDAError( "WorkspaceSize failed", cudnnGetConvolutionForwardWorkspaceSize(handle, cudnnIdesc, cudnnFdesc, cudnnConvDesc, cudnnOdesc, algoPerf.algo, &workSpaceSize) );
    if (workSpaceSize > 0) {
        cudaMalloc(&workSpace, workSpaceSize);
    }

However, cudnnGetConvolutionForwardWorkspaceSize fails with CUDNN_STATUS_BAD_PARAM.

According to API Reference :: NVIDIA Deep Learning cuDNN Documentation

this can only be because of one of the reasons:

    CUDNN_STATUS_BAD_PARAM:
    At least one of the following conditions are met:
    
    (1) One of the parameters handle, xDesc, wDesc, convDesc, yDesc is NULL.
    (2) The tensor yDesc or wDesc are not of the same dimension as xDesc.
    (3) The tensor xDesc, yDesc or wDesc are not of the same data type.
    (4) The numbers of feature maps of the tensor xDesc and wDesc differ.
    (5) The tensor xDesc has a dimension smaller than 3.

I don’t see how any of them are true.

(1) is obviously not the case. Because yDesc, wDesc and xDesc all have 4 dimensions, (2) is also not the case.
Every tensor has the data type CUDNN_DATA_HALF, which is why (3) is also not true.
I don’t know exactly what (4) refers to but I think the number of feature maps for image and kernel is 1 in my case.
And (5) is also not true.

Any idea why the function fails nevertheless?

Hi,

At first glance, the strideI and strideO values look wrong, please make sure them correctly assigned.

const int dimI = { I_M, I_N, 1, 1 };
// Wstride, Hstride, Dstride, Cstride, Nstride
const int strideI = { 1, 1, 1, 1 };

...

const int dimO[] = { I_M - K_M + 1, I_N - K_N + 1, 1, 1 };
const int strideO[] = { 1, 1, 1, 1 };

Also, the cuDNN api log would help us to debug and reproduce with cudnnTest, could you pleaase share logs? Also please let us know the cuDNN version and GPU architecture you’re using.

Thank you.