Hello,
I am having difficulties finding the origin of an error.
When cudnnGetConvolutionForwardAlgorithm function from cuDNN version 4 was run the function returns CUDNN_STATUS_NOT_SUPPORTED value. The same function called from cuDNN version 5 returns CUDNN_STATUS_ARCH_MISMATCH. The problem: these outputs are not in the list of possible outputs in the cuDNN library reference.
I started from experimental branch of Caffe and added dilation and few custom layers. The changes work without cuDNN support, thus it is possible some changes necessary for cuDNN support are missing.
The input and the output data tensor descriptors are generated using the following set of instructions:
- cudnnCreateTensorDescriptor(desc)
- cudnnSetTensor4dDescriptorEx(*desc, dataType::type, n, c, h, w, stride_n, stride_c, stride_h, stride_w)
Then the filters:
- cudnnCreateFilterDescriptor(desc)
- cudnnSetFilter4dDescriptor(*desc, dataType::type, CUDNN_TENSOR_NCHW, n, c, h, w) (on cuDNN version 5) cudnnSetFilter4dDescriptor_v4(*desc, dataType::type, CUDNN_TENSOR_NCHW, n, c, h, w) (on version 4)
The convolution:
- cudnnCreateConvolutionDescriptor(conv_desc)
- cudnnSetConvolutionNdDescriptor(*conv_desc, 2, padA, strideA, upscaleA, CUDNN_CROSS_CORRELATION, dataType::type)
Finally function call returning bad value:
- cudnnGetConvolutionForwardAlgorithm(Caffe::cudnn_handle(), bottom_descs_[i], fwd_filter_desc_, fwd_conv_descs_[i], top_descs_[i], CUDNN_CONVOLUTION_FWD_SPECIFY_WORKSPACE_LIMIT, workspace_limit_bytes, &fwd_algo_[i])
While using cuDNN version 4, the return value CUDNN_STATUS_NOT_SUPPORTED made me standing in place for some time as the possible return values of the function are CUDNN_STATUS_SUCCESS and CUDNN_STATUS_BAD_PARAM. The actual return value was not mentioned. But I encountered in the library reference for cudnnSetTensor4dDescriptorEx following:
“At present, some cuDNN routines have limited support for strides; Those routines will return CUDNN_STATUS_NOT_SUPPORTED if a Tensor4D object with an unsupported stride is used. cudnnTransformTensor can be used to convert the data to a supported layout.”,
so i tried using cudnnSetTensorNdDescriptor method but the output was the same. The strides are following integers: 270000, 90000, 300 and 1, thus they should not be a problem.
As for cuDNN version 5 and CUDNN_STATUS_ARCH_MISMATCH return value, I am using GeForce 980 TI so GPU compute capability should not be the problem (5.2, while > 3.0 is necessary).
I am working on Ubuntu 14.04 (64-bit), GPU driver version is 352.63 and CUDA version 7.5.