Convolution with cuDNN

I’m just looking through the docs for cuDNN, particularly the cudnnGetConvolutionForwardAlgorithm() function. It states that an error of CUDNN_STATUS_BAD_PARAM will be returned if the size of feature maps in the input and output differ. How is it possible to perform a convolution with 64 channels against a 3 input RGB? It seems to infer that this isn’t possible with the convolution implementation in cuDNN.

Am I misreading this?

Okay, so I’m assuming that cudnnSetConvolutionNdDescriptor() needs to participate in this solution. A single filter matches the same input feature map size (in this case 3 for each R, G, B channel), and multiple filters need to be created (in this case 64 for each output feature map). Can I apply a descriptor initialized via cudnnSetConvolutionNdDescriptor() to a convolution op which results in a single 3 * 64 tensor? I’m assuming that cudnnConvolutionForward() is smart enough to recognize this?

Thinking out loud here…

// Filter
int filterDims[3] = { 64, 3, 3 }; // channels = 64, kernel w=h=3
cudnnSetFilterNdDescriptor(filterDesc, 3, filterDims);

// Convolution
int padding[3] = { 0, 0, 0 }; // z, y, x
int stride[3] = { 1, 1, 1 }; // z, y, x
int upscale[3] = { 1, 1, 1 }; / z, y, x
cudnnSetConvolutionNdDescriptor(convDescr, 3, padding, stride, upscale, CUDNN_CROSS_CORRELATION);

You are reading it correct. The kernel dimensions must match the input dimensions. If you’re thinking of a typical convolutional layer in a deep neural network, it’s a repeated operation. Please read here for more detail: Adit Deshpande – Engineering at Forward | UCLA CS '19

Practically, the cuDNN code in MXnet is a decent example on implementation. From incubator-mxnet/cudnn_convolution-inl.h at 5b99b25e5f6ab3a20c7bcf4821a6af0a1a95f823 · apache/incubator-mxnet · GitHub

Ln 426-440

CUDNN_CALL(cudnnSetFilterNdDescriptor(filter_desc_,
                                          dtype_,
                                          CUDNN_TENSOR_NCHW,
                                          static_cast<int>(wshape.ndim()),
                                          CastTShapeToIntPtr(wshape, &wshape_buffer)));
      #else
      LOG(FATAL) << "Only support CUDNN V5 for 3D convolution";
      #endif
      CUDNN_CALL(cudnnSetConvolutionNdDescriptor(forward_conv_desc_,
                                               3,
                                               param_pad_.data(),
                                               param_stride_.data(),
                                               param_dilate_.data(),
                                               CUDNN_CROSS_CORRELATION,
                                               cudnn_forward_compute_type));

Thanks for the reply, bostontam.

So in order to apply the multiple 3 channel filters during the convolution forward operation (with resulting, eg, 64 feature maps), I would use cudnnSetFilterNdDescriptor() to create a filter with shape dimensions (K, C, H, W), where K => feature maps, C => input channels, H => kernel height, W => kernel width? The output tensor created with dimensions (N, K, C, H, W)? Where N = number of images.

Eg to apply a 3x3 convolution with 64 resulting feature maps, to a 3 channel tensor, I would use the dimensions:
cudnnSetFilterNdDescriptor() => K = 64, C = 3, H = 3, W = 3
cudnnSetTensorNdDescriptor() => N = 1, K = 64, C = 3, H = output height, W = output width

Is this a valid assumption?

Just to clarify, a convolution over a 3 channel tensor (RGB image), with 64 filters of size 3x3, would be performed with:

A 4d input tensor, cudnnSetTensor4dDescriptor() => n => 1, c => 3, h = 128, w = 128
A 4D filter, cudnnSetFilter4dDescriptor() => k => 64, c => 3, h => 3, w => 3
A 3D convolution, cudnnSetConvolutionNdDescriptor()
A 5D output tensor, cudnnSetTensorNdDescriptor() => dims => [ 1, 64, 3, 126, 126 ] => [ n, k, c, h, w ]

Surely someone in the world has got some idea how to use cuDNN? Is it that unsupported that nobody is actually using this library anymore?

Okay, ignore all of this. ;) I had somehow convinced myself that a convolution (w=3, h=3), with 64 output features applied to a 3 channel tensor input (RGB), would result in a 64*3 channel output tensor.

Each 3 channel filter applied to a 3 channel tensor does not result in a 3 channel output from this filter, it results in a single channel (eg, feature) output. Therefore, the result is a 64 channel tensor, not a 64*3 channel tensor.

I could slap myself.