I’m just looking through the docs for cuDNN, particularly the cudnnGetConvolutionForwardAlgorithm() function. It states that an error of CUDNN_STATUS_BAD_PARAM will be returned if the size of feature maps in the input and output differ. How is it possible to perform a convolution with 64 channels against a 3 input RGB? It seems to infer that this isn’t possible with the convolution implementation in cuDNN.
Am I misreading this?
Okay, so I’m assuming that cudnnSetConvolutionNdDescriptor() needs to participate in this solution. A single filter matches the same input feature map size (in this case 3 for each R, G, B channel), and multiple filters need to be created (in this case 64 for each output feature map). Can I apply a descriptor initialized via cudnnSetConvolutionNdDescriptor() to a convolution op which results in a single 3 * 64 tensor? I’m assuming that cudnnConvolutionForward() is smart enough to recognize this?
Thinking out loud here…
// Filter
int filterDims[3] = { 64, 3, 3 }; // channels = 64, kernel w=h=3
cudnnSetFilterNdDescriptor(filterDesc, 3, filterDims);
// Convolution
int padding[3] = { 0, 0, 0 }; // z, y, x
int stride[3] = { 1, 1, 1 }; // z, y, x
int upscale[3] = { 1, 1, 1 }; / z, y, x
cudnnSetConvolutionNdDescriptor(convDescr, 3, padding, stride, upscale, CUDNN_CROSS_CORRELATION);
You are reading it correct. The kernel dimensions must match the input dimensions. If you’re thinking of a typical convolutional layer in a deep neural network, it’s a repeated operation. Please read here for more detail: Adit Deshpande – Engineering at Forward | UCLA CS '19
Practically, the cuDNN code in MXnet is a decent example on implementation. From incubator-mxnet/cudnn_convolution-inl.h at 5b99b25e5f6ab3a20c7bcf4821a6af0a1a95f823 · apache/incubator-mxnet · GitHub
Ln 426-440
CUDNN_CALL(cudnnSetFilterNdDescriptor(filter_desc_,
dtype_,
CUDNN_TENSOR_NCHW,
static_cast<int>(wshape.ndim()),
CastTShapeToIntPtr(wshape, &wshape_buffer)));
#else
LOG(FATAL) << "Only support CUDNN V5 for 3D convolution";
#endif
CUDNN_CALL(cudnnSetConvolutionNdDescriptor(forward_conv_desc_,
3,
param_pad_.data(),
param_stride_.data(),
param_dilate_.data(),
CUDNN_CROSS_CORRELATION,
cudnn_forward_compute_type));
Thanks for the reply, bostontam.
So in order to apply the multiple 3 channel filters during the convolution forward operation (with resulting, eg, 64 feature maps), I would use cudnnSetFilterNdDescriptor() to create a filter with shape dimensions (K, C, H, W), where K => feature maps, C => input channels, H => kernel height, W => kernel width? The output tensor created with dimensions (N, K, C, H, W)? Where N = number of images.
Eg to apply a 3x3 convolution with 64 resulting feature maps, to a 3 channel tensor, I would use the dimensions:
cudnnSetFilterNdDescriptor() => K = 64, C = 3, H = 3, W = 3
cudnnSetTensorNdDescriptor() => N = 1, K = 64, C = 3, H = output height, W = output width
Is this a valid assumption?
Just to clarify, a convolution over a 3 channel tensor (RGB image), with 64 filters of size 3x3, would be performed with:
A 4d input tensor, cudnnSetTensor4dDescriptor() => n => 1, c => 3, h = 128, w = 128
A 4D filter, cudnnSetFilter4dDescriptor() => k => 64, c => 3, h => 3, w => 3
A 3D convolution, cudnnSetConvolutionNdDescriptor()
A 5D output tensor, cudnnSetTensorNdDescriptor() => dims => [ 1, 64, 3, 126, 126 ] => [ n, k, c, h, w ]
Surely someone in the world has got some idea how to use cuDNN? Is it that unsupported that nobody is actually using this library anymore?
Okay, ignore all of this. ;) I had somehow convinced myself that a convolution (w=3, h=3), with 64 output features applied to a 3 channel tensor input (RGB), would result in a 64*3 channel output tensor.
Each 3 channel filter applied to a 3 channel tensor does not result in a 3 channel output from this filter, it results in a single channel (eg, feature) output. Therefore, the result is a 64 channel tensor, not a 64*3 channel tensor.
I could slap myself.