cuDNN - Updating weights

I have been exploring the cuDNN library. I am able to create a simple neural network with one convolutional layer and one activation layer. I can propagate input forward through this simple network and i am now looking to backpropagate the error through the network and updating the weights. I am able to propagate the difference and compute the bias gradient and filter (weights) gradient.

However i wonder how i should update the weights using the gradient i.e. w += -alpha*w_gradient. For the bias i used the cudnnAddTensor4d function with CUDNN_ADD_SAME_C. This function allows me to set an alpha (weighting factor).

However in order to update the filter weights i can’t easily use this function as the filter weights are described by the type cudnnFilterDescriptor_t not cudnnTensor4dDescriptor_t.

I could hack this in, creating a cudnnTensor4dDescriptor_t for the filter weights. However i wonder if there is a better way of going about this (something i am missing).

The operation you’re asking for is not supported on cudnnFilterDescriptor_t at the moment. We support w += w_gradient via the “accumulate” mode but not with the scaling parameter needed for w += -alpha*w_gradient. We’re planning on adding it in the next release of cuDNN.

I’m also confused by cudnnAddTensor4d.

Why is alpha a pointer, if it’s a scalar? Why not “double”?

If it’s supposed to point to an array, how long should that array be?

Should alpha point to data on the host or the device?

The manual and the header do not talk about this.

P.S. I’ve figured it out, but I’m leaving this post as a suggestion to improve the documentation in this area

co-ask