I am somewhat confused about the description of diffData for cudnnConvolutionBackwardData:
“Data pointer to GPU memory associated with the input differential tensor descriptor diffDesc.”
As i understand it i feed the gradData resulting from cudnnConvolutionBackwardData or cudnnActivationBackward from the above layer into the current layer in the form of diffDesc. In this manner errors are propagated down. And for each layer weight derivatives (filter & bias) can be computed.
However what should the diffData be for the top most layer? Currently i am simply using the difference between the labels and the forward propagated output values. But is this the correct thing to do?
Also cudnnConvolutionBackwardData only makes use of the filter Data. However i am also using a bias term. Should this bias term not be included somehow before back propagating further?