I am currently using cuDNN to develop a basic CNN architecture for a project. This includes convolutional, relu, maxpool, fully-connected and softmax activation layers. I am stuck with the idea of tensor descriptors to be passed to the forward and backward routines of cuDNN.
- What do the descriptors really store? Do they just provide the shape information
NCHWfor the input (or output float* buffer whichever they are describing!)?
- Do the tensor descriptors point to a location in the device memory for the buffers? What is their purpose considering the memory perspective?
My second question is on computational graphs on cuDNN.
- Does cuDNN even make a computational graph for gradient flow for the entire network model? Or does it just store local buffer values and computes gradients with what is passed to the backward functions i.e., cuDNN is just a bunch of independent compute functions?
- If I change the input/ output tensor descriptors of consecutive layers, how will that affect my network? For example, consider the simple
CONV- RELUpipeline. Is it necessary that the input descriptor of the RELU forward has to be the output descriptor of the CONV forward? Does cuDNN track tensors like that like if they are not connected properly, the computational graph (if there is any) would be incomplete?
Please help me with understanding these as they are bugging me for long.