Profiling backpropagation of convolutional neural networks


I was using nvprof to profile the forward and backward pass for MobilenetV2 at various levels of pruning and on the backward pass some layers execute the “computeWgradSplitKOffset” kernel while others use “wgrad_alg0_engine” instead.

I was wondering what the different approaches are to these kernels in I assume calculating the gradient or the weights?

Thanks in advance.

Hi @adi8862,

computeWgradSplitKOffset is just a preparation kernel that prepares some offsets for the main kernel, the main compute kernel that really computes the gradient will be launched after that. this combination is typically faster in typical workloads however is more strict about the tensor shape and layout.
wgrad_alg0_engine is more flexible in accepting different tensor shapes and layouts, it computes the offsets on the fly, but may be a bit slower in typical workloads.
cudnn uses internal heuristics that looks at tensor shape and layout etc to find out the fastest kernel possible to execute the compute task .
However you may have more control comparatively if you switch to cudnn v8 backend API.
Below links will help you understand it better.


Thanks for the breakdown. Are these offsets for the im2col operation or for other parts of the matrix multiply as well?

Hi @adi8862,
This kernel is used to calculate the indexes for matrix multiplication. So these offset will be for all the matrix multiplications.