I am trying to get the cuDNN batch normalization play along nicely with our framework that already features a CPU implementation.
In particular, I am currently working on fetching the data together to satisfy the interface’s needs. Your interface design makes this harder than it has to be.
BN consists of 3 easily separable operations that have nothing to do with each other.
Compute mean & variance per channel and normalize.
To me, it is rather limiting that I am forced to execute all three operations during FPROP and BPROP. This is even more confusing because it kind of breaks with established practices, since you also did not hardwire Convolution and AddTensor (add bias) together. Steps 2+3 could easily be realized manually using OpTensor. So why push everything into the same function call? Sometimes I just want to normalize but not multiply gamma and add bias. Is there any way to separately perform FPROP and BPROP only for step 1?