cuDNN Batch Normalization - Can I separate operations?

I am trying to get the cuDNN batch normalization play along nicely with our framework that already features a CPU implementation.

In particular, I am currently working on fetching the data together to satisfy the interface’s needs. Your interface design makes this harder than it has to be.

BN consists of 3 easily separable operations that have nothing to do with each other.

  1. Compute mean & variance per channel and normalize.

  2. Multiply gamma

  3. Add bias

To me, it is rather limiting that I am forced to execute all three operations during FPROP and BPROP. This is even more confusing because it kind of breaks with established practices, since you also did not hardwire Convolution and AddTensor (add bias) together. Steps 2+3 could easily be realized manually using OpTensor. So why push everything into the same function call? Sometimes I just want to normalize but not multiply gamma and add bias. Is there any way to separately perform FPROP and BPROP only for step 1?

This is outside my area of expertise, but an educated guess is that multiplying gamma and adding bias are essential free in terms of performance in the context of step (1), so it may make sense to simply roll these operation in by default and let programmers select gamma of 1.0 and bias of 0.0 if this part of the functionality is not needed.

Considering my experience with ConvolutionBackwardBias, I beg to differ. That one can take quite long in some cases and it just broadcasts and adds stuff. However your idea will definitely work. But it also implies that I would have to allocate dummy memory for gradients during the backward phase. It would be nice if that could somehow be avoided.

As I stated, I offered only an “educated guess”. I think the folks familiar with cuDNN check this forum occasionally, so there is a chance you will get an authoritative reply regarding this particular design question.