When doing batch normalization cudnn uses (var/batch-size) for the per-batch variances (and var/(batch-size -1) for the running-variances.

Furthermore the documentation states implicitly that the variance is inverse variance (it does not appear to be meaning if I feed it data with a variance of 20 I get back ~20 in the running variances array, not 1/20).

Given input data with a variance per-batch variances of 20:

Given a batch size of 20, I get back a batch variance (savedInvVar) of 20 and a running variances (runningInvVar) of 21.05.

I would not use the batch size as the population for the running variances; I think this is in error. Specifically the point of using the running variances is to dramatically increase the population and thus for small batch sizes the outcome of the running variances does not converge to the dataset variance while it should.