cudnn BatchNorm, Pooling get wrong result for float 16 data

I’m using using OnnxRuntime to run a float 16 model on GPU, it can’t get the correct result. So far, I found it doesn’t get the correct result from the BatchNorm and MaxPool. The result is inf. In the code it use the API cudnnBatchNormalizationForwardInference, cudnnPoolingForward. I tested the code, it works good for float32 and double type. I really don’t know what’s wrong for float 16 case.

Here’s the code:
https://github.com/Microsoft/onnxruntime/blob/93bcd9beb6b616f33f723548fa8c180c2bd2239e/onnxruntime/core/providers/cuda/nn/batch_norm.cc#L60-L74

and the test code
https://github.com/Microsoft/onnxruntime/blob/master/onnxruntime/test/providers/cpu/nn/batch_norm_op_test.cc

I also re-implement the BatchNorm using cuda kernel code instead of cuddn, it works for float 16. Anyone can help to point out what’s wrong here?

It turns out that, for cudnnBatchNormalizationForwardInference with input data half type, the alpha, beta, scale, B, mean, var all need to be float type.