I’m using using OnnxRuntime to run a float 16 model on GPU, it can’t get the correct result. So far, I found it doesn’t get the correct result from the BatchNorm and MaxPool. The result is inf. In the code it use the API cudnnBatchNormalizationForwardInference, cudnnPoolingForward. I tested the code, it works good for float32 and double type. I really don’t know what’s wrong for float 16 case.
Here’s the code:
and the test code
https://github.com/Microsoft/onnxruntime/blob/master/onnxruntime/test/providers/cpu/nn/batch_norm_op_test.cc
I also re-implement the BatchNorm using cuda kernel code instead of cuddn, it works for float 16. Anyone can help to point out what’s wrong here?