I’m using using OnnxRuntime to run a float 16 model on GPU, it can’t get the correct result. So far, I found it doesn’t get the correct result from the BatchNorm and MaxPool. The result is inf. In the code it use the API cudnnBatchNormalizationForwardInference, cudnnPoolingForward. I tested the code, it works good for float32 and double type. I really don’t know what’s wrong for float 16 case.
I also re-implement the BatchNorm using cuda kernel code instead of cuddn, it works for float 16. Anyone can help to point out what’s wrong here?