It seems cudnnConvolutionBiasActivationForward supports half-precision and group, but not at the same time.
I wonder anyone else has encounter this situation and is there anything I missed to support half-precision group convolution. (all following tests are made with same piece of code)
filter: 32x1x3x3, stride: 1,1, padding: 1,1, group: 32
half-precision cudnnConvolutionBiasActivationForward generate correct output for only output channel 0, for other channels the result are totally different with single-precision output(single-precision output has been verified with other CPU implementation).
Other tests have been taken:
On group == 1, half-precision cudnnConvolutionBiasActivationForward generate correct result with acceptable errors compare to single-precision version result.
On group == 32, single-precision generate correct result compare to other CPU implementation result.