Trying to change the rounding mode when the output type is int8. According to the documentation, and a little experimentation
output = RoundToEven(ClipToInt8(accum_i32 * alpha + bias)) where alpha & bias are float32 and the accumulator is int32
This doesn’t quite match our CPU reference code because it uses purely integer arithmetic:
((accumulator+32) >> 6) + bias
=Floor(accum / 64.0 + 0.5) + bias because (x >> 6) = Floor(x / 64.0)
=RoundNearest(accum / 64.0) + bias always round X.5 up
I know this is a minor issue and round to even is less biased, but is there some way to set the CUDNN rounding mode to round towards negative infinity so that it can match the CPU code exactly?