I am using cudnnConvolutionBiasActivationForward API and I have a couple of questions regarding this API. I am using this API in CUDNN_INT8x4 mode with the below configurations.
y = act ( alpha1 * conv(x) + alpha2 * z + bias ) from the API usage guide
I am setting alpha2 to zero, so in my case bias will be added to the conv output and activation will be applied to the resulting output. Now, my question is bias is in float so conv output will be casted to float and then bias will be added to the casted output. Then, resulting output will be casted back to INT8 and then activation will be applied. That is my understanding anyway – PLEASE SOMEONE CORRECT ME IF I AM WRONG!
Now my question is what is the use of back and forth casting? Why can’t we keep the bias also in INT8? In that case we do not have to cast back and forth and everything will be in INT8, at least for this mode.