I am trying out INT8 and INT8x4 configuration with cudnn. I am specifically trying out the following API. For both – INT8 and INT8X4 – configurations cudnn outputs INT8.
But what I don’t understand is how are we deciding which 8-bits to chose from the 32-bit computed result. Are we simply downcasting the 32-bit result from the cudnn to 8-bit? if not, how does cudnn decide which 8-bits output from 32-bit computed result?
For the aforementioned API I am using following configurations:
Data Type: INT8_CONFIG