Referring to my post here:
I ran the code as here from here https://github.com/jesryu/cudnn_conv_int8/blob/master/src/cudnn_conv_int8.cc:
with changes described by txbob in the above post for int8 as well as fp32.
The iteration time for forward pass which I get for int8 is more than fp32. Why is it so ?
Does the INT8 convolution here not using dp4a ?
I am using Nvidia 1080 TI with INT8 support.
Fp32 :
Begin forward pass
Iteration time: 0.284869ms
Int8 :
Begin forward pass
Iteration time: 1.451339ms