Why Convolution in 8bits with CUDNN6.0 takes more time than fp32 convolution?

Referring to my post here:


I ran the code as here from here https://github.com/jesryu/cudnn_conv_int8/blob/master/src/cudnn_conv_int8.cc:

with changes described by txbob in the above post for int8 as well as fp32.

The iteration time for forward pass which I get for int8 is more than fp32. Why is it so ?

Does the INT8 convolution here not using dp4a ?

I am using Nvidia 1080 TI with INT8 support.

Fp32 :

Begin forward pass
Iteration time: 0.284869ms

Int8 :

Begin forward pass
Iteration time: 1.451339ms