Why Convolution in 8bits with CUDNN6.0 takes more time than fp32 convolution?

Referring to my post here:

https://devtalk.nvidia.com/default/topic/1005119/gpu-accelerated-libraries/cudnn-v6-int8-convolution-failing-with-cudnn_status_not_supported/

I ran the code as here from here https://github.com/jesryu/cudnn_conv_int8/blob/master/src/cudnn_conv_int8.cc:

with changes described by txbob in the above post for int8 as well as fp32.

The iteration time for forward pass which I get for int8 is more than fp32. Why is it so ?

Does the INT8 convolution here not using dp4a ?

I am using Nvidia 1080 TI with INT8 support.

Fp32 :

Begin forward pass
Iteration time: 0.284869ms

Int8 :

Begin forward pass
Iteration time: 1.451339ms