cuBLAS GEMM INT8 is much slower than FP16 in T4

After studying the document for a few hours, I guess the API does support int8 with tensor cores.

According to the document, cublasGemmEx computeType with tensor core only supports: CUBLAS_COMPUTE_16F, CUBLAS_COMPUTE_32F_FAST_16F, CUBLAS_COMPUTE_32F_FAST_16BF, CUBLAS_COMPUTE_32F_FAST_TF32

Not sure if this is the reason. Would someone confirm this? Thanks.