After studying the document for a few hours, I guess the API does support int8 with tensor cores.
According to the document, cublasGemmEx computeType with tensor core only supports: CUBLAS_COMPUTE_16F, CUBLAS_COMPUTE_32F_FAST_16F, CUBLAS_COMPUTE_32F_FAST_16BF, CUBLAS_COMPUTE_32F_FAST_TF32
Not sure if this is the reason. Would someone confirm this? Thanks.