About cublasGemm INT8 support


Several high-level resources about cuBLAS mention the support of INT8 matrix multiplication (in this cuBLAS introduction, this blog post or this one).

However, after looking at the online documentation and doing some actual experiments on a Titan X Pascal, it is unclear to me whether cublasGemm supports INT8 as the computation precision or not.

The closest I can find is the cublasGemmEx function that supports INT8 data as inputs but does the computation with half float at minimum.

Is the documentation not up-to-date or am I missing something?



I think there may be some gaps in the documentation, but it appears that a CUDA_R_8I, CUDA_R_8I, CUDA_R_32I combination is supported for cublasGemmEx, as described like this in the documentation you linked:

“For CUDA_R_32I computation type the matrix types combinations supported by cublasGemmEx are listed below. This path is only supported with alpha, beta being either 1 or 0; A, B being 32-bit aligned; and lda, ldb being multiples of 4.”

and I would expect this to take advantage of the dp4a instruction which is at the heart of int8 acceleration available in cc 6.1 GPUs.

Indeed, it seems that this configuration is the so called INT8 GEMM.



Does cublasGemmEx() supports unsigned INT8 multiplications ?
For this combo, CUDA_R_8I, CUDA_R_8I, CUDA_R_32I , only signed INT8 values are supported…

How to execute unsigned INT8 ?