Hi, As per documentation from this link cuBLAS :: CUDA Toolkit Documentation , cublasGemmEx() is not working for INT8 matrix multiplications. It says: “For CUDA_R_32I computation type the matrix types combinations supported by cublasGemmEx are listed below. This path is only supported with alpha,…

cublasGemmEx doesn't work with INT8 utilizing __dp4a instruction on NVIDIA 1080TI

BulatZiganshin September 15, 2017, 3:33pm 8

large matrixes can be multiplied in O(n^2.7) time

Topic		Replies	Views
cuBLAS convolution does not use Tensor Cores GPU-Accelerated Libraries cublas	6	2191	June 8, 2021
cuBLAS GEMM INT8 is much slower than FP16 in T4 GPU-Accelerated Libraries cublas	11	4266	November 2, 2023
Is it correct that my Pascal card is calling Maxwell_gemm kernels through cublas? And if so, why is cublas unusably slow for me? CUDA Programming and Performance	6	940	August 23, 2018
cuBLAS call from kernel in CUDA 10.0 GPU-Accelerated Libraries	9	4835	April 7, 2021
Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaMemsetAsync CUDA Programming and Performance	7	7520	January 11, 2020
Inaccurate results for int8 in cublasGemmEx GPU-Accelerated Libraries cublas	4	542	April 19, 2024
How does cublasGemmEx() call work with CUDA_R_16F inputs and CUDA_R_32F computeType CUDA Programming and Performance	3	1842	December 10, 2017
cublasGemmEx cant use CUDA_R_8I compute type on GTX1080 GPU-Accelerated Libraries	4	1366	February 12, 2018
CUDA error when running matrixMulCUBLAS sample - Ubuntu 16.04 CUDA Setup and Installation	19	13418	May 4, 2018
cublasSgemm() alway fail during compute intensify task CUDA Programming and Performance	14	4555	January 8, 2015