cublasGemmEX() INT-8 runtime error

Hi, All

I found the problem when I call cublasGemmEX()on RTX3090 with CUDA11.2. It always returns me CUBLAS_STATUS_NOT_SUPPORTED

Where A, B, C was defined as

    typedef int8_t input_t;
    typedef int output_t;

    int alpha = 1;
    int beta = 0;
    input_t* A;
    input_t* B;
    output_t* C;

    int size_A = m*k;
    int size_B = n*k;
    int size_C = m*n;
    cudaMalloc((void **)&A, size_A * sizeof(input_t));
    cudaMalloc((void **)&B, size_B * sizeof(input_t)); 
    cudaMalloc((void **)&C, size_C * sizeof(output_t));

And call with

                status = cublasGemmEx(cublasHandle, 
                        m, n, k,
                        A, CUDA_R_8I, k,
                        B, CUDA_R_8I, k,
                        C, CUDA_R_32I, m,

Can you try it for test purposes with CUDA 11.1?

According to my testing your code works on CUDA 11.1, but returns error 15 on CUDA 11.2. It may be a bug in CUBLAS.

Hi, Robert

After I change to the CUDA11.1, I still get the same error at the runtime. Here is the commend I use for compilation.

/usr/local/cuda-11.1/bin/nvcc src/ -std=c++11 -O3 -w -rdc=true -arch=sm_86 -lcuda -lcublas -o cublas_TC_FP16


on your machine what is the output of:

/usr/local/cuda-11.1/bin/nvcc --version


it says

nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

Hi, Robert

Problem solved,
I just forget to link to 11.1 /lib64 in my bash configure. And now it works fine.
The CUDA 11.2 does have the problem of INT8 on cublasGemmEX while 11.1 does not.


Our dev team has a confirmed a bug in CUBLAS for CUDA 11.2 for this.