cublasGemmEX() INT-8 runtime error

Daniel_Wong · December 29, 2020, 8:28pm

Hi, All

I found the problem when I call cublasGemmEX()on RTX3090 with CUDA11.2. It always returns me CUBLAS_STATUS_NOT_SUPPORTED

Where A, B, C was defined as

    typedef int8_t input_t;
    typedef int output_t;

    int alpha = 1;
    int beta = 0;
    
    input_t* A;
    input_t* B;
    output_t* C;

    int size_A = m*k;
    int size_B = n*k;
    int size_C = m*n;
    cudaMalloc((void **)&A, size_A * sizeof(input_t));
    cudaMalloc((void **)&B, size_B * sizeof(input_t)); 
    cudaMalloc((void **)&C, size_C * sizeof(output_t));

And call with

                status = cublasGemmEx(cublasHandle, 
                        CUBLAS_OP_T, 
                        CUBLAS_OP_N,
                        m, n, k,
                        &alpha,
                        A, CUDA_R_8I, k,
                        B, CUDA_R_8I, k,
                        &beta,
                        C, CUDA_R_32I, m,
                        CUBLAS_COMPUTE_32I, 
                        CUBLAS_GEMM_DFALT_TENSOR_OP);

Robert_Crovella · December 29, 2020, 10:38pm

Can you try it for test purposes with CUDA 11.1?

According to my testing your code works on CUDA 11.1, but returns error 15 on CUDA 11.2. It may be a bug in CUBLAS.

Daniel_Wong · December 29, 2020, 11:05pm

Hi, Robert

After I change to the CUDA11.1, I still get the same error at the runtime. Here is the commend I use for compilation.

/usr/local/cuda-11.1/bin/nvcc src/cublas_TC_FP16.cu -std=c++11 -O3 -w -rdc=true -arch=sm_86 -lcuda -lcublas -o cublas_TC_FP16

Thanks!

Robert_Crovella · December 29, 2020, 11:38pm

on your machine what is the output of:

/usr/local/cuda-11.1/bin/nvcc --version

?

Daniel_Wong · December 30, 2020, 12:37am

it says

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

Daniel_Wong · December 30, 2020, 12:51am

Hi, Robert

Problem solved,
I just forget to link to 11.1 /lib64 in my bash configure. And now it works fine.
The CUDA 11.2 does have the problem of INT8 on cublasGemmEX while 11.1 does not.

Thanks!

Robert_Crovella · December 30, 2020, 9:21pm

Our dev team has a confirmed a bug in CUBLAS for CUDA 11.2 for this.

Topic		Replies	Views
cublasGemmEx cant use CUDA_R_8I compute type on GTX1080 GPU-Accelerated Libraries	4	1366	February 12, 2018
cublasGemmEx execution error code CUBLAS_STATUS_ARCH_MISMATCH GPU-Accelerated Libraries	1	1468	January 7, 2020
How can I perform GEMM with INT8 in cuBLAS CUDA Programming and Performance	3	2114	February 24, 2017
How can I perform GEMM with INT8 in cuBLAS with DRIVE PX2 General	6	2180	May 18, 2017
Inaccurate results for int8 in cublasGemmEx GPU-Accelerated Libraries cublas	4	544	April 19, 2024
Cublas_status_execution_failed GPU-Accelerated Libraries	2	10678	February 23, 2021
CUBLAS initialization failed when running cuBLAS example CUDA Programming and Performance	4	3105	October 12, 2021
cublasGemmEx doesn't work with INT8 utilizing __dp4a instruction on NVIDIA 1080TI CUDA Programming and Performance	12	3642	September 25, 2017
cublasGemmEx failed: 15, ieee_inexact in Fortran and C++ GPU-Accelerated Libraries cuda	4	40	February 28, 2025
CUBLASinit fails on Ubuntu amd64 cannot init CUBLAS CUDA Programming and Performance	7	6080	December 12, 2008

cublasGemmEX() INT-8 runtime error

Related topics