Hi, All
I found the problem when I call cublasGemmEX()
on RTX3090 with CUDA11.2. It always returns me CUBLAS_STATUS_NOT_SUPPORTED
Where A, B, C was defined as
typedef int8_t input_t;
typedef int output_t;
int alpha = 1;
int beta = 0;
input_t* A;
input_t* B;
output_t* C;
int size_A = m*k;
int size_B = n*k;
int size_C = m*n;
cudaMalloc((void **)&A, size_A * sizeof(input_t));
cudaMalloc((void **)&B, size_B * sizeof(input_t));
cudaMalloc((void **)&C, size_C * sizeof(output_t));
And call with
status = cublasGemmEx(cublasHandle,
CUBLAS_OP_T,
CUBLAS_OP_N,
m, n, k,
&alpha,
A, CUDA_R_8I, k,
B, CUDA_R_8I, k,
&beta,
C, CUDA_R_32I, m,
CUBLAS_COMPUTE_32I,
CUBLAS_GEMM_DFALT_TENSOR_OP);
Can you try it for test purposes with CUDA 11.1?
According to my testing your code works on CUDA 11.1, but returns error 15 on CUDA 11.2. It may be a bug in CUBLAS.
Hi, Robert
After I change to the CUDA11.1, I still get the same error at the runtime. Here is the commend I use for compilation.
/usr/local/cuda-11.1/bin/nvcc src/cublas_TC_FP16.cu -std=c++11 -O3 -w -rdc=true -arch=sm_86 -lcuda -lcublas -o cublas_TC_FP16
Thanks!
on your machine what is the output of:
/usr/local/cuda-11.1/bin/nvcc --version
?
Hi, Robert
Problem solved,
I just forget to link to 11.1 /lib64 in my bash configure. And now it works fine.
The CUDA 11.2 does have the problem of INT8 on cublasGemmEX while 11.1 does not.
Thanks!
Our dev team has a confirmed a bug in CUBLAS for CUDA 11.2 for this.
2 Likes