cublasGemmEx execution error code CUBLAS_STATUS_ARCH_MISMATCH

x111 · January 7, 2020, 6:31am

My problem is similar to https://devtalk.nvidia.com/default/topic/1023896/cublasgemmex-doesn-t-work-with-int8-utilizing-__dp4a-instruction-on-nvidia-1080ti-

env:
CUDA version: 8.0.
Device : Tesla T4, compute SM 7.5. compute 7

compilation command：
nvcc -std=c++11 -arch=sm_61 gemmex_test.cu -L/usr/local/cuda-8.0/lib64/ -lcublas

I am getting below error:
CUBLAS_STATUS_ARCH_MISMATCH

What’s wrong with my cublasGemmEx use on Tesla T4？

#include <iostream>

#include <cublas_v2.h>
#include <thrust/device_vector.h>

const char* cublasGetErrorString(cublasStatus_t status) {
  switch(status) {
    case CUBLAS_STATUS_SUCCESS: return "CUBLAS_STATUS_SUCCESS";
    case CUBLAS_STATUS_NOT_INITIALIZED: return "CUBLAS_STATUS_NOT_INITIALIZED";
    case CUBLAS_STATUS_ALLOC_FAILED: return "CUBLAS_STATUS_ALLOC_FAILED";
    case CUBLAS_STATUS_INVALID_VALUE: return "CUBLAS_STATUS_INVALID_VALUE";
    case CUBLAS_STATUS_ARCH_MISMATCH: return "CUBLAS_STATUS_ARCH_MISMATCH";
    case CUBLAS_STATUS_MAPPING_ERROR: return "CUBLAS_STATUS_MAPPING_ERROR";
    case CUBLAS_STATUS_EXECUTION_FAILED: return "CUBLAS_STATUS_EXECUTION_FAILED";
    case CUBLAS_STATUS_INTERNAL_ERROR: return "CUBLAS_STATUS_INTERNAL_ERROR";
  }
  return "unknown error";
}

int main(void) {
  // matrix A
  int rowA = 40;
  int colA = 40;
  // matrix B
  int rowB = colA;
  int colB = 40;
  // matrix C
  int rowC = rowA;
  int colC = colB;

  thrust::device_vector<unsigned char> A(rowA * colA);
  thrust::device_vector<unsigned char> B(rowB * colB);
  thrust::device_vector<unsigned int> C(rowC * colC);

  for (size_t i = 0; i < rowA; i++){
    for (size_t j = 0; j < colA; j++){
      A[i * rowA + j] = i + j;
    }
  }

  for (size_t i = 0; i < rowB; i++){
    for (size_t j = 0; j < colB; j++){
      B[i * rowA + j] = i + j;
    }
  }

  for (size_t i = 0; i < rowC; i++) {
    for (size_t j = 0; j < colC; j++) {
      C[i * rowA + j] = i + j;
      if (i == 0) {
        std::cout << " " << C[i * rowA + j];
      }
    }
  }
  std::cout << std::endl;

  cublasHandle_t handle;
  cublasStatus_t status = cublasCreate(&handle);
  if (status != CUBLAS_STATUS_SUCCESS) {
    std::cerr << "cublasCreate failed. error is: " << cublasGetErrorString(status) << std::endl;;
  }

  int alpha = 1;
  int beta = 0;
  // A * B + C
  status = cublasGemmEx(handle, CUBLAS_OP_N, CUBLAS_OP_N,
      rowA, colB, colA,
      &alpha, thrust::raw_pointer_cast(&A[0]), CUDA_R_8I, rowA,
      thrust::raw_pointer_cast(&B[0]), CUDA_R_8I, colB,
      &beta, thrust::raw_pointer_cast(&C[0]), CUDA_R_32I, colB, CUDA_R_32I, CUBLAS_GEMM_ALGO0);
  if (status != CUBLAS_STATUS_SUCCESS) {
    std::cerr << "cublasGemmEx execution error is: " << cublasGetErrorString(status) << std::endl;
  }

  std::cout << "output print: " << std::endl;
  for (size_t i = 0; i < rowC; i++) {
    for (size_t j = 0; j < colC; j++) {
      C[i * rowA + j] = i + j;
      if (i == 0) {
        std::cout << " " << C[i * rowA + j];
      }
    }
  }
  std::cout << std::endl;

  status = cublasDestroy(handle);
  if (status != CUBLAS_STATUS_SUCCESS) {
    std::cerr << "shutdown error code is: " << cublasGetErrorString(status) << std::endl;
  }

  return 0;
}

x111 · January 7, 2020, 7:23am

I upgraded the cuda version to cuda10.0 solve the problem. https://docs.nvidia.com/deeplearning/sdk/cudnn-support-matrix/index.html

Topic		Replies	Views
cublasGemmEx cant use CUDA_R_8I compute type on GTX1080 GPU-Accelerated Libraries	4	1366	February 12, 2018
cublasGemmEX() INT-8 runtime error GPU-Accelerated Libraries cuda	7	1980	October 12, 2021
cublasGemmEx doesn't work with INT8 utilizing __dp4a instruction on NVIDIA 1080TI CUDA Programming and Performance	12	3639	September 25, 2017
Cublas_status_execution_failed GPU-Accelerated Libraries	2	10674	February 23, 2021
CublasLtIgemm example error in the CUDA toolkit documentation GPU-Accelerated Libraries	2	1025	December 12, 2019
simplecublas kernel execution error Deep Learning (Training & Inference)	0	686	May 17, 2019
[BUG][DEVICE LAUNCHED GRAPHS] CudaErrorInvalidValue with cuBLAS combined with the cudaGraphInstantiateFlagsDeviceLaunch starting with CUDA 12.1 CUDA Programming and Performance nvbugs	3	766	February 19, 2025
How can I perform GEMM with INT8 in cuBLAS CUDA Programming and Performance	3	2113	February 24, 2017
How can I perform GEMM with INT8 in cuBLAS with DRIVE PX2 General	6	2178	May 18, 2017
[ubuntu1404][GTX-1080] Cublas handle: not initialized in driver version 384.111 Linux	6	5131	October 14, 2021

cublasGemmEx execution error code CUBLAS_STATUS_ARCH_MISMATCH

Related topics