CUBLAS initialization failed when running cuBLAS example

I kept getting the “CUBLAS initialization failed” error when trying to run the example from the cuBLAS website. It seems that the cublashandle failed to be initialized. I have followed the instruction for linking the cublas libraries and headers so I think that shouldn’t be the issue. The environment is VS2019 on WIndows10, gpu is RTX3090. The code is as below:

//Example 2. Application Using C and cuBLAS: 0-based indexing
//-----------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <cuda_runtime.h>
#include “cublas_v2.h”
#define M 6
#define N 5
#define IDX2C(i,j,ld) (((j)*(ld))+(i))

static inline void modify(cublasHandle_t handle, float* m, int ldm, int n, int p, int q, float alpha, float beta) {
cublasSscal(handle, n - q, &alpha, &m[IDX2C(p, q, ldm)], ldm);
cublasSscal(handle, ldm - p, &beta, &m[IDX2C(p, q, ldm)], 1);
}

int main(void) {
cudaError_t cudaStat;
cublasStatus_t stat;
cublasHandle_t handle;
int i, j;
float* devPtrA;
float* a = 0;
a = (float*)malloc(M * N * sizeof(*a));
if (!a) {
printf(“host memory allocation failed”);
return EXIT_FAILURE;
}

for (j = 0; j < N; j++) {
    for (i = 0; i < M; i++) {
        a[IDX2C(i, j, M)] = (float)(i * N + j + 1);
    }
}

cudaStat = cudaMalloc((void**)&devPtrA, M * N * sizeof(*a));
if (cudaStat != cudaSuccess) {
    printf("device memory allocation failed");
    return EXIT_FAILURE;
}

stat = cublasCreate(&handle);
if (stat != CUBLAS_STATUS_SUCCESS) {
    printf("CUBLAS initialization failed\n");
    return EXIT_FAILURE;
}
stat = cublasSetMatrix(M, N, sizeof(*a), a, M, devPtrA, M);
if (stat != CUBLAS_STATUS_SUCCESS) {
    printf("data download failed");
    cudaFree(devPtrA);
    cublasDestroy(handle);
    return EXIT_FAILURE;
}
modify(handle, devPtrA, M, N, 1, 2, 16.0f, 12.0f);
stat = cublasGetMatrix(M, N, sizeof(*a), devPtrA, M, a, M);
if (stat != CUBLAS_STATUS_SUCCESS) {
    printf("data upload failed");
    cudaFree(devPtrA);
    cublasDestroy(handle);
    return EXIT_FAILURE;
}
cudaFree(devPtrA);
cublasDestroy(handle);
for (j = 0; j < N; j++) {
    for (i = 0; i < M; i++) {
        printf("%7.0f", a[IDX2C(i, j, M)]);
    }
    printf("\n");
}
free(a);
return EXIT_SUCCESS;

}

And I reinstalled the driver, still getting the error. nvidia-smi results is as below:

Thu Mar 4 00:35:56 2021
±----------------------------------------------------------------------------+
| NVIDIA-SMI 461.72 Driver Version: 461.72 CUDA Version: 11.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3090 WDDM | 00000000:65:00.0 On | N/A |
| 0% 30C P8 31W / 390W | 935MiB / 24576MiB | 1% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1596 C+G Insufficient Permissions N/A |
| 0 N/A N/A 5408 C+G …ekyb3d8bbwe\HxOutlook.exe N/A |
| 0 N/A N/A 7472 C+G …ty\Common7\IDE\devenv.exe N/A |
| 0 N/A N/A 8948 C+G C:\Windows\explorer.exe N/A |
| 0 N/A N/A 9148 C+G Insufficient Permissions N/A |
| 0 N/A N/A 10024 C+G …artMenuExperienceHost.exe N/A |
| 0 N/A N/A 10672 C+G …5n1h2txyewy\SearchApp.exe N/A |
| 0 N/A N/A 11124 C+G …ekyb3d8bbwe\YourPhone.exe N/A |
| 0 N/A N/A 11564 C+G …nputApp\TextInputHost.exe N/A |
| 0 N/A N/A 12104 C+G …y\ShellExperienceHost.exe N/A |
| 0 N/A N/A 13568 C+G …les\NZXT CAM\NZXT CAM.exe N/A |
| 0 N/A N/A 13776 C+G …kyb3d8bbwe\HxAccounts.exe N/A |
| 0 N/A N/A 14476 C+G …iginProxy\OriginProxy.exe N/A |
| 0 N/A N/A 15072 C+G …ram Files\LGHUB\lghub.exe N/A |
| 0 N/A N/A 16588 C+G …me\Application\chrome.exe N/A |
| 0 N/A N/A 19212 C+G …icrosoft VS Code\Code.exe N/A |
| 0 N/A N/A 19404 C+G …in7x64\steamwebhelper.exe N/A |
| 0 N/A N/A 19480 C+G …wekyb3d8bbwe\Video.UI.exe N/A |
| 0 N/A N/A 20808 C+G …lPanel\SystemSettings.exe N/A |
| 0 N/A N/A 21288 C+G …b3d8bbwe\WinStore.App.exe N/A |
±----------------------------------------------------------------------------+

Can you successfully build and run any other of the sample apps that ship with CUDA? If CUBLAS fails to initialize, that very likely means that CUDA itself isn’t in working order.

What CUDA version is installed, and what is the output of nvcc --version?

Indeed other cublas sample routines all failed to run. It was CUDA 11.0, I solved the issue by deleting cuda11.0 and installing cuda11.2.1, and that now runs cublas without problems. Have to say that cuda11.0 has been kind of a nightmare, plagued with many bugs that normal users would run into frequently. Apart from the cublas another issue that bothers me a lot is the pow() function in math.h, which in cuda11.0 on visual studio has a bug that does not support int type, and that broke most of my code written in the past. Glad they fixed it in cuda11.2.1.

It seems to me that the issue about some flavors of pow() conflicting with Visual Studio’s header file should have been found in internal testing. That’s the kind of egg-on-your-face bug that shouldn’t happen in a mature software stack. I’d say: time to review and tighten internal test procedures.

On the other hand it’s not too surprising for such an issue to appear since CUDA’s tight integration with host tool chains and their associated header files, while very convenient for CUDA users, has historically been a great source of compatibility issues.

I usually stay away from .0 versions of software and therefore jumped in at CUDA 11.1, which didn’t help in this specific case (I use MSVC 2019 and the fix did not materialize until CUDA 11.2).

1 Like