Something wrong after cublasSmatinvBatched!!

I use cublasSmatinvBatched to inverse 3x3 matrix,and this func return CUBLAS_STATUS_SUCCESS.
But after that,Every cudaMemcpy return cudaIllegalAddress no matter device or host.
please Help me !!! have waste whole day

#define M 3
#define N 3
    cublasHandle_t handle;
    auto create_state = cublasCreate_v2(&handle);

 
    auto *A = new float[9];
    float *InvA = new float[9];
    for (int i = 0; i < M; i++) {
        for (int j = 0; j < N; j++) {
            A[i * N + j] = i + j;
        }
    }

    float *d_A, *d_InvA, *d_test;
    int *info;
    cudaMalloc((void **) &d_A, M * N * 4);
    cudaMalloc((void **) &d_InvA, M * N * 4);
    cudaMalloc((void **) &d_test, M * N * 4);
    cudaMalloc((void **) &info, 4);
 
    if (cudaMemcpy(d_A, A, M * N * sizeof(float), cudaMemcpyKind::cudaMemcpyHostToDevice) != cudaSuccess) {
        throw std::runtime_error("err3");
    }
 
    auto res = cublasSmatinvBatched(handle, 3, &d_A, 3, &d_InvA, 3, info, 1);
 

    auto after_state = cudaMemcpy(d_test, InvA, 3 * 3 * 4, cudaMemcpyHostToDevice);

after_state return cudaIllegalAddress. What should I do?

you try to copy 9 elements from InvA, but it only has length 8. float *InvA = new float[8]; (same for A)

float *InvA = new float[9]
is still the same error

And I found that cublasSmatinvBatched will always return success, even I set A = nullptr

You will need to provide a complete minimal example that reproduces your problem if you need more help.

#include <stdexcept>
#include <iostream>
#include "cublas_v2.h"
#include "cuda_runtime.h"
int main(){
#define M 3
#define N 3
    cublasHandle_t handle;
    cublasCreate_v2(&handle);


    auto *A = new float[9];
    auto *InvA = new float[9];
    for (int i = 0; i < 9; i++) {

        A[i] = 0;
        if (i % 3 == 0) { A[i] = 1; }

    }

    float *d_A, *d_InvA,  *d_test;
    int *info;
    cudaMalloc((void **) &d_A, M * N * 4);
    cudaMalloc((void **) &d_InvA, M * N * 4);

    cudaMalloc((void **) &d_test, M * N * 4);

    cudaMalloc((void **) &info, 4);

    if (cudaMemcpy(d_A, A, M * N * sizeof(float), cudaMemcpyKind::cudaMemcpyHostToDevice) != cudaSuccess) {
        throw std::runtime_error("err3");
    }

    auto res = cublasSmatinvBatched(handle, 3, &d_A, 3, &d_InvA, 3, info, 1);



    auto after = cudaMemcpy(d_test, InvA, 3 * 3*4 , cudaMemcpyHostToDevice);

    auto res1 = cudaMemcpy(InvA, d_InvA, M * N * 4, cudaMemcpyKind::cudaMemcpyDeviceToHost);


    std::cout << info << "😘" << res1 << std::endl;
}

here is whole minimal example ,after and res1return cudaErrorIllegalAddress

All pointer parameters of cublasSmatinvBatched must be device accessible. This is not the case in your code because &d_A and &d_InvA are host pointers.

You need to create another device array float** d_pointers_A where d_pointers_A[0] = d_A

that already make it device accessible

No, it is not.
d_A is a pointer variable on the host stack which points to an address in device memory.
&d_A will return the address of this stack variable, which is a host pointer, not a device pointer.

1 Like

have you test it? float** d_pointers_A will solve the problem?

Why don’t you test it? You can easily verify with compute-sanitizer and printing the cublas arguments that those host pointers cause the illegal memory access errors.

I suggest you give the advice given by striker159 a try. The statements about the nature of the pointers passed to that cublas function call are correct, and unless you pass a device pointer-to-pointer argument as suggested, you will not make any progress here.

This concept is true for a number of the cublas batched functions. Here is an example of the general methodology necessary for these types of arguments.

1 Like