cublasDgelsBatched help

neji49 · November 20, 2014, 8:34am

I am using CUDA 6.5 with cublas and I have been trying to get the cublasDgelsBatched function to work. However, when I call it, all I get is a segmentation fault. cuda-gdb and cuda-memcheck are not helping me resolve the issue. I have a feeling that the issue has something to do with the array of pointers, but I can’t seem to figure it out. I have consulted other batched code posts here.

https://devtalk.nvidia.com/default/topic/767806/gpu-accelerated-libraries/matrix-inversion-with-cublassgetri/

https://devtalk.nvidia.com/default/topic/761729/gpu-accelerated-libraries/cublasdgetrfbatched-and-cublasdgetribatched/

http://stackoverflow.com/questions/22887167/cublas-incorrect-inversion-for-matrix-with-zero-pivot

But none of the code examples here seem to be successful. Could somebody provide me with example usage for this function? Here is the code I am attempting to use. ls_matrix->data and solutions->data are just of data type double*.

double* cuda_test_batched_ls(matrix* ls_matrix, matrix* solutions, int batch_size){
    double *A[] = {ls_matrix->data};
    double** A_d;
    gpu_error_check(cudaMalloc<double*>(&A_d, sizeof(A)));
    gpu_error_check(cudaMemcpy(A_d, A, sizeof(A), cudaMemcpyHostToDevice));
    
    double *C[] = {solutions->data};
    double** C_d;
    gpu_error_check(cudaMalloc<double*>(&C_d, sizeof(C)));
    gpu_error_check(cudaMemcpy(C_d, C, sizeof(C), cudaMemcpyHostToDevice));
    
    cublasStatus_t status;
    cublasHandle_t handle;
    int* cublas_error_info = 0;
    status = cublasCreate_v2(&handle);
    if (status != CUBLAS_STATUS_SUCCESS){
        puts(cublas_get_error_string(status));
    }

    status = cublasDgelsBatched(handle, CUBLAS_OP_N, ls_matrix->rows, ls_matrix->columns, 1, A_d, 3, C_d, 3, cublas_error_info, NULL, 1);

    if (status != CUBLAS_STATUS_SUCCESS){
        puts(cublas_get_error_string(status));
    }

    gpu_error_check(cudaMalloc<double*>(&C_d, sizeof(C)));
    gpu_error_check(cudaMemcpy(C, C_d, sizeof(C), cudaMemcpyDeviceToHost));
    return C[0];
}

Robert_Crovella · November 21, 2014, 10:25pm

If you look in the link you provided:

[url]https://devtalk.nvidia.com/default/topic/761729/gpu-accelerated-libraries/cublasdgetrfbatched-and-cublasdgetribatched/[/url]

at the code I posted there, it demonstrates how to send an array of matrices to the device, for cublas. There are at least 2 cudaMemcpy operations, one of which is called from a loop. Your code to transfer matrices looks nothing like that. You might start by studying those examples more thoroughly.

Topic		Replies	Views
cublas_cublasDgetrsBatched_problem GPU-Accelerated Libraries cublas , cusolver	5	1047	October 6, 2022
Segmentation fault using cublas<T>getrsBatched GPU-Accelerated Libraries cuda , cublas	3	1805	October 12, 2021
Internal Operation Error from cublasDgelsBatched GPU-Accelerated Libraries cublas	2	512	August 11, 2023
cublasDtrsmBatched giving error when to solve array linear systems GPU-Accelerated Libraries	0	515	October 25, 2017
cublasDgetrfBatched and cublasDgetriBatched GPU-Accelerated Libraries	1	4659	July 15, 2014
Exception Error cublasSgetrsBatched while cublasSgetrfBatched has no issues (cuda12.8) GPU-Accelerated Libraries cublas	0	53	September 24, 2025
Calling CUBLAS' cublasDgetrfBatched proper procedure Legacy PGI Compilers	3	4942	January 30, 2017
Cublas batched lu decomposition get segmentation fault GPU-Accelerated Libraries	3	1266	April 23, 2014
Using cublasGemmBatchedEx GPU-Accelerated Libraries cuda	2	819	December 23, 2022
Excuse me, I would like to ask the following questions about the use of the cublasZgemmBatched function GPU-Accelerated Libraries cublas	1	444	June 26, 2023

cublasDgelsBatched help

Related topics