Cusolver solve sparse Ax=b wrong

foxnotflower · August 15, 2024, 7:34am

I use cusolver cusolverSpDcsrlsvqr() function solve Ax=b.
Some matrix will return wrong answer, singularity is 252, cusolverState of this function is 0.
But in MATLAB, same matrix can solve.
/*
Update1: I can only find minimal matrix size is 138*138.
Working computer can’t connect to network, So I can only give sparse form matrix.
This Bx=f can solve in MATLAB. It’s rank is 138.
x should be :{12,-66,0,66,0,0,66,-66 …} And singularity should be -1.
but cusolver give wrong answer.
*/
Here is my code. I first let matrix to be sparse CSR form, then solve it with cusolverSpDcsrlsvqr() function.

#include <iostream>
#include <cusparseSp.h>
#include <vector>
#include <cuda_runtime.h>

using namespace std;
int main(int argc, char const *argv[]) {
    int num_rows = 138;
    int nnz = 307;
    vector<int> h_csr_offsets{
        0,   2,   4,   7,   8,   11,  13,  15,  17,  20,  23,  26,  29,  31,
        34,  37,  40,  42,  45,  48,  51,  53,  56,  59,  62,  65,  67,  70,
        73,  76,  78,  81,  84,  86,  89,  92,  95,  98,  100, 103, 106, 109,
        111, 114, 116, 119, 122, 125, 127, 129, 131, 133, 135, 137, 139, 141,
        143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169,
        171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197,
        199, 201, 203, 205, 207, 209, 211, 213, 215, 217, 219, 221, 223, 225,
        227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253,
        255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281,
        283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307};
    vector<int> h_csr_columns{
        0,   2,   1,     3,   5,   6,   7,   8,   10,  11,  12,  11,  13,  14,
        16,  15,  17,    19,  20,  21,  23,  24,  25,  27,  28,  29,  30,  31,
        32,  31,  33,    35,  36,  37,  39,  40,  41,  42,  43,  44,  43,  45,
        47,  48,  49,    51,  52,  53,  54,  55,  56,  55,  57,  59,  60,  61,
        63,  64,  65,    67,  68,  69,  70,  71,  72,  71,  73,  75,  76,  77,
        79,  80,  81,    82,  83,  84,  83,  85,  87,  88,  89,  90,  91,  92,
        91,  93,  95,    96,  97,  99,  100, 101, 103, 104, 105, 106, 107, 108,
        107, 109, 111,   112, 113, 115, 116, 117, 118, 119, 120, 119, 121, 122,
        123, 124, 123,   125, 128, 129, 127, 131, 132, 133, 134, 135, 136, 135,
        137, 26,  86,    29,  88,  22,  102, 25,  104, 74,  98,  75,  101, 62,
        74,  63,  77,    78,  102, 80,  105, 130, 114, 133, 115, 110, 126, 129,
        112, 94,  114,   97,  117, 86,  110, 89,  113, 130, 134, 131, 135, 34,
        46,  35,  49,    110, 120, 111, 121, 114, 118, 116, 119, 54,  58,  55,
        60,  26,  38,    28,  41,  66,  78,  68,  81,  50,  56,  51,  57,  66,
        72,  67,  73,    78,  84,  79,  85,  62,  70,  64,  71,  38,  44,  39,
        45,  42,  46,    43,  48,  26,  32,  27,  33,  14,  22,  15,  24,  30,
        34,  31,  36,    50,  66,  52,  69,  12,  22,  13,  23,  10,  18,  11,
        20,  0,   18,    1,   19,  86,  92,  87,  93,  102, 108, 103, 109, 98,
        106, 100, 107,   58,  62,  59,  65,  90,  94,  91,  96,  38,  50,  40,
        53,  18,  98,    21,  99,  46,  58,  47,  61,  74,  82,  76,  83,  34,
        94,  37,  95,    2,   4,   3,   7,   4,   8,   5,   9,   4,   16,  6,
        17,  136, 126,   128, 137, 130, 122, 132, 123,  124, 126, 125, 127};
    vector<double> h_csr_value{
        1,  -1, 1,  1,  1,  1,  1,  1,  1,  -2, -1, 1,  1,  1,  -1, 1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  -2, -1, 1,  1,  1,  1,  1,  1,  1,
        1,  1,  -2, -1, 1,  1,  1,  1,  1,  1,  1,  1,  1,  -2, -1, 1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  -2, -1, 1,  1,  1,  1,  1,  1,  1,
        1,  1,  -2, -1, 1,  1,  1,  1,  1,  1,  -2, -1, 1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  -2, -1, 1,  1,  1,  1,  1,  1,  1,  1,  1,  -2,
        -1, 1,  1,  1,  -2, -1, 1,  1,  1,  1,  1,  1,  1,  1,  1,  -2, -1, 1,
        1,  -1, 1,  1,  1,  -1, 1,  1,  1,  -1, 1,  1,  1,  1,  -1, 1,  1,  -1,
        1,  1,  1,  1,  -1, 1,  1,  1,  -1, 1,  1,  1,  -1, 1,  1,  1,  -1, 1,
        1,  -1, 1,  1,  1,  -1, 1,  1,  1,  -1, 1,  1,  1,  1,  -1, 1,  1,  -1,
        1,  1,  1,  -1, 1,  1,  1,  1,  -1, 1,  1,  -1, 1,  1,  1,  -1, 1,  1,
        1,  -1, 1,  1,  1,  1,  -1, 1,  1,  -1, 1,  1,  1,  -1, 1,  1,  1,  -1,
        1,  1,  1,  -1, 1,  1,  1,  -1, 1,  1,  1,  -1, 1,  1,  1,  1,  -1, 1,
        1,  -1, 1,  1,  1,  1,  -1, 1,  1,  -1, 1,  1,  1,  -1, 1,  1,  1,  1,
        -1, 1,  1,  -1, 1,  1,  1,  -1, 1,  1,  1,  -1, 1,  1,  1,  -1, 1,  1,
        1,  -1, 1,  1,  1,  1,  -1, 1,  1,  1,  -1, 1,  1,  -1, 1,  1,  1,  -1,
        1,  1,  1,  1,  -1, 1,  1,  -1, 1,  1,  1,  1,  -1, 1,  1,  1,  -1, 1,
        1};
    vector<double> f(num_rows, 0);
    vector<double> x(num_rows, 0);
    f[0] = 12;
    int *d_csr_offsets, *d_csr_columns;
    double *d_csr_value;
    cudaMalloc(&d_csr_offsets, sizeof(int) * (num_rows + 1));
    cudaMalloc(&d_csr_columns, sizeof(int) * nnz);
    cudaMalloc(&d_csr_value, sizeof(double) * nnz);
    cudaMemcpy(d_csr_offsets, h_csr_offsets.data(),
               (num_rows + 1) * sizeof(int), cudaMemcpyHostToDevice); 
    cudaMemcpy(d_csr_columns, h_csr_columns.data(),
               nnz * sizeof(int), cudaMemcpyHostToDevice);   
    cudaMemcpy(d_csr_value, h_csr_value.data(),
               nnz * sizeof(double), cudaMemcpyHostToDevice);
    double *d_f;
    double *d_x;
    cudaMalloc(&d_f, sizeof(double) * num_rows);
    cudaMalloc(&d_x, sizeof(double) * num_rows);
    cudaMemcpy(d_f, f.data(), sizeof(double) * num_rows, cudaMemcpyHostToDevice);
    cudaMemcpy(d_x, x.data(), sizeof(double) * num_rows, cudaMemcpyHostToDevice);
    // 2. solve Ax=b.
    int singularity;
    cusolverSpHandle_t spHandle;
    cusparseMatDescr_t descrA = nullptr;
    cusparseCreateMatDescr(&descrA);
    cusparseSetMatType(descrA, CUSPARSE_MATRIX_TYPE_GENERAL);
    cusparseSetIndexBase(descrA, CUSPARSE_INDEX_BASE_ZERO);
    cusolverSpCreate(&spHandle);
    cusolverStatus_t state;
    state = cusolverSpDcsrlsvqr(spHandle, row, nnz, descrA, d_csr_value,
                                d_csr_offsets, d_csr_columns, d_b, 1e-15, 0,
                                d_x, &singularity);
    cout << "Solve state: " << state << endl;
    cout << "Singular? " << singularity << endl;
   cudaMemcpy(x.data(),d_x,num_rows*sizeof(double),cudaMemcpyDeviceToHost);
    for(auto e:x)
      cout << e << '\t';
    cout <<endl;
    return 0;
}

matrix is 370*370 get wrong answer, but MATLAB works fine.
state always return 0.

Curefab · August 15, 2024, 9:37am

I would recommend making your code a complete example with main function providing data (with as small matrix as possible) including your expected result.
Instead of comparing to Matlab you could also compare to the dense calculation of the same matrix.
In this way, you can include it in your example code.

antona · August 15, 2024, 4:11pm

Hi, the code looks correct. Could you please share the matrix so we can reproduce the issue on our side?
In anyway, since the matrix is square, I would recommend trying cuda Direct Sparse Solver (cuDSS):
cuDSS | NVIDIA Developer
CUDALibrarySamples/cuDSS/simple/simple.cpp at master · NVIDIA/CUDALibrarySamples · GitHub
In the example you should set:
cudssMatrixType_t mtype = CUDSS_MTYPE_GENERAL;
In addition, if the solution is not right you should try to set:
cudssAlgType_t reorder_alg = CUDSS_ALG_1;
cudssConfigSet(config, CUDSS_REORDERING_ALG, &reorder_alg, sizeof(cudssAlgType_t));
See cuDSS Data Types — NVIDIA cuDSS documentation for more details about non default reordering usage.

foxnotflower · August 16, 2024, 1:40pm

Thanks for you reply, I post my matrix here.
Dense format matrix solve Ax=b by cusolver still wrong in this matrix.
I use cuDSS shows “cudss64_0.dll” error LNK1107: invalid or corrupt file.
cuDSS is v0.3 in windows 10.
It locate at ‘bin’ directory.

antona · August 16, 2024, 3:32pm

Could you please double check h_csr_columns. It has more than 307 elements and also has “123, , 128,” and I at the end

foxnotflower · August 17, 2024, 2:12am

Thanks for reply.
Sorry about that, I updated data now and checked it, data is right now.
My working computer can’t connect network, so I input it by hand.

foxnotflower · August 17, 2024, 3:18am

I set reorder to 1, this matrix solved right.
Is there some documentation about this reorder?

antona · August 19, 2024, 3:47pm

With the new matrix I also see cuDSS can solve it with reorder algorithm 1 and 2. Reorder 2 is faster though.
Reorder 1 is blocked triangular reordering (BTF + COLAMD). Reorder 2 is just COLAMD. But the main benefit of algorithms 1 and 2 is global pivoting during factorization. So if your matrix is ill condition then that is the right way to solve it.
More info about config parameters: cuDSS Data Types — NVIDIA cuDSS documentation
Note also that for both these algorithms REFACTORIZATION step can bring significant speed-up vs FACTORIZATION in case you need to run factorization many times and only matrix values are changing.

system · September 2, 2024, 3:48pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
cuSPARSE BSR Matrix Solver GPU-Accelerated Libraries cuda , cusolver , cusparse	2	56	October 23, 2024
example of sparse LL^t or QR functionality GPU-Accelerated Libraries	2	1263	April 3, 2015
cuSolverSp_LinearSolver produces different results with different order methods CUDA Programming and Performance	6	1441	September 21, 2017
Problem with cusparseDcsrsv_solve() function GPU-Accelerated Libraries	2	1219	July 30, 2015
cuSOLVER solving a overdetermined linear equation system GPU-Accelerated Libraries cusolver	8	1261	September 19, 2021
Sparse cusolver inside loop .................. factorization at every call? GPU-Accelerated Libraries cusparse	8	1411	March 9, 2024
cuSOLVER QR decomposition solving with non-square matrices CUDA Programming and Performance	10	5596	September 6, 2017
Proper use of cusparseScsrmv - x and y sizes? GPU-Accelerated Libraries	10	2834	June 3, 2014
Example using cusparse and cusolverSpDcsrlsvchol GPU-Accelerated Libraries cusolver , cusparse	10	70	May 1, 2025
cusparseScsrilu02 breaks with large matrices GPU-Accelerated Libraries cublas , cusolver , cusparse	10	1154	June 1, 2022

Cusolver solve sparse Ax=b wrong

Related topics