Cusolver solve sparse Ax=b wrong

I use cusolver cusolverSpDcsrlsvqr() function solve Ax=b.
Some matrix will return wrong answer, singularity is 252, cusolverState of this function is 0.
But in MATLAB, same matrix can solve.
/*
Update1: I can only find minimal matrix size is 138*138.
Working computer can’t connect to network, So I can only give sparse form matrix.
This Bx=f can solve in MATLAB. It’s rank is 138.
x should be :{12,-66,0,66,0,0,66,-66 …} And singularity should be -1.
but cusolver give wrong answer.
*/
Here is my code. I first let matrix to be sparse CSR form, then solve it with cusolverSpDcsrlsvqr() function.

#include <iostream>
#include <cusparseSp.h>
#include <vector>
#include <cuda_runtime.h>

using namespace std;
int main(int argc, char const *argv[]) {
    int num_rows = 138;
    int nnz = 307;
    vector<int> h_csr_offsets{
        0,   2,   4,   7,   8,   11,  13,  15,  17,  20,  23,  26,  29,  31,
        34,  37,  40,  42,  45,  48,  51,  53,  56,  59,  62,  65,  67,  70,
        73,  76,  78,  81,  84,  86,  89,  92,  95,  98,  100, 103, 106, 109,
        111, 114, 116, 119, 122, 125, 127, 129, 131, 133, 135, 137, 139, 141,
        143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169,
        171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197,
        199, 201, 203, 205, 207, 209, 211, 213, 215, 217, 219, 221, 223, 225,
        227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253,
        255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281,
        283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307};
    vector<int> h_csr_columns{
        0,   2,   1,     3,   5,   6,   7,   8,   10,  11,  12,  11,  13,  14,
        16,  15,  17,    19,  20,  21,  23,  24,  25,  27,  28,  29,  30,  31,
        32,  31,  33,    35,  36,  37,  39,  40,  41,  42,  43,  44,  43,  45,
        47,  48,  49,    51,  52,  53,  54,  55,  56,  55,  57,  59,  60,  61,
        63,  64,  65,    67,  68,  69,  70,  71,  72,  71,  73,  75,  76,  77,
        79,  80,  81,    82,  83,  84,  83,  85,  87,  88,  89,  90,  91,  92,
        91,  93,  95,    96,  97,  99,  100, 101, 103, 104, 105, 106, 107, 108,
        107, 109, 111,   112, 113, 115, 116, 117, 118, 119, 120, 119, 121, 122,
        123, 124, 123,   125, 128, 129, 127, 131, 132, 133, 134, 135, 136, 135,
        137, 26,  86,    29,  88,  22,  102, 25,  104, 74,  98,  75,  101, 62,
        74,  63,  77,    78,  102, 80,  105, 130, 114, 133, 115, 110, 126, 129,
        112, 94,  114,   97,  117, 86,  110, 89,  113, 130, 134, 131, 135, 34,
        46,  35,  49,    110, 120, 111, 121, 114, 118, 116, 119, 54,  58,  55,
        60,  26,  38,    28,  41,  66,  78,  68,  81,  50,  56,  51,  57,  66,
        72,  67,  73,    78,  84,  79,  85,  62,  70,  64,  71,  38,  44,  39,
        45,  42,  46,    43,  48,  26,  32,  27,  33,  14,  22,  15,  24,  30,
        34,  31,  36,    50,  66,  52,  69,  12,  22,  13,  23,  10,  18,  11,
        20,  0,   18,    1,   19,  86,  92,  87,  93,  102, 108, 103, 109, 98,
        106, 100, 107,   58,  62,  59,  65,  90,  94,  91,  96,  38,  50,  40,
        53,  18,  98,    21,  99,  46,  58,  47,  61,  74,  82,  76,  83,  34,
        94,  37,  95,    2,   4,   3,   7,   4,   8,   5,   9,   4,   16,  6,
        17,  136, 126,   128, 137, 130, 122, 132, 123,  124, 126, 125, 127};
    vector<double> h_csr_value{
        1,  -1, 1,  1,  1,  1,  1,  1,  1,  -2, -1, 1,  1,  1,  -1, 1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  -2, -1, 1,  1,  1,  1,  1,  1,  1,
        1,  1,  -2, -1, 1,  1,  1,  1,  1,  1,  1,  1,  1,  -2, -1, 1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  -2, -1, 1,  1,  1,  1,  1,  1,  1,
        1,  1,  -2, -1, 1,  1,  1,  1,  1,  1,  -2, -1, 1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  -2, -1, 1,  1,  1,  1,  1,  1,  1,  1,  1,  -2,
        -1, 1,  1,  1,  -2, -1, 1,  1,  1,  1,  1,  1,  1,  1,  1,  -2, -1, 1,
        1,  -1, 1,  1,  1,  -1, 1,  1,  1,  -1, 1,  1,  1,  1,  -1, 1,  1,  -1,
        1,  1,  1,  1,  -1, 1,  1,  1,  -1, 1,  1,  1,  -1, 1,  1,  1,  -1, 1,
        1,  -1, 1,  1,  1,  -1, 1,  1,  1,  -1, 1,  1,  1,  1,  -1, 1,  1,  -1,
        1,  1,  1,  -1, 1,  1,  1,  1,  -1, 1,  1,  -1, 1,  1,  1,  -1, 1,  1,
        1,  -1, 1,  1,  1,  1,  -1, 1,  1,  -1, 1,  1,  1,  -1, 1,  1,  1,  -1,
        1,  1,  1,  -1, 1,  1,  1,  -1, 1,  1,  1,  -1, 1,  1,  1,  1,  -1, 1,
        1,  -1, 1,  1,  1,  1,  -1, 1,  1,  -1, 1,  1,  1,  -1, 1,  1,  1,  1,
        -1, 1,  1,  -1, 1,  1,  1,  -1, 1,  1,  1,  -1, 1,  1,  1,  -1, 1,  1,
        1,  -1, 1,  1,  1,  1,  -1, 1,  1,  1,  -1, 1,  1,  -1, 1,  1,  1,  -1,
        1,  1,  1,  1,  -1, 1,  1,  -1, 1,  1,  1,  1,  -1, 1,  1,  1,  -1, 1,
        1};
    vector<double> f(num_rows, 0);
    vector<double> x(num_rows, 0);
    f[0] = 12;
    int *d_csr_offsets, *d_csr_columns;
    double *d_csr_value;
    cudaMalloc(&d_csr_offsets, sizeof(int) * (num_rows + 1));
    cudaMalloc(&d_csr_columns, sizeof(int) * nnz);
    cudaMalloc(&d_csr_value, sizeof(double) * nnz);
    cudaMemcpy(d_csr_offsets, h_csr_offsets.data(),
               (num_rows + 1) * sizeof(int), cudaMemcpyHostToDevice); 
    cudaMemcpy(d_csr_columns, h_csr_columns.data(),
               nnz * sizeof(int), cudaMemcpyHostToDevice);   
    cudaMemcpy(d_csr_value, h_csr_value.data(),
               nnz * sizeof(double), cudaMemcpyHostToDevice);
    double *d_f;
    double *d_x;
    cudaMalloc(&d_f, sizeof(double) * num_rows);
    cudaMalloc(&d_x, sizeof(double) * num_rows);
    cudaMemcpy(d_f, f.data(), sizeof(double) * num_rows, cudaMemcpyHostToDevice);
    cudaMemcpy(d_x, x.data(), sizeof(double) * num_rows, cudaMemcpyHostToDevice);
    // 2. solve Ax=b.
    int singularity;
    cusolverSpHandle_t spHandle;
    cusparseMatDescr_t descrA = nullptr;
    cusparseCreateMatDescr(&descrA);
    cusparseSetMatType(descrA, CUSPARSE_MATRIX_TYPE_GENERAL);
    cusparseSetIndexBase(descrA, CUSPARSE_INDEX_BASE_ZERO);
    cusolverSpCreate(&spHandle);
    cusolverStatus_t state;
    state = cusolverSpDcsrlsvqr(spHandle, row, nnz, descrA, d_csr_value,
                                d_csr_offsets, d_csr_columns, d_b, 1e-15, 0,
                                d_x, &singularity);
    cout << "Solve state: " << state << endl;
    cout << "Singular? " << singularity << endl;
   cudaMemcpy(x.data(),d_x,num_rows*sizeof(double),cudaMemcpyDeviceToHost);
    for(auto e:x)
      cout << e << '\t';
    cout <<endl;
    return 0;
}

matrix is 370*370 get wrong answer, but MATLAB works fine.
state always return 0.

I would recommend making your code a complete example with main function providing data (with as small matrix as possible) including your expected result.
Instead of comparing to Matlab you could also compare to the dense calculation of the same matrix.
In this way, you can include it in your example code.

Hi, the code looks correct. Could you please share the matrix so we can reproduce the issue on our side?
In anyway, since the matrix is square, I would recommend trying cuda Direct Sparse Solver (cuDSS):
cuDSS | NVIDIA Developer
CUDALibrarySamples/cuDSS/simple/simple.cpp at master · NVIDIA/CUDALibrarySamples · GitHub
In the example you should set:
cudssMatrixType_t mtype = CUDSS_MTYPE_GENERAL;
In addition, if the solution is not right you should try to set:
cudssAlgType_t reorder_alg = CUDSS_ALG_1;
cudssConfigSet(config, CUDSS_REORDERING_ALG, &reorder_alg, sizeof(cudssAlgType_t));
See cuDSS Data Types — NVIDIA cuDSS documentation for more details about non default reordering usage.

Thanks for you reply, I post my matrix here.
Dense format matrix solve Ax=b by cusolver still wrong in this matrix.
I use cuDSS shows “cudss64_0.dll” error LNK1107: invalid or corrupt file.
cuDSS is v0.3 in windows 10.
It locate at ‘bin’ directory.

Could you please double check h_csr_columns. It has more than 307 elements and also has “123, , 128,” and I at the end

Thanks for reply.
Sorry about that, I updated data now and checked it, data is right now.
My working computer can’t connect network, so I input it by hand.

I set reorder to 1, this matrix solved right.
Is there some documentation about this reorder?

With the new matrix I also see cuDSS can solve it with reorder algorithm 1 and 2. Reorder 2 is faster though.
Reorder 1 is blocked triangular reordering (BTF + COLAMD). Reorder 2 is just COLAMD. But the main benefit of algorithms 1 and 2 is global pivoting during factorization. So if your matrix is ill condition then that is the right way to solve it.
More info about config parameters: cuDSS Data Types — NVIDIA cuDSS documentation
Note also that for both these algorithms REFACTORIZATION step can bring significant speed-up vs FACTORIZATION in case you need to run factorization many times and only matrix values are changing.

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.