We are currently trying to integrate cuDSS into one of our projects, but we find that it sometimes returns (spectacularly) wrong results for some systems.
Here is an example that works with the cudss simple example provided in the CUDALibraryExamples repository:
int i = 0;
int n = 17;
int nnz = 32;
int rhs = 1;
// ... mallocs ...
int csr_indptr_raw[] = {
0, 1, 3, 5, 9, 10, 11, 14, 15, 17, 19, 20, 23, 25, 29, 30, 31, 32
};
for (i = 0; i < n+1; i++) {
csr_offsets_h[i] = csr_indptr_raw[i];
}
int csr_indices_raw[] = {
2, 2, 9, 0, 1, 0, 3, 13, 14, 5, 4, 6, 7, 11, 6, 9, 10, 1, 8, 8, 6, 11, 12, 11, 13, 0, 3, 12, 13, 3, 16, 15
};
for (i = 0; i < nnz; i++) {
csr_columns_h[i] = csr_indices_raw[i];
}
double csr_values_raw[] = {
-1000000000000.0, 1000000000000.0, -1000000000000.0, -1000000.0, 1000000.0, -4.26122904e-12, -10379765.7, 10379765.7, -1000000000000.0, 1000000000000.0, 1000000.0, -1000000000.0, 1000000000000.0, 1000000000.0, 1000000.0, 1000000000000.0, 1000000000000.0, -1000000.0, 1000000.0, 1000000.0, 1000000000.0, -1000000000.0, 1000000000000.0, 1000000.0, -1000000.0, 4.26122904e-12, 10379765.7, -1000000000000.0, -10379765.7, -1000000.0, 1000000000.0, 10000.0
};
for (i = 0; i < nnz; i++) {
csr_values_h[i] = csr_values_raw[i];
}
// ...
cudssMatrixType_t mtype = CUDSS_MTYPE_GENERAL;
cudssMatrixViewType_t mview = CUDSS_MVIEW_FULL;
cudssIndexBase_t base = CUDSS_BASE_ZERO;
Anyway, if you try to solve the system above, cuDSS gives an answer that is totally wrong:
x[0] = 251979373537734098944.0000
x[1] = -0.4000
x[2] = -0.0000
x[3] = -0.0000
x[4] = 0.0000
x[5] = 0.0000
x[6] = -0.8001
x[7] = 576286421369295.5000
x[8] = -0.4000
x[9] = -0.0000
x[10] = 0.0000
x[11] = -576286421369238528.0000
x[12] = -91095040000000000000.0000
x[13] = -576286421369238528.0000
x[14] = -5981718029904.1699
x[15] = 0.0000
x[16] = 0.0000
Solving the same system using a dense solver (from numpy) gives
[-4.00000000e-01 -4.00000000e-01 -0.00000000e+00 -1.33671954e-16
0.00000000e+00 0.00000000e+00 -8.00000000e-01 -8.21850639e-06
-4.00000000e-01 -0.00000000e+00 0.00000000e+00 -7.91781494e-01
8.21850639e-06 -7.91781494e-01 -8.21850639e-06 0.00000000e+00
0.00000000e+00]
Using umfpack (I hope) via scipy with scikit-umfpack and suitesparse installed:
[-4.00000000e-01 -4.00000000e-01 -0.00000000e+00 -0.00000000e+00
0.00000000e+00 0.00000000e+00 -8.00000000e-01 -8.21850639e-06
-4.00000000e-01 -0.00000000e+00 0.00000000e+00 -7.91781494e-01
8.21850639e-06 -7.91781494e-01 -8.21850639e-06 0.00000000e+00
0.00000000e+00]
Using cusparse via jax.experimental.sparse.linalg.spsolve:
[-4.00000000e-01 -4.00000000e-01 1.74622983e-23 -1.03690234e-13
-0.00000000e+00 0.00000000e+00 -8.00000000e-01 -8.21850639e-06
-4.00000000e-01 5.82076609e-24 5.82076609e-24 -7.91781494e-01
8.21850639e-06 -7.91781494e-01 -8.21850639e-06 -0.00000000e+00
0.00000000e+00]
I can repeat these results on two separate machines with an RTX 4090 and an RTX 4070 Super respectively. I am using cuDSS 0.6

