Example using cusparse and cusolverSpDcsrlsvchol

Can anyone point me to an example using the sparse double cholesky solver on gpu using C++ and CUDA.
I can get dense solver to give correct answers but the sparse solver does not so I must be doing something wrong.
Cheers
Terry

Hi @terryhendicott, can you share more details of your problem? What are you trying to compute?

Also, you can check out cuDSS, which is a Direct Sparse Solver on GPU. It may be suitable solution to your problem.

I will check out cuDSS thsnk you for the hint.
Trying to solve a simple matrix first before launching into large sparse matrices
const int N = 3; // Matrix size
const int nnz = 5; // Number of non-zero elements
const int csrRowPtrA[N + 1] = { 0, 2, 4, 5 }; // Row pointers
const int csrColIndA[nnz] = { 0, 1, 0, 1, 2 }; // Column indices
const double csrValA[nnz] = { 4.0, 1.0, 1.0, 3.0, 1.0 }; // Non-zero values of A

// Right-hand side vector b
const double b[N] = { 1.0, 2.0, 3.0 };
double x[N];  // Solution vector

No matter what I try get wrong numeric solution?

HI terryhendicott, I am the developer of cuDSS, could you please share the whole example code?

Have resolved what the problem is and it was on my side.
kernel.zip (3.4 KB)
Have been trying for years to get a command line fast sparse solver that I can call from other software. Just need to add saving x to a binary file to be complete. This means an awful lot to me thank you.

Challenge.zip (729 Bytes)
Here is my data files for the small matrix check but I have got the correct answer with 46328x46328 matrix.
Thank you so much.

I run you kernel.cu for both matrices from message above and from your attachments and I see the correct solution.

1 Like

Thanks Antonia appreciate your work thank you greatly

Hi @antona I am having a similar issue using cudss to solve a sparse linear system of equations. I am new to cuda. I have been using Eigen libraries so far and wanted to compare cudss to Eigen linear solver for accuracy and speed. Attached is an example where I am using cudss and then solving the same equation using Eigen LU solver. Some of the numbers are matching. However, cudss gives nan for small numbers. Not sure why. The attached spread sheet shows the results comparison. I can’t figure out what I am missing. Any help is appreciated.
CudaRuntime1.zip (1.1 MB)

Hi, The matrix type is set to SPD while the matrix is not positive defined. You can check INFO by calling
cudssDataGet(handle, solverData, CUDSS_DATA_INFO, &cudss_error, sizeof(cudss_error), &sizeWritten);
after factorization.
if you set matrix type to CUDSS_MTYPE_SYMMETRIC, then the solution is correct. And if you check INERTIA by
cudssDataGet(handle, solverData, CUDSS_DATA_INERTIA, cudss_inertia, sizeof(cudss_inertia), &sizeWritten);
there will be (6314, 34) that means that 34 eigen values are negative.

Hi @antona Thank you very much for your quick response. setting the mtype as you recommended gives the exact results.

@antona I have a follow up question regarding cudssExecute. In my case of solving Ax=b, I don’t want to factorize A at every time step calculation. I would like to keep the same factors and solve x for different b values. So is it possible to extract “matrix” after the “CUDSS_PHASE_FACTORIZATION” and reuse it for different b values during “CUDSS_PHASE_SOLVE” or is there another function that can be used only to decompose the sparse matrix?

@pkuma, it is possible to reuse factorization. Here is the example

cudssExecute(A, FACTORIZATION);
//Solve with the first RHS
cudssExecute(A, b1, x1, SOLVE);
//Solve with the second RHS
cudssExecute(A, b2, x2, SOLVE);
//or just update the pointers to values fo b1, x1 and solve again
cudssMatrixSetValues(b1, new_values_ptr);
cudssMatrixSetValues(x1, new_values_ptr);
cudssExecute(A, b1, x1, SOLVE);

In the same way you can reuse FACTORIZATION step. First factorize the matrix and then update values pointer and factorize again without calling Analysis step. Basically you must call Analysis only if matrix non-zero struction has changed

@antona I believe I am missing something. My cudss function to solve the system of equations is called from another function. I tried your example but it didn’t provide the correct results for the reuse. I am attaching my function here. I would really appreciate if you could take a look and see what I am doing wrong. Basically if reEval (!= 1) then I would like to skip as much as possible and only solve for x since only the b vector is updated at every call.
EigentoCuda.zip (1.8 KB)

You need to skip both Analysis and Factorization (that is Analysis must be also under reEval == 1)
For new b and x you need to call only Solve step.

Note that you also need to use same ‘cudssData_t solverData’ and ‘cudssHandle_t handle’ through the whole process. In your example it is created for each new EigenSparseToCuSparse calls. That is solverData and handle must be a global data. Basically, it stores the internal data that will be reused.

In my example above I removed it from Execute routines for simplicity, but they all use same solverData and handle

I am not sure how to create globals for solverData and handle. Is it possible to create it within the function or should it be passed in?

cudssData_t solverData and cudssHandle_t handle are just pointers for internal cudss data. You can create it just once and then pass it all over your code. So here is simple example:

//Here you don’t need to create new hande, etc. Just create b and x with new pointers,
your_function_to_solve(cudssHandle_t handle, cudssData_t solverData, cudssConfig_t**
solverConfig, cudssMatrix_t A, double *rhs_ptr, double *sol_ptr)
{
//Create new rhs and sol
cudssMatrixCreateDn(b, rhs_ptr);
cudssMatrixCreateDn(x, sol_ptr);

//Run Solve step
cudssExecute(handle, solverData, solverConfig, A, b, x, SOLVE);
}

//Main or your original code where you calculate matrix, rhs, sol, etc.
int main () {

//Create global handle, solverData and solverConfig
cudssCreate(&handle);
cudssConfigCreate(&solverConfig);
cudssDataCreate(handle, &solverData);

//Create wrappers for matrix
cudssMatrixCreateCsr(&A));
//Create wrappers for matrix. You can set NULL for pointers until Solve step
cudssMatrixCreateDn(&b, nrows, NULL);
cudssMatrixCreateDn(&x, nrows, NULL);

//Run Analysis and Factorization
cudssExecute(A, ANALYSIS);
cudssExecute(A, FACTORIZATION);

//Generate your first rhs1
for (i=0; i<nrows;i++) rhs1[i] = 1;

//Call your function for solving
your_function_to_solve(handle, solverData, solverConfig, A, rhs1, sol1);

//Generate your second rhs2
for (i=0; i<nrows;i++) rhs2[i] = 2;

//Call your function for solving with new rhs
your_function_to_solve(handle, solverData, solverConfig, A, rhs2, sol2);

}

@antona I am still struggling to get this to work. I have followed your direction and implemented the creation of handles. In my code I have 3 functions namely,
main, Cuda1, Cuda2. All my calculations are done in Cuda2 and I have implemented it as per your instructions. When I call Cuda2 multiple times from Cuda1 everything works fine. However, if I return the control back to main and call the functions again, the handle addresses are lost. I am not sure why since I haven’t destroyed the handles. I am attaching my code here which should compile and run. I don’t want to create the handles in the main function since my code is supposed to interface with another function I don’t have access to.
CudaRuntime1.zip (1019.3 KB)

Looks like I have to create the handles in the main function. The handles are visible only to the functions below the one in which they are created. Is that accurate?

You’re running into a C++ issue. A local variable in a C++ function:

cudssHandle_t handle; 

is not guaranteed to hold its value from one function call to the next. The fact that it appears to work sometimes is just exploring UB.

If you want the handle to persist and be usable, one approach would be to pass it by reference or by pointer as a function argument/parameter, at whatever scope you would like the preservation.

This is just C++.