Example using cusparse and cusolverSpDcsrlsvchol

terryhendicott · October 4, 2024, 1:56am

Can anyone point me to an example using the sparse double cholesky solver on gpu using C++ and CUDA.
I can get dense solver to give correct answers but the sparse solver does not so I must be doing something wrong.
Cheers
Terry

qanhpham · October 4, 2024, 5:11pm

Hi @terryhendicott, can you share more details of your problem? What are you trying to compute?

Also, you can check out cuDSS, which is a Direct Sparse Solver on GPU. It may be suitable solution to your problem.

terryhendicott · October 4, 2024, 6:37pm

I will check out cuDSS thsnk you for the hint.
Trying to solve a simple matrix first before launching into large sparse matrices
const int N = 3; // Matrix size
const int nnz = 5; // Number of non-zero elements
const int csrRowPtrA[N + 1] = { 0, 2, 4, 5 }; // Row pointers
const int csrColIndA[nnz] = { 0, 1, 0, 1, 2 }; // Column indices
const double csrValA[nnz] = { 4.0, 1.0, 1.0, 3.0, 1.0 }; // Non-zero values of A

// Right-hand side vector b
const double b[N] = { 1.0, 2.0, 3.0 };
double x[N];  // Solution vector

No matter what I try get wrong numeric solution?

antona · October 4, 2024, 7:04pm

HI terryhendicott, I am the developer of cuDSS, could you please share the whole example code?

terryhendicott · October 5, 2024, 12:17am

Have resolved what the problem is and it was on my side.
kernel.zip (3.4 KB)
Have been trying for years to get a command line fast sparse solver that I can call from other software. Just need to add saving x to a binary file to be complete. This means an awful lot to me thank you.

terryhendicott · October 5, 2024, 12:23am

Challenge.zip (729 Bytes)
Here is my data files for the small matrix check but I have got the correct answer with 46328x46328 matrix.
Thank you so much.

antona · October 7, 2024, 11:34pm

I run you kernel.cu for both matrices from message above and from your attachments and I see the correct solution.

terryhendicott · October 8, 2024, 4:56am

Thanks Antonia appreciate your work thank you greatly

pkuma · April 30, 2025, 8:04pm

Hi @antona I am having a similar issue using cudss to solve a sparse linear system of equations. I am new to cuda. I have been using Eigen libraries so far and wanted to compare cudss to Eigen linear solver for accuracy and speed. Attached is an example where I am using cudss and then solving the same equation using Eigen LU solver. Some of the numbers are matching. However, cudss gives nan for small numbers. Not sure why. The attached spread sheet shows the results comparison. I can’t figure out what I am missing. Any help is appreciated.
CudaRuntime1.zip (1.1 MB)

antona · April 30, 2025, 9:10pm

Hi, The matrix type is set to SPD while the matrix is not positive defined. You can check INFO by calling
cudssDataGet(handle, solverData, CUDSS_DATA_INFO, &cudss_error, sizeof(cudss_error), &sizeWritten);
after factorization.
if you set matrix type to CUDSS_MTYPE_SYMMETRIC, then the solution is correct. And if you check INERTIA by
cudssDataGet(handle, solverData, CUDSS_DATA_INERTIA, cudss_inertia, sizeof(cudss_inertia), &sizeWritten);
there will be (6314, 34) that means that 34 eigen values are negative.

pkuma · May 1, 2025, 12:42pm

Hi @antona Thank you very much for your quick response. setting the mtype as you recommended gives the exact results.

pkuma · May 7, 2025, 8:02pm

@antona I have a follow up question regarding cudssExecute. In my case of solving Ax=b, I don’t want to factorize A at every time step calculation. I would like to keep the same factors and solve x for different b values. So is it possible to extract “matrix” after the “CUDSS_PHASE_FACTORIZATION” and reuse it for different b values during “CUDSS_PHASE_SOLVE” or is there another function that can be used only to decompose the sparse matrix?

antona · May 9, 2025, 3:29pm

@pkuma, it is possible to reuse factorization. Here is the example

cudssExecute(A, FACTORIZATION);
//Solve with the first RHS
cudssExecute(A, b1, x1, SOLVE);
//Solve with the second RHS
cudssExecute(A, b2, x2, SOLVE);
//or just update the pointers to values fo b1, x1 and solve again
cudssMatrixSetValues(b1, new_values_ptr);
cudssMatrixSetValues(x1, new_values_ptr);
cudssExecute(A, b1, x1, SOLVE);

In the same way you can reuse FACTORIZATION step. First factorize the matrix and then update values pointer and factorize again without calling Analysis step. Basically you must call Analysis only if matrix non-zero struction has changed

pkuma · May 9, 2025, 7:43pm

@antona I believe I am missing something. My cudss function to solve the system of equations is called from another function. I tried your example but it didn’t provide the correct results for the reuse. I am attaching my function here. I would really appreciate if you could take a look and see what I am doing wrong. Basically if reEval (!= 1) then I would like to skip as much as possible and only solve for x since only the b vector is updated at every call.
EigentoCuda.zip (1.8 KB)

antona · May 9, 2025, 7:58pm

You need to skip both Analysis and Factorization (that is Analysis must be also under reEval == 1)
For new b and x you need to call only Solve step.

Note that you also need to use same ‘cudssData_t solverData’ and ‘cudssHandle_t handle’ through the whole process. In your example it is created for each new EigenSparseToCuSparse calls. That is solverData and handle must be a global data. Basically, it stores the internal data that will be reused.

In my example above I removed it from Execute routines for simplicity, but they all use same solverData and handle

pkuma · May 9, 2025, 8:40pm

I am not sure how to create globals for solverData and handle. Is it possible to create it within the function or should it be passed in?

antona · May 9, 2025, 9:13pm

cudssData_t solverData and cudssHandle_t handle are just pointers for internal cudss data. You can create it just once and then pass it all over your code. So here is simple example:

//Here you don’t need to create new hande, etc. Just create b and x with new pointers,
your_function_to_solve(cudssHandle_t handle, cudssData_t solverData, cudssConfig_t**
solverConfig, cudssMatrix_t A, double *rhs_ptr, double *sol_ptr)
{
//Create new rhs and sol
cudssMatrixCreateDn(b, rhs_ptr);
cudssMatrixCreateDn(x, sol_ptr);

//Run Solve step
cudssExecute(handle, solverData, solverConfig, A, b, x, SOLVE);
}

//Main or your original code where you calculate matrix, rhs, sol, etc.
int main () {

//Create global handle, solverData and solverConfig
cudssCreate(&handle);
cudssConfigCreate(&solverConfig);
cudssDataCreate(handle, &solverData);

//Create wrappers for matrix
cudssMatrixCreateCsr(&A));
//Create wrappers for matrix. You can set NULL for pointers until Solve step
cudssMatrixCreateDn(&b, nrows, NULL);
cudssMatrixCreateDn(&x, nrows, NULL);

//Run Analysis and Factorization
cudssExecute(A, ANALYSIS);
cudssExecute(A, FACTORIZATION);

//Generate your first rhs1
for (i=0; i<nrows;i++) rhs1[i] = 1;

//Call your function for solving
your_function_to_solve(handle, solverData, solverConfig, A, rhs1, sol1);

//Generate your second rhs2
for (i=0; i<nrows;i++) rhs2[i] = 2;

//Call your function for solving with new rhs
your_function_to_solve(handle, solverData, solverConfig, A, rhs2, sol2);

}

pkuma · May 13, 2025, 2:22pm

@antona I am still struggling to get this to work. I have followed your direction and implemented the creation of handles. In my code I have 3 functions namely,
main, Cuda1, Cuda2. All my calculations are done in Cuda2 and I have implemented it as per your instructions. When I call Cuda2 multiple times from Cuda1 everything works fine. However, if I return the control back to main and call the functions again, the handle addresses are lost. I am not sure why since I haven’t destroyed the handles. I am attaching my code here which should compile and run. I don’t want to create the handles in the main function since my code is supposed to interface with another function I don’t have access to.
CudaRuntime1.zip (1019.3 KB)

pkuma · May 13, 2025, 3:57pm

Looks like I have to create the handles in the main function. The handles are visible only to the functions below the one in which they are created. Is that accurate?

Robert_Crovella · May 13, 2025, 4:39pm

You’re running into a C++ issue. A local variable in a C++ function:

cudssHandle_t handle;

is not guaranteed to hold its value from one function call to the next. The fact that it appears to work sometimes is just exploring UB.

If you want the handle to persist and be usable, one approach would be to pass it by reference or by pointer as a function argument/parameter, at whatever scope you would like the preservation.

This is just C++.

Topic		Replies	Views
Cusolver solve sparse Ax=b wrong GPU-Accelerated Libraries cusolver	8	123	August 19, 2024
Sparse cusolver inside loop .................. factorization at every call? GPU-Accelerated Libraries cusparse	8	1435	March 9, 2024
How does cuDSS solve linear equations internally? GPU-Accelerated Libraries cusolver , cudss	6	67	July 8, 2025
How to efficiently solve transposed problem using cuDSS? GPU-Accelerated Libraries cusparse , cudss	4	61	October 10, 2024
Linear solver with sparse matrix on Xavier Jetson Xavier NX	4	34	April 18, 2025
Help Improving Performance using cuSolver/cuSparse Routines GPU-Accelerated Libraries cuda , nsight , performance , python , pycuda	0	708	December 15, 2023
Some questions for the function "cusparseDcsrsv2_solve" nvc, nvc++ and nvfortran cuda	12	61	August 9, 2024
cuSOLVER sparse: cusolverSpDcsrlsvqr() error GPU-Accelerated Libraries	4	3829	June 7, 2015
Why I cannot get correct results from cudss solver for a simple matrix? Customer Stories cuda , cudss	4	39	June 10, 2025
Cusparse for solving the sparse linear equation Ax=b Legacy PGI Compilers	8	2038	August 30, 2019

Example using cusparse and cusolverSpDcsrlsvchol

Related topics