CuSPARSE: Solver with LU decomposition returns wrong answer, maybe fill-in problem?

Flecto · July 12, 2014, 11:54pm

I have a large non-symmetric sparse matrix A and I want to solve the system A * x = b for some given right-hand side b. CuSPARSE only has triangular solvers and so I figured out that I have to take the following steps:

Decompose A into A = LU with cusparseDcsrilu0
Solve the system L * y = b for y with cusparseDcsrsv_solve
Solve the system U * x = y for x with cusparseDcsrsv_solve

Analytically, we would now have L * (U * x) = b = A * x, since A = LU. However, the answer for x that CuSPARSE produces does not satisfy A * x = b. After some digging, I found that the answer does indeed satisfy L * (U * x) = b, but L*U = A only in the sparse sense, i.e. LU = A in all entries where A is non-zero, but LU has some additional non-zero entries. Thus LU != A in the dense sense, and so the solution I get from solving L * (U * x) = b does not satisfy A * x = b.

I think this may be the fill-in problem I’ve read about. How do I fix this? How can I solve the system A * x = b with CuSPARSE when A can by any sparse matrix?

Is there perhaps some way to extend the sparsity pattern of A (i.e. include some 0 entries in the sparsity pattern) to avoid the fill-in problem? Note that I need to solve A * x = b for many different A’s, but they all have the same sparsity pattern. So it would be ok if the step of extending the sparsity pattern is expensive, because it would only have to be done once and could be reused.

Thanks for the help.

Flecto · July 13, 2014, 7:46am

Ok, I figured it out. One has to use the Bi-Conjugate Gradient Stabilized (BiCGStab) method, described in this white paper (Algorithm 2): [url]https://developer.nvidia.com/content/incomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublas[/url]

blade613x · July 22, 2014, 3:05pm

I am working on the same thing, and stuck at this point exactly right now.

How did you obtain the L and U matrices separately like required in the paper? When I followed the example code to do LU decomposition to solve Ax=B, like you, it did not provide the “correct” solution for my system. So instead of steps 6 and 7 from the sample code, I need those L and U matrices. I don’t know where they are stored on the GPU while Steps 5-7 are executed.

// step 5: M = L * U
cusparseDcsrilu02(handle, m, nnz, descr_M,
    d_csrVal, d_csrRowPtr, d_csrColInd, info_M, policy_M, pBuffer);
status = cusparseXcsrilu02_zeroPivot(handle, info_M, &numerical_zero);
if (CUSPARSE_STATUS_ZERO_PIVOT == status){
   printf("U(%d,%d) is zero\n", numerical_zero, numerical_zero);
}
 
// step 6: solve L*z = x
cusparseDcsrsv2_solve(handle, trans_L, m, nnz, &alpha, descr_L,
   d_csrVal, d_csrRowPtr, d_csrColInd, info_L,
   d_x, d_z, policy_L, pBuffer);

// step 7: solve U*y = z
cusparseDcsrsv2_solve(handle, trans_U, m, nnz, &alpha, descr_U,
   d_csrVal, d_csrRowPtr, d_csrColInd, info_U,
   d_z, d_y, policy_U, pBuffer);

Also, along with L and U, was there a convenient way to obtain csRowPtr and csrColInd with them? They’re all the same “format” so figuring those out based off matrix dimensions is trivial, but I was wondering if there was a simple way like “dense2csr” to get all of that information from a single line or two. The closest I could think of are to create dummy matrices filled with placeholder 1s, then call the “dense2csr” conversion function and then replace the vals with the vals of L and U on the GPU, and then cudaFree the dummy arrays off the device memory.

blade613x · August 6, 2014, 4:05pm

For anyone who may run into this,

I came back to this on Monday, and since then I got it working with the code linked in the original post (with a small tweaks to include v2 functions). It looks like some small changes have been made in CUDA since the paper, so instead of focusing on L and U matrices, in the code above, in steps 6 and 7, the original A matrix is actually overwritten with LU. Step 6 and 7 above make this clear as you are solving a system with L in step 6, and then U in step using the same d_csrVal matrix found in step 5.

So after step 5 is where you begin with in the code in the paper. But you do need to make a copy of the original A matrix before doing anything of this, and use that everywhere where A is passed in the code.

Topic		Replies	Views
solving linear system with cusparse GPU-Accelerated Libraries	4	1635	March 13, 2019
Error in CUDALibrary exampple cuSPARSE/spsv_csr GPU-Accelerated Libraries cuda , hpc , cusparse	4	330	May 13, 2024
cuSPARSE LU decomposition GPU-Accelerated Libraries	1	1677	October 10, 2017
cuSolverSp_LinearSolver LU factorization example CUDA Developer Tools	0	735	May 4, 2020
cusparseScsrilu02 breaks with large matrices GPU-Accelerated Libraries cublas , cusolver , cusparse	10	1268	June 1, 2022
cusparseDcsrsv_solve returns CUSPARSE_STATUS_MAPPING_ERROR BiCG CUDA Programming and Performance	1	1438	April 27, 2012
Sparse triangular matrix solvers in MKL and cuSparse produce different results? GPU-Accelerated Libraries cusparse	5	760	July 24, 2023
Sparse LU decomposition got wrong results for large matrices CUDA Programming and Performance	2	711	July 20, 2018
Solve sparse system of Ax = B with LU GPU-Accelerated Libraries	0	708	May 17, 2015
cuSPARSE for solving Ax=b on matrix ~ 230400x230400 GPU-Accelerated Libraries	3	3836	December 31, 2015

CuSPARSE: Solver with LU decomposition returns wrong answer, maybe fill-in problem?

Related topics