CuSPARSE: Solver with LU decomposition returns wrong answer, maybe fill-in problem?

I have a large non-symmetric sparse matrix A and I want to solve the system A * x = b for some given right-hand side b. CuSPARSE only has triangular solvers and so I figured out that I have to take the following steps:

  1. Decompose A into A = LU with cusparseDcsrilu0
  2. Solve the system L * y = b for y with cusparseDcsrsv_solve
  3. Solve the system U * x = y for x with cusparseDcsrsv_solve

Analytically, we would now have L * (U * x) = b = A * x, since A = LU. However, the answer for x that CuSPARSE produces does not satisfy A * x = b. After some digging, I found that the answer does indeed satisfy L * (U * x) = b, but L*U = A only in the sparse sense, i.e. LU = A in all entries where A is non-zero, but LU has some additional non-zero entries. Thus LU != A in the dense sense, and so the solution I get from solving L * (U * x) = b does not satisfy A * x = b.

I think this may be the fill-in problem I’ve read about. How do I fix this? How can I solve the system A * x = b with CuSPARSE when A can by any sparse matrix?

Is there perhaps some way to extend the sparsity pattern of A (i.e. include some 0 entries in the sparsity pattern) to avoid the fill-in problem? Note that I need to solve A * x = b for many different A’s, but they all have the same sparsity pattern. So it would be ok if the step of extending the sparsity pattern is expensive, because it would only have to be done once and could be reused.

Thanks for the help.

Ok, I figured it out. One has to use the Bi-Conjugate Gradient Stabilized (BiCGStab) method, described in this white paper (Algorithm 2): [url]https://developer.nvidia.com/content/incomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublas[/url]

I am working on the same thing, and stuck at this point exactly right now.

How did you obtain the L and U matrices separately like required in the paper? When I followed the example code to do LU decomposition to solve Ax=B, like you, it did not provide the “correct” solution for my system. So instead of steps 6 and 7 from the sample code, I need those L and U matrices. I don’t know where they are stored on the GPU while Steps 5-7 are executed.

// step 5: M = L * U
cusparseDcsrilu02(handle, m, nnz, descr_M,
    d_csrVal, d_csrRowPtr, d_csrColInd, info_M, policy_M, pBuffer);
status = cusparseXcsrilu02_zeroPivot(handle, info_M, &numerical_zero);
if (CUSPARSE_STATUS_ZERO_PIVOT == status){
   printf("U(%d,%d) is zero\n", numerical_zero, numerical_zero);
}
 
// step 6: solve L*z = x
cusparseDcsrsv2_solve(handle, trans_L, m, nnz, &alpha, descr_L,
   d_csrVal, d_csrRowPtr, d_csrColInd, info_L,
   d_x, d_z, policy_L, pBuffer);

// step 7: solve U*y = z
cusparseDcsrsv2_solve(handle, trans_U, m, nnz, &alpha, descr_U,
   d_csrVal, d_csrRowPtr, d_csrColInd, info_U,
   d_z, d_y, policy_U, pBuffer);

Also, along with L and U, was there a convenient way to obtain csRowPtr and csrColInd with them? They’re all the same “format” so figuring those out based off matrix dimensions is trivial, but I was wondering if there was a simple way like “dense2csr” to get all of that information from a single line or two. The closest I could think of are to create dummy matrices filled with placeholder 1s, then call the “dense2csr” conversion function and then replace the vals with the vals of L and U on the GPU, and then cudaFree the dummy arrays off the device memory.

For anyone who may run into this,

I came back to this on Monday, and since then I got it working with the code linked in the original post (with a small tweaks to include v2 functions). It looks like some small changes have been made in CUDA since the paper, so instead of focusing on L and U matrices, in the code above, in steps 6 and 7, the original A matrix is actually overwritten with LU. Step 6 and 7 above make this clear as you are solving a system with L in step 6, and then U in step using the same d_csrVal matrix found in step 5.

So after step 5 is where you begin with in the code in the paper. But you do need to make a copy of the original A matrix before doing anything of this, and use that everywhere where A is passed in the code.

1 Like