cusparse Incomplete Cholesky CG - incorrect results

Hi,

I am trying to implement the ICCG from the cusparse library as outlined in https://developer.nvidia.com/content/incomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublas, however I can’t seem to get the correct results, except for a 4x4 matrix. The Cholesky factor appears to be correct, so the problem seems to be in either the solve or analysis phase.

If anyone with experience with this library could take a look at the code, it would be much appreciated.

#include <iostream>
#include <iomanip>

#include <cusparse_v2.h>
#include <cublas_v2.h>

#include <thrust/device_ptr.h>
#include <thrust/host_vector.h>

#include <cusp/csr_matrix.h>
#include <cusp/array1d.h>
#include <cusp/gallery/poisson.h>

// where to perform the computation
typedef cusp::host_memory HostMemorySpace;
typedef cusp::device_memory MemorySpace;

const char* cublasGetErrorString(cublasStatus_t status);
const char* cusparseGetErrorString(cusparseStatus_t status);

int main(int argc, char **argv)
{
	const double TOL = 1e-3;
	const size_t MAX_ITER = 1000;

	const size_t numGrid = 2; // N = numGrid*numGrid

	cusp::csr_matrix<int, double, HostMemorySpace> h_A;
    cusp::gallery::poisson5pt(h_A, numGrid, numGrid);

    assert(h_A.num_rows == h_A.num_cols);// sanity check
    const size_t N = h_A.num_rows;//Number of rows and columns (square matrix assumed)
    const size_t NZ = h_A.num_entries;//Total number of non-zero entries in complete matrix

    //Number of non-zero entries in lower or upper + diagonals (symmetric matrix assumed)
    const size_t LOWER = 0.5*(NZ + N);

	double *h_val = (double *)malloc(LOWER*sizeof(double));
	int *h_col = (int *)malloc(LOWER*sizeof(int));
	int *h_row = (int *)malloc((N+1)*sizeof(int));
	double *h_b = (double *)malloc(N*sizeof(double));
	double *h_x = (double *)malloc(N*sizeof(double));

    //populate lower triangular column indices and row offsets for zero fill-in IC
    h_row[N] = LOWER;
    int k = 0;
	for (int i = 0; i < N; i++) {
		h_row[i] = k;
		int numRowElements = h_A.row_offsets[i+1] - h_A.row_offsets[i];
		int m = 0;
		for (int j = 0; j < numRowElements; j++) {
			if (!(h_A.column_indices[h_A.row_offsets[i] + j] > i)) {
				h_col[h_row[i] + m] = h_A.column_indices[h_A.row_offsets[i] + j];
				h_val[h_row[i] + m] = h_A.values[h_A.row_offsets[i] + j];
				k++; m++;
			}
		}
	}

	// RHS vector
	h_b[0] = 1.0; h_b[1] = 1.0; h_b[2] = 1.0; h_b[3] = 1.0;
	// solution vector
	h_x[0] = 0.0; h_x[1] = 0.0; h_x[2] = 0.0; h_x[3] = 0.0;

	double *d_val;
	int *d_col;
	int *d_row;
	double *d_b;
	double *d_x;
	double *d_valCholesky;

	cudaMalloc((void **)&d_val, LOWER*sizeof(double));
	cudaMalloc((void **)&d_col, LOWER*sizeof(int));
	cudaMalloc((void **)&d_row, (N+1)*sizeof(int));
	cudaMalloc((void **)&d_b, N*sizeof(double));
	cudaMalloc((void **)&d_x, N*sizeof(double));
	cudaMalloc(&d_valCholesky, LOWER*sizeof(double));

	cudaMemcpy(d_val, h_val, LOWER*sizeof(double), cudaMemcpyHostToDevice);
	cudaMemcpy(d_col, h_col, LOWER*sizeof(int), cudaMemcpyHostToDevice);
	cudaMemcpy(d_row, h_row, (N+1)*sizeof(int), cudaMemcpyHostToDevice);
	cudaMemcpy(d_b, h_b, N*sizeof(double), cudaMemcpyHostToDevice);
	cudaMemcpy(d_x, h_x, N*sizeof(double), cudaMemcpyHostToDevice);
	cudaMemcpy(d_valCholesky, d_val, LOWER*sizeof(double), cudaMemcpyDeviceToDevice);

	//for printing to check values
	thrust::device_ptr<double>   d_val_devptr(d_val);
	thrust::device_ptr<int>   	 d_col_devptr(d_col);
	thrust::device_ptr<int>   	 d_row_devptr(d_row);
	thrust::device_ptr<double>   d_b_devptr(d_b);
	thrust::device_ptr<double>   d_x_devptr(d_x);
    thrust::device_ptr<double>   d_valCholesky_devptr(d_valCholesky);

//	cudaError_t cudaError;
	cublasStatus_t cublasStatus;
	cusparseStatus_t cusparseStatus;

    cublasHandle_t cublasHandle0 = 0;
    cublasStatus = cublasCreate(&cublasHandle0);

    if (cublasStatus != CUBLAS_STATUS_SUCCESS) {
		std::cout << "cublasCreate(&cublasHandle0) failed, status returned: " << std::endl;
		std::cout << cublasGetErrorString(cublasStatus) << std::endl;
		return 1;
    }

    // Create CUSPARSE context
    cusparseHandle_t cusparseHandle0 = 0;
    cusparseStatus = cusparseCreate(&cusparseHandle0);

    if (cusparseStatus != CUSPARSE_STATUS_SUCCESS) {
		std::cout << "cusparseCreate(&cusparseHandle0) failed, status returned: " << std::endl;
		std::cout << cusparseGetErrorString(cusparseStatus) << std::endl;
		return 1;
    }

    cusparseMatDescr_t descrA = 0;
    cusparseStatus = cusparseCreateMatDescr(&descrA);

    if (cusparseStatus != CUSPARSE_STATUS_SUCCESS) {
		std::cout << "cusparseCreateMatDescr(&descrA) failed, status returned: " << std::endl;
		std::cout << cusparseGetErrorString(cusparseStatus) << std::endl;
		return 1;
	}

    cusparseSetMatType(descrA, CUSPARSE_MATRIX_TYPE_SYMMETRIC);
    cusparseSetMatFillMode(descrA, CUSPARSE_FILL_MODE_LOWER);
    cusparseSetMatDiagType(descrA, CUSPARSE_DIAG_TYPE_NON_UNIT);
    cusparseSetMatIndexBase(descrA, CUSPARSE_INDEX_BASE_ZERO);

	// create the analysis info object for the A matrix
    cusparseSolveAnalysisInfo_t infoA = 0;
    cusparseStatus = cusparseCreateSolveAnalysisInfo(&infoA);

    if (cusparseStatus != CUSPARSE_STATUS_SUCCESS) {
		std::cout << "cusparseCreateSolveAnalysisInfo(&infoA) failed, status returned: " << std::endl;
		std::cout << cusparseGetErrorString(cusparseStatus) << std::endl;
		return 1;
	}

    // Perform the analysis for the Non-Transpose case
    cusparseStatus = cusparseDcsrsv_analysis(cusparseHandle0, CUSPARSE_OPERATION_NON_TRANSPOSE,
    											N, LOWER, descrA, d_val, d_row, d_col, infoA);

    if (cusparseStatus != CUSPARSE_STATUS_SUCCESS) {
		std::cout << "cusparseDcsrsv_analysis(infoA) failed, status returned: " << std::endl;
		std::cout << cusparseGetErrorString(cusparseStatus) << std::endl;
		return 1;
	}

    // Generate the incomplete Cholesky factor
    cusparseStatus = cusparseDcsric0(cusparseHandle0, CUSPARSE_OPERATION_NON_TRANSPOSE, N, descrA,
    									d_valCholesky, d_row, d_col, infoA);

    cusparseDestroySolveAnalysisInfo(infoA);

    // describe the lower triangular matrix of the Cholesky factor
    cusparseMatDescr_t descrL = 0;
    cusparseStatus = cusparseCreateMatDescr(&descrL);

    if (cusparseStatus != CUSPARSE_STATUS_SUCCESS) {
		std::cout << "cusparseCreateMatDescr(&descrL) failed, status returned: " << std::endl;
		std::cout << cusparseGetErrorString(cusparseStatus) << std::endl;
		return 1;
	}

    cusparseSetMatType(descrL, CUSPARSE_MATRIX_TYPE_TRIANGULAR);
    cusparseSetMatFillMode(descrL, CUSPARSE_FILL_MODE_LOWER);
    cusparseSetMatDiagType(descrL, CUSPARSE_DIAG_TYPE_NON_UNIT);
    cusparseSetMatIndexBase(descrL, CUSPARSE_INDEX_BASE_ZERO);

	// create the analysis info object for the lower Cholesky factor
    cusparseSolveAnalysisInfo_t infoL = 0;
    cusparseStatus = cusparseCreateSolveAnalysisInfo(&infoL);

    if (cusparseStatus != CUSPARSE_STATUS_SUCCESS) {
		std::cout << "cusparseCreateSolveAnalysisInfo(&infoL) failed, status returned: " << std::endl;
		std::cout << cusparseGetErrorString(cusparseStatus) << std::endl;
		return 1;
	}

    cusparseStatus = cusparseDcsrsv_analysis(cusparseHandle0, CUSPARSE_OPERATION_NON_TRANSPOSE,
    											N, LOWER, descrL, d_valCholesky, d_row, d_col, infoL);

    if (cusparseStatus != CUSPARSE_STATUS_SUCCESS) {
		std::cout << "cusparseDcsrsv_analysis(infoL) failed, status returned: " << std::endl;
		std::cout << cusparseGetErrorString(cusparseStatus) << std::endl;
		return 1;
	}

	// create the analysis info object for the upper Cholesky factor
    cusparseSolveAnalysisInfo_t infoU = 0;
    cusparseStatus = cusparseCreateSolveAnalysisInfo(&infoU);

    if (cusparseStatus != CUSPARSE_STATUS_SUCCESS) {
		std::cout << "cusparseCreateSolveAnalysisInfo(&infoU) failed, status returned: " << std::endl;
		std::cout << cusparseGetErrorString(cusparseStatus) << std::endl;
		return 1;
	}

    cusparseStatus = cusparseDcsrsv_analysis(cusparseHandle0, CUSPARSE_OPERATION_TRANSPOSE,
    											N, LOWER, descrL, d_valCholesky, d_row, d_col, infoU);

    if (cusparseStatus != CUSPARSE_STATUS_SUCCESS) {
		std::cout << "cusparseDcsrsv_analysis(infoU) failed, status returned: " << std::endl;
		std::cout << cusparseGetErrorString(cusparseStatus) << std::endl;
		return 1;
	}

    double *d_r;
    double *d_z;
    double *d_t;
    double *d_p;
    double *d_q;

    double rho;
    double rhop;
    double ptAp;
    double alpha, negalpha;
    double beta;

    double normr0, normr;
    size_t iter = 0;
    double zero = 0.0;
    double one = 1.0, negone = -1.0;

    cudaMalloc((void **)&d_r, N*sizeof(double));
    cudaMalloc((void **)&d_z, N*sizeof(double));
    cudaMalloc((void **)&d_t, N*sizeof(double));
    cudaMalloc((void **)&d_p, N*sizeof(double));
    cudaMalloc((void **)&d_q, N*sizeof(double));

    cudaMemcpy(d_r, d_x, N*sizeof(double), cudaMemcpyDeviceToDevice);
    cudaMemcpy(d_z, d_x, N*sizeof(double), cudaMemcpyDeviceToDevice);
    cudaMemcpy(d_t, d_x, N*sizeof(double), cudaMemcpyDeviceToDevice);
    cudaMemcpy(d_p, d_x, N*sizeof(double), cudaMemcpyDeviceToDevice);
    cudaMemcpy(d_q, d_x, N*sizeof(double), cudaMemcpyDeviceToDevice);

    //for printing to check values
	thrust::device_ptr<double>   d_r_devptr(d_r);
	thrust::device_ptr<double>   d_z_devptr(d_z);
	thrust::device_ptr<double>   d_t_devptr(d_t);
	thrust::device_ptr<double>   d_p_devptr(d_p);
	thrust::device_ptr<double>   d_q_devptr(d_q);

    //1: r <- b - A*x0		initial residual
    //1a: r <- A*x0
	cusparseStatus = cusparseDcsrmv(cusparseHandle0, CUSPARSE_OPERATION_NON_TRANSPOSE, N, N, LOWER,
					&one, descrA, d_val, d_row, d_col, d_x, &zero, d_r);

	if (cusparseStatus != CUSPARSE_STATUS_SUCCESS) {
		std::cout << "cusparseDcsrmv(r <- Ax0) failed, status returned: " << std::endl;
		std::cout << cusparseGetErrorString(cusparseStatus) << std::endl;
		return 1;
	}
	//1b: r <- -r
	cublasDscal(cublasHandle0, N, &negone, d_r, 1);
	//1c: r <- b + (-r)
	cublasDaxpy(cublasHandle0, N, &one, d_b, 1, d_r, 1);

	// rho <- <r,z>
    cublasDdot(cublasHandle0, N, d_r, 1, d_z, 1, &rho);

    // normr0 <- sqrt(<r,r>)_0
    cublasDnrm2(cublasHandle0, N, d_r, 1, &normr0);
    normr = normr0;

    // iterate until convergence or maximum iterations are reached
    while (normr/normr0 > TOL && iter < MAX_ITER) {


    	//2: Solve M*z = r via L*[transpose(L)*z] = r where transpose(L)*z = t
    	//2a: L*t <- r
		cusparseStatus = cusparseDcsrsv_solve(cusparseHandle0, CUSPARSE_OPERATION_NON_TRANSPOSE, N,
												&one, descrL, d_valCholesky, d_row, d_col, infoL, d_r, d_t);

		if (cusparseStatus != CUSPARSE_STATUS_SUCCESS) {
			std::cout << "cusparseDcsrsv_solve(LOWER) failed, status returned: " << std::endl;
			std::cout << cusparseGetErrorString(cusparseStatus) << std::endl;
			return 1;
		}

	    //2b: transpose(L)*z <- t
		cusparseStatus = cusparseDcsrsv_solve(cusparseHandle0, CUSPARSE_OPERATION_TRANSPOSE, N, &one,
												descrL, d_valCholesky, d_row, d_col, infoU, d_t, d_z);

		if (cusparseStatus != CUSPARSE_STATUS_SUCCESS) {
			std::cout << "cusparseDcsrsv_solve(UPPER) failed, status returned: " << std::endl;
			std::cout << cusparseGetErrorString(cusparseStatus) << std::endl;
			return 1;
		}

	    //3: rhop <- <r,z>_i
		rhop = rho;
		//4: rho <- <r,z>_{i+1}
		cublasDdot(cublasHandle0, N, d_r, 1, d_z, 1, &rho);

		if (iter == 0) {
			//5: p <- z
			cublasDcopy(cublasHandle0, N, d_z, 1, d_p, 1);
		} else {
			//5: beta <- <r,z>_i / <r,z>_{i+1}
			beta = rho/rhop;
			//6: p = z + beta*p
			//6a: z <- beta*p + z
			cublasDaxpy(cublasHandle0, N, &beta, d_p, 1, d_z, 1);
			//6b: p <- z
			cublasDcopy(cublasHandle0, N, d_z, 1, d_p, 1);
		}
		//7: q <- A*p
		cusparseStatus = cusparseDcsrmv(cusparseHandle0,CUSPARSE_OPERATION_NON_TRANSPOSE, N, N, LOWER,
						&one, descrA, d_val, d_row, d_col, d_p, &zero, d_q);

		if (cusparseStatus != CUSPARSE_STATUS_SUCCESS) {
			std::cout << "cusparseDcsrmv(q <- Ap) failed, status returned: " << std::endl;
			std::cout << cusparseGetErrorString(cusparseStatus) << std::endl;
			return 1;
		}

		//8: alpha = <r,z>_i / <p,A*p> = rho/<p,q>
		//8a: ptAp <- <p,q>
		cublasDdot(cublasHandle0, N, d_p, 1, d_q, 1, &ptAp);
		//8b: alpha <- rho/ptAp
		alpha = rho / ptAp;

		//9: x <- x + alpha*p
		cublasDaxpy(cublasHandle0, N, &alpha, d_p, 1, d_x, 1);

		//10: r <- r - alpha*q
		//10a:
		negalpha = -alpha;
		//10b: r <- r + negalpha
		cublasDaxpy(cublasHandle0, N, &negalpha, d_q, 1, d_r, 1);

		iter++;

		// normr <- sqrt(<r,r>)_iter
		cublasDnrm2(cublasHandle0, N, d_r, 1, &normr);

    }

    cublasDestroy(cublasHandle0);
    cusparseDestroy(cusparseHandle0);
    cusparseDestroyMatDescr(descrA);
    cusparseDestroyMatDescr(descrL);
    cusparseDestroySolveAnalysisInfo(infoL);
    cusparseDestroySolveAnalysisInfo(infoU);

    std::cout << std::endl << "N = " << N << std::endl;
    std::cout << "Cholesky factor values (lower - csr storage)" << std::endl;
    for (int i = 0; i < LOWER; i++) {
    	std::cout << std::fixed << std::setprecision(3) << d_valCholesky_devptr[i] << ", ";
    }
    std::cout << std::endl;
    std::cout << "Cholesky factor column indices" << std::endl;
    for (int i = 0; i < LOWER; i++) {
		std::cout << std::fixed << std::setprecision(3) << d_col_devptr[i] << ", ";
	}
	std::cout << std::endl;
    std::cout << "Cholesky factor row offsets" << std::endl;
    for (int i = 0; i < N+1; i++) {
		std::cout << std::fixed << std::setprecision(3) << d_row_devptr[i] << ", ";
	}
	std::cout << std::endl;

    std::cout << std::endl << "d_x[i] solution vector" << std::endl;
    for (int i = 0; i < N; i++) {
    	std::cout << std::fixed << std::setprecision(3) << d_x_devptr[i] << ", ";
    }
    std::cout << std::endl;

    return 0;
}

const char* cublasGetErrorString(cublasStatus_t status)
{
    switch(status)
    {
        case CUBLAS_STATUS_SUCCESS:				return "CUBLAS_STATUS_SUCCESS";
        case CUBLAS_STATUS_NOT_INITIALIZED: 	return "CUBLAS_STATUS_NOT_INITIALIZED";
        case CUBLAS_STATUS_ALLOC_FAILED: 		return "CUBLAS_STATUS_ALLOC_FAILED";
        case CUBLAS_STATUS_INVALID_VALUE: 		return "CUBLAS_STATUS_INVALID_VALUE";
        case CUBLAS_STATUS_ARCH_MISMATCH: 		return "CUBLAS_STATUS_ARCH_MISMATCH";
        case CUBLAS_STATUS_MAPPING_ERROR: 		return "CUBLAS_STATUS_MAPPING_ERROR";
        case CUBLAS_STATUS_EXECUTION_FAILED:	return "CUBLAS_STATUS_EXECUTION_FAILED";
        case CUBLAS_STATUS_INTERNAL_ERROR: 		return "CUBLAS_STATUS_INTERNAL_ERROR";
        default: 								return "UNKNOWN_ERROR";
    }

}

const char* cusparseGetErrorString(cusparseStatus_t status)
{
    switch(status)
    {
        case CUSPARSE_STATUS_SUCCESS: 			return "CUSPARSE_STATUS_SUCCESS";
        case CUSPARSE_STATUS_NOT_INITIALIZED: 	return "CUSPARSE_STATUS_NOT_INITIALIZED";
        case CUSPARSE_STATUS_ALLOC_FAILED: 		return "CUSPARSE_STATUS_ALLOC_FAILED";
        case CUSPARSE_STATUS_INVALID_VALUE: 	return "CUSPARSE_STATUS_INVALID_VALUE";
        case CUSPARSE_STATUS_ARCH_MISMATCH: 	return "CUSPARSE_STATUS_ARCH_MISMATCH";
        case CUSPARSE_STATUS_MAPPING_ERROR: 	return "CUSPARSE_STATUS_MAPPING_ERROR";
        case CUSPARSE_STATUS_EXECUTION_FAILED: 	return "CUSPARSE_STATUS_EXECUTION_FAILED";
        case CUSPARSE_STATUS_INTERNAL_ERROR: 	return "CUSPARSE_STATUS_INTERNAL_ERROR";
        case CUSPARSE_STATUS_MATRIX_TYPE_NOT_SUPPORTED: return "CUSPARSE_STATUS_MATRIX_TYPE_NOT_SUPPORTED";
        default: 								return "UNKNOWN_ERROR";
    }
}

Here are the results I am getting compared to those from the CG cusp example

cusp cg
=======
N = 4
x[i] solution vector
0.500, 0.500, 0.500, 0.500,

N = 9
x[i] solution vector
0.688, 0.875, 0.688, 0.875, 1.125, 0.875, 0.688, 0.875, 0.688, 

N = 16
x[i] solution vector
0.833, 1.167, 1.167, 0.833, 1.167, 1.667, 1.667, 1.167, 1.167, 1.667, 1.667, 1.167, 0.833, 1.167, 1.167, 0.833,


cusparse iccg
=============
N = 4
d_x[i] solution vector
0.500, 0.500, 0.500, 0.500, 

N = 9
d_x[i] solution vector
0.607, 0.794, 0.821, 0.634, 0.749, 1.490, 0.179, 0.080, 0.392,


N = 16
d_x[i] solution vector
0.680, 1.051, 0.816, 0.475, 0.669, 1.708, 0.738, 0.084, 0.289, 0.375, 0.345, 0.124, 0.112, 0.157, 0.142, 0.066,

Thanks,

David

After reimplementing this in Matlab, I have compared the output of each stage and it shows that the Cholesky factor is correct. The first incorrect results appear at the first cusparseDcsrsv_solve(). Interestingly, the values in the first four elements of the output d_t vector are correct. This explains why the final solution is correct when N=4 for the coefficient matrix.

I have tried describing the Cholesky factor matrix in different ways to see if it’s a problem with the analysis information being passed to the solve function, but have had no success so far.

I see in the Preconditioned Conjugate Gradient sample provided with the CUDA SDK, that based on a comment in the code, there was meant to be a sample of the IC(0) along with the ILU(0), but it has been omitted. I wonder if that’s because there is a problem with it? Has anyone had any success implementing the Incomplete Cholesky with the CUSPARSE library?

Thanks,

David

  1. Are the indices within each row of the matrix sorted by column?
  2. Is the matrix base index 0 or 1?
  3. You should be able to reuse the (non-transpose) analysis of L for the csric0 and csrsv (no need to make an extra analysis call).

Thanks for taking the time to reply.

The matrix is generated from the Poisson equation and 5 point stencil via a routine from the Cusp library. This is stored in CSR format as required by the CUSPARSE functions. The matrix index base is zero.

Yes thanks, I was sure I could reuse it, but regenerated the analysis information in an attempt to figure out what was wrong. I have also tried describing the matrix as ‘general’ instead of triangular, as well as generating separate lower and upper factors in order to eliminate the transpose operation. Nothing I have tried has worked so far.

I have submitted a bug report, just in-case. Although I am new to GPGPU so am probably missing something.

David

How large (in (Rows X Cols) terms, and number of non-zeros) is the matrix you want to take the Cholesky factor?

I have some CUDA code which I use to compute the Cholesky factor of a dense positive-definite matrix which works on dense matrices of up to 10,000 x 10,000 rather quickly.

At this point the only time I use cuSPARSE is to convert any csc/csr format matrix into dense so I can use cuBLAS.

The cuBLAS documentation is way better anyway. For example if you have computed the Cholesky factor and are storing the result in the lower triangle, you can use cuBLAS Strsm() to solve for the inverse of L very quickly.

To give an rough timing example for a dense matrix L which is the Cholesky factor of dimensions (4096 x 4096);

Using the CPU(Gaus Seidel method) it took 294.4 seconds to calculate inv(L), while on the GPU using cublas solver it took 0.13 seconds, including all memory transfers, allocations, copies etc.

The results were identical (within 1e-6).

The code is eventually destined for CFD (Computational Fluid Dynamics) so I would expect the matrices to start in the order of 0.5M x 0.5M, but quickly get much larger. I don’t think dense storage is an option here. In my previous posts, when referring to the Cholesky factor, I meant the Incomplete Cholesky factor.

The matrices that I am generating in the above code are just for testing purposes, although the reason for using Cusp as a starting point, in terms of storage, is the existing code I am working with is based on Cusp.

That’s an impressive difference in time for calculating the inverse, however I don’t think it will be an option, but I will keep it in mind.

If you get the cusparseDcsrsv_solve() to work correctly please let us know.

Naive question;

If I were to compare the results of the straight Cholesky factor on a dense positive-definite matrix to the incomplete Cholesky factor of the same matrix (converted to sparse first, get result, then convert back to dense), how would they be different?

Would it be just an issue of the 0-fill in, or would there also be other differences?

Okay, I’ve found the problem and it wasn’t cusparseDcsr_solve(). It was due to a stupid oversight on my part. :-) Basically, in the beginning I hard coded a 4x4 matrix element by element just to start with. Now stupidly, I also coded the solution and RHS (Right Hand Side) vectors the same way, as can be seen in the above code on lines 61-63, instead of using a loop to initialise them. Therefore, when I replaced the original matrix with poisson5pt() and increased the system of equations beyond four, I was no longer solving a system where the RHS contained all ones. Hence, the incorrect results.

Anyway, the above code works and produces the correct results if you initialise the RHS vector with the intended values.

CudaaduC,

If you perform a normal Cholesky factorization, the fill-ins will contribute to other values. Therefore, doing this then zeroing the fill-ins to maintain the sparsity pattern will result in some values being different, compared to those from an Incomplete factorization.

David,

Thanks for the update and explanation!

Since the result of the incomplete Cholesky factor would be different than the full dense version, would that result of a incomplete factor still be useful in the same way as the actual?

Which does Matlab’s ‘chol()’ use?

It is very easy to make a mistake using cuSPARSE, so glad you figured it out!

CudaaduC,

For a symmetric matrix A, Cholesky factorization provides a unique factor ‘L’ such that L*L^{T}==A, and is used directly to obtain a solution to Ax=b.

Incomplete Cholesky factorization gives a factor ‘L’ such that L*L^{T}~=A, and is used to precondition the system Ax=b in order to increase the rate of convergence of an iterative method.

Ultimately, it depends on the type of system you’re trying to solve.

As for Matlab, I have very little experience using it, but ichol(A) calculates the Incomplete Cholesky factor, so I assume chol(A) calculates the Cholesky factor of A.

Ironically, I had the cuSPARSE components correct.