Parallel Preconditioners for CG calculating the "inverse" in parallel

xlro · August 14, 2007, 11:03am

Hi,

i´m currently implementing a conjugate gradient solver for sparse linear systems on the gpu. And i´m quite happy with it´s performance (2.8 gflop / s - 20 times faster then CPU). The only problem is that i need many more iterations (on GPU with Jacobi Preconditioner) then on cpu (with ILU Preconditioner).

The problem is the following: for preconditioned cg i have to solve something like Cx=b in each step.
In case of a Jacobi preconditioner i can calculate x = C^-1 b easily.
In case of a ILU Preconditioner i could calculate LUx=b by performing a forward and a backward substitution.
But my Matrix only contains the main diagonal + 3 upper and 3 lower diagonals. Consequently for an optimal parallelisation i could get a speedup of factor 3. And that would destroy the performance of my whole algorithm (furthermore GPU without parallelisation is slower then cpu). So ILU on the GPU doesn’t make too much sense.

Did anybody implement anything like this or has good experiences with special parallel preconditioners (the calculation of the Preconditioner Matrix doesn’t have to be in parallel. Only the solving of Cx = b has to be in parallel).

Thanks for any ideas

bart_k · September 18, 2007, 10:14am

Hi,

I’m sorry to ask you a question and not being able to answer yours. I’m also trying to implement a CG solver with CUDA and I was wondering if you had some references that I could use to get me started. I understand how CG works, so I’m just looking for articles on the parallelization of things.

And which sparse storage scheme do you use?

Thanks, Bart

CUDAwithTBB · April 7, 2010, 4:19pm

Hi,

iï¿½m currently implementing a conjugate gradient solver for sparse linear systems on the gpu. And iï¿½m quite happy with itï¿½s performance (2.8 gflop / s - 20 times faster then CPU). The only problem is that i need many more iterations (on GPU with Jacobi Preconditioner) then on cpu (with ILU Preconditioner).

The problem is the following: for preconditioned cg i have to solve something like Cx=b in each step.

In case of a Jacobi preconditioner i can calculate x = C^-1 b easily.

In case of a ILU Preconditioner i could calculate LUx=b by performing a forward and a backward substitution.

But my Matrix only contains the main diagonal + 3 upper and 3 lower diagonals. Consequently for an optimal parallelisation i could get a speedup of factor 3. And that would destroy the performance of my whole algorithm (furthermore GPU without parallelisation is slower then cpu). So ILU on the GPU doesn’t make too much sense.

Did anybody implement anything like this or has good experiences with special parallel preconditioners (the calculation of the Preconditioner Matrix doesn’t have to be in parallel. Only the solving of Cx = b has to be in parallel).

Thanks for any ideas

I haven’t worked with parallel preconditioners but seeing that you’re working with multi-diagonals, here is an idea.

If you have access to Matlab:

Load your matrix to the Matlab workspace.

In Matlab command line, type:

spy( your_matrix_name );

Enter. You’ll see the map of your matrix.

There is a command called symrcm (for Symmetric Reverse Cuthill-McKee).

Type in command line:

p = symrcm( your_matrix_name );

Press enter. Then type:

spy( p );

You’ll see that the resulting permutation vecotr is divided into two parts, which might be helpful for parallelizing.

Topic		Replies	Views
Parallel preconditioning for CG algorithm ILU(0) CUDA Programming and Performance	1	2039	June 14, 2011
conjugate gradient CUDA Programming and Performance	16	9899	June 18, 2008
Incomplete-LU and Cholesky Preconditioned CUDA Programming and Performance	4	5353	November 8, 2012
Conjugate Gradient With Preconditioner Sample CUDA Programming and Performance	0	1454	November 28, 2013
Implement CG method in CUDA CUDA Programming and Performance	0	2793	April 9, 2009
Conjugate Gradient Solver CUDA Programming and Performance	1	4095	April 23, 2013
Linear Algebra Solvers CUDA Programming and Performance	20	19225	February 7, 2009
CULA Sparse 1.0 now available Sparse Iterative Solver Package CUDA Programming and Performance	3	2252	November 7, 2011
Is there library or code to solve Large Sparse Linear system or do Large sparse Matrix Multiplicatio CUDA Programming and Performance	4	3325	April 28, 2009
Conjugate Gradient CUDA Fortran Legacy PGI Compilers	5	10386	September 11, 2015

Parallel Preconditioners for CG calculating the "inverse" in parallel

Related topics