I’m trying to calculate the system of linear equations Ax=b having sparse (<= 1% of non-zero blocks) 3x3 block matrix. I store it in BSR format. The matrix is symmetrical and positive-definite. Number of lines from 5k to 30k.
I’m trying to calculate it using conjugate gradient method from Saad Iterative Methods for Sparse Linear Systems book http://www.lmn.pub.ro/~daniel/ElectromagneticModelingDoctoral/Books/Numerical%20Methods/SaadIterativeMethods.pdf (ALGORITHM 7.6, page 263) mixing it with preconditioner ILU(2) (10.3.3, page 331 from the same book).
The system and the preconditioner work well due to CPU parallel computing. But the point of problem is the preconditioner’s matrix calculation (the calculation of both systems of linear equations with the top and bottom triangular matrix).
While trying to speed up the calculation I have used cusparse lib. And since I’ve used ILU(2) which isn’t implemented in cusparse I implemented PCG method for GPU using cusparseDbsrsv2_solve, cublasDdot, cublasDaxpy, etc… But this solutions requires much more iterations to achieve the same accuracy of PCG comparing to CPU-based implementation which seems pretty strange to me since it uses the same algorithm. And, I think, therefore GPU-solution works slower.
So guys, my questions are:
Please find my examples attached. https://gist.github.com/maiorpa/d2b853f87a0c4d58163ba4de8e06bac8
Thank you, I appreciate your time to help me. Feel free to refer me to a source.
P.S. Sorry for any inconvenience or mistakes It’s my first post.