I just ventured into Solver acceleration. I have implemented the LDM^T factorizer in GPU (only the factorization).
I understand the importance of factorization and the algorithm that goes bhind it. And, thats about it. No practical application experience.
The sequential algorithm for LDM^T can be found in “The Matrix computations” book by Van Loan & Golub
Page 137, Algorithm 4.1.1
I am getting pretty good speedups (8x for 1000x1000 (~340 ms) to 133x for 5000x5000 (13 secs) on a TESLA C1060 compared to 2.41GHz AMD Athlon)
But my problem is precision. The diagonal elements get pumped up pretty soon and they cumulatively amplify small errors into big ones - well, Thats my reasoning (I hope it is not a software bug - I did some verification - see next line).
I checked my application with a minimal 20x20 matrix and the max error was 2.xxx… The original matrix contained random elements from 1 to 100. I dumped the entire matrix just before calculating the max-error element and found that entires were pretty accurate… except that the big diagonal elements are amplifying miniscule errors for subsequent calculations. I actually wrote a C program to validate this. Please find it in attachment. The “par” array is the GPU based array just before calculating the  diagonal and “desi” array is the CPU based array just before doing the  one. “par” and “desi” array values are very close to each other and yet  calculation differs by 2. I also print the diagonal elements out.
So, my questions are:
What kind of pre-conditioners need to be applied before doing a LDM^T?
At the momment I am randomly generating a matrix with entries between 1 and 2. (1 to 100 for the test application in the attachment)
Are there huge matrices available in the internet for benchmarking? I remember some1 posting something about it sometime back.
I think all solvers need to address this floating point precision problem. So, What is the industry standard for it?
Is that why Iterative solvers are in place?
By question 3, Are most direct solvers accompanied by an iterative companion to find right values?
Thanks in Advance,
test.cpp (9.73 KB)