I’m trying to solve a Least square problem using a linear system in the form of
Ax = b
Where I have a wight diagonal matrix, which changes my problem to the form of:
WAx = Wb
I developed the code using Intel IPP and CUDA implementation, CC 2.0 no npp is in use. [I’m definite that there is no bug in the CUDA code].
The IPP works great, extremely accurate, no doubt about it.
Regarding to the CUDA code , the code works great for small matrices, However when I work with matrices size of n > 1e3 , I receive an unbearable numerical differences of ± ~~ 0.5 which may cause a significant distortion in the future.
I’d like to know how to avoid such a numerical differences? How could it be that IPP is so much accurate ?
I’m using double precision for both applications,
Are there any CUDA flags relevant to high accuracy computations ?
I’m using a lot of multiply + additions operation, I tried to set -fmad flag to false, it didn’t change anything.
** I can’t post the code.
Any help would be very appreciated