The LU factorization works well but numerical results are wrong for me. If gpu_sgetrf works as netLib’ sgetrf, we obtain A=PLU. But how can I compute matrix P ?
I have these inputs with a random matrix 4x4 :
Device: GeForce 9600M GT, 1250 MHz clock, 511 MB memory.
Matrix A:
3.00 6.00 7.00 5.00
3.00 5.00 6.00 2.00
9.00 1.00 2.00 7.00
0.00 9.00 3.00 6.00
Matrix B:
1.00
3.00
1.00
2.00
Start sgetrf
End sgetrf succefuly
Matrix PLU:
7.00 0.71 0.43 0.86
2.00 4.57 -0.19 0.94
9.00 0.57 -1.75 0.14
3.00 3.86 4.44 -6.82
ipiv : 3 4 3 4
Matrix L:
1.00 0.00 0.00 0.00
2.00 1.00 0.00 0.00
9.00 0.57 1.00 0.00
3.00 3.86 4.44 1.00
Matrix U:
7.00 0.71 0.43 0.86
0.00 4.57 -0.19 0.94
0.00 0.00 -1.75 0.14
0.00 0.00 0.00 -6.82
Matrix L*U:
7.00 0.71 0.43 0.86
14.00 6.00 0.67 2.65
63.00 9.04 2.00 8.39
21.00 19.78 -7.20 0.00
Matrix X:
1.05
0.48
-0.77
-0.12
n flop/s time LU info time solver
4 0.000350 0.000122s 0 0.000008s
Success exit
And vector X is totaly wrong when I give the output matrix A from gpu_sgetrf to sgetrs “sgetrs(‘N’,n,1,A,lda,ipiv,B,n,&info);”
Thanks for your answers!
PS: I’ve done a Benchmark about sgetrf subroutine on my CPU (3,06 Ghz Intel Core 2 duo) and my GeForce 9600M. I am going to do it again with an Nvidia GTX295 at my work!
benchmark_LU_Decomposition_Coppey.pdf (356 KB)