MKL versus CULA issues Solving system of linear equations

I have written a code that solves system of linear equations both on the CPU (using MKL 10.3.7 sgesv) and GPU (using CULA Dense R13 culaSgesv). I am then calculating the error between filling the coefficient matrix ( lhs matrix) on the CPU and the GPU. In a perfect world the error should be zero. I also determined the RMS error of the solution from the CPU and the GPU.

The RMS error for the CPU and GPU are approximately the same upto 100 unknowns. After 100 unknowns the errors differ from each other significantly and instead of getting smaller, the error is jumpin all over the place. I have attached my code. I am using Visual Studios 2008 X64 and CUDA 4.0 with Geforce GT 525M.

This is the result I am getting which is also attached (error.txt). I am not sure what is causing the error since I have a similar code in Matlab(minus GPU code) in which the error reduces to about 2.0E-003 and levels off as I increase the “Numnodes”. Also the error calculation from the C program is similar to that of the Matlab upto 100 unknowns. I am not sure what is happening after the 100 unknowns. Could it be configuration problems(MKL and CULA running at the same time)? I am not sure what is causing the error. Please any help in the right direction will be greatly appreciated. In case some one wants to see my matlab code which works, I will be glad to post that too.

laplace2D.cpp (14 KB)

laplace2D_kernel.cu (4.3 KB)

Gaussian_matrixFill.h (416 Bytes)

main.cpp (302 Bytes)

error.txt (859 Bytes)

Numnodes    Alpha         	CPU Error	GPU Error	

16            1.170000		7.101676E-002	7.101548E-002

36            3.250000		8.908086E-003	8.904453E-003

64            6.369999		2.478148E-003	2.479677E-003

100            10.530000	1.537363E-003	1.497382E-003

<b>144		15.729999	2.007023E-003	6.758709E-003</b>

<b>196		21.969997	5.164282E-003	2.743981E-003</b>

<b>256		29.249996	6.399789E-003	3.003306E-003</b>

<b>324		37.570000	6.843160E-002	5.168200E-003</b>

<b>400		46.929996	8.977773E-003	1.528272E-002</b>

<b>484		57.329998	4.600475E-002	5.412453E-003</b>

<b>576		68.769997	2.689121E-002	9.519931E-003</b>

<b>676		81.250000	8.117091E-002	1.732562E-002</b>

<b>784		94.769997	1.850877E-001	4.871070E-002</b>

<b>900		109.330002	6.828754E-002	4.783689E-002</b>

<b>1024	124.930008	2.539281E-002	8.547682E-002</b>

<b>1156	141.569992	3.487872E-001	1.783528E-001</b>

<b>1296	159.250000	2.891807E-001	8.317184E-002</b>

<b>1444	177.969986	1.070141E-001	8.043955E-002</b>

<b>1600	197.729996	3.038914E-001	2.393741E-001</b>

Is your error computed relatively, or absolutely?

The error is root mean square deviation(RMSD) = sqrt((sum((exact-num)^2))/n)

Suppose you have only 1 val to compare and exact = 10000001 and num = 10000002. Then, RMSD would be about 0.7, even though those two values agree to many, many decimal places.

If you compare with a relative error, say: abs((num-exact)/exact), you get about 1.0E-07, which makes a lot more sense.