I have written a code that solves system of linear equations both on the CPU (using MKL 10.3.7 sgesv) and GPU (using CULA Dense R13 culaSgesv). I am then calculating the error between filling the coefficient matrix ( lhs matrix) on the CPU and the GPU. In a perfect world the error should be zero. I also determined the RMS error of the solution from the CPU and the GPU.

The RMS error for the CPU and GPU are approximately the same upto 100 unknowns. After 100 unknowns the errors differ from each other significantly and instead of getting smaller, the error is jumpin all over the place. I have attached my code. I am using Visual Studios 2008 X64 and CUDA 4.0 with Geforce GT 525M.

This is the result I am getting which is also attached (error.txt). I am not sure what is causing the error since I have a similar code in Matlab(minus GPU code) in which the error reduces to about 2.0E-003 and levels off as I increase the “Numnodes”. Also the error calculation from the C program is similar to that of the Matlab upto 100 unknowns. I am not sure what is happening after the 100 unknowns. Could it be configuration problems(MKL and CULA running at the same time)? I am not sure what is causing the error. Please any help in the right direction will be greatly appreciated. In case some one wants to see my matlab code which works, I will be glad to post that too.

laplace2D.cpp (14 KB)

laplace2D_kernel.cu (4.3 KB)

Gaussian_matrixFill.h (416 Bytes)

main.cpp (302 Bytes)

error.txt (859 Bytes)

```
Numnodes Alpha CPU Error GPU Error
16 1.170000 7.101676E-002 7.101548E-002
36 3.250000 8.908086E-003 8.904453E-003
64 6.369999 2.478148E-003 2.479677E-003
100 10.530000 1.537363E-003 1.497382E-003
<b>144 15.729999 2.007023E-003 6.758709E-003</b>
<b>196 21.969997 5.164282E-003 2.743981E-003</b>
<b>256 29.249996 6.399789E-003 3.003306E-003</b>
<b>324 37.570000 6.843160E-002 5.168200E-003</b>
<b>400 46.929996 8.977773E-003 1.528272E-002</b>
<b>484 57.329998 4.600475E-002 5.412453E-003</b>
<b>576 68.769997 2.689121E-002 9.519931E-003</b>
<b>676 81.250000 8.117091E-002 1.732562E-002</b>
<b>784 94.769997 1.850877E-001 4.871070E-002</b>
<b>900 109.330002 6.828754E-002 4.783689E-002</b>
<b>1024 124.930008 2.539281E-002 8.547682E-002</b>
<b>1156 141.569992 3.487872E-001 1.783528E-001</b>
<b>1296 159.250000 2.891807E-001 8.317184E-002</b>
<b>1444 177.969986 1.070141E-001 8.043955E-002</b>
<b>1600 197.729996 3.038914E-001 2.393741E-001</b>
```