seibert I can show a partial output I would show the full output but I don’t think you want to look through 400 lines :)
Here are the correct results…
0.250,2138.611762554759
0.255,2092.868567279131
0.260,2058.309226301871
0.265,2016.717948844871
0.270,1955.39850723784
0.275,1869.225747429558
0.280,1760.6376794998
0.285,1637.854444507106
0.290,1511.987133687082
.
.
.
0.960,92.78303227132187
0.965,93.91643349781558
0.970,95.70534037574494
0.975,97.71544292934036
0.980,99.57838484108686
0.985,100.9984177869715
0.990,101.752963146927
0.995,101.6896222963883
1.000,100.7226676580754
1.005,98.83100363363878
.
.
.
1.200,114.0212254126888
1.205,111.2882085894538
1.210,108.9470922716362
1.215,107.0689548403182
1.220,105.6783678809785
1.225,104.7472445272642
1.230,104.1876497380735
1.235,103.8479534887385
1.240,103.5182667650906
1.245,102.9498676679108
Here are the incorrect results for that same domain…
0.250,86288.1562499999854481
0.255,80877.5937499999854481
0.260,75868.3437499999854481
0.265,71216.4687499999854481
0.270,66883.4062499999854481
0.275,62835.7656249999927240
0.280,59044.5156249999927240
0.285,55484.5156249999927240
0.290,52133.9843749999927240
.
.
.
0.960,-760.0603027343748863
0.965,-747.7209472656248863
0.970,-736.4636230468748863
0.975,-726.1794433593748863
0.980,-716.6721191406248863
0.985,-707.6682128906248863
0.990,-698.8312988281248863
0.995,-689.7785644531248863
1.000,-680.1022949218748863
1.005,-669.3859863281248863
.
.
.
1.200,313.1156005859374432
1.205,301.6966552734374432
1.210,288.4442138671874432
1.215,273.6068115234374432
1.220,257.4373779296874432
1.225,240.1913452148437216
1.230,222.1231079101562216
1.235,203.4838256835937216
1.240,184.5186157226562216
1.245,165.4644165039062216
So as you can see there is a major difference in the two…
The graph of the correct results should look like a roller-coaster and the results should never be negative.
Also Sylvain Collange I believe this should answer why I am looking for them to be the same :) its not that I need them to be exactly the same I mean a 10e-8 error would be fine by me but the numbers aren’t even close you know. Also It comes down to this is my first CUDA application and since I already had it written for MPI turning it into a CUDA application seemed like a simple way to start grasping some of the concepts even though I should probably be completely redesign it to a more CUDA like style program design but I think you get what I was trying to do, fast and simple to understand how CUDA works before I start diving into a program where I don’t know what the results should be.
Also I just tried using __fsqrt_rn and still that answer does not come out correctly…I feel as though this cannot be correct…
This is the code that works…
double p[ATOM_LENGTH][3];
.
.
.
double *p1=p[i];
double *p2=p[j];
.
.
.
216 double sq=(
217 (
218 (p1[0]-p2[0]) *
219 (p1[0]-p2[0])
220 ) +
221 (
222 (p1[1]-p2[1]) *
223 (p1[1]-p2[1])
224 ) +
225 (
226 (p1[2]-p2[2]) *
227 (p1[2]-p2[2])
228 )
229 );
230 return sqrt(sq);
All of these return the incorrect result…
double p[ATOM_LENGTH *3];
.
.
.
74 double sq=(
75 (
76 (p[ATOM_LENGTH * 0 + i]-p[ATOM_LENGTH * 0 + j]) *
77 (p[ATOM_LENGTH * 0 + i]-p[ATOM_LENGTH * 0 + j])
78 ) +
79 (
80 (p[ATOM_LENGTH * 1 + j]-p[ATOM_LENGTH * 1 + j]) *
81 (p[ATOM_LENGTH * 1 + i]-p[ATOM_LENGTH * 1 + j])
82 ) +
83 (
84 (p[ATOM_LENGTH * 2 + i]-p[ATOM_LENGTH * 2 + j]) *
85 (p[ATOM_LENGTH * 2 + i]-p[ATOM_LENGTH * 2 + j])
86 )
87 );
88 return __fsqrt_rn(sq);
89 // return sqrt(sq);
90 // return rsqrt(sq) * sq;
Sorry for the length of this post… and for all the numbers in my code just copied it straight from vim…if you guys have any thoughts please let me know and if I am doing something really stupid please let me down easy :)
Thanks.