more iterations = worse outcome Pi

grzesio1912 · February 19, 2011, 9:10am

hello,

i’m new in CUDA, i wrote a program to determine the number of PI

__global__ void Pi(float* out, long N, int nthreads, int nblocks)

{	

	float i;

	float x;

	float licznik = 4.0;

	int idx = blockIdx.x*blockDim.x+threadIdx.x; 

	if (idx%2!=0)

		licznik = -4.0;

	for (i = idx; i <= N; i += nthreads * nblocks) 

	{		

		out[idx] +=  licznik/(2 * i + 1);

	}

}

...

gets strange results:

N = 1000 Pi = 3.1425914764404297

N = 1000000 Pi = 3.1415917873382568

N = 1000000000 Pi = 3.1412153244018555

N = 10000000000 Pi = 3.1409137248992920

anyone know why?

avidday · February 19, 2011, 10:27am

You are exceeding the useable precision of 32 bit floating point arithmetic, and the way you are doing summation in the kernel is not helping either, especially if you are using a lot of threads.

grzesio1912 · February 20, 2011, 7:44pm

If anybody can write better program? I would be thankful

avidday · February 20, 2011, 7:53pm

__global__ void Pi(double* val, const long N0, const long N1)

{

    long idx = (blockIdx.x * blockDim.x) + threadIdx.x;

    double counter = (((idx+N0) % 2l) == 0) ? 4.0 : -4.0;

double sum = (N0 > 0) ? val[idx] : 0.;

    for (long i = (idx+N0); i <= N1; i +=  (blockDim.x * gridDim.x)) {

        sum += counter / double(2 * i + 1);

    }

val[idx] = sum;

}

will certainly be an improvement, but you will need a GPU with double precision support to use it.

cricri1 · February 20, 2011, 9:30pm

for 100 1000 10000 decimal if you want all in integer

i dont know if it s easy to do in cuda

Sub a()

Dim ta1(50) As Long

For i = 1 To 50

   ta1(i) = 0

Next i

ta1(1) = 1

For i = 500 To 1 Step -1

   For j = 50 To 1 Step -1

      ta1(j) = ta1(j) * i + reste

      reste = 0

      If ta1(j) > 10000 Then

         reste = Int(ta1(j) / 10000)

         ta1(j) = ta1(j) - reste * 10000

      End If

   Next j

reste = 0

   For j = 1 To 50

      reste2 = (ta1(j) + reste * 10000) Mod (2 * i + 1)

      ta1(j) = Int((reste * 10000 + ta1(j)) / (2 * i + 1))

      reste = reste2

   Next j

ta1(1) = ta1(1) + 1

reste = 0

Next i

End Sub