calculate pi with 4 gpu's

Hi. I’m working on a project on school. We have a pc on school with 4*9600GT. Now the teacher asked to write a program in cuda where we should calculate pi (using all 4 cards).
I found on the net that they would port superpi for cuda buth that never got finished:
Does someone know how to start best?

I listened to the following idea in some probability session… So reserve your credits, if any, for a phantom person.

One could use monte-carlo simulation to determine the value of PI.

Consider a square of side 2 Units.

Consider a circle of radius 1 unit with its centre in the middle of the square. The circle will be perfectly inside the square (Circumscribed??)

Consider throwing a dot randomly inside this square. It could be anywhere inside the square. The probability that the dot falls inside the circle is:

Area of Circle / Area of square == (Pi11)/(2*2) == Pi/4

You can estimate this probability by performing monte-carlo simulation in the following way.

  1. Generate 2 random numbers between [-1 to +1]

  2. Considering the centre of square (circle) as [0,0], this randomly generated point will surely fall inside the square.

  3. The same point may or may not fall inside the circle depending on whether the distance from [0,0] is <=1

Now repeat the experiment some “N” number of times. Out of this, “K” times the point will fall inside the circle.

K/N = probability of point falling inside the circle = Pi/4

Thus 4*K/N = PI

The method will give you very good estimations if “N” is very large – which is the essence of Monte-carlo simulations.

Do it 10 million times and find “K”. You can do it 2.5 million times on each of your 4GPUs , combine results and get the value of PI.

For generating random numbers (check Mersenne Twister algorithm. Its in monte-carlo SDK sample)

For adding up numbers generated by each of your thread - check reduction SDK sample


Best Regards,


How does one calculate pi? If you let us know something about the algorithm, we could tell you where to start. Btw, using 4 GPUs at once is tricky.

Right, might need to be corrected if I am wrong but:

pi = 4 * Arctan(1)

We can calculate Arctan(x) using

arctan(x) = x - x^3/3 + x^5/5 - x^7/7 + …

If we calculate lots of terms, split them across GPUs and then reduce we can generate an Arctan(1) term.

How many DP’s were you wanting to calculate for pi?

Tricky thing about all this PI calculation might be precision. You are only going to get something out of your cards if the problem size is large enough, and for PI that means having result precision.

The biggest trick is to make an algorithm that is parallizable to thousands of threads.

There are some, but here’s one that’d be interesting… it’s a formula that lets you compute the Nth digit of Pi without calculating the others. The advantage is that the computation doesn’t require intercommunication, simplifying the design.
It still will NOT be the FASTEST way to proceed, but especially if you have multiple GPUs, it’s a great way to break interdependence and use them independently.

I think the second method using cuFFT might be a good idea, though not easy (and can cuFFT work across 4 GPUs?). I get the feeling that the others, except for the incredible pick-a-digit-any-digit method, are pretty serial. Anyway, I think this would be an awesome CUDA contest.

Btw, can someone find an actual code that computes the nth digit? If you look at the raw formula, it’s actually just integer division, and if you want more than a billion or so digits it’ll have to be int64 division.