 # Simple pi-algorithm

Hello.

As an introduction project I was thinking of developing a pi-evaluator. The algorithm I’m intending to use is very simple, yet suitable for threaded execution.

1. Generate random floating values between 0-1 for (X,Y)
2. Determine whether the point is inside the circle-arc by evaluating if absolute distance is less or equal to one.
3. Report the result. If the point was inside, increase a “hit_count” variable, and a “point_count” variable by one. If it was outside, just increase the “point_count”

Now, in the host program (outside the kernel) take the hit_count:point_count ratio (which should be ~0.78) and multiply this by 4 (as we are only doing this test for one quadrant) to get your approximation of PI. Ofcourse there are several numerical drawbacks with this approach, for instance the precision of the random numbers will limit the amount of correct decimals you will get, and more things.

However, what would be the best solution to do this in a practical sense. I have a few questions regarding this. Where should I generate the random numbers? If I do it in the kernel by say rand(), all threads get the value. Making a RNG in the kernel of some seed will be a bit too slow, right? How should I store the hits/points variables? All suggestions in general would be appriciated!

Many thanks!

This isn’t as simple an introductory project as you might think!

There is no rand() built-in function in CUDA. However, the MersenneTwister example in the SDK demonstrates how to generate random numbers in parallel - essentially you run multiple generators in parallel with different initial seeds.

It’s not possible to increment a global counter from parallel threads as you suggest without atomic operations, which are only available on devices with compute capability 1.1. Instead you could write out 1s where the points are inside the circle, and then perform a parallel sum reduction to get the final count.

Hope this helps.

Hi Simon.