Random number generator on GPU

I am currenlty writing my own GPU tracer.

And I now need random number, or quasi random number for sampling.

But how to get random number with CUDA? What do people usually do?

Generate a bunch of random numbers on the CPU and pass it to kernel? Or use some library to generate numbers in the kernel directly?
Or implement their own RNG in the kernel??

Try CURAND? It is part of CUDA. Check http://docs.nvidia.com/cuda/pdf/CURAND_Library.pdf.

I tend to use the Park-Miller psuedo random number gernators.
This has the advantage of defined “randomness”, is fast and can be run on the GPU.
code http://www.cs.ucl.ac.uk/staff/W.Langdon/ftp/gp-code/random-numbers/cuda_park-miller.tar.gz

I would suggest using CURAND. I am no expert on the manner, but I am aware that making high quality parallel PRNGs is hard. Many older generators suffer from issues like periods that are way too short for todays fast computers where Monte Carlo applications can consume several billion random numbers per second, they cannot guarantee the independence of the random number stream generated for each thread, and they deliver random numbers with enough structure that they fail many tests implemented by comprehensive modern test frameworks like TestU01.

As the information in the CURAND manual points out, not all of CURAND’s generators pass all of the tests in TestU01, so programers will need to consider trade-offs between quality and performance. For something that needs to be extremely robust, the Mersenne Twister generator (CURAND_RNG_PSEUDO_MTGP32) appears to be indicated.

If Mersenne Twister is so good, why default generator is XORWOW?

I have no first-hand knowledge. It is probably for consistency across versions. If I understand the timeline correctly, CURAND was introduced with CUDA 3.2, while MTGP32 was added to CURAND for CUDA 4.1. Changing library defaults underneath existing applications would be undesirable.

There is no single best PRNG, that is why random number libraries typically offer several good ones, so programmers can make trade-offs, e.g. between performance and very long period versus extremely long period.

I have use CURAND in my project in host which is slower actually. So i suggest you to go through the manual of cuda rand and find the device code for random number generation … which is actually very fast

On a C2050, Park-Miller produces more than 25 billion prng numbers per second.
(Fig 15.7 in Massively Parallel Evolutionary Computation on GPGPUs)

How does Park-Miller fare with TestU01? The Park-Miller PRNG I remember from 25 years ago was the so-called “minimum standard”, a congruential generator with a period of 2**32, and with the output falling into planes as is typical for congruential generators.

Does the GPU implementation use a different Park-Miller generator (there may well be several ones for all I know)?

Dear njuffa,
Yes you are correct. It is the same one.
The CUDA implementation has been shown to give exactly the same numbers
as Park and Miller defined.
(The implementation is different to take advantage of GPU hardware.)