CURAND: Independence of RND numbers Are Host API generated RND numbers independant?

devkec · May 19, 2011, 4:41pm

Hello,

Regarding the Host API from CURAND, it is not clear how the numbers are generated on the device. It is ok if nvidia keeps it secret, just common business. But:

A common problem in generating random numbers in parallel is, the more parallel generators exist, their sequences overlap. That is why I don’t use the Device API.

The Host API is still a black box to me, and I don’t want the source to be published, but my question is:

Are the random numbers from the Host API independant?

NathanW · May 25, 2011, 10:47pm

It’s not a secret, don’t worry! External Image

In the CUDA Toolkit 4.0 CURAND Guide, page 6, there is a formula that shows how pseudorandom results are arranged for the ordering CURAND_ORDERING_PSEUDO_DEFAULT.

“The result at offset n in global memory is from position (n mod 4096) * 2^67 + floor(n / 4096) in the XORWOW sequence.”

What’s happening is that the library first allocates space for the state of 4096 threads. Then it precomputes the starting state for each of the 4096 threads. All the threads start from a common state that is computed from the seed, then advanced by 2^67 steps times the thread number. So thread 0 is advanced 0 steps, thread 1 is advanced 2^67 steps, thread 2 is advanced 2*2^67 steps, … To generate 4096 output values, each thread uses its state to get a single output value, then advances one step.

Each thread can generate 2^67 values before it starts to overlap with the sequence from any other thread.

If you choose the CURAND_ORDERING_PSEUDO_SEEDED ordering, then the states are setup slightly differently. Each of the 4096 threads gets an initial state based on the seed and on the thread number. This is much faster than advancing by steps of 2^67, but it doesn’t give you a guarantee that the subsequences won’t overlap.

Hopefully this answers your question, let me know if I misinterpreted your question or if you want more details about anything.

seibert · May 26, 2011, 1:41am

One nice thing about the lack of a device-side linker is that the source code for all device functions has to be written out in the CUDA headers. You can see the implementation of the XORWOW algorithm in curand_kernel.h. (There are several generators in that file, so you have to read the comments to make sure you are looking at the right one.)

devkec · May 26, 2011, 4:42am

Thank you both.

Looks like I misunderstood the CURAND guide on that page, thanks for making that clear Nathan! It was the exact answer to my question.

And yes seibert, I’ll have a look at the curand_kernel.h

Topic		Replies	Views
how to get same output by CURAND in CPU and GPU CUDA Programming and Performance	3	5910	July 19, 2011
Differences between host API and device API for CURAND? CUDA Programming and Performance	4	12085	February 16, 2011
CURAND (device) seems to give correlated outputs among threads how to avoid? CUDA Programming and Performance	4	9846	December 7, 2011
Trying to understand CURand (curand_init) sequence input parameter CUDA Programming and Performance	5	5530	April 19, 2011
CURAND question CUDA Programming and Performance	1	1421	December 1, 2010
random numbers inside the Kernel CUDA Programming and Performance	31	61040	November 26, 2011
Inconsistent offsets in CURAND Host API CUDA Programming and Performance	7	3565	May 23, 2012
Sequence number in curand_init() CUDA Programming and Performance	2	1254	September 18, 2013
What's a good random number generator? CUDA Programming and Performance	21	13287	May 6, 2009
on generating random numbers in parallel CUDA Programming and Performance	4	9188	November 28, 2011

CURAND: Independence of RND numbers Are Host API generated RND numbers independant?

Related topics