How do I test random kernels vs cpu?

I wish to run cpu and gpu side by side to verify kernel results, but my kernels use random numbers generated on gpu. so how do I manage to get the cpu and gpu to generate the same random numbers? Will there be a cpu equivalent version of nvidias curand library?

what is the standard random algo that most people use as ‘good enough’ for most general cases? curand has lots and its difficult to know which to use. why does gpu have many random algos, yet c is basically rand()? what is the equivalent gpu curand for rand()?

Generate the random values ahead of time. Put them into a buffer. Then you can use the same sequence for the GPU and the CPU for test purposes. Some of the generators in CURAND come from specified/documented sources, but even then it is rather hard to find a CPU generator that exactly matches a GPU one. These generators all have parameters that can vary the results even for the exact same “sequence”

I wouldn’t be able to say what is “good enough”.

C as a language doesn’t really have (IMO) “industrial strength” RNGs that are built-in. People who are serious about statistics of their data would probably use a library of some sort. C++ has some better “built-in” stuff in , but there you will find that there is enough variety that it is approximately as complicated as CURAND.

I don’t believe there is an equivalent CURAND routine for rand() in C. I think rand() in C would be looked at by most as “unuseful” except for the most trivial uses. The rand48 family in the C standard library is probably a little bit better:

[url]rand48(3)

but again, for serious use, I think people would not rely on built-in C functionality but instead use a library.

As txbob says, rand() is only good enough for, say, a small toy app that simulates the rolling of dice. Every programmer seems to have their own idea about what constitutes an acceptable “minimal standard” PRNG. Mine happens to be professor George Marsaglia’s very portable KISS generator, which works best if you deal with 32-bit data. Good for unit tests, even quite extensive ones. As with any simple generator, care needs to be taken when moving to a parallel execution environment to make sure the random numbers are used in the same as in the serial code, to ensure an apples-to-apples comparison (here: between host and device implementations of the app).

// Fixes via: Greg Rose, KISS: A Bit Too Simple. http://eprint.iacr.org/2011/007
static unsigned int z=362436069,w=521288629,jsr=362436069,jcong=123456789;
#define znew (z=36969*(z&0xffff)+(z>>16))
#define wnew (w=18000*(w&0xffff)+(w>>16))
#define MWC  ((znew<<16)+wnew)
#define SHR3 (jsr^=(jsr<<13),jsr^=(jsr>>17),jsr^=(jsr<<5)) /* 2^32-1 */
#define CONG (jcong=69069*jcong+13579)                     /* 2^32 */
#define KISS ((MWC^CONG)+SHR3)