Sorry to pepper with Yet Another RNG Question, but…

We’re doing a simulation involving simple additive noise with Brownian motion. To get something up and running, I used the hybrid Tausworth/LCG/BoxMuller code from GPU Gems, and it looked like it was working great; I’d start up an array of N*3 seeds (where N is our number of particles, seeded with rand()), and I’d get N random numbers with the right mean and standard deviation; looked pretty good.

Only with enough time, the simulations seem to “creep”; in other words, the mean of our particle position in a coordinate had a definite drift to it (in our case, 1024 test particles over about 4.5 million steps). Standard deviation looked about right–just the mean seems to be drifting.

So now I’m trying to figure out if the problem is the RNG, the seeding of the RNG, the routine which propagates the random numbers through our simulation, or what. So if anyone has any suggestions…

Is the hybrid Tausworth/LCG routine from GPU Gems an acceptable choice, or has it been revealed to have serious problems?

Is good old rand() a Really Bad Choice for seeding, particularly for values of srand(x)? I need the same sequence of random numbers per run (for some comparisons), controllable by a single seed (so for now I srand(seed), then rand() for each particle’s components)

Are there standard “goodness tests” for a RNG?

Thanks in advance; this one is vexing me as I’m not sure where to pin the blame (RNG, simulation, stochastic integration, etc).

I can’t help you with the first 2
for 3), check out TestU01 by L’ecuyer. It is the standard. You can also check out DIEHARD from Marsaglia which is a whole lot less stringent but can run in about 10 seconds.

Thanks, Ailleur–I found a GPL version of diehard (named, cheekily, “dieharder” and ran it against a CPU implementation of a single stream of the GPU code. It passed everything with two “weak” notes, but it really tests whether a single stream is random, and not so much a bunch of streams (“stream” here doesn’t mean a GPU stream, but rather a stream of RNG numbers produced from a particular RNG seed state). Still, that makes me feel better about the RNG itself.

I think my problem was my initialization; each seed state has four UINTs, and I was needlessly constraining their range to modulus 2048 (I have NO IDEA why I did this; the only constraint to the seeds was that they be above 128). If I restricted a single stream to seeds mod 2048, dieharder gave the RNG five “weak” ratings–but given that my simulation had 3072 states, the possibility of some sort of correlation between streams is probably fairly high; I fixed the seed and am running a simulation now, but I may still try to code up a test for the multistream version just to satisfy myself.

As a side note: you may wish to consider using so-called quasi-random sequences instead, along the lines of SobolQRNG example from SDK. I was too into implementing random numbers using CUDA some time ago, and the whole story behind these is indeed rather complicated; but for this particular application, I found Sobol sequences working well (and fast enough), and also the underlying theory behind QRNGs seems for some reason more compelling to me than for pseudo-RNGs…

Quasi-random (or low discrepancy) sequences are useful for things like integration problems where you want to sample uniformly, but might not know how many points you will need. They have better convergence than a purely random sequence and are much easier to compute. I’ve used the Hammersley sequence in CUDA code to avoid aliasing in the integral of a 2D function that had some uniform grid structure embedded in it, and it is very convenient. Very underappreciated bit of mathematics!

(Just don’t use them for your Monte Carlo simulations :) )

To extend this thread further, I am reviewing Politis & Romano’s Stationary Bootstrap method (1994) for porting/vectorizing. It requires use of a uniform distribution whose result will then, subject to a conditional test, be used to: 1) choose the next observation in an array or 2) jump to another random position in the array. It is the branching issue which I am having trouble wrapping my head around particularly with the effect of two RNGs. Would appreciate any trailheads for someone new to CUDA and paralleling code. TIA, V.