Another Random Number Thread

Hi everybody,

The other threads to that topic didn’t help me, so I am sorry but I have to open another thread…
I need a

device float RandomNumber(…)
{

}

function, callable from the device. The problem is, that some of my threads have to call the function more often than others, so I can’t use the examples from the other topic, because they use the syncthreads function (Am I right with that?).
I don’t need to have perfect randomness, speed is more important for me, if they aren’t totally bad.

Anyone who can help me?
Philipp.

I forgot something: I have to call the kernel multiple times (very often, like a million times), so the “initialization” procedure for each kernel shouldn’t need too much time.

Here is a link to an lrand48 implementation that has been posted on the forums some time ago: [url=“http://forums.nvidia.com/index.php?act=attach&type=post&id=9512”]http://forums.nvidia.com/index.php?act=att...ost&id=9512[/url]

If you can live with the GPL license, this would fit your problem description.

Mea culpa if this comes across as a hijack but just trying to clarify some points for a noob…

Assuming I have an array that is exposed to a conditional 1) walk thru the array item by item or 2) jump to a random point within the array, can I use the lrand48 kernel referenced above as my C like rand() func after loading it onto the device? Trying to wrap my head around a loop unroll and conditional items while still parallelizing efficiently.

TIA, Vince

Hi,

Thanks for you answer! I allready tryed this one, but i didn’t get it running. Maybe you can help me. I copied all the new functions and the struct to my project and can still compile. But the “Usage”-part doesn’t work. It says I should call

rng = new Rand48();
rng->init(GRID_DIM*BLOCK_DIM, SEED);
cudaFunction1 <<<GRID_DIM, BLOCK_DIM, sharedMem>>> (*rng);
cudaFunction2 <<<GRID_DIM, BLOCK_DIM, sharedMem>>> (*rng);
rng->destroy();
delete rng;

from a host function. But i get compiler errors for all those lines. (“error: identifier “rng” is undefined” - for the first line for example).
Can you show me how to do that right? What is the SEED variable?

Philipp.

If you are doing Monte Carlo you need at least a Lagged Fibonacci - I implemented a 17-5 on a SIMD machine time ago, I can see to port it, if useful.

If randomness is really not an issue, just use a simple LCG:

http://en.wikipedia.org/wiki/Linear_congruential_generator

http://www.koders.com/c/fidCA79AA537E666C3…1F54CA9ACA.aspx

It will fit a couple of registers - just seed it differently on any thread.

Seeding LCG is actually fast :D - little less for the LFG: you need to call an LCG to initialize 17 numbers - the register of the generator.

(I didnt notice jjp already give you a pointer… sorry - for the double answer on this part)

I indeed need that for MC-Code. But I think I can’t program that Lagged Fibonacci by myself…
Well, can anyone help me with the problem above?

I will look on how to port it this evening. I used it in “Zz”… programming language of APE100 simd architectures. I probably have a C port for checking somewhere… I did it a lot of time ago. However it is not very difficult - there is only the need to know about its normal form, in order to generate independent sequences on different threads, and perform correctly the initialization.

If I will find them, I will post some references. Actually I should have a presentation I gave to a workshop…

17-5 is the easiest LFG with good statistical properties. Now days much longer lags are in use (thousands and more, see SPRNG for more info). 17-5 has been already identified by D.Knuth being one of the good ones. I did not observe correlations with my specific problems (that, actually, does not mean that you will not too. Let’s hope you won’t! :) ).

Reading the posts I think you need a different sequence per thread. It can be worked out either on registers (about 17-18 per thread) or shared memory (each thread accessing its own generator). I would use shared memory, but what do you think it would be better?

Sequence splitting LCGs is usually bad for MC, because different sequences can have strong correlations - giving wrong statistical properties to the runs. At least, doing it for MC, there are special rules to follow in choosing the different entry seeds, but today CPUs are so fast (and GPUs has so many processing units) that the splitting is not feasible at all - sub-sequences will start overlapping with relatively short runs.

Hey!

That would be pretty cool! I’m not a information scientist (but electrical engineer), so it would take weeks for me to do that… ;)
There is one general question: Why can’t I just take the threadId as a seed for my RNG? I mean, all threads have different numbers, so it can’t be too difficult to use that for a good RNG… That’s confusing me a bit…
A function like
device float giveRandomNumber(int threadID)
{
blabla
}
should be perfect for so many applications. Curiously none posted that before…
And another question: A RNG like the one i just wrote… if I “initialize” that one at the beginning of my code (from the host) than use in my kernels the threadID as some kind of seed. When I start the same kernel a second time, than my random numbers will give the same results as the first run, because using the threadID will always give the same result for every thread. This means I have to store some value / change the “initialization” to get other results by the same threadID ?! Or something like that!?

So now to your questions:
My monte Carlo code runs for a huge number of timesteps. That means, every time step all my particles do MC-collisions. In my oppinion it shouldn’t be too bad, if the random numbers within one timestep are “a bit” correlated, if they aren’t correlated too much to the random numbers of the next timestep (and thereby to the next call of my kernel).
If that can be done without using shared memory it would be even better for me, because i can eventually need the shared mem for the other code… But I don’t know the differences of the two ways (registers / shared mem) you wrote about.

Philipp.

The danger here is that you are assuming that random number sequences with different seeds are uncorrelated in the same way that numbers within one sequence are. Depending on your RNG, this may or may not be true. This is why random number generation for parallel devices is an open area of research. As the previous poster mentioned, there are prescriptions for specific RNG algorithms on how to pick different seeds to minimize correlation.

Ok, I think I understand what you mean, that sounds logical.

It is the same reason because in so many years of work, “built in” random number generators (C rand(), Fortran random()) need to be avoided performing Monte Carlo. If you are simulating white noise in a signal, typically none will care about the quality of your generator. Or for the initial position of a N-body simulation. But with Monte Carlo, build-in generator are strongly discouraged… ehm, no, they must not be used!

Watch out for Metropolis, Markov chain bibliography. Remember that with Monte Carlo you are performing a real multidimensional integration to evaluate your statistical observables. If your PSEUDO random engine is not random enough your averages will be wrong.

A typical usage is to seed parametrizing the seed with the thread ID and with the current time. A more serious approach is to save the state of the generator at the end of each run and restart from it.

On the other side reseeding lagged fibonacci is good because you get indipendent sequences from any seed - no risk of overlap (you can easly generate 2^32 independent sequences - that is a subset of all the ones available). So you can also avoid saving to disk, for this generator.

However save/restart is used also because sometimes you need to check agains a previous run after having corrected a bug: using the same sequence allows you to double-check.

Actually restarting from one thread launch to the next is easy: just save the state of the generators at the end to global memory - read from it at the beginning. The host can do the same to disk.

I do not know if shared memory is persistent from run to run. Global is for sure, if not reallocated in between.

I think I do not understand - saying the random numbers within a timestep being a bit correlated, sounds like an Heresy for a Monte Carlo simulation… the positions will be correlated because you are performing Metropolis moves, but the moves must be uncorrelated, or your result will be wrong…, I hope you quickly realize that even if your are working with fuzzy things (random numbers) the underlying math is tight.

“Random numbers can not be chosen at random”.

Implementing it is not much difference… actually you can switch it later. However you are telling me that you are short of shared memory - and not of registers - so I will try with this approach first.

Wow, thanks for that explanation.

Do you have some kind of lecture notes (pdf) or can you advise a good book on that topic? Best thing would be with examples in C…

In general I’m very intrested in all that stuff, but as you have seen I no nothing at the moment ;)

I mean in principal it should be ok when all random values of one timestep are the same, because the “randomness” comes already with the huge amount of timesteps (and thereby consecutive random numbers), if you know what i mean. Only the noise gets too high then, so they should not be really the same, just “a bit”. If the values from different time-steps are truely independent that would be ok I guess…

Philipp.

This would all be so much easier if the GTX 285 also came with a lava lamp, or perhaps a lump of uranium. I should go put this on the suggestion list. :)

(Seriously, I’m kidding. Although it would be great to see “Now, with 2 grams of uranium in every box!” on the product packaging.)

just a fortran90 source… I totally removed Zz from my brain - it is unreadable! Ok, I started with it :D

googling I have found it here that explain the logic of the LFG:
[url=“http://www.ipp.mpg.de/~rfs/comas/Helsinki/helsinki04/CompScience/csep/csep1.phy.ornl.gov/rn/node20.html”]http://www.ipp.mpg.de/~rfs/comas/Helsinki/.../rn/node20.html[/url]

About Monte Carlo: I have some nice references tight to… molecular mechanics :D the logic is always the same, however.
Allen Tildesley is now really dated - even if monte carlo is the same. Sure I have more, but I suggest to you you find a good book related to your field… the tricks will be the same, but letting you read a Comp Chem book… I don’t think it will be the easiest way!

I remember that I liked this coddington review…
P.D. Coddington, “Monte Carlo tests of random number generators”,
NPAC technical report SCCS-526.

I am speaking of 10+ years ago! Probably you can find more recent stuff.
look at SPRNG - lot of material in it - references too, i think.

And here my paper…
[url=“http://www2.fci.unibo.it/~bebo/z/bbzcp%20quadrics_97.pdf”]http://www2.fci.unibo.it/~bebo/z/bbzcp%20quadrics_97.pdf[/url]
look at the year of the Metropolis pubblication… the last one in the references. But here no references to random numbers, just as a curiosity.

No, I really do not understand what you are saying here. Really, not clear.

But in principal it is ok when random numbers are random, either in time (no correlations within the same sequence) and in space (no correlations among different sequences). If your guess has not a mathematical proof, just forget it (or prove it!)

To let you understand why it is important being so tight: I worked with “cluster MC moves”, i.e. moving more than one particle per step. I would have guessed that you can just do it - it is random the same, isn’t it? No way. the acceptation rule needed to be changed consequently, because of the statistical mechanics math. Because of the math, making cluster moves was not useful for that specific case - but better not doing it, rather than doing it wrong.

I was testing the LFG code yesterday night - some more test this evening and I will post it - I do not want to post it wrong!

Done it!

I have attacched lfrng.zip - with the sources.
I have realized it to shared memory - I need to understand some more of cuda to do it with registers!
lfrng.zip (2.83 KB)