Random Number Simulated Annealing

I alredy found some topic about this, but none of them helped. They are really confusing me.

That’s the problem: I need that each thread generate two random numbers before the real work. There is a device function for this? Like:

__device__ int random (){

...

return n;

}

I saw the MersenneTwister but I don’t think it is the answer for the case.

Thank you in advance for any response and sorry for the bad english!

Have you looked at this one?

[url=“http://forums.nvidia.com/index.php?showtopic=101860”]http://forums.nvidia.com/index.php?showtopic=101860[/url]

I thought that it was only for fortran. Thanks for the code, I have two more questions:

It is necessary to use the fuction “LFG175gmInit” every time I execute the kernel for random numbers, or just once?

and

How can I limit the range of the random numbers inside the kernel or device functions? For example: I need random numbers in that range: [1…50]

One more time: thanks for the code and for the answers, it helped a lot!

No, you need to call it just the very first time - it will generate the starting configuration in the rng status global memory.

Then, in every kernel you need to call LFG175gmLoadToShm() to copy the status from global to shared memory at the beginning and then

LFG175gmSaveFromShm() just before exiting the kernel to save the status reached by all the generators to global memory again: shared memory does not conserve its contents among kernel calls, global memory does.

However jjp pointed out an lrand48 generator, lighter than this one in terms of shared memory usage. For annealing it should be appropriate.

actually 1…64 would be better, guess why!

The generator generates positive integers in the range [0,0x7FFFFFFF], so to have it in the needed range [a,b]:

x = LFG175shm() * (b-a+1)/0x80000000 + a;

well, this as theory: the first multiplication will overflow integer capacity, and 0x80000000 is negative. To avoid overflow, let 2^m be the lowest 2 power, greater than (b-a+1), so, 2^6=64 in your example, 2^6.

It becomes:

x = (LFG175shm()>>m) * (b-a+1)/(0x80000000>>m) + a

working with the shifted numbers you won’t have overflow now, and negative numbers neither working with signed ints.

Having to do 1…64 (or 0…63: b-a+1=64 as well) you are done with:

x = LFG175shm() /(0x8000000>>6) + a

In case, if you or any get tempted to do x = LFG175shm() % 50 + a, NEVER DO IT. In fact in this way you would be taking the low bits and discarding the high ones, and low bits are generally much more correlated. The correct way is to take the high bits, that is, with division, not with modulo.

Well…I did it External Media ,

but now with your instructions I will fix the code. In the simulated annealing code I’m gonna make every thread run a single instance of the code (Don’t know if it is the best method, and I have some other ideas), and that means: every thread will have to generate two random numbers per “while”. So in each one is necessary to run the LFG175gmSaveFromShm() and the LFG175gmLoadToShm()?

Again, thanks for the answer.

I would suggest - generate random numbers on CPU and use it in GPU. – This way, you can compare CPU and GPU results (especially when random numbers are involved) first to confirm that your GPU code is sane.

This is what we intend to do for a similar ongoing project.

In the source I have posted there is both the GPU implementation and the CPU one: they are consistent. Only pity, I have not performed extensive testing on any of two for validating - validating one, will validate the other, since they give exactly the same sequences for the given seed.

I have noticed there are some problems in downloading attachments from the forum… I can not download my own sources from it! I am not the first to report this problem. If needed I will repost them. I would be very happy if you want to give a try to it.

External Media

not the first - look the point is that rng works fine expecially thanks to carries. In fact with LFG the lowest bit will be always the same sequence (all the sequences start from the “canonical form”, that has the lowest bits set the same), not being influenced by other lower bits carries. And in fact it is removed (I did it in the code, as recommended). The second bit, that is the lowest you are observing is a bit better and so on.

However, again, for annealing you do not need high quality generators - but this remain a real bad practice. Better a simple rand generator with high bits than an LFG100000 with low bits.

Uhm. I was trying to download my code to see my example, but I could not… some problem in the forum. This evening I will double check it at home, but if I remember fine, in the test kernel you can find the order of the things:

    [*]initialize the global memory with kernel “Init”

    [*]run computing kernel that will:

      [*] load RNG state to shared memory with “LoadToShm”

      [*] cycle over the ocmputing loop getting more and more numbers, that will update the shared memory registers

      [*] save RNG state to global memory with “SaveFromShm” and exit

Sigismondo,

The attachments are not working fine in the forum. I raise a topic to grab the attention of moderators… but still this has not been fixd.

btw,
RNGs usually use a “state” information – which makes the RNGeneration a perfect serial operation. (cumulative… one after another)…
So, I wonder how you circumvent this problem in your code. Can you explain? Thanks,

Yeah, i know… I already tryed to send a message to “Admin”. This problem is really annoying.

If they do not fix it soon, I will repost the code “inlined”.

Sure, I have set up the code so every thread has its own generator. The nice thing with lagged fibonacci RNG is that it is possibile to generate many independent sequences, because the number of bit combinations in the state is much larger than the length of the generator. I have been more detailed in the other thread:

http://forums.nvidia.com/index.php?showtopic=101860

Good thing about lgf17-5 is also the fact that it does need just 17 integers (per generator) to save the state. This allows to place many generators in 16KB. This is much better than Mersenne Twister - that needs about 4KB per generator. However Mersenne Twister is considered high standard - I think for its cryptographic characteristics. For Monte Carlo I used lgf17-5 successfully.

On the other side for annealing it should be enought any Linear Congruential Generator (awful for Monte Carlo, because of the hyperplanes, and even more awful for cryptography). It’s state fits just one integer - just seeding it differently on any thread will be sufficiently random for this class of problems.

In the code there is also a Park Miller (copied from a fortran code I found don’t know where… It can be written much more concisley - see wikipedia) generator (that is a particular LCG), that is used to generate the initial state of the generator (in the Init routine). For this task it can be used with any seed (I need just 16 numbers from it), but if I remember fine, it needs to be seeded correctly to avoid shorter subsequences, if used “as is”. Other LCGs do not have this limit.

Find some references on wikipedia:

http://en.wikipedia.org/wiki/Linear_congruential_generator

http://en.wikipedia.org/wiki/Park%E2%80%93Miller_RNG

And I report this again with the logic of Lagged Fibonacci initialization of independent sequences:

http://www.ipp.mpg.de/~rfs/comas/Helsinki/…/rn/node20.html

So, this was my problem. WHen you have independent generators, you really cannot compare your GPU implementation (that uses these random numbers) with the CPU implementation – for correctness…

For the initial portion of our project, we intend to generate random numbers in the CPU and use them in the correct order in the GPU kernel (i.e. mapping them to threads) – Once we know that our algorithm works fine, I think we can move on to GPU generated random numbers…

I am bookmarkng the link. I will revisit when I need this. Thanks!

I’m trying to make the Simulated Annealing spend most of it’s time on GPU and random numbers are necessary in each iteration, so I don’t think it’s a good idea to pass the random numbers from CPU.

I took some time to test the code you suggested, but finally I did. Indeed when using the: x = LFG175shm() % 50 + a, in each iteration, the numbers generated are the same:

LFG175gmLoadToShm (gfibreg);

while (T > 0.0001){

	while ((pos1 == 0) || (pos2 == 0) || (pos1 == *municipios) || (pos2 == *municipios) || (pos1 == pos2)){

		pos1 = LFG175shm() % 50 + 1;

		pos2 = LFG175shm() % 50 + 1;

	}

	LFG175gmSaveFromShm(gfibreg);

	

	int temp;

	temp = solutionM[pos1];

	solutionM[pos1] = solutionM[pos2];

	solutionM[pos2] = temp;

	(...)

	T = T * 0.9999;

}

LFG175gmSaveFromShm(gfibreg);

The way it is, the code simply change the same two positions when it had to change two random positions. So I put the code:

while ((pos1 == 0) || (pos2 == 0) || (pos1 == *municipios) || (pos2 == *municipios) || (pos1 == pos2)){

	pos1 = LFG175shm()/(0x8000000>>6) + 1;

	pos2 = LFG175shm()/(0x8000000>>6) + 1;

}

but the numbers are greater than 64, and are always the same. There’s a way to fix it, sigismondo?

Again, thanks for the answers!

Uhhh… :wacko:

Are you sure you have correctly initialized the generator as in the test? Obviously I am going to double check it this evening.

…I forgot to initialize the pos1 and pos2 in each iteration! >.<

Well, it’s working now, but I need to find a mutex. I know that it will slow down my program, but I think it’s the only choice.

fiuuuuuuh :rolleyes:

Actually yestarday I had not checked it… this is a good new.

For the mutex, I can not be of great help: maybe you can start a new thread in the forum.