Random Number Simulated Annealing

clockmaster · September 11, 2009, 1:46am

I alredy found some topic about this, but none of them helped. They are really confusing me.

That’s the problem: I need that each thread generate two random numbers before the real work. There is a device function for this? Like:

__device__ int random (){

...

return n;

}

I saw the MersenneTwister but I don’t think it is the answer for the case.

Thank you in advance for any response and sorry for the bad english!

sigismondo · September 11, 2009, 12:17pm

Have you looked at this one?

[url=“http://forums.nvidia.com/index.php?showtopic=101860”]http://forums.nvidia.com/index.php?showtopic=101860[/url]

clockmaster · September 13, 2009, 2:26am

I thought that it was only for fortran. Thanks for the code, I have two more questions:

It is necessary to use the fuction “LFG175gmInit” every time I execute the kernel for random numbers, or just once?

and

How can I limit the range of the random numbers inside the kernel or device functions? For example: I need random numbers in that range: [1…50]

One more time: thanks for the code and for the answers, it helped a lot!

sigismondo · September 13, 2009, 9:48am

No, you need to call it just the very first time - it will generate the starting configuration in the rng status global memory.

Then, in every kernel you need to call LFG175gmLoadToShm() to copy the status from global to shared memory at the beginning and then

LFG175gmSaveFromShm() just before exiting the kernel to save the status reached by all the generators to global memory again: shared memory does not conserve its contents among kernel calls, global memory does.

However jjp pointed out an lrand48 generator, lighter than this one in terms of shared memory usage. For annealing it should be appropriate.

actually 1…64 would be better, guess why!

The generator generates positive integers in the range [0,0x7FFFFFFF], so to have it in the needed range [a,b]:

x = LFG175shm() * (b-a+1)/0x80000000 + a;

well, this as theory: the first multiplication will overflow integer capacity, and 0x80000000 is negative. To avoid overflow, let 2^m be the lowest 2 power, greater than (b-a+1), so, 2^6=64 in your example, 2^6.

It becomes:

x = (LFG175shm()>>m) * (b-a+1)/(0x80000000>>m) + a

working with the shifted numbers you won’t have overflow now, and negative numbers neither working with signed ints.

Having to do 1…64 (or 0…63: b-a+1=64 as well) you are done with:

x = LFG175shm() /(0x8000000>>6) + a

In case, if you or any get tempted to do x = LFG175shm() % 50 + a, NEVER DO IT. In fact in this way you would be taking the low bits and discarding the high ones, and low bits are generally much more correlated. The correct way is to take the high bits, that is, with division, not with modulo.

clockmaster · September 16, 2009, 2:02am

Well…I did it External Media ,

but now with your instructions I will fix the code. In the simulated annealing code I’m gonna make every thread run a single instance of the code (Don’t know if it is the best method, and I have some other ideas), and that means: every thread will have to generate two random numbers per “while”. So in each one is necessary to run the LFG175gmSaveFromShm() and the LFG175gmLoadToShm()?

Again, thanks for the answer.

Sarnath · September 16, 2009, 5:32am

I would suggest - generate random numbers on CPU and use it in GPU. – This way, you can compare CPU and GPU results (especially when random numbers are involved) first to confirm that your GPU code is sane.

This is what we intend to do for a similar ongoing project.

sigismondo · September 16, 2009, 9:01am

In the source I have posted there is both the GPU implementation and the CPU one: they are consistent. Only pity, I have not performed extensive testing on any of two for validating - validating one, will validate the other, since they give exactly the same sequences for the given seed.

I have noticed there are some problems in downloading attachments from the forum… I can not download my own sources from it! I am not the first to report this problem. If needed I will repost them. I would be very happy if you want to give a try to it.

sigismondo · September 16, 2009, 9:32am

External Media

not the first - look the point is that rng works fine expecially thanks to carries. In fact with LFG the lowest bit will be always the same sequence (all the sequences start from the “canonical form”, that has the lowest bits set the same), not being influenced by other lower bits carries. And in fact it is removed (I did it in the code, as recommended). The second bit, that is the lowest you are observing is a bit better and so on.

However, again, for annealing you do not need high quality generators - but this remain a real bad practice. Better a simple rand generator with high bits than an LFG100000 with low bits.

Uhm. I was trying to download my code to see my example, but I could not… some problem in the forum. This evening I will double check it at home, but if I remember fine, in the test kernel you can find the order of the things:

[*]initialize the global memory with kernel “Init”

[*]run computing kernel that will:

[*] load RNG state to shared memory with “LoadToShm”

[*] cycle over the ocmputing loop getting more and more numbers, that will update the shared memory registers

[*] save RNG state to global memory with “SaveFromShm” and exit

Sarnath · September 17, 2009, 4:52am

Sigismondo,

The attachments are not working fine in the forum. I raise a topic to grab the attention of moderators… but still this has not been fixd.

btw,
RNGs usually use a “state” information – which makes the RNGeneration a perfect serial operation. (cumulative… one after another)…
So, I wonder how you circumvent this problem in your code. Can you explain? Thanks,

sigismondo · September 17, 2009, 7:57am

Yeah, i know… I already tryed to send a message to “Admin”. This problem is really annoying.

If they do not fix it soon, I will repost the code “inlined”.

Sure, I have set up the code so every thread has its own generator. The nice thing with lagged fibonacci RNG is that it is possibile to generate many independent sequences, because the number of bit combinations in the state is much larger than the length of the generator. I have been more detailed in the other thread:

http://forums.nvidia.com/index.php?showtopic=101860

Good thing about lgf17-5 is also the fact that it does need just 17 integers (per generator) to save the state. This allows to place many generators in 16KB. This is much better than Mersenne Twister - that needs about 4KB per generator. However Mersenne Twister is considered high standard - I think for its cryptographic characteristics. For Monte Carlo I used lgf17-5 successfully.

On the other side for annealing it should be enought any Linear Congruential Generator (awful for Monte Carlo, because of the hyperplanes, and even more awful for cryptography). It’s state fits just one integer - just seeding it differently on any thread will be sufficiently random for this class of problems.

In the code there is also a Park Miller (copied from a fortran code I found don’t know where… It can be written much more concisley - see wikipedia) generator (that is a particular LCG), that is used to generate the initial state of the generator (in the Init routine). For this task it can be used with any seed (I need just 16 numbers from it), but if I remember fine, it needs to be seeded correctly to avoid shorter subsequences, if used “as is”. Other LCGs do not have this limit.

Find some references on wikipedia:

http://en.wikipedia.org/wiki/Linear_congruential_generator

http://en.wikipedia.org/wiki/Park%E2%80%93Miller_RNG

And I report this again with the logic of Lagged Fibonacci initialization of independent sequences:

http://www.ipp.mpg.de/~rfs/comas/Helsinki/…/rn/node20.html

Sarnath · September 17, 2009, 8:14am

So, this was my problem. WHen you have independent generators, you really cannot compare your GPU implementation (that uses these random numbers) with the CPU implementation – for correctness…

For the initial portion of our project, we intend to generate random numbers in the CPU and use them in the correct order in the GPU kernel (i.e. mapping them to threads) – Once we know that our algorithm works fine, I think we can move on to GPU generated random numbers…

I am bookmarkng the link. I will revisit when I need this. Thanks!

clockmaster · September 17, 2009, 1:22pm

I’m trying to make the Simulated Annealing spend most of it’s time on GPU and random numbers are necessary in each iteration, so I don’t think it’s a good idea to pass the random numbers from CPU.

clockmaster · September 21, 2009, 1:38am

I took some time to test the code you suggested, but finally I did. Indeed when using the: x = LFG175shm() % 50 + a, in each iteration, the numbers generated are the same:

LFG175gmLoadToShm (gfibreg);

while (T > 0.0001){

	while ((pos1 == 0) || (pos2 == 0) || (pos1 == *municipios) || (pos2 == *municipios) || (pos1 == pos2)){

		pos1 = LFG175shm() % 50 + 1;

		pos2 = LFG175shm() % 50 + 1;

	}

	LFG175gmSaveFromShm(gfibreg);

	

	int temp;

	temp = solutionM[pos1];

	solutionM[pos1] = solutionM[pos2];

	solutionM[pos2] = temp;

	(...)

	T = T * 0.9999;

}

LFG175gmSaveFromShm(gfibreg);

The way it is, the code simply change the same two positions when it had to change two random positions. So I put the code:

while ((pos1 == 0) || (pos2 == 0) || (pos1 == *municipios) || (pos2 == *municipios) || (pos1 == pos2)){

	pos1 = LFG175shm()/(0x8000000>>6) + 1;

	pos2 = LFG175shm()/(0x8000000>>6) + 1;

}

but the numbers are greater than 64, and are always the same. There’s a way to fix it, sigismondo?

Again, thanks for the answers!

sigismondo · September 21, 2009, 12:56pm

Uhhh… :wacko:

Are you sure you have correctly initialized the generator as in the test? Obviously I am going to double check it this evening.

clockmaster · September 22, 2009, 12:55am

…I forgot to initialize the pos1 and pos2 in each iteration! >.<

Well, it’s working now, but I need to find a mutex. I know that it will slow down my program, but I think it’s the only choice.

sigismondo · September 22, 2009, 10:20am

fiuuuuuuh :rolleyes:

Actually yestarday I had not checked it… this is a good new.

For the mutex, I can not be of great help: maybe you can start a new thread in the forum.

Topic		Replies	Views
Another Random Number Thread CUDA Programming and Performance	15	3450	August 13, 2009
random number generation generating random numbers in CUDA CUDA Programming and Performance	64	109955	January 25, 2011
random numbers inside the Kernel CUDA Programming and Performance	31	61058	November 26, 2011
What's a good random number generator? CUDA Programming and Performance	21	13288	May 6, 2009
random number generators CUDA Programming and Performance	12	7293	August 7, 2008
random number generation on each thread? random number generation rng CUDA Programming and Performance	3	6288	August 26, 2008
simple rand() a simple rand function on gpu exist? CUDA Programming and Performance	9	5902	June 1, 2009
Monte Carlo Transport CUDA Programming and Performance	14	15852	August 17, 2007
Mersenne Twister SDK: what's going on? CUDA Programming and Performance	10	16974	January 19, 2011
MersenneTwister application one more question MersenneTwister application one more.... CUDA Programming and Performance	0	2513	May 30, 2008

Random Number Simulated Annealing

Related topics