No, you need to call it just the very first time - it will generate the starting configuration in the rng status *global* memory.

Then, in every kernel you need to call LFG175gmLoadToShm() to copy the status from global to shared memory at the beginning and then

LFG175gmSaveFromShm() just before exiting the kernel to save the status reached by all the generators to global memory again: shared memory does not conserve its contents among kernel calls, global memory does.

However jjp pointed out an lrand48 generator, lighter than this one in terms of shared memory usage. For annealing it should be appropriate.

actually 1…64 would be better, guess why!

The generator generates positive integers in the range [0,0x7FFFFFFF], so to have it in the needed range [a,b]:

x = LFG175shm() * (b-a+1)/0x80000000 + a;

well, this as theory: the first multiplication will overflow integer capacity, and 0x80000000 is negative. To avoid overflow, let 2^m be the lowest 2 power, greater than (b-a+1), so, 2^6=64 in your example, 2^6.

It becomes:

x = (LFG175shm()>>m) * (b-a+1)/(0x80000000>>m) + a

working with the shifted numbers you won’t have overflow now, and negative numbers neither working with signed ints.

Having to do 1…64 (or 0…63: b-a+1=64 as well) you are done with:

x = LFG175shm() /(0x8000000>>6) + a

In case, if you or any get tempted to do x = LFG175shm() % 50 + a, NEVER DO IT. In fact in this way you would be taking the low bits and discarding the high ones, and low bits are generally much more correlated. The correct way is to take the high bits, that is, with division, not with modulo.