CURAND question

hi all!

toolkit 3.2 is still quite new, so there isnt much about some new features, especially CURAND lib.

Here is my problem - i am implementing ant colony optimizaton algorithm for detecting image edges. If you are familiar with ACO, u know that u have to generate random number quite often (each ant in each step decides where he will move, based on probaility and some info → this ‘probability’ is random number).

For example, i have colony of 1024 ants, walking 512 steps. So, i have to call random number generator 512 * 1024 times.

Ok, CURAND provides functions for setting up and generating rand numbers. I am not clear about how it works, and paragraph in CURAND doc which is describing usage, confuses me more.

Here it is:

"For optimal parallel pseudorandom number generation, each experiment

should be assigned a unique seed. Within an experiment, each thread of

computation should be assigned a unique sequence number. If an experi-

ment spans multiple kernel launches, it is recommended that threads between

kernel launches be given the same seed, and sequence numbers be assigned

in a monotonically increasing way. If the same configuration of threads is

launched, random state can be preserved in global memory between launches

to avoid state setup time."

Last sentences troubles me - it says that only thing that I need is to set up (curand_init()) once, and after each kernel launch store state in global memory?

eg.

//setup kernel,  curand_init parameters taken from some thread here

//seed == (unsigned)time(0)

__global__ void setupKernel(curandState *state, unsigned long seed){

    int id = threadIdx.x + blockIdx.x * blockDim.x;

    curand_init((seed << 20) + id, id, 0, &state[id]);

}
//example kernel which is invoken many times (500)

__global__ void randKernel(curandState *state){

	int id = threadIdx.x + blockIdx.x * blockDim.x;

	curandState localState = state[id];

	curand_uniform(&localState); //dummy call, just to be here

	state[id] = localState;

}

//something like this is in docs, but i dont want to just use it, want to understand it

So, is this good practice? What is happening with states after each randKernel() call? What ‘state’ actualy represents? Is it enough to set up, call curand_init only once, and then curand_x calls somehow updates states, and in next call curand_x uses that updated state, and update it again itself?

thanks

sry for bad english, not my first language

"If an experiment spans multiple kernel launches, it is recommended that threads between

kernel launches be given the same seed, and sequence numbers be assigned

in a monotonically increasing way. If the same configuration of threads is

launched, random state can be preserved in global memory between launches

to avoid state setup time."

This means that for example, each thread is used to generate 1000 random numbers, then you can save the final state in global memory so that your next kernel launch can proceed from that state rather than setting up the CURAND state again which is slow and performance drops with increasing OFFSET that you would need to give while setting it up, which could increase exponentially depending upon how many nos. you generate in one kernel launch.