I am an absolute beginner in CUDA. I plan to write a CUDA version of a stochastic optimization problem for which I require to generate large number of floats in the interval (0,1). Sadly I have no idea about random number generation algorithms and can’t understand how to implement the CUDA SDK MT example. All I require is a device function that would help me generate a large number of random floats from (0,1) using the GPU. I would be very grateful if someone can give me a function that generates random floats.

Most of the examples I see are generating a state for each thread.
That seems impractical for may applications, like mine, where I typically spawn around 4 million threads.
Most of us just want some (mostly) independent random draws.
I would think that if we could attach one random number generator state to each core,
it might work just fine… if threads do not have race conditions on the RNG state,
when each core has its own state vector.

i.e. If my hard has 512 CUDA cores, can I not just set up 512 random number generator state
vectors instead of 4 million?