Random Walk

First of all thanks, for help a beginner

It’s been awhile that I’m trying to do a Random Walk generating on GPU using CUDA C.
But until now I couldn’t get a high performance.

If somebody know this problem please help me.
I’m having troubles mainly in see how could I parallel this problem without use atomic function.

Thanks for read

prefix sum may be the way to go.
intro: http://en.wikipedia.org/wiki/Prefix_sum
comprehensive guide with specific reference to Fermi GPUs: http://www.moderngpu.com/intro/scan.html#ScanOnGPU