Hi, I am learning Cuda and trying to convert an existing program to GPU runable. The program is basically a Monte Carlo program but each thread needs to update the result after during computation.
My question is: is there a potential conflict if different threads trying to accumulate values to the same global variable? If yes, how to avoid the problem. I am thinking setting up the result variable for each individual thread but this method will consume a lot of memory space. Is there a better solution?
thanks, that makes a lot of sense. One practical question, I just the atomic functions, it seems like there is no one deals with floating point numbers. How should I get around of that? Thanks a lot.