problem about the GPU thread

hello, everyone
I have a problem about the GPU thread.

After computing by each GPU thread, I need to add and store(operator +=) the result with one variable which is in device memory,
we all know that, without thread control(synch), the parallel threads write the datas down the same address, the result will not be correct.

Is there any solution for that the GPU thread can add the data in the same device memory address one by one(serialized)?

For integers, you can use the atomicAdd() function. There is no corresponding atomic function for floats, but you can also do a parallel reduction, which can be faster than using atomics:…c/reduction.pdf

thank you very much