hello, everyone
I have a problem about the GPU thread.
After computing by each GPU thread, I need to add and store(operator +=) the result with one variable which is in device memory,
we all know that, without thread control(synch), the parallel threads write the datas down the same address, the result will not be correct.
Is there any solution for that the GPU thread can add the data in the same device memory address one by one(serialized)?