In CUDA Programming Guide 0.8, it was stated that:
But, in the new version (0.8.1), this was replaced by:
I am confused (and frankly very concerned) about this change. I am wondering if someone from NVIDIA could shed some light on it! In my code, I rely a lot on writing to the same location in global memory by more than one thread and it works as I want it to do. In my case, when more than one thread write to the same memory location, they all write the same value. The problem is that, I do not know in advance which one will write it. So, I let all eligible threads to write the value, and, based on the old documentation, I was guaranteed that at least one of them will make it.
Specifically, I have the following questions:
What is the meaning of “one or more of the threads” in the new doc? How can that be applied to one thread only?
Does this change in the documentation mean that nVidia will stop supporting this feature? Or, they found it malfunctioning? If it is the latter, is there any plan to make this work in subsequent releases?
Do you have any recommendations to turn around this problem in the situation explained above (when all threads write the same value).
Thanks in advance!