Usually it is better to have separate memory for input and output, as atomic operations are expensive. And for floating point data, atomic operations create the additional nuisance that rounding errors suddenly depend on the specific timing of each execution.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
how to avoid race condition? | 7 | 5511 | October 23, 2009 | |
help race conditions | 9 | 15196 | January 14, 2012 | |
Possible race condition? | 4 | 1293 | October 4, 2010 | |
Racing Condition problem | 7 | 803 | March 18, 2011 | |
CUDA Memory Consistency | 23 | 55719 | March 8, 2007 | |
Correct use of _threadfence() to remove the RAW race Cannot remove race condition | 14 | 3805 | April 23, 2012 | |
Race Conditions | 11 | 6208 | June 1, 2010 | |
Synchronization, threadfence, random memory access beginner questions | 7 | 2670 | April 9, 2012 | |
Race condition? | 6 | 8239 | December 5, 2009 | |
Q: read/writing data by multiple threads | 4 | 2382 | July 15, 2009 |