Hi, I have a question, I have a data structure (containing float values) that i have numerous threads reading and writing.
Sometimes, this data structure will have to be read and written to by multiple threads. The threads will of course, interfere with each other, as they have no idea when the other threads will be reading/writing from it.
How can I go about preventing these race conditions? I know that any lock will be slow because of course, you are blocking potentially a large number of threads. However, the order that each thread read/writes into the data structure is highly structured, and these kinds of conflicts should actually happen fairly irregularly.
Atomic operations (like addition) are available in CUDA. Unfortunately, you mention the data structure you are operating on includes floats, and there are no atomic operations on floats in CUDA. (My suspicion is that atomic operations are implemented in the memory controller, and putting a full FPU there was too much.)
Many people have used the atomic operations to construct mutexes and semaphores (which is where it sounds like you are going), but those tend to be complicated and error-prone if you are not careful.
The best approach is to see if you can design the algorithm to avoid the need to have multiple threads write to the same location (this might take multiple passes). This isn’t always possible, of course, but it is worth thinking about.