I am running into a problem where all threads in a warp write the same value to the same location in shared memory. A kernel containing just this illustrates the problem:
sharedValue = 0;
This ends up being serialized which is wasteful (especially in a loop), but since they all write the same value its possible to not serialize the writes at all. Obviously I could have a simple if-statement so only one of the threads in the warp writes the value, but then I must insert a sync or fence so that the other threads in the warp will see the resulting write.
if (thread == 0) sharedValue = 0;
This too is not necessary since if they could all write the same value, then they could “immediately” use it rather than waiting for a sync/fence to complete.
Is there any other way to write the same value to a location by all threads in a warp without causing a serialize? Thanks for your help!