Occoured to me that if only threads of the same warp can execute at the same time then this might be used to prevent two threads updating the same shared (or global) data
So a simple case
[codebox]if ( threadIdx.x == 0 || threadIdx.x == 32 )
{
int targ = threadIdx.x%32;
sharedData[targ] = sharedData[targ] + somenumber;
}[/codebox]
Will this work or is it expanded into
[codebox]
if ( threadIdx.x == 0 || threadIdx.x == 32 )
{
int targ = threadIdx.x%32;
float reg = sharedData[targ]; // step 1
reg = reg + somenumber; // step 2
sharedData[targ] = reg; // step 3
}
[/codebox]
with the possibility that the sequence of operations could become
warp A step 1
warp A step 2
warp B step 1 // ← warp B reads from shared before warp A does step 3 so a failure
warp B step 2
warp A step 3
warp B step 3
PS am aware it relies on card/compiler behaviour not changing.