In-place reduction and block ordering Why does it work?

I’m confused by the reduction example.

It appears to switch to in-place reduction after the first kernel launch. I don’t understand why this works since the order of block execution is not guaranteed.

For example, consider the trivial case of two blocks of two threads. If block 1 executes first, it will read locations 2 and 3 and then write the sum to location 1. Then block 0 will execute and read locations 0 and 1, writing it’s sum to location 0. However, the value read by block 0 will be the sum from block 1, and hence the calculation will be incorrect.

What am I missing?

Shameless bump… :)

It looks like you are right. It may be that this never happens in practice because blocks with lowest blockIdx.x are schedeuled first, but that behaviour is undefined and therefore it looks indeed like a bug in the example. At least in my quick scan of the code I could not see where this is prevented.

Thanks Denis. I was hoping I had overlooked some trick - it would have saved some memory for me. I’m using a type of “double buffering” to get around the (theoretical) scheduling issue.

You can also let thread zero write to [blockIdx.x * blockSize.x*2] (if you add 2 values in the beginning of the kernel) the values that you need to add afterwards are then not adjacent, but if you only need to add a few of them it might not be an issue (the kernel-call overhead will be dominant compared to the non-coalesced read afterwards)

Thanks Dennis. I have used the “thread zero” trick a few times. In this particular case, though, I was looking for a way to eliminate a second buffer when collapsing a large array of complex numbers down to an array of (real) magnitudes, as well as cases where I am summing up large arrays. In the case of the magnitudes, I wanted to collapse them into a linear array to allow for coallescing of memory access in later steps. I could just leave a one float gap between each number and waste the bandwidth, or I could move it into a secondary array. For now I’ve chosen the latter.