thread synchronization in reduce

im very new in cuda and im starting to learn the basics
i want to ask about thread synchronization how its happened in reduce algorithm bettween multiple blocks?

Thanks for pointing out that you are new. Thread synchronization happen only inside a block and not in multiple blocks. This topic should be moved I believe to a relevant folder.

The only way to sync the blocks is to split the work in 2 kernels. There are ways around it, depending on the situations, by using atomic functions (a certain memory is locked until operations are done) and/or threadence functions (blocks all execution till the result of the previous lines are visible in the gloabl memory). These alternatives result in non-coalesced write accesses so they might give or not improvement over dividing the work in 2 or more kernels and have coalesced accesses.