yes, I know, that __threadfence_block() doesn’t sync instead of __syncthreads… but… what’s it really for? The example program for __threadfence makes sense, but does anyone know a reasonable application for __threadfence_block()?
Another question: Does a __syncthreads() barrier superseeds additional usage of __threadfence_block(), or do I have to use both to ensure that some changes of shared memory reach the other threads of the block before continuing?