__threadfence_block vs. __syncthreads

Hello,

yes, I know, that __threadfence_block() doesn’t sync instead of __syncthreads… but… what’s it really for? The example program for __threadfence makes sense, but does anyone know a reasonable application for __threadfence_block()?

Another question: Does a __syncthreads() barrier superseeds additional usage of __threadfence_block(), or do I have to use both to ensure that some changes of shared memory reach the other threads of the block before continuing?

__syncthreads is enougth to guarantee that all threads within a block reached the this barrier and that all memory writes are visible to all threads.

__threadfence(-block()) is rarely used (compared to __syncthreads), but there is at least one example in the sdk.