In the document of GTC, after copying GMEM to SMEM with __pipeline_memcpy_async, __syncthreads() are executed.
Similalily, the sample code bellow, __syncthreads() are not executed after the __pipeline_memcpy_async.
In the code bellow, Is synchronization between threads in a block not executed?