Yes, you can use it in cuda. see Compiler Explorer
It uses the barrier ptx instruction 1. Introduction — parallel-thread-execution 8.1 documentation
Its similar to __syncthreads()
Yes, you can use it in cuda. see Compiler Explorer
It uses the barrier ptx instruction 1. Introduction — parallel-thread-execution 8.1 documentation
Its similar to __syncthreads()