I want to know what is the cost of a “__syncthreads” operation.
If lets say , a WARP (all threads in the WARP) execute __syncthreads() , does the WARP scheduler MARK this WARP as in WAITING state and never schedules until all threads in the block reach that place???
Does __syncthreads cause the WARP to do some idle-looping until all threads sync together ???
Basically, I want to know how intelligent or dumb, the WARP scheduler is.