__syncthreads and thread-scheduling Scheduling intelligence

I want to know what is the cost of a “__syncthreads” operation.

If lets say , a WARP (all threads in the WARP) execute __syncthreads() , does the WARP scheduler MARK this WARP as in WAITING state and never schedules until all threads in the block reach that place???


Does __syncthreads cause the WARP to do some idle-looping until all threads sync together ???

Basically, I want to know how intelligent or dumb, the WARP scheduler is.

I recomend Listening and viewing the Slides of Dr. Hwu lecture on Threading Hardware Fall Lecture 7. See Faq Question 11. My interpretation is that once a warp has reached the sync point is that it is no longer elgible for execution. Priority is given to the least recently executed warp elgible for execution until all warps have reached the sync point and all warps in the block are released for sceduling