block synchronisation

The document gives the following reasons for not allowing Block synchronisation in CUDA.
Can anyone please elaborate and explain both the reasons…

*Expensive to build in hardware for GPUs with high processor count.

*Would force programmer to run fewer blocks (no more than #multiprocessors * # resident blocks / multiprocessor) to avoid deadlock, which may reduce overall efficiency.

Search the forums, this has been gone over many, many times.