The document gives the following reasons for not allowing Block synchronisation in CUDA.
Can anyone please elaborate and explain both the reasons…
*Expensive to build in hardware for GPUs with high processor count.
*Would force programmer to run fewer blocks (no more than #multiprocessors * # resident blocks / multiprocessor) to avoid deadlock, which may reduce overall efficiency.