Cuda Basic

Hi every one and thank you for your help.
My question is a follow: Why divide threads in blocks? what the idea behind it?

Dividing threads into blocks is how CUDA forces the programmer to express data locality. Threads in the same block can easily communicate through shared memory and synchronize execution with barriers. Threads in different blocks can only communicate via global memory.

This split between threads that can communicate directly and threads that cannot allows for more efficient hardware designs.