multidimensional blocks and warps

The documentation indicates that the programmer defines blocks, and blocks are divided in warps. What I can’t find in the documentation is what logic is used to decide which 32 threads from a block constitue a warp.

For example, if I have a 2D block of (32x32), are are do each warp contain threads for a block raw?

http://docs.nvidia.com/cuda/cuda-c-programming-guide/#thread-hierarchy