The documentation indicates that the programmer defines blocks, and blocks are divided in warps. What I can’t find in the documentation is what logic is used to decide which 32 threads from a block constitue a warp.
For example, if I have a 2D block of (32x32), are are do each warp contain threads for a block raw?