The way a block is split into warps is always the same; each warp contains threads of
consecutive, increasing thread IDs with the first warp containing thread 0.
Section 2.2.1 describes how thread IDs relate to thread indices in the block.
The issue order of the warps within a block is undefined, but their execution can be
synchronized, as mentioned in Section 2.2.1, to coordinate global or shared memory
The issue order of the blocks within a grid of thread blocks is undefined and there is
no synchronization mechanism between blocks, so threads from two different
blocks of the same grid cannot safely communicate with each other through global
memory during the execution of the grid.
If a non-atomic instruction executed by a warp writes to the same location in global
or shared memory for more than one of the threads of the warp, the number of
serialized writes that occur to that location and the order in which they occur is
undefined, but one of the writes is guaranteed to succeed. If an atomic instruction
(see Section 4.4.6) executed by a warp reads, modifies, and writes to the same
location in global memory for more than one of the threads of the warp, each read,
modify, write to that location occurs and they are all serialized, but the order in
which they occur is undefined.