Can a developer expect that warps are constructed of consecutive threads in groups whose base thread index is some multiple of 32?
warpID = threadID div 32. = threadID mod 32
threadID = threadIdx.x for 1D thread blocks
threadID = threadIdx.y * blockDim.x + threadIdx.x for 2D thread blocks
threadID = threadIdx.z * blockDim.x * blockDim.y + threadIdx.y * blockDim.x + threadIdx.x for 3D thread blocks.