I want to know how threads are assigned to warps in the case that the block size is odd or smaller than the warp size in terms of dimensions.
For example, let the block size be (x * y) 3x3. Does this translate to 1 warp or 3 warps with some threads disabled?
For odd number of threads in the x-dimension of the block, do we get wrap-around for warp assignment?
For example, if the block size is (x * y) 17x2, will the second warp have 1 thread in it or 16 threads with one thread coming from the first “row” and the next 15 coming from the second “row”?