to optimize the memory access of my kernels I would like to understand how the warps are scheduled on the SMs.
In the Programming Guide it says:
My quesion now is:
Suppose I invoke a two dimensional kernel execution.
What does “consecutive, increasing thread IDs” mean? Consecutive in which dimension?
I need to know that, because I would like to know which IDs are scheduled to run concurrently, in order to optimize the memory accesses.