so far I have no idea whether threads involved in a warp are launched as row priority or column priority.
Threads and blocks are numbered in column-major order. The order they are run in is undefined.
but if so,how can i map threads to memory to fit coalescing
OK, perhaps I should have given a slightly fuller answer in the second sentence: The order in which warps of threads run in is undefined. Coalescing is a warp scale phenomena, so any threads who share the same quotient when divided by 32 (the warp size) are in the same warp. The coalescing rules apply to threads in the same warp.
i got it