Pitched linear memory is just a linear memory allocation calculated from the 2D sizes you provide, with padding added as required to ensure row major access will be correctly aligned for coalesced memory access.
Yes. You still want coalesced reads for optimum performance. It’s just that the hardware on the G200 GPUs makes the penalty for not coalescing much lower (it automatically works out the minimum set of coalesced reads required to satisfy the half-warp’s request). One of the key requirements for coalescing is alignment.
But after pitched memory copy to the device, is the data still in format of 2D array or just 1D linear array? When I want to access the 2D array on device, how can I use 2D threads structure (threadIdx.x, threadIdx.y) to index and process the data?