The organization depends on how you setup the texture. You can bind a texture directly to global memory for 1D locality, or to a cudaArray for 1D, 2D or 3D locality.
It has been said that the new “pitch-linear” 2D texture bound to global memory still has 1D locality, I haven’t written a microbenchmark to test that for myself, yet.
Just what it says in the programming guide. The best use of the texture cache is to have spatially local accesses among the threads in each warp.
For longer and more explicit descriptions: search the forums for the many other posts on the texture cache by me.