Surfaces and memory coalescing

Can anyone point me in the direction of information on memory coalescing (or optimizing cache efficiency) when writing to surfaces? I’m primarily interested in surfaces bound to 2D and 3D CUDA arrays. As far as I know the storage format for CUDA arrays is undocumented other than that it is optimised for 2D or 3D locality. Is that therefore all the information available on the optimum way to write to them?

Edit: I’ve just noticed that 3D surfaces don’t seem to exist (yet?).

It is nvidia secret how the textures are written.

A comment years ago in the forum when this came up indicated that the layout is kind of like a Z-order (or Morton-order) curve:

But as was mentioned, the details are completely unspecified, and may even change between generations of devices for all we know.