Interpretation of Coalesced Global memory access for 3d Block Is it coalesced only if tid is used??

I understand that coalesced memory access can be used at best performance when tid(thread id) is used to access the data. But in the case of accessing linear memory (containing a 3D image data) how can coalescing be interpreted??

As in the following case.

unsigned int i = blockIdx.x * blockDim.x + threadIdx.x; 

	unsigned int j = blockIdx.y * blockDim.y + threadIdx.y;

	unsigned int k = blockIdx.z * blockDim.z + threadIdx.z; 

size_t index= i + (j * resX) + (k * frameSize);

        imageMap[index] = maxValue;

If the above access turns out to be an inefficient way , what other approaches can be followed??

I have gone through the strided Access section in the ‘CUDA Best Practices’ guide.
I need a deeper understanding of the same.

Awaiting a quick reply.

If you can imagine linearly aligned memory

// first slide of the cube

0 1 2 3 

4 5 6 7

8 9 10 11

12 13 14 15

// second slide of the cube

16 17 18 19

20 21 22 23

24 25 26 27

28 29 30 31

// linearly arranged

 0 1 2 3 4 ... 12 13 14 15 16 17 ... 28 29 30 31

and keep in mind that x is the fastest varying dimension (then y, then z). Your index scheme:

size_t index= i + (j * resX) + (k * frameSize); // resX = width; frameSize = number of slides.

has a coalesced pattern to the data.

which is satisfied in your example. Note there are other conditions and some of them have been relaxed for Compute Capability 1.2 and above.

I recommend see Section G.3.2 Global Memory of NVIDIA CUDA C Programming Guide, v.3.2.

Hope this help.

PS: Any review is welcome.