I understand that coalesced memory access can be used at best performance when tid(thread id) is used to access the data. But in the case of accessing linear memory (containing a 3D image data) how can coalescing be interpreted??
As in the following case.
unsigned int i = blockIdx.x * blockDim.x + threadIdx.x;
unsigned int j = blockIdx.y * blockDim.y + threadIdx.y;
unsigned int k = blockIdx.z * blockDim.z + threadIdx.z;
size_t index= i + (j * resX) + (k * frameSize);
imageMap[index] = maxValue;
If the above access turns out to be an inefficient way , what other approaches can be followed??