Hello everyone,

In an attempt to optimize our kernel we tried to make sure that our global memory loads are coherent/coalesced. Below is a small snippet of (pseudo-)code:

```
__global__ void kernel(){
// index calculations.
T[index] = C1*T[index-1]+C2*T[index+1]+C3*T[index-dimx]
+ C4*T[index+dimx]+C5*T[index-dimx*dimy]+C6*T[index+dimx*dimy]
+ C7*T[index];
}
```

Now our conclusion was that the associated loads can never be coherent, at least not all of them. This can be seen by the fact that if the T[index] loads would be coherent, so having a BaseAdress which is a multiple of 16, T[index-1] can not be coherent.

Is this correct reasoning or are we missing something here?

Jordy