From NVIDIA CUDA programming guide 3.0 section 220.127.116.11.1
“Any address of a variable residing in global memory or returned by one of the memory allocation routines from the driver or runtime API is always aligned to at least 256 bytes.”
It basically says addresses in global memory are aligned 256B while memory coalescing says “The size of a memory transaction can be: 32B, 64B or 128B”. So if memory access is coalesced as 64B, how can it access a 256B aligned address?
in global memory:
64B | 64B | 64B | 64B ------------->256B aligned address
now I have a coalesced memory access to above 2nd 64B, since address in global memory has been coalesced at the 1st 64B, it seems there is no way to access the 2nd 64B.
Would anyone like to clear my doubts?