256B aligned address in global memory?

Hi, all,
From NVIDIA CUDA programming guide 3.0 section

“Any address of a variable residing in global memory or returned by one of the memory allocation routines from the driver or runtime API is always aligned to at least 256 bytes.”

It basically says addresses in global memory are aligned 256B while memory coalescing says “The size of a memory transaction can be: 32B, 64B or 128B”. So if memory access is coalesced as 64B, how can it access a 256B aligned address?

For example:
in global memory:

64B | 64B | 64B | 64B ------------->256B aligned address
now I have a coalesced memory access to above 2nd 64B, since address in global memory has been coalesced at the 1st 64B, it seems there is no way to access the 2nd 64B.

Would anyone like to clear my doubts?


The first statement means that the whole allocation (ie the starting address) is aligned to a 256 byte page boundary. That doesn’t contradict coalescing rules at all.