in programming guide it is said that in accessing 2D array pattern width should be multiple of 16. can anyone explain why? i suggested that segments adresses are multiple of 32. also “to get coalescing data should be located at address multiple to sizeof(type)” - is said in programming guide. that does not imply on multiple of 16 for width.

because memory is accessed per half warp!

from programming guide:

First, the device is capable of reading 4-byte, 8-byte, or 16-byte words from global

memory into registers in a single instruction. To have assignments such as:

device type device[32];

type data = device[tid];

compile to a single load instruction, type must be such that sizeof(type) is

equal to 4, 8, or 16 and variables of type type must be aligned to sizeof(type)

bytes (that is, have their address be a multiple of sizeof(type)).