I have some questions regarding the right access pattern to get maximum memory bandwidth.
In the programming guide on page 44 it says that the device is capable of reading 32 - bit , 64 - bit and 128 bit from global memory into registers in a single load instructions,
__device type device; type data = device[tid];
when sizeof(type) is equal to 4, 8, or 16 bytes and when variables of type type is aligned to 4, 8, or 16 bytes (that is, have 2,3, or 4 least signigicant bits of their address equal to zero).
I do understand the first constraint but I do not know how to assure the second condition. For instance if type is float then I have met the first requirement but I do not know whether the second one is fulfilled or not ?! How do I meet the second requirement and if not possible with this type, i.e. float how do I do a conversion.
Does it also work for loading global memory data into shared memory data ?
thx for the answer in advance!