I’m new with cuda and need your help. I try to understand the meaning of coalesced memory and how to use it. My kernel works with 16-Bit unsigned short arrays. The programming guide shows that a per thread access of 2 Bytes is coalesced, if all words lie in a 64 byte segment. But, why a 64byte segment? A 32byte segment is big enough, or not?
In my case they lie in a 32byte segment. A 64 byte segment contained data for 32 threads in my case. Would that be coalesced?
Thanks in advance!