can someone explain me how to access in coalesced way (to global memory) for device with compute capability 2.x? I’ve read C programming guide and C best practices, and i’m not sure I understand well.
In C best practices we can read
and in C programming guide
So it means that all access of 128 byte aligned segment of global memory are always colalesced, regardless of how I access it?
Thanks a lot
EDIT: I’m sorry, I didn’t read your question properly.
The answer is that the read from global memory to cache will be coalesced, but after that you might still have memory bank conflicts(you can read about those in the guides.)
Ok, but the memory bank conflict is a problem related to shared memory, independent from how we read from global memory, and from the compute capability of the video card, right? I read from global memory and copy to shared, then I work with shared memory. At this step I’ve to consider memory bank conflict, right?
If i would analyze only the coalesced access to global memory, I can say that all access of 128 byte aligned segment of global memory are always colalesced, or not?