Good Tutorial on Coalesced memory accesses ?


I’m looking for a good tutorial explaining the coalesced memory access.

Is there any good one out there ?

I’ve read the programming guide and getting started guide of NVIDIA SDK,
but I got only a nit subtle idea about it.

So I’d like to fully understand it…


With the new architectures 2.X this feature is (i think) less important, but the best tutorial is the CUDA programming guide.
In the v3.2 section G.3.2.2 i learnt the coalescing concept and its influence over the performance.

In this same guide, you have few examples in the page 164. These examples, not only show the number of transactions by halfwarp else the size of these transactions too.

Finally, other good tutorial is the vectorAdd example. Try to modify the stride of access to the vector words to make it uncoalesced and you will check how much affect the uncoalesced acceses to the performance. For example: sum the elements of the vector with a stride of 3.

In all your proofs, perhaps you need a leaf and a pen External Image .

Good luck!