CUDA Memory handling articles on CUDA memory handling

Does anyone know of an article online (or in a journal) that talks extensively about CUDA memory handling. I am lookiing for an analytical approach. The secret it seem to realizing all of the speed improvement is in handling of the memory.

I saw one article (which I did not keep) that said the secret to seeing the speed increases is to be intimately familiar with you program’s memory and how CUDA handles it.


Try Chapter 5 of the CUDA programming guide.