I think all of us have the Programming Guide PDF always open in a background window. It’s well written and is packed with information.
I have two suggestions to make it even better:
Add PDF index navigation for the Appendices! The appendices are the section I most often have to jump to. (What’s the intrinsic name again? How big is the constant cache, I forget? Is single precision divides 0, 1, or 2 ULP?)
There’s quick navigation indexing to all of the PDF’s sections and subsections… but not the Appendices.
There are two global memory topics which are undocumented but very important when you get down to low level algorithm design. These both deserve a paragraph at least in the programming guide. Even if it was a light vague overview, at least it would be a clue for people to dig deeper if it seems to be an issue for them.
-Global memory has its own partition conflicts, similar to shared memory bank conflicts. G80 has 6 global memory partitions, G200 has 8 partitions. In both cases the partitions are 256 bytes wide. Global memory accesses perform fastest when they’re spread over all partitions.
- The GPU supports both prefetching of global memory and fire-and-forget writes to global memory. Prefetching is automatic and occurs by your code just by reading memory and following it up with non-dependent computes which can run while the memory fetch is still pending.