I think all of us have the Programming Guide PDF always open in a background window. It’s well written and is packed with information.
I have two suggestions to make it even better:
-
Add PDF index navigation for the Appendices! The appendices are the section I most often have to jump to. (What’s the intrinsic name again? How big is the constant cache, I forget? Is single precision divides 0, 1, or 2 ULP?)
There’s quick navigation indexing to all of the PDF’s sections and subsections… but not the Appendices. -
There are two global memory topics which are undocumented but very important when you get down to low level algorithm design. These both deserve a paragraph at least in the programming guide. Even if it was a light vague overview, at least it would be a clue for people to dig deeper if it seems to be an issue for them.
-Global memory has its own partition conflicts, similar to shared memory bank conflicts. G80 has 6 global memory partitions, G200 has 8 partitions. In both cases the partitions are 256 bytes wide. Global memory accesses perform fastest when they’re spread over all partitions.
- The GPU supports both prefetching of global memory and fire-and-forget writes to global memory. Prefetching is automatic and occurs by your code just by reading memory and following it up with non-dependent computes which can run while the memory fetch is still pending.