As a CUDA newbie poring over CUDA Programming Guide Version 2.0 (dated 6/7/2008), I’d like to offer some feedback. These are just my own take as reader, not necessarily errata. But I hope it might be a little useful for the tech writers.
It’s not clear to me on first or second reading why the broadcast mechanism described in Fig. 5-8 does not solve the 8-way bank conflict in the right side of Fig. 5-7. Perhaps some minor clarification would make this more obvious.
But reads for constants are cached too, right, and locality should apply? So is there an implied difference in the constant/texture cache, or did this bullet refer to a difference from global reads only?
Refering to this paragraph:
My brain has to double-parse this because global memory is “device memory” as described in 3.1. The paragraph is correct as written, but the double-entendre trips me up. So “data transfers between the device and global memory” are “data transfers between the device and its global memory” or something that keeps me from interpreting it as “device- and global- memory” the first time.
In fact, throughout the whole document I have to remind myself that “device memory” is the big, high-latency, off-chip memory, which includes local/global/texture/constant memory, but “device memory” does not include the on-device registers, shared memory, or constant/texture caches. So “device memory” is viewed as being on the device from the point of view of the host, but not from the point of view of a “transfer between the device and global memory”. It makes me wish there was a better name for “device memory” but I can’t suggest one.
The Specifications for Compute Capability 1.2 has no mention of the relaxed criteria for coalescing described in 18.104.22.168. As a programmer it looks like a big win to not have to worry as much about the coalescing criteria as in CC 1.0/1.1.
I wasn’t sure how that jibes with this from A.2:
I’m guessing that both are true, but that for doubles FMADs don’t truncate the intermediate results of the mul. Maybe that could be clarified.