new coalescing rules ...just to be sure

Hey EES fellows,

I understand that the rules for accessing global device mem have changed slightly with the 2.0 toolkit (page 50/51, sect. 5.1.2.1). I have two (sort-of-dumb) questions regarding that, just to make sure I get it right:

  1. Is there any additional difference for sm13 hardware (i.e. the T10P) beyond what’s described in the 2.0 manual (pages given above).
  2. Is it correct, that for the basic situation of reading a bunch of aligned floats linearly, one per threrad, nothing has changed from the 1.0 coalescing?

Sorry for the rather basic question, but as the profiler doesn’t seem to be working completely yet, I have trouble to figure out whether I’m running in the right direction coalescing-wise.

Thanks, Alex

I think the programming guide describes everything. For linear, aligned accesses (i.e. threads in a warp access sequential floats, and the starting address is aligned), there is no difference from G80 (i.e. that’s “speed of light”).

Mark