Hi,
i was just wondering how the possible size of one memory transaction (e.g. 256 Byte at CC 2.x) fits with the device’s memory interface width, e.g. 384 Bit at GTX580.
How is the maximum transaction size determined? Is there a relationship to the Interface width? Some Driver stuff?
Also i don’t fit with the figures and facts about coalesced memory access in section 3.2.1 of Cuda BP Guide v 4.0:
Offset Copy: figure tells about heavy impact at GTX280 - in fact this device has no problem with offset copy since it results in only 1 more coalesced access (Prog. Guide V4.0)
Strided Copy: figure tells about immediate impact with a stride of 2. I experienced no impact at all with stride 2, slightly impact with stride 4, and strong growing impact with stride > 4
what shall i believe? :D