Coalesced Transaction Size

Snowball_Two · October 12, 2011, 9:43am

Hi,

i was just wondering how the possible size of one memory transaction (e.g. 256 Byte at CC 2.x) fits with the device’s memory interface width, e.g. 384 Bit at GTX580.

How is the maximum transaction size determined? Is there a relationship to the Interface width? Some Driver stuff?

Also i don’t fit with the figures and facts about coalesced memory access in section 3.2.1 of Cuda BP Guide v 4.0:

Offset Copy: figure tells about heavy impact at GTX280 - in fact this device has no problem with offset copy since it results in only 1 more coalesced access (Prog. Guide V4.0)

Strided Copy: figure tells about immediate impact with a stride of 2. I experienced no impact at all with stride 2, slightly impact with stride 4, and strong growing impact with stride > 4

what shall i believe? :D

L_F · October 13, 2011, 9:46pm

Probably in your case CC2.x cache affects a lot.

tera · October 14, 2011, 1:01am

In the last several generations the memory interface always consists of multiple 64 bit wide channels. Wider transactions (up to 128 bytes on Fermi GPUs) will be carried out in bursts in a single channel and thus be more efficient.
The mapping of addresses to memory channels is hashed on Fermi class GPUs to prevent partition camping.

As L F wrote, the cache in Fermi GPUs affects this a lot. Memory transactions on Fermi always have the width of a full cacheline which is 128 byte, or 32 byte is the L1 cache was disabled with [font=“Courier New”]-Xptxas -dlcm=cg[/font] at compile time.

Topic		Replies	Views
What is a transaction from HBM to L2? CUDA Programming and Performance	2	126	August 29, 2024
Memory coalescing in one thread CUDA Programming and Performance	17	16594	March 31, 2011
how to get memory fetch size for coalesced reads from global memory? CUDA Programming and Performance	0	458	December 15, 2016
Is cache access coalesced? CUDA Programming and Performance	4	2007	September 5, 2016
About coalescing CUDA Programming and Performance	6	2615	April 16, 2010
Uncoalesced global memory bandwidth CUDA Programming and Performance	3	2231	March 28, 2009
variable cache line width ? CUDA Programming and Performance	4	1998	January 13, 2015
Global memory access bottleneck CUDA Programming and Performance	8	3441	September 4, 2015
Fermi. Any DRAM-intensive benchmarks ? or any suggestions ? CUDA Programming and Performance	2	2349	April 19, 2012
Memory transactions CUDA Programming and Performance	1	1032	May 28, 2019

Coalesced Transaction Size

Related topics