Coalescing on Devices with Compute Capability 1.2

Imago · July 10, 2008, 9:17am

Hi,

in the CUDA Programming Guide 2.0, 5.1.2.1, under the topic “Coalescing on Devices with Compute Capability 1.2 and Higher” they say that 128 bytes are transfered in one memory transaction if the segment size of all threads of a halfwarp access 32-bit or 64-bit words. I mean 32-bit * 16 threads (because of a halfwarp) = 512 bits = 128 bytes. But 64 bit-words would need 64 * 16 / 4 = 256 byte transfer…
Are 64-bit words casted into 32-bit ones?

Thx in advance.

MisterAnderson42 · July 10, 2008, 12:45pm

There are 8 bits in a byte:
(float) 32 bits * 16 / 8 bits/byte = 64 bytes
(float2) 64 bits * 16 / 8 bits/byte = 128 bytes
(float4) 128 bits * 16 / 8 bits/byte = 256 bytes

So, your question still holds but it should have been what happens to a float4 read? It seems that float4 reads are not coalesced on compute 1.2 hardware. Although, they will only generate 2 128 byte threads so performance will not suffer (and it doesn’t in my testing)

Topic		Replies	Views
Question about coalesced memory access CUDA Programming and Performance	10	2753	September 24, 2009
memory coalescing CUDA Programming and Performance	4	5446	June 10, 2011
Are memory fetches 64 bytes _minimum_? CUDA Programming and Performance	1	2535	October 17, 2008
Question for coalesced access for copute capabilty 1.2 or higher CUDA Programming and Performance	0	676	September 17, 2009
Coalesced Memory Read Question CUDA Programming and Performance	7	3026	February 24, 2016
Beginner's question CUDA Programming and Performance	2	472	July 3, 2019
why 256byte loads slower than 128byte loads? CUDA Programming and Performance	6	6935	February 11, 2010
Coalesced vs non-coalesced in reduction example Why float4-reads are not coalesced? CUDA Programming and Performance	10	4070	October 15, 2008
gbl32/64/128 coalescing doubt CUDA Programming and Performance	6	1657	September 13, 2010
Require clarification for Memory coalescing? CUDA Programming and Performance hw , cuda	4	1682	October 5, 2023

Coalescing on Devices with Compute Capability 1.2

Related topics