Global memory alignment and coalescing CUDA 1.1 compatible

Mafos · October 20, 2008, 1:01pm

What is the align requirement to use “coalesced” way of accesing global memory?
From Figure 5.1 from Programming Guide I’d say it must be 16, 32, 64 or 128. Is this sizeof(VAR) dependent? Please enlighten me :)

Mafos · October 20, 2008, 6:13pm

Ok, Programming Guide says:

"Any address of a variable residing in global memory or returned by one of the memory allocation routines from the driver or runtime API is always aligned to at least 256 bytes."

It also mentions 2nd out of 3 conditions at “Coalescing on Devices with Compute Capability 1.0 and 1.1”:

"All 16 words must lie in the same segment of size equal to the memory transaction size [...]"

If I understand this correctly, when I cudaMalloc let’s say 64 bytes of global memory to store 16 floats, the address is aligned to at least 256 bytes (so it is also aligned to 32 bytes) and if succesive threads from the half-warp access succesive 32-bit words, then the whole 64 bytes transaction is done as one, fast transaction, instead of slow, serialized access. Am I right here?

paulius · October 20, 2008, 6:14pm

Yes.

Topic		Replies	Views
256B aligned address in global memory? CUDA Programming and Performance	1	6554	April 19, 2011
Loading global memory into shared memory: alignment? CUDA Programming and Performance	2	835	December 8, 2017
Memory access - data alignment How does the data alignment in opencl work? CUDA Programming and Performance	0	4682	July 6, 2010
Alignment Requirement Single instructions CUDA Programming and Performance	1	3665	October 11, 2007
Alignement requirement CUDA Programming and Performance	1	3321	August 16, 2009
Coalesced Access to Global Memory CUDA Programming and Performance	2	1864	April 13, 2012
Require clarification for Memory coalescing? CUDA Programming and Performance hw , cuda	4	1534	October 5, 2023
Memory alignment when using cudamalloc? CUDA Programming and Performance	2	8679	May 28, 2010
Problem withGlobal/Device memory alignment in CUDA CUDA Programming and Performance	0	1304	August 25, 2008
Global Memory Coalescing on Devices with Compute Capability 1.2 and Higher CUDA Programming and Performance	3	645	June 4, 2015

Global memory alignment and coalescing CUDA 1.1 compatible

Related topics