clarification of coalesced memory access

xtiger1 · February 21, 2011, 11:41pm

Hi,

I have two questions:

does coalesced access require a __syncthreads() call right before access?
consider a kernel that takes two arrays as input: float *a, float *b. If the kernel does simply
a[threadIdx.x] = a[threadIdx.x]>b[threadIdx.x]?1.0:0.0;
will the access/write be coalesced?

tera · February 22, 2011, 1:17pm

No, memory accesses are always per warp, so it is irrelevant whether different warps are synchronized or not.
it depends on the alignment of [font=“Courier New”]a[/font] and [font=“Courier New”]b[/font], and on the compute capability of the GPU.

Coalescing rules for devices of different compute capabilities are given in Appendix G.3.2 and G.4.2 of the Programming Guide.

Topic		Replies	Views
Accessing same global memory address within warps CUDA Programming and Performance	4	4435	October 24, 2018
want to know more detail of memory coalescing CUDA Programming and Performance	4	2074	November 11, 2008
Coalesced acces slower than non coalesced CUDA Programming and Performance	4	945	February 7, 2011
Is this coalesced access global memory access in for loop and with divergent while loop CUDA Programming and Performance	1	2692	January 5, 2009
Is these way coalesced access? CUDA Programming and Performance	0	420	March 6, 2020
coalescing problem CUDA Programming and Performance	4	1131	August 8, 2011
memory accesses by thread block accessing memory by thread block is only semi-coalesced? CUDA Programming and Performance	7	3850	February 16, 2009
questions about coalescing access coalescing access CUDA Programming and Performance	8	2087	November 13, 2009
Single address coalescing CUDA Programming and Performance	2	9561	January 29, 2011
Coalesced memory access in a matrix of coefficients CUDA Programming and Performance	5	494	August 15, 2024