clarification of coalesced memory access

Hi,

I have two questions:

  1. does coalesced access require a __syncthreads() call right before access?
  2. consider a kernel that takes two arrays as input: float *a, float *b. If the kernel does simply
    a[threadIdx.x] = a[threadIdx.x]>b[threadIdx.x]?1.0:0.0;
    will the access/write be coalesced?
  1. No, memory accesses are always per warp, so it is irrelevant whether different warps are synchronized or not.
  2. it depends on the alignment of [font=“Courier New”]a[/font] and [font=“Courier New”]b[/font], and on the compute capability of the GPU.

Coalescing rules for devices of different compute capabilities are given in Appendix G.3.2 and G.4.2 of the Programming Guide.