Coalesced/uncoalesced memory access detection?

I’m sorry if this is the wrong place to put my questions.

  1. I wonder if there is a algorithm to statically detect if a memory access is coalesced or uncoalesced in OpenCL. If we can’t determine that statically, what else do we need then?
  2. What is the relation ship between 3 numbers: coalesced memory accesses #, uncoalesced memory accesses # and the total of memory transactions?
  3. What are the main differences between the answers for 1.3 and 2.x?


The differentiation between coalesced and uncoalesced memory accesses hardly makes sense in the context of Compute Capability 1.3 and 2.x (I suspect that is what you mean). For CC 1.0 and 1.1 a half-warp either performed a coalesced memory access or the access was uncoalesed and each thread was served separately. The 1.2 and 1.3 GPUs breaks up not-fully-coalesced accesses “as efficient as possible” (halfwarp-wise), for 2.x at least reads (usually) go to the L1 cache. CUDA_C_Programming_Guide.pdf (v3.2) explains all this in section G.3.2 (1.x) and G.4.2 (2.x). If you know the access pattern of a (half)warp, you can easily deduce the type and number of memory requests for 1.2/1.3. For 2.x, temporal locality also plays a role, as well as possible bank conflicts.