Is cache access coalesced?

Can accesses to L1 cache be coalesced in the same manner that accesses to global memory are? E.g. a 128 byte segment of global memory is accessed in a coalesced fashion, the segment is cached in L1. If I later access the same 128 byte segment will access to it be coalesced in L1, or carried out serially? I couldn’t find anything in the programming guide, only information about if the segment was in global memory.

The Fermi Tuning Guide states: “The same on-chip memory is used for both L1 and shared memory, …”

Shared memory accesses do not need coalescing to have optimal performance, you only have to be careful about bank conflicts. Therefore I think you don’t have to worry about memory coalescing for L1 cache as well. I’m not sure if the accesses are actually coalesced or not.

I have the same question regarding access pattern of L1 cache with you. Further, what is the access pattern of L2 cache?

Hi, I have a similar question here! It is reasonable for shared memory to be un-coalesced.

But what is the case for L2 cache? Does L2 cache perform coalescing if I have the L1 cache disabled? So does L2 access perform similarly as main memory access?

Thank you!

coalescing is not a function of whether a particular cache is enabled or not.

You can have proper coalescing even on CC 1.x devices that had no L1 and no L2 cache.

You can determine whether or not a read or write transaction will coalesce based strictly on the addresses generated for that transaction by each thread within a warp.

You may want to study carefully a presentation such as this one:

coalesced (or uncoalesced) access is a characteristic of global memory transactions.

with respect to shared memory, the question is whether or not a particular transaction issued by a warp instruction will have bank conflicts. The rules for determining bank conflicts have some similarities to the rules for determining coalesced access but they are not the same rules.