Memory Transaction Width and L2 Cache Fill - Compute Capability width 2.x and 3.0

nagesh · June 27, 2012, 5:11pm

In Section F.4.2 of the CUDA C Programming Guide V 4.2,

“A cache line is 128 bytes and maps to a 128-byte aligned segment in device memory. Memory accesses that are cached in both L1 and L2 are serviced with 128-byte memory transactions whereas memory accesses that are cached in L2 only are serviced with 32-byte memory transactions. Caching in L2 only can therefore reduce over-fetch, for example, in the case of scattered memory accesses.”

Is the above statement correct?

For lines that are cached only in L2, shouldn’t all the 128-bytes be still fetched to fill the L2? 32-byte transactions could may be help in returning the critical word early to the SM, but would not reduce over-fetch.

Thanks,
nagesh

seibert · June 27, 2012, 8:25pm

The statement is correct. At the PTX level, there are load instruction modifiers that indicate whether the load should bypass the L1 cache. If such a instruction modifier is used, then the smaller cache line will be used and less data will be fetched from the device memory. This is not directly available at the CUDA C level, although there are options to nvcc that allow you to globally disable the L1 cache (or the L1 and L2 cache) when compiling CUDA C code.

nagesh · June 27, 2012, 8:54pm

seibert, thanks for the reply.

"... If such a instruction modifier is used, then the smaller cache line will be used and less data will be fetched from the device memory ..."

Does this mean that the L2 cache line size is 32-bytes while the L1 cache line size is 128-bytes? And that would cause a lot of problems.

seibert · June 28, 2012, 12:15am

Yes, that is correct. The cache line sizes are different between L1 and L2.

Topic		Replies	Views
Cache line size of L1 and L2 CUDA Programming and Performance	3	20728	November 14, 2011
variable cache line width ? CUDA Programming and Performance	4	2023	January 13, 2015
The granularity of L1 and L2 caches CUDA Programming and Performance cuda	2	1154	April 18, 2024
Can I disable L2 caching? CUDA Programming and Performance	3	2463	May 27, 2015
Cache access characteristics CUDA Programming and Performance	0	586	February 17, 2011
Behavior of L1/L2 caches CUDA Programming and Performance	1	458	June 2, 2023
L1-L2-Global how to clearly describe their interaction for a given kernel CUDA Programming and Performance	3	2066	April 15, 2012
Where can I easily find the L1 and L2 cache line size per compute capability? CUDA Programming and Performance	1	269	July 2, 2024
Sometimes smaller blocks may work better Cache overload CUDA Programming and Performance	2	2527	July 7, 2011
texture cache and L2 cache CUDA Programming and Performance	3	4348	March 19, 2014

Memory Transaction Width and L2 Cache Fill - Compute Capability width 2.x and 3.0

Related topics