L1 cache statistics in computeprof always 0

msinclair · April 8, 2013, 5:00pm

Hi everyone,

I’m trying to run a few of the benchmarks from the Rodinia suite through computeprof on a GeForce GTX 680 (CUDA 5.0). The issue I’m running across is that all of the L1 cache statistics are always appearing as 0. I’ve also tried running the BlackScholes application from the CUDA (5) SDK suite, and the stats for it for the L1 cache are also 0 for everything except the local load/store misses.

My question is: is there a specific flag/switch I need to set to get the L1 cache statistics to appear? Is it just a feature of these benchmarks I’ve selected (bfs, backprop, BlackScholes) that they happen to have no L1 traffic?

Thanks,
Matt

tera · April 8, 2013, 5:21pm

Does the code use local memory?

Unlike Fermi, on Kepler devices L1 is exclusively used for local memory. All global memory accesses go straight to L2. (A fact that is not yet reflected everywhere in the documentation.)

So if the code doesn’t use local memory, there is no L1 traffic.

msinclair · April 8, 2013, 5:40pm

By local memory, do you mean shared memory? Or do you mean something else? Local memory seems to be an overloaded term nowadays, which is why I’m asking.

Matt

tera · April 8, 2013, 5:47pm

No, not shared memory.
I mean local memory as the term is used in the Programming Manual, i.e. the off-chip memory where automatic variables are stored if they are not in registers.

msinclair · April 8, 2013, 6:02pm

Thanks again for getting back to me. I recompiled each of them with ptxas-options=-v set, and none of them show having any local memory. Additionally, I ran cuobjdump with them and didn’t see any commands with “.local” after them, as the Programming Manual mentioned. Thus, it seems like none of them have local memory accesses, which explains the lack of L1 cache statistics.

On a related note, if the Kepler GPUs behave as you’ve mentioned, then why are there even columns in computeprof for l1 global load hit? It seems like there can never be any loads of that type based on what you said…

Thanks,
Matt

tera · April 8, 2013, 7:24pm

That is a question that I’ve asked myself too. It appears several places in the tools and particularly the documentation haven’t yet been updated to reflect the new architecture. Which made made me wonder occasionally which of the contradictory statements to trust.

Topic		Replies	Views
Understanding Caching/Flushing Behavior/Performance in computeprof for Kepler CUDA Programming and Performance	6	3472	September 19, 2014
I see l1 cache hits for local memory, eventhough I have disabled l1 cache Visual Profiler and nvprof	2	1996	February 20, 2016
How can I check and see if my GPU is using L1 cache CUDA Programming and Performance	7	3098	June 9, 2011
Understanding the functioning of nvprof and .cv data load option CUDA Programming and Performance	8	3254	December 11, 2014
L1 cache hits 0% CUDA Programming and Performance	2	1164	June 1, 2013
Why L1 cache hit ratio become zero on K20? CUDA Programming and Performance	10	5800	January 17, 2013
Local memory - no requests but numerous transactions CUDA Programming and Performance	9	1237	January 10, 2014
L2 Hit Rate(Texture Reads) becomes 100% when modifying memory never used CUDA Programming and Performance	7	2765	March 17, 2018
Question on the L1 caching of the GK 110 CUDA Programming and Performance	17	7312	April 17, 2013
cannot disable L1 on Fermi CUDA Programming and Performance	0	3743	June 8, 2011

L1 cache statistics in computeprof always 0

Related topics