L1 cache statistics in computeprof always 0

Hi everyone,

I’m trying to run a few of the benchmarks from the Rodinia suite through computeprof on a GeForce GTX 680 (CUDA 5.0). The issue I’m running across is that all of the L1 cache statistics are always appearing as 0. I’ve also tried running the BlackScholes application from the CUDA (5) SDK suite, and the stats for it for the L1 cache are also 0 for everything except the local load/store misses.

My question is: is there a specific flag/switch I need to set to get the L1 cache statistics to appear? Is it just a feature of these benchmarks I’ve selected (bfs, backprop, BlackScholes) that they happen to have no L1 traffic?


Does the code use local memory?

Unlike Fermi, on Kepler devices L1 is exclusively used for local memory. All global memory accesses go straight to L2. (A fact that is not yet reflected everywhere in the documentation.)

So if the code doesn’t use local memory, there is no L1 traffic.

By local memory, do you mean shared memory? Or do you mean something else? Local memory seems to be an overloaded term nowadays, which is why I’m asking.


No, not shared memory.
I mean local memory as the term is used in the Programming Manual, i.e. the off-chip memory where automatic variables are stored if they are not in registers.

Thanks again for getting back to me. I recompiled each of them with ptxas-options=-v set, and none of them show having any local memory. Additionally, I ran cuobjdump with them and didn’t see any commands with “.local” after them, as the Programming Manual mentioned. Thus, it seems like none of them have local memory accesses, which explains the lack of L1 cache statistics.

On a related note, if the Kepler GPUs behave as you’ve mentioned, then why are there even columns in computeprof for l1 global load hit? It seems like there can never be any loads of that type based on what you said…


That is a question that I’ve asked myself too. It appears several places in the tools and particularly the documentation haven’t yet been updated to reflect the new architecture. Which made made me wonder occasionally which of the contradictory statements to trust.