CUDA profiler: local_store

Hello,

does anybody know exactly what local_store counts? I mean: I’m profiling my apps to get an idea of their behavior. I’ve followed step by step the NVIDIA considerations to use the profiler, but I got different results if I measure local_store & local_load together instead of separate.

On the other hand, my code use arrays which are mapped into local memory…why do I get local_store numbers bigger than local_load if the ratio is one load one store??

Local stores include spilling of registers to local memory.

So maybe some register variables are stored in local memory for later reuse, but the code which would reload those variables is never executed(ex: because an “if” condition is not true).

yeah, you’re right about spilling…but my kernels don’t make it, they only only a few number of registers per threadblock (between 10-15), so the compiler doesn’t have to make spilling, does it?

local_load = 78648
local_store = 13527456 if I measure local_store counter alone, and 26351772 if I measure gld_request & gst_request & local_load & local_store together
It hasn’t any sense…

If the code is never executed, the profiler shouldn’t count anything. The counters only increase when the instructions are executed…

If the code is never executed, the profiler shouldn’t count anything. The counters only increase when the instructions are executed…