I see l1 cache hits for local memory, eventhough I have disabled l1 cache

Hi all,

I’m using nvprof 5.5

I have disabled the L1 cache by using “-Xptxas -dlcm=cg” compiler option.

There is only one kernel call

I check the following events using nvprof

Kernel calls           Event             Min          Max        Avg
 1                      local_load       32064       32064       32064
 1                     local_store          96          96          96
 1               l1_local_load_hit       90180       90180       90180
 1              l1_local_load_miss           0           0           0
 1              l1_local_store_hit          90          90          90
 1             l1_local_store_miss         180         180         180
 1              l1_global_load_hit           0           0           0
 1             l1_global_load_miss           0           0           0

I expected all the l1 counters to be 0, at least *hits.

Can someone explain what I have missed here.

Thanks,
Waruna

The reason might be caused by register spilling, which means you use too many registers per thread.

This option applies to global and generic loads not to local loads.

–def-load-cache -dlcm Default cache modifier on global/generic load. Default value: ca.

Read more at: http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#ixzz40fXQoA5i