Using NSIGHT VS Perfomance Analysis I got strange results in Memory Statistic. Report for Local Memory shows that there were no requests from the kernel to load or store in local memory but nevertheless there are many load/store transactions with local memory through L1 cache.
Compiler output does not show any implicit use of local memory as well:
1> ptxas : info : Compiling entry function ‘_Z9kergetsigPKjPKfS2_jiPjPd’ for ‘sm_30’
1> ptxas : info : Function properties for _Z9kergetsigPKjPKfS2_jiPjPd
1> 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
1> ptxas : info : Used 16 registers, 528 bytes smem, 372 bytes cmem, 120 bytes cmem
How it may be?
Is it a bug in the profiler or I do not understand something?