Why profiler slows down the runtime by more than 10x (20-30x)

Hi,

I don’t understand why when I turn on the profiling flag (COMPUTE_PROFILE) the run slows down by a factor of 10-30x. what takes so much time?

Thanks.

Actually I found out that the major slowdown happens only when I’m countermodeaggregate option. still I don’t understand why it’s so slow.

The profiler uses counters to measure all the statistics. These counters often end up in local memory, which is slow.