Why profiler slows down the runtime by more than 10x (20-30x)


I don’t understand why when I turn on the profiling flag (COMPUTE_PROFILE) the run slows down by a factor of 10-30x. what takes so much time?


Actually I found out that the major slowdown happens only when I’m countermodeaggregate option. still I don’t understand why it’s so slow.

The profiler uses counters to measure all the statistics. These counters often end up in local memory, which is slow.