When loop number increase, performance of denver cpu decrease?

We create a simple benchmark (based on google benchmark) for Tx2.
And found, when loop number increasing, performance of denver cpu decreased and A57 not.

Testing source code:

#include "benchmark/benchmark.h"

int getFactorNumber(int n) {
  int factor = 1;
  if (n <= 1) return factor;
  for (int i = 2; i < n; i++) {
    if (n % i == 0) factor++;
  }
   return factor;
}



static void MT_Factor(benchmark::State& state) {
  int factor;
  int i = state.range(0);
  //for (auto _ : state) is_prime = isPrime(static_cast<int>(state.range(0)));
  for (auto _ : state) factor = getFactorNumber(i);
  // Prevent compiler optimizations
  std::stringstream ss;
  ss << factor;
  state.SetLabel(ss.str());
}

BENCHMARK_RANGE(MT_Factor, 1, 1024*1024);


BENCHMARK_MAIN();

Testing result:

loopNumber denver(ns) average of denver A57(ns) average of A57
1 1.62 1.62 6.95 6.95
8 29.0 3.625 33.3 4.1625
64 187 2.92 290 4.53
512 6461 12.62 2024 3.95
32768 350309 10.69 124104 3.79
262144 2510869 9.58 1002964 3.826
1048576 9612591 9.167 4006140 3.82

Testing binary (for TX2)
benchmark (1.4 MB)

We’re investigating this issue, will do the update once any result found.