We create a simple benchmark (based on google benchmark) for Tx2.
And found, when loop number increasing, performance of denver cpu decreased and A57 not.
Testing source code:
#include "benchmark/benchmark.h"
int getFactorNumber(int n) {
int factor = 1;
if (n <= 1) return factor;
for (int i = 2; i < n; i++) {
if (n % i == 0) factor++;
}
return factor;
}
static void MT_Factor(benchmark::State& state) {
int factor;
int i = state.range(0);
//for (auto _ : state) is_prime = isPrime(static_cast<int>(state.range(0)));
for (auto _ : state) factor = getFactorNumber(i);
// Prevent compiler optimizations
std::stringstream ss;
ss << factor;
state.SetLabel(ss.str());
}
BENCHMARK_RANGE(MT_Factor, 1, 1024*1024);
BENCHMARK_MAIN();
Testing result:
loopNumber | denver(ns) | average of denver | A57(ns) | average of A57 |
---|---|---|---|---|
1 | 1.62 | 1.62 | 6.95 | 6.95 |
8 | 29.0 | 3.625 | 33.3 | 4.1625 |
64 | 187 | 2.92 | 290 | 4.53 |
512 | 6461 | 12.62 | 2024 | 3.95 |
32768 | 350309 | 10.69 | 124104 | 3.79 |
262144 | 2510869 | 9.58 | 1002964 | 3.826 |
1048576 | 9612591 | 9.167 | 4006140 | 3.82 |
Testing binary (for TX2)
benchmark (1.4 MB)