Jetson TX1 poor performance relative to TX2

I recently did a comparison between TX1 and TX2, both with L4T 28.2. I wrote a trivial program called spin.cpp:

int main()
  for(int i = 0; i < 1000000000; i++)
  return 0;

and compiled it on the TX2 without optimization with

g++ spin.cpp

I made sure to run “sudo ./” before running “time ./a.out”. The same binary was used on the TX1 and the TX2. The results were:


real	0m5.024s
user	0m5.008s
sys	0m0.004s


real	0m0.600s
user	0m0.596s
sys	0m0.000s

tegrastats shows 100% utilization on a single core when the test is running on the TX1, as expected. Any ideas why the TX1 is so much slower and how I can get it close to the performance of the TX2 for this single-core, CPU-only test?

I’ve answered my own question. On TX2, this test is being handled by a Denver core, which does not exist on TX1. I discovered this by rerunning the test on TX2 in nvpmodel 3 and nvpmodel 4. The results:


real	0m4.671s
user	0m4.652s
sys	0m0.000s


real	0m0.615s
user	0m0.608s
sys	0m0.000s

If anyone can explain why the Denver core can do this task so much faster, I’d be interested in hearing it.

Are you using the same compiler on both targets ? A different compiler may optimize better such a ‘do-nothing’ code.
What gives

g++ --version

on both systems ?


Both tests used the same binary, built on the TX2 without optimization. The g++ version was g++ (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609

I cannot really say why Denver core performs better, but I’d warn that MAXP_CORE_DENVER enables Denver core1, but A57 core0 is also (always) on, so not sure you get always this performance if it is sometime launched on core0. You may use cpu affinity if your final code really performs better on Denver cores, but it may be less different doing real tasks, it depends on your application.

I’m afraid that this test is flawed by design. It’s totally artificial and is hardly any realistic comparison.