Denver 2 vs Cortex-A57 performance comparison

I am trying to understand how the Denver 2 cores compare in terms of performance with that of the Cortex-A57 cores. Performance numbers in MIPS/MHz (or DMIPS/MHz) will help.

This doesn’t answer your question directly but if you have an application that you can use as a benchmark, the mvpmodel option allows you to try different combinations of cores which will help you better understand their performance.

Thanks for the comments, singularity7.

And, here’s a more detailed analysis regarding some performance comparisons as your reference:

https://devblogs.nvidia.com/parallelforall/jetson-tx2-delivers-twice-intelligence-edge/

Thanks

Thanks for your comments, singularity7 and kayccc.

The blog gives a good idea on how Tegra X2 compares with Tegra X1 for different and clock settings. It also mentions that Denver 2 is suited for high single-thread performance whereas, Cortex-A57 is geared for multi-threading.

However, it does give a picture on how Denver 2 compares with Cortex-A57. I have a fair idea of what performance numbers (in MIPS/MHz) I can expect from Cortex-A57 but not on Denver 2. I would like to understand Denver 2 so that I can out it to best use.

Also, can you let me know where can I find the Technical Reference Manual(TRM) for Tegra X2.

Hi SagarGaonkar,

Also, can you let me know where can I find the Technical Reference Manual(TRM) for Tegra X2.
TX2 TRM will be public once it’s ready, I don’t have clear schedule yet, but should not be too long.

Thanks

Hi kayccc,

How does switching between the Denver and Arm cortex take place? Is it implemented in the hardware level inside the SoC. Or is it done in the kernel?

Is there a way to see which cluster is active currently? or to switch between the two? I remember using an option like that in TK1.

http://elinux.org/Jetson/Performance#Viewing_the_current_CPU_status

I would like to resurrect this old thread - our application needs only two cores and I would like to power off either Denver2 or Cortex-A57 clusters and have CPU clock speed @ max 2GHz. Seeing a comparison of features/speed between Denver2 and Cortex-A57 would be very helpful… Anyone from Nvidia can help with that?

-albertr

Hi albertr,

You could try with nvpmodel, please refer to http://www.jetsonhacks.com/2017/03/25/nvpmodel-nvidia-jetson-tx2-development-kit/

Thanks

Sorry, but that doesn’t answer the question.

-albertr

I think that does answer a good part of the question. There are various power models that enable/disable various CPU cores. You have to read the script and its configuration file to find out what they are.

That being said, core 0 is always on, because it’s the core that handles most I/O and interrupt (it’s not symmetrically distributed across all the cores.) Core 0 is an A57 core, so you can’t run “only” Denver. Thus, if you believe Denver will draw power while idle, using a nvpmodel that turns off those cores is probably best.

The good news is that un-used cores don’t draw a lot of power. That being said, you can use Linux CPU control to turn off CPUs 1,2 (Denver) and 4,5 (two of the A57 cores) for the lowest possible draw, in the same way that nvpmodel does.

Exactly how much will they draw? That depends on your specific application load and instruction mix. There is no one answer that fits all applications; you have to measure on your own application.

I think I need to clarify the question, since it seems to be some confusion… I know how to control cores, what I and others are asking about is how Denver2 core compares to Cortex-A57 core performance-wise? To put it simple - if I need only two cores for my application - is there any performance advantage of using Denver2 core instead of Cortex-A57 core?

-albertr

And the answer is still that any difference will depend entirely on your specific application, instruction mix, cache usage pattern, and such.
If you are GPIO intense, you may find that running on core 0 will have less overhead. If you have cache usage that doesn’t alias, you may find that one Denver and one A57 would work better. (Which requires activating both clusters.) The only way you can tell how the cores perform for your application, is to measure your application.

Hello,
I went to the above link, under the section “Max-Q and Max-P” and the “Table 2. Power consumption measurements for GoogLeNet and AlexNet architectures for max-Q and max-P performance levels on NVIDIA Jetson TX1 and Jetson TX2” does not show max-Q and max-P for the TX1.

It only shows max clock for the TX1.

Does this imply the GPU on the TX1 only runs at Max Clock?

Second, is there a similar comparison for the CPU cores? E.g. ARM A57 quad core on the TX1 max-q/p/clock vs the same on TX2?

The use case is the GPU is OFF and the applications are run on the quad core CPU.

OR, would the recommendation be to run nvpmodel to set the above modes and try to run some kind of benchmark? (This seems to be the answer)