TX1 vs TK1 CPU

Hey guys, sadly the I was unable to find the specified target CPU frequency of the TX1.

According to my tests and research about the A57 the X1 runs at 1.9 GHz and the TK1 runs at 2.3 GHz.
Does the TX1 really have less performance in terms of CPU or is there a special trick?

I don’t know about the frequency requirements/constraints, but one is 64-bit, the other is 32-bit. There was also a lot of change in caching on the 64-bit version which in itself made 64-bit a lot smarter with cache. I suspect the quad core A57 at 1.9 GHz is far faster than the quad core A15 at 2.3 GHz…but I have not personally tested this.

EDIT: Changed to fix an obvious wetware failure on my part (wetware being my brain).

I’m not quite sure what this question is asking.

The Cortex-A57 in the Tegra X1 is a different processor architecture than the Cortex-A15 in the Tegra K1.

There are a wide variety of factors for the better performance of the Cortex-A57 over the Cortex-A15. Some of them are system related Here are a few of the reasons, but certainly not all. Processor design and “tricks” take up volumes of books.

First, the Cortex-A57 is 64 bit versus the 32 bit Cortex-A15.

The Cortex-A57 can fetch, decode, and dispatch three instructions per clock cycle, and it executes instructions out of program order to improve throughput. On the other hand, the Cortex-A15 executes 2 instructions per clock cycle with instructions executing in program order.

The memory bandwidth is faster on the Cortex-A57, 25.6GBs vs. 15GBs. The TX1 can use LPDDR4 memory versus the TK1 LPDDR3.

The architecture difference means memory caches are more performant on the TX1.

The Jetson TX1 uses a 20nm process, The Tegra TK1 uses a 28nm.

The list goes on, but the basic take away is that it’s not an apples to apples comparison, and certainly just using clock speeds as a measure of processor speed is a throwback to the old days when people were all using the same Intel processors.

You can look at some benchmarks of the Jetson TX1 vs Jetson TK1 here: http://www.phoronix.com/scan.php?page=article&item=nvidia-jtx1-perf&num=1

Don’t believe them: the TK1 was not configured for maximum performance. I’ve posted some CPU comparisons at http://openbenchmarking.org/result/1512084-HA-6977069441. The TK1 CPU is faster in many cases.

That is with L4T on both systems, which means the A57 is running in 32-bit mode in userspace. Possibly a 64-bit userspace would run faster.

I am aware that the A57 is a 64 bit processor, the L4T will use 32 bit though…

In performance tests with nbench the K1 processor gets a better result. That’s what is bothering me.
Can you suggest any other tools which work on arm? Most are made just for x64/86

I and other did some tests using Cuda 7.0 samples, the results are mixed.

https://devtalk.nvidia.com/default/topic/901337/jetson-tx1/cuda-7-0-jetson-tx1-performance-and-benchmarks/

Some poorer results for TX1 might be ARM CPU related. Hopefully 64bit L4T will improve TX1 performance significantly.

The boxFilter Cuda sample runs on TX1 much faster than TK1.

Sadly we for now can’t use a 64 Bit userland as PointGrey cameras do not come with a aarch64 driver yet.
Is someone from Nvidia able to tell if the 1TFlop you tell the TX1 has was measured with 32 or 64 bit userland?

1 TFlop is theoretical performance on 16-bit floats, on the GPU. CPU mode won’t affect theoretical GPU performance, and I don’t think it will have much practical effect on real-world GPU performance (other than making pointers bigger, which will reduce performance).