my Xavier ARM = v8 rev 0, not v8 rev 2 (Solved)

I just got the Xavier developer kit from nVidia directly, and here’s the output of /proc/cpuinfo:

jetson-0423018055068:~> cat /proc/cpuinfo
processor : 0
model name : ARMv8 Processor rev 0 (v8l)
BogoMIPS : 62.50
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp
CPU implementer : 0x4e
CPU architecture: 8
CPU variant : 0x0
CPU part : 0x004
CPU revision : 0
MTS version : 42272872

The spec sheet say it is an ARMv8 rev 2. Also, I noticed compiling on it takes a longer time than the jetson tx2. Did we get an engineering sample?

Xavier feels slower in the default config. The high performance cores are disabled by default. Only the 4 low perf cores are running.

That sounds like what a “production” unit behaves like. I seem to have an “engineering” unit where all 8 cores are enabled, but they all max out at the slower 1.1904 GHz

jetson-0423018055068:~> sudo ./jetson_clocks.sh --show
[sudo] password for nvidia:
SOC family:tegra194 Machine:jetson-xavier
Online CPUs: 0-7
CPU Cluster Switching: Disabled
cpu0: Gonvernor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400
cpu1: Gonvernor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400
cpu2: Gonvernor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400
cpu3: Gonvernor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400
cpu4: Gonvernor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400
cpu5: Gonvernor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400
cpu6: Gonvernor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400
cpu7: Gonvernor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400
GPU MinFreq=114750000 MaxFreq=905250000 CurrentFreq=114750000
EMC MinFreq=204000000 MaxFreq=1600000000 CurrentFreq=1600000000 FreqOverride=0
Fan: speed=0

What happens if you run

sudo nvpmodel -m 0

Here’s mine:

nvidia@xavier:~$ sudo nvpmodel -m 0
nvidia@xavier:~$ sudo ./jetson_clocks.sh --show
SOC family:tegra194  Machine:jetson-xavier
Online CPUs: 0-7
CPU Cluster Switching: Disabled
cpu0: Gonvernor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1190400
cpu1: Gonvernor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1190400
cpu2: Gonvernor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1190400
cpu3: Gonvernor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1190400
cpu4: Gonvernor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1190400
cpu5: Gonvernor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1190400
cpu6: Gonvernor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1190400
cpu7: Gonvernor=schedutil MinFreq=1190400 MaxFreq=2265600 CurrentFreq=1190400
GPU MinFreq=114750000 MaxFreq=1377000000 CurrentFreq=114750000
EMC MinFreq=204000000 MaxFreq=2133000000 CurrentFreq=2133000000 FreqOverride=0
Fan: speed=0

I do get the same output when I do that. I didn’t check to see what the nvpmodel was set to before. Do you have arm8v2 or v0 when you cat /proc/cpuinfo?

If you hadn’t changed nvpmodel before, it should have been MODE_15W as that’s what the units ship with. You can change it back by running “sudo nvpmodel -m 2”. There is a table that lists the frequencies and ID’s for the nvpmodel presets included in the L4T Documentation under ‘Clock Frequency and Power Management’ in the ‘Power Management for Jetson-Xavier Devices’ and ‘Max-Q and Max-P Power Efficiency’ section.

The cpuinfo of my production unit looks the same, looking into if there is an issue with how that is being reported.

I don’t think there is a difference in performance between cores anymore.

From https://en.wikichip.org/wiki/nvidia/tegra/xavier:

nvidia@xavier:~$ cat /proc/cpuinfo
processor       : 0
model name      : ARMv8 Processor rev 0 (v8l)
BogoMIPS        : 62.50
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp
CPU implementer : 0x4e
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0x004
CPU revision    : 0
MTS version     : 43038740

I’d think that there might be a bug in how the revision is identified/reported, rather than some kind of dumbed-down CPU version. They taped out a year ago, and are reportedly going to mass production now, so I don’t think there are any chip changes in the pipeline!

Ok, this all makes sense. Because the Features section of /proc/cpuinfo has more than the jetson tx2 does, it seems like it may be a misreporting of the arm rev. So I think that should clear it up.
Now onto the bigger problem of why we’re only getting about 10% improvement for my application over the jetson TX2…

Profile on TX2, profile on Xavier, and compare!

My guess is there is some serial section, and Amdahl’s law is screwing you over. It’s pretty much always Amdahl’s law :-(

For the rev listed under the cpuinfo model name, “Revision” in this case is the value of MIDR_EL1.Revision as specified in the ARMv8 architecture (section D7.2.66).

That register is defined as “An implementation-defined revision number for the device.” That is not the revision of the ARMv8 architecture. It is the chip revision. Individual supported features are listed in the various identification registers because ARMv8 implementations have a lot of freedom to choose which features they implement.

For example, ARMv8.2 added support for 16-bit floating point. It is indicated by the value of MVIDR_EL1.FPHP. And the kernel reports that feature as “fphp” under the list of “Features.” That is visible in the /proc/cpuinfo output.

Interesting…

The 1.1GHz output is what I got on my Xavier (JetPack 4.2), and the -m 0 and -m 2 made no difference. Then I did SOMETHING that forced the /etc/nvpmodel.conf to be parsed and, wa-la, the -m 0 and -m 2 make a difference. (I tried the example which talked about “pm.conf”, which came up just exactly wrong and complained that the file was not found, and to fix it I did “nvpmodel -p -f /etc/nvpmodel.conf --verbose”).

Now:

nvpmodel -m 0 ::= 8 Cores 2.2GHz MAXN Power
nvpmodel -m 2 ::= 4 Cores 1.1GHz 15W Power

I was finding the builds to be grindingly slow, given based on bogomips these 1.1GHz cores are 50X slower than what I’m used to.

Now building MarchingCubes:

-m 0 ::= real 59.4s 3.2w
-m 2 ::= real 2m4.085s 1.2W

Single-threaded build - on paper the 2.2Ghz cores are 2X faster, which makes this about right…