Idle CPUs

I’ve just noticed that 2 out of 6 CPU cores are idle and not getting scheduled.

The TX2 is configured as MAXN so I would expect all cores to be utilized.

If I change the power mode I can successfully disable the affected cores.
Do I have a defective TX2?

I found this, not sure it helps

kern.log:Mar 26 19:01:14 texy-desktop kernel: [775215.001404] CPU1: shutdown
kern.log:Mar 26 19:01:14 texy-desktop kernel: [775215.004318] psci: CPU1 killed.
kern.log:Mar 26 19:01:14 texy-desktop kernel: [775215.077037] CPU2: shutdown
kern.log:Mar 26 19:01:14 texy-desktop kernel: [775215.080588] psci: CPU2 killed.
kern.log:Mar 26 19:01:42 texy-desktop kernel: [775242.928058] CPU1: Booted secondary processor [4e0f0030]
kern.log:Mar 26 19:01:42 texy-desktop kernel: [775242.953392] CPU2: Booted secondary processor [4e0f0030]
kern.log:Mar 26 19:01:55 texy-desktop kernel: [775255.681205] CPU1: shutdown
kern.log:Mar 26 19:01:55 texy-desktop kernel: [775255.686231] psci: CPU1 killed.
kern.log:Mar 26 19:01:55 texy-desktop kernel: [775255.763399] CPU2: shutdown
kern.log:Mar 26 19:01:55 texy-desktop kernel: [775255.766277] psci: CPU2 killed.
kern.log:Mar 26 19:02:06 texy-desktop kernel: [775266.333443] CPU1: Booted secondary processor [4e0f0030]
kern.log:Mar 26 19:02:06 texy-desktop kernel: [775266.341223] CPU2: Booted secondary processor [4e0f0030]
kern.log:Mar 26 19:02:07 texy-desktop kernel: [775267.890048] CPU1: shutdown
kern.log:Mar 26 19:02:07 texy-desktop kernel: [775267.899664] psci: CPU1 killed.
kern.log:Mar 26 19:02:07 texy-desktop kernel: [775268.013892] CPU2: shutdown
kern.log:Mar 26 19:02:07 texy-desktop kernel: [775268.019480] psci: CPU2 killed.
kern.log:Mar 26 19:02:11 texy-desktop kernel: [775271.478157] CPU1: Booted secondary processor [4e0f0030]
kern.log:Mar 26 19:02:11 texy-desktop kernel: [775271.490926] CPU2: Booted secondary processor [4e0f0030]

If you run command “cat /proc/cmdline” you will find this near the end of the kernel command line:
isolcpus=1-2
…which means only tasks specifically scheduled on those cores will run (these are the Denver cores…core 0 is the first core, cores 1 and 2 are Denver).

In command line parameters, if those parameters are repeated, then in most cases the last version of the parameter is the one actually used. This means that if you see “isolcpus=1-2”, and this is the only occurrence of this, then you could simply edit this out of the “/boot/extlinux/extlinux.conf” file’s “APPEND” line and the parameter would no longer isolate those 2 cores; or, you could put this at the end of the APPEND line (which is a space-delimited line), and this would also disable that feature:
isolcpus=

FYI, you would use “taskset” if you wanted to manually schedule something on one of those cores.

Well that certainly explains what is happening. I compared with another TX2 which is using all cores and I’m curious as to how this came about. Both TX2’s were provisioned the same way.

Working TX2

root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 console=ttyS0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 video=tegrafb no_console_suspend=1 earlycon=uart8250,mmio32,0x3100000 nvdumper_reserved=0x2772e0000 gpt tegra_fbmem2=0x140000@0x9607d000 lut_mem2=0x2008@0x9607a000 usbcore.old_scheme_first=1 tegraid=18.1.2.0.0 maxcpus=6 boot.slot_suffix= boot.ratchetvalues=0.2031647.1 bl_prof_dataptr=0x10000@0x275840000 sdhci_tegra.en_boot_part_access=1 quiet

Bad TX2

console=ttyS0,115200 androidboot.presilicon=true firmware_class.path=/etc/firmware root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 console=ttyS0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 isolcpus=1-2 video=tegrafb no_console_suspend=1 earlycon=uart8250,mmio32,0x3100000 nvdumper_reserved=0x2772e0000 gpt usbcore.old_scheme_first=1 tegraid=18.1.2.0.0 maxcpus=6 boot.slot_suffix= boot.ratchetvalues=0.2031647.1 bl_prof_dataptr=0x10000@0x275840000 sdhci_tegra.en_boot_part_access=1 quiet root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 console=ttyS0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 isolcpus=1-2

Thanks for your response.

Originally (earlier L4T releases) isolcpus was not used. I don’t know the actual story behind adding this, but I will guess it went something like this: The Denver cores have advantages and disadvantages, such as lower power consumption for a given amount of work done, but higher latency at starting up, e.g., loading microcode. People would have been profiling over time to see about performance, and may have noticed the Denver cores had more latency. Those cores are best manually assigned to tasks appropriate for longer batch operations, and some people probably started manually assigning those tasks. On the other hand, it doesn’t help much to assign a task there if other processes are also randomly running there, so I will guess that this is when “isolcpus=1-2” was added. There just wasn’t much documentation to make it obvious of why it was there. Since it is just a kernel command line parameter it takes about 10 seconds of editing to remove it, but unless someone is aware of it, then it is a “big deal”.

1 Like