Independent core control on TX1 ? (also on TK1 and TX2)

Hi,

I have tried to control cores to compare the power consumption when using a single core and when using 4 cores. However, when using the userspace governor, fixing the frequency and disabling some cores I do not measure any consumption difference when enabling all cores and when enabling only one core.

Does it mean that disabling cores is just at the OS level and on the hardware side, they continue to work at the same frequency (even when idle mode is enabled) and draw the same amount of power ?
Is it possible to disable independent cores at the hardware level ?
Is it also possible on the TK1 ?
I have just ordered a TX2 and I expect that turning a cluster off may change the power consumption a lot !

Thanks !

I can’t give a complete answer, but power consumption on various cores will depend greatly on the kind of loads presented. Hardware-dependent loads will require CPU0. The scheduler will not evenly distribute a single software-dependent load across all cores…this would cause cache misses and that single thread would be slowed. Many system processes don’t even use much CPU power most of the time.

If you were to create a program which spawns one thread for each core other than CPU0, and if the thread did nothing but CPU-intense operations (e.g., unending math problems which do not change the amount of memory used), and if you were to offer a means to enable and disable a given thread, then I think you would get a better idea of what real power differences there are when a given core is used or discarded…and then do the same thing with different core performance settings.

My problem is in fact not (only) related to my parallel codes. Here are some more information:
When I run a single threaded code on a single core (affinity set to CPU0 for example) with the userspace governor and a fixed frequency, all the cores are automatically set to the same frequency (htop) even if the code is single threaded.
I measure the time and the energy consumption for this run.
When I compare the same code with the ondemand governor with a fixed max frequency set to the same frequency as in the previous test, then I can see that some cores are disabled but the time and the energy consumption are the same (with a very long run since the ondemand governor does not set immediatly the frequency to the max).
Note that the CPU fan is not powered by the board but by a separate power supply and everything is disconnected (USB, HDMI, SATA, …) from the board, the GPU is set to its lowest frequency.
So, I wonder if idle processors are in fact really idle since I cannot measure any difference in power consumption.
I presumed that idle cores were turned into a sleep state, partially disabled but it does not seem to be the case.

Thus, when I further compare with the parallel version of my code running on 4 cores, and measure the energy consumption for 60 seconds (less than the execution time of both versions) I don’t see a significant difference between the two.
This test tends to confirm that idle cores are not really idle since in the ideal case 1 single core may require 1/4 less energy than 4 cores at the same frequency.

perhaps there is not much drop in power consumption because the remaining cores, in an effort to replace compute work done by the dropped core, work at an increased level thus using the power consumption formerly used by the dropped core.

I do not know anything about the internals of how the power/performance settings are achieved, but I could see the possibility that you are right since some parts of the system probably have to share various clocks (e.g., most memory clocks won’t change depending on which CPU core uses it, though cache could). Linux itself does have facilities for core affinity, but I think various power and clock settings are hardware-specific and part of Tegra drivers…as to what actually goes on in the hardware when the driver makes a change I don’t know.

I agree with you (dkryder and linuxdev), but it means that it is possible to consume even less energy if cores were more independant.
Someone at Nvidia can confirm this behaviour ? Is it an hardware or a software limitation ?
Is it specific to Nvidia implementations ?

TK1 or TX2 should show less power consumption at CPU level if compared when the CPU is loaded and when the CPU is killed. We have Dynamic Voltage and Freq Scaling implemented along with may power optimization features, that even if the core is ON but if idling, there will be negligible power consumption.
Apart from that TX2 is a hetro core dual cluster. You will definitely see power reduction when you stop the cores and the cluster.
Lastly if a core is shown as offline in linux kernel, then its indeed offline. there is no catch there.