I happened to notice that the system monitor on my Jetson shows relatively heavy load on cpu0 and cpu1, but nothing at all on cpu2 and cpu3.
I tried to use taskset to move a process to one of those cpus and it tells me that I can’t. The current affinity mas is 0x3, and I can’t seem to be able to set the upper two mask bits.
ubuntu@tegra-ubuntu:/etc$ taskset -p 0x4 1706
taskset: failed to set pid 1706's affinity: Invalid argument
Trying to set all of the bits just lops off the upper two:
ubuntu@tegra-ubuntu:/etc$ taskset -p 0xf 1706
pid 1706's current affinity mask: 3
pid 1706's new affinity mask: 3
I’ve never tried doing anything with taskset before, so maybe I am just being dumb, but it sure does look to me on the surface like cpu2 and cpu3 are not there.
Further evidence: cpufrequtils claims an ERROR in dmesg. After messing with the script to explicitly try each CPU individually, I find that I can set the governor on cpu0 and cpu1, but not cpu2 or cpu3.
peba is almost certainly correct that cores 2 and 3 are turned off. In addition, I thought I’d let you know that it is not necessarily to individually set cpufreq settings for the different cores in the high performance cluster, setting one will affect all. You can see that with:
Any time you want to test if all CPUs kick in just compile a kernel with “make -j4”. This provides a long and steady use of all 4 CPUs if they are available.
Just to clarify the comment on the first post of this thread. Even if the all 4 cores are forced online, they would be idle, if the application is using only 2 threads.
Build systems (like the make -j4 above) can practically always benefit from arbitrary number of cores.
Applications may have their own, sometimes broken, heuristics to determine how many threads to use. E.g. I think the Kodi (XBMC) at least used to check how many cores were online when it started and then used that many threads to e.g. decode videos. I think on a normal PC the cores are always online while on ARM devices they are often taken automatically offline when idle to save power.
I was experimenting to see if I could get better performance for some audio processing by forcing the JACK server to execute on a single processor. I’m trying to push the latency of it down as low as I can get it.
It worked, too.
I can get it down to just under 3ms with no buffer underruns if I confine it to a core. If I let the system bounce it around, I occasionally get underruns even with twice the buffering.
Yes, it works. The original problem was that I was trying to push the process to a CPU that wasn’t enabled, so it would error out.
Some example commands:
taskset -p 1403
This will show you the affinity mask of the job with process ID 1403. The mask represents which CPUs the process can run on.
taskset -p 0x5 1403
This will cause the affinity mask for process 1403 to be changed to 0x5 (binary 0101), which will allow the process to run on cpu0 or cpu2. (This is where I hit my error. I had specified 0x8 to push a task to cpu3, but cpu3 was asleep.)
You can specify the cpus numerically with the “-c” argument instead of as a hex mask.
taskset -p -c 0,2 1403
Same as above. Allows the process to run on cpu0 or cpu2.
You can also use it to launch new processes instead of modifying existing processes.
taskset 7 xterm
This will launch an xterm with an affinity mask of 0x7, allowing it to run on cpu0, cpu1, or cpu2.
It usually isn’t necessary to try and do these sorts of things. The scheduler does a pretty darn good job of handling thread assignments. I just wanted to try this as an experiment to see if I could squeeze as much perf as I could out of the audio layers I am running.
I commented that “it worked, too”, but I think I was somewhat mistaken on the results. I’m using the jetson over VNC with a bunch of audio apps running that have VU meters bouncing around and lighting up. This causes the Xtightvnc server process to have a pretty heavy load as I pumps out desktop display updates.
Confining that process to one CPU keeps it out of the way of my audio apps. I’m still tweaking.