OpenMP application appears to use only 2 of 4 cores

I have a computation written in C++ for which I use

#pragma omp parallel for

in several loops. If I run this on a Raspberry Pi or an Odroid N2 (both of which have at least 4 cores) I can tell from “top” that 4 cores are running at 85-95% CPU load, but if I do this on the nano, only CPU2 and CPU3 carry a load, while CPU0 and CPU1 are almost completely idle.

I’ve tried setting the environment variable OMP_NUM_THREADS=4 as well as a function call omp_set_num_threads(4), and neither of these makes a difference.

Is there some other setting, or is this an OS design issue?

Did you check with tegrastats?

I’m not sure what you have in mind, but the CPU utilization output of tegrastats is similar to what I get using top, unlike other SoCs that I own, for which the utilization is at least 85% for all four fast cores. My conclusion is that somewhere a priority is being set such that my request to run with 4 threads is either ignored, or else 4 threads are forced onto 2 cores.

Could you run the sudo jetson_clcoks then try it again.
Could you attached the binary file here?

I ran

sudo jetson_clocks

There’s no output.

Then I started my application again, and this time all 4 cores have loads over 82%, as I’ve seen with other multi-core SoCs.

What happened? I couldn’t decipher what jetson_clocks does, and whether I need to worry about (1) heating and (2) the action persists across reboots.

The problem could be some CPUs were idle and Jetson_clocks bring them out of idle.

echo 1 > /sys/devices/system/cpu/cpu*/cpuidle/state*/disable

Without taking that step, I get this:

% cat /sys/devices/system/cpu/cpu*/cpuidle/state*/disable
1
1
1
1
1
1
1
1

Maybe this is a result of jetson_clocks.

Where is this documented?

An addendum:

With all disable files having the value 1, I just discovered that my latest run (via crontab) is now showing the same 2 out of 4 core utilization. This job is running after I ran jetson_clocks (and then ran the same job briefly by hand).

I’ve done another test: running the job via crontab shows 2 cores at 80%, 2 cores at 25%. Running the same job from the command line shows all 4 cores at 85%.

I don’t understand this.

How do you run the job via crontab?

Sorry, my latest post apparently didn’t make it in. I discovered that the shell that runs under crontab had a holdover ‘taskset’ line from another SoC (Ordroid N2) that was not appropriate for the nano. When I removed that line, crontab jobs are now fully loaded.