Source of scaling_available_frequencies Orin NX 8GB Jetpack 5.1.2

Hello,

I am monitoring CPU frequency during a stress test where I stress CPU + GPU on my Orin NX 8gb in 20W mode with jetson_clocks run.

During the test, I record the output of tegrastats.

While it’s running I see the CPU frequencies “dip”, but no Overcurrent (OC1/2/3) events recorded.

I also see corresponding events in dmesg saying:

[  587.529068] cpufreq: cpu0,cur:1639000,set:1497600,set ndiv:117
[  602.577336] cpufreq: cpu0,cur:339000,set:1497600,set ndiv:117
[  603.680896] cpufreq: cpu4,cur:1379000,set:1497600,set ndiv:117
[  615.623390] cpufreq: cpu0,cur:1617000,set:1497600,set ndiv:117
[  622.356034] cpufreq: cpu0,cur:1368000,set:1497600,set ndiv:117
[  624.475794] cpufreq: cpu0,cur:245000,set:1497600,set ndiv:117
[  659.973306] cpufreq: cpu0,cur:1616000,set:1497600,set ndiv:117

Which seems to come from this function:

static unsigned int tegra194_get_speed(u32 cpu)
{
	struct tegra194_cpufreq_data *data = cpufreq_get_driver_data();
	u32 clusterid = data->phys_ids[cpu].clusterid;
	struct cpufreq_frequency_table *pos;
	unsigned int rate;
	u64 ndiv;
	int ret;

	/* reconstruct actual cpu freq using counters */
	rate = tegra194_calculate_speed(cpu);

	/* get last written ndiv value */
	ret = data->soc->ops->get_cpu_ndiv(cpu, data->phys_ids[cpu].cpuid, clusterid, &ndiv);
	if (WARN_ON_ONCE(ret))
		return rate;

	/*
	 * If the reconstructed frequency has acceptable delta from
	 * the last written value, then return freq corresponding
	 * to the last written ndiv value from freq_table. This is
	 * done to return consistent value.
	 */
	cpufreq_for_each_valid_entry(pos, data->tables[clusterid]) {
		if (pos->driver_data != ndiv)
			continue;

		if (abs(pos->frequency - rate) > 115200) {
			pr_info("cpufreq: cpu%d,cur:%u,set:%u,set ndiv:%llu\n",
				cpu, rate, pos->frequency, ndiv);
		} else {
			rate = pos->frequency;
		}
		break;
	}
	return rate;
}

What I don’t understand yet is if this is the CPU actually throttling/changing frequency, or if it’s a measurement artifact.

To prove whether or not its a measurement artifact I want to limit the scaling_available_frequencies to just 1497000 and re-run the test.
I run jetson clocks before the test so I know the CPU scaling should be locked, but I don’t know why I get those “dipping” events.

I see in Jetpack 6 there’s opp cluster tables in the device tree, but I don’t see such tables in jetpack 5.1.2.

What would be the best way to limit the scaling_available_frequencies so that I can prove whether or not the CPU freq is actually scaling or if its just tegrastats?

Hi,
Please share the sudo tegrastats for reference. Probably it it high temperature triggering the throttling.

Hi @DaneLLL, thanks for your response.

I don’t have the tegrastats output directly, I parse it and add it to this logfile format with other data I am collecting, but it can be seen here:

baseline_log_file_20250207_022353.txt (540.0 KB)

The first few columns are taken directly from sudo tegrastats output.

I don’t see any messages in dmesg indicating thermal throttling and the temperature from tegrastats looks normal.

Hi,
Would be great if you can share output of sudo tegrastats directly. We are more familiar with the interface.

Or please share a method to replicate it on developer kit. Then we will set up and check.

Hi,

That makes sense, I can work on getting you the direct output tomorrow.

What I am running:

sudo tegrastats

while running the following stressors:

CPU:

stress-ng --cpu 9 --hdd 4 --temp-path /tmp --vm 2 --vm-bytes 100M &

GPU:

/usr/src/tensorrt/bin/trtexec --loadEngine=~/yolo_04_09_1.engine --fp16 --useSpinWait --streams=1 --iterations=1000000 &

With this engine (zipped for posting):
yolo_04_09_1.engine.zip (29.4 MB)

And this camera command to stress VI/ISP/Encoding Pipelines:

gst-launch-1.0 nvarguscamerasrc sensor-id=0 ! "video/x-raw(memory:NVMM),width=3840,height=2160,framerate=30/1" ! nvv4l2h264enc bitrate=8000000 ! h264parse ! nvv4l2decoder ! nvvidconv ! "video/x-raw,format=(string)I420" ! fakesink \
nvarguscamerasrc sensor-id=1 ! "video/x-raw(memory:NVMM),width=3840,height=2160,framerate=30/1" ! nvv4l2h264enc bitrate=8000000 ! h264parse ! nvv4l2decoder ! nvvidconv ! "video/x-raw,format=(string)I420" ! fakesink &

Hi,
We tested on developer kit and didn’t observe it on developer kit in 20W mode. We observed it in MAXN mode. It was because total power was achieving 20W and triggered throttling. You may get sudo tegrastats and check if VDD_IN is close to 20W.

Hi, Thank you for your reply,

When you ran on your developer kit did you have jetson_clocks enabled too? I seem to be able to reproduce this in 20W mode.

I will also check if VDD_IN is close to 20W.

Hi @DaneLLL,

I was able to get a capture of just tegrastats by itself for you, where my system is hitting what seems to be throttling.

Raw tegrastats logs:

tegrastats_output.log (232.6 KB)

dmesg output during run:

[708154.293894] cpufreq: cpu0,cur:1619000,set:1497600,set ndiv:117
[708157.601617] cpufreq: cpu0,cur:7000,set:1497600,set ndiv:117
[708159.759479] cpufreq: cpu4,cur:1618000,set:1497600,set ndiv:117
[708168.849308] cpufreq: cpu0,cur:1614000,set:1497600,set ndiv:117
[708175.038143] cpufreq: cpu0,cur:1362000,set:1497600,set ndiv:117
[708189.707994] cpufreq: cpu4,cur:1361000,set:1497600,set ndiv:117
[708195.216670] cpufreq: cpu4,cur:1381000,set:1497600,set ndiv:117
[708207.238345] cpufreq: cpu4,cur:1330000,set:1497600,set ndiv:117
[708218.925976] cpufreq: cpu0,cur:277000,set:1497600,set ndiv:117
[708260.823269] cpufreq: cpu0,cur:53000,set:1497600,set ndiv:117
[708272.891206] cpufreq: cpu0,cur:376000,set:1497600,set ndiv:117
[708276.372866] cpufreq: cpu0,cur:1617000,set:1497600,set ndiv:117
[708278.845591] cpufreq: cpu4,cur:1377000,set:1497600,set ndiv:117
[708289.194654] cpufreq: cpu0,cur:1346000,set:1497600,set ndiv:117
[708292.934253] cpufreq: cpu0,cur:1616000,set:1497600,set ndiv:117
[708301.197296] cpufreq: cpu4,cur:1349000,set:1497600,set ndiv:117
[708312.121121] cpufreq: cpu0,cur:74000,set:1497600,set ndiv:117
[708319.723671] cpufreq: cpu0,cur:1381000,set:1497600,set ndiv:117
[708323.241381] cpufreq: cpu4,cur:1370000,set:1497600,set ndiv:117
[708334.418265] cpufreq: cpu0,cur:104000,set:1497600,set ndiv:117
[708346.829095] cpufreq: cpu4,cur:1370000,set:1497600,set ndiv:117
[708355.709759] cpufreq: cpu0,cur:1381000,set:1497600,set ndiv:117
[708370.653527] cpufreq: cpu0,cur:1643000,set:1497600,set ndiv:117
[708380.705307] cpufreq: cpu0,cur:1318000,set:1497600,set ndiv:117
[708407.869463] cpufreq: cpu0,cur:1622000,set:1497600,set ndiv:117
[708427.349122] cpufreq: cpu0,cur:108000,set:1497600,set ndiv:117
[708435.115894] cpufreq: cpu0,cur:60000,set:1497600,set ndiv:117
[708435.381600] cpufreq: cpu0,cur:1380000,set:1497600,set ndiv:117
[708437.181875] cpufreq: cpu0,cur:57000,set:1497600,set ndiv:117
[708448.913720] cpufreq: cpu0,cur:31000,set:1497600,set ndiv:117
[708472.284005] cpufreq: cpu0,cur:1379000,set:1497600,set ndiv:117
[708485.473256] cpufreq: cpu0,cur:1357000,set:1497600,set ndiv:117
[708501.705868] cpufreq: cpu0,cur:1375000,set:1497600,set ndiv:117
[708538.229998] cpufreq: cpu0,cur:1649000,set:1497600,set ndiv:117
[708551.419643] cpufreq: cpu0,cur:1372000,set:1497600,set ndiv:117
[708576.785355] cpufreq: cpu0,cur:1376000,set:1497600,set ndiv:117
[708712.737817] cpufreq: cpu0,cur:1365000,set:1497600,set ndiv:117
[708738.102871] cpufreq: cpu0,cur:1372000,set:1497600,set ndiv:117

output of nvpmodel -q:

$ sudo nvpmodel -q
NV Power Mode: 20W
3

My nvpmodel.conf:
nvpmodel.conf.txt (7.6 KB)

Hi,
So it is VDD_IN close to 20W limit and throttling is triggered. This behavior is expected.

Hi Dane,

Why do you think I am able to hit the limit in 20W mode but you are not? Is it minor differences in hardware, or the voltage we are running the system at?

Also how can I see the throttling events being reported? I don’t see any increase in OC1/2/3 counts.

From what I can tell the max VDD_IN I hit during my tests is around 18k mW, not the 20k max.

Hi,
It will not exceed 20k mW and when it close to the limit, throttling is triggers. It is shown in VDD_IN 18222mW/16434mW

Thanks, that makes sense.

If its not OC1 (VDD_IN average) or OC2 (VDD_IN instantaneous), how do I see the trigger for the throttling event ?

Hi,
Please check if the OC count increases once loading gets lower. The OC event counter will be increased by 1 when throttling event is asserted and de-asserted once. It is not counted when the system is still in throttling.

Hi Dane, it seems the OC counters do not increment even well after the test is complete and all the stressors have been turned off. Are there any other paths for throttling, or could it be measurement error like I had originally thought?

Hi,
Please check this section:

Jetson Orin Nano Series, Jetson Orin NX Series and Jetson AGX Orin Series — Jetson Linux Developer Guide documentation

And compare the nodes between developer kit(Orin NX module + Orin Nano carrier board) and the custom carrier board. And check which nodes have different behavior. Ideally we should see INA3221 run identically on Orin Nano carrier board and custom carrier board. Probably there is deviation due to hardware design of custom board.

Hi, I have compared the nodes and put them in this table:

They look to be correct, where the crit and max values for 1 seem to be 20W and 25W respectively.

(I’m not sure why the values for curr2+ max and crit are so high, is that normal? OC1/2/3 only seems to have to deal with VDD_IN, I don’t see much documentation for the rest.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.