Orin NX 16G frequency locking failed

hardware: DevelopKit/Custom board

software version: JetPack5.1.3/JetPack5.1.5

Unable to lock frequency during Orin NX stress testing.

Here is my test command:

sudo nvpmodel -m 0
sudo jetson_clocks
stress-ng --cpu $(nproc) --cpu-method matrixprod --cpu-load  95 &
stress-ng --vm 6 --vm-bytes 500M --vm-method all &
./matrixMulCUBLAS --sizemult=8 &

The following is the Orin NX print after running for a few minutes:

01-01-1970 08:18:49 RAM 3047/15503MB (lfb 2799x4MB) SWAP 2/7752MB (cached 0MB) CPU [100%@266,100%@1984,100%@255,100%@882,100%@332,100%@1832,100%@816,100%@1984] EMC_FREQ 39%@3199 GR3D_FREQ 99%@[917,0] VIC_FREQ 729 APE 174 CV0@80.281C CPU@83.718C SOC2@80.281C SOC0@76.406C CV1@76.937C GPU@87.875C tj@87.875C SOC1@75.281C CV2@74.718C VDD_IN 24489mW/24437mW VDD_CPU_GPU_CV 13599mW/13752mW VDD_SOC 4827mW/4805mW
01-01-1970 08:18:50 RAM 3047/15503MB (lfb 2799x4MB) SWAP 2/7752MB (cached 0MB) CPU [100%@291,100%@1984,100%@257,100%@1619,100%@1578,100%@1984,100%@253,100%@1984] EMC_FREQ 39%@3199 GR3D_FREQ 99%@[918,0] VIC_FREQ 729 APE 174 CV0@80.375C CPU@83.75C SOC2@80.343C SOC0@76.468C CV1@77.062C GPU@87.625C tj@87.625C SOC1@75.218C CV2@74.75C VDD_IN 24489mW/24437mW VDD_CPU_GPU_CV 13593mW/13752mW VDD_SOC 4827mW/4805mW
01-01-1970 08:18:51 RAM 3046/15503MB (lfb 2799x4MB) SWAP 2/7752MB (cached 0MB) CPU [100%@273,100%@1860,100%@956,100%@1984,100%@1207,100%@1984,100%@1984,100%@449] EMC_FREQ 38%@3199 GR3D_FREQ 99%@[918,0] VIC_FREQ 729 APE 174 CV0@80.281C CPU@83.937C SOC2@80.25C SOC0@76.343C CV1@77.125C GPU@87.656C tj@87.656C SOC1@75.218C CV2@74.875C VDD_IN 24489mW/24437mW VDD_CPU_GPU_CV 13750mW/13752mW VDD_SOC 4827mW/4805mW
01-01-1970 08:18:53 RAM 3044/15503MB (lfb 2799x4MB) SWAP 2/7752MB (cached 0MB) CPU [100%@1326,100%@1393,100%@2158,100%@250,100%@1026,100%@1984,100%@248,100%@1984] EMC_FREQ 39%@3199 GR3D_FREQ 99%@[918,0] VIC_FREQ 729 APE 174 CV0@80.312C CPU@83.656C SOC2@80.375C SOC0@76.343C CV1@76.968C GPU@87.687C tj@87.687C SOC1@75.25C CV2@74.656C VDD_IN 24489mW/24437mW VDD_CPU_GPU_CV 13599mW/13751mW VDD_SOC 4827mW/4805mW
01-01-1970 08:18:54 RAM 3046/15503MB (lfb 2799x4MB) SWAP 2/7752MB (cached 0MB) CPU [100%@1078,100%@1984,100%@1984,100%@254,100%@1984,100%@1384,100%@1152,100%@254] EMC_FREQ 38%@3199 GR3D_FREQ 99%@[918,0] VIC_FREQ 729 APE 174 CV0@80.281C CPU@83.906C SOC2@80.5C SOC0@76.343C CV1@76.906C GPU@87.718C tj@87.718C SOC1@75.375C CV2@74.781C VDD_IN 24489mW/24437mW VDD_CPU_GPU_CV 13599mW/13751mW VDD_SOC 4827mW/4805mW
01-01-1970 08:18:55 RAM 3047/15503MB (lfb 2799x4MB) SWAP 2/7752MB (cached 0MB) CPU [100%@565,100%@1984,100%@1273,100%@724,100%@1984,100%@246,100%@1984,100%@299] EMC_FREQ 38%@3199 GR3D_FREQ 99%@[917,0] VIC_FREQ 729 APE 174 CV0@79.937C CPU@83.718C SOC2@80.437C SOC0@76.312C CV1@76.875C GPU@87.656C tj@87.656C SOC1@75.343C CV2@74.687C VDD_IN 24489mW/24437mW VDD_CPU_GPU_CV 13744mW/13751mW VDD_SOC 4827mW/4805mW
01-01-1970 08:18:56 RAM 3049/15503MB (lfb 2799x4MB) SWAP 2/7752MB (cached 0MB) CPU [100%@279,100%@481,100%@612,100%@1984,100%@1003,100%@328,100%@1984,100%@448] EMC_FREQ 38%@3199 GR3D_FREQ 99%@[917,0] VIC_FREQ 729 APE 174 CV0@80.062C CPU@83.718C SOC2@80.375C SOC0@76.468C CV1@76.968C GPU@87.593C tj@87.593C SOC1@75.281C CV2@74.812C VDD_IN 24489mW/24437mW VDD_CPU_GPU_CV 13744mW/13751mW VDD_SOC 4827mW/4805mW
01-01-1970 08:18:57 RAM 3046/15503MB (lfb 2799x4MB) SWAP 2/7752MB (cached 0MB) CPU [100%@733,100%@348,100%@1984,100%@252,100%@1984,100%@256,100%@1984,100%@745] EMC_FREQ 38%@3199 GR3D_FREQ 99%@[917,0] VIC_FREQ 729 APE 174 CV0@80.531C CPU@83.812C SOC2@80.468C SOC0@76.343C CV1@77.031C GPU@87.656C tj@87.656C SOC1@75.281C CV2@74.843C VDD_IN 24489mW/24437mW VDD_CPU_GPU_CV 13750mW/13751mW VDD_SOC 4827mW/4805mW
01-01-1970 08:18:58 RAM 3047/15503MB (lfb 2799x4MB) SWAP 2/7752MB (cached 0MB) CPU [100%@1984,100%@333,100%@1984,100%@259,100%@1984,100%@259,100%@1984,100%@257] EMC_FREQ 38%@3199 GR3D_FREQ 99%@[918,0] VIC_FREQ 729 APE 174 CV0@80.125C CPU@83.75C SOC2@80.437C SOC0@76.343C CV1@77.093C GPU@87.75C tj@87.75C SOC1@75.281C CV2@74.781C VDD_IN 24489mW/24437mW VDD_CPU_GPU_CV 13599mW/13751mW VDD_SOC 4827mW/4805mW
01-01-1970 08:18:59 RAM 3045/15503MB (lfb 2799x4MB) SWAP 2/7752MB (cached 0MB) CPU [100%@1984,100%@258,100%@1702,100%@358,100%@1861,100%@264,100%@1801,100%@907] EMC_FREQ 38%@3199 GR3D_FREQ 99%@[918,0] VIC_FREQ 729 APE 174 CV0@79.906C CPU@83.75C SOC2@80.375C SOC0@76.375C CV1@76.875C GPU@87.843C tj@87.843C SOC1@75.375C CV2@74.625C VDD_IN 24489mW/24437mW VDD_CPU_GPU_CV 13750mW/13751mW VDD_SOC 4827mW/4805mW
01-01-1970 08:19:00 RAM 3047/15503MB (lfb 2799x4MB) SWAP 2/7752MB (cached 0MB) CPU [100%@615,100%@422,100%@270,100%@1984,100%@260,100%@1984,100%@1852,100%@1692] EMC_FREQ 38%@3199 GR3D_FREQ 99%@[918,0] VIC_FREQ 729 APE 174 CV0@80.187C CPU@83.687C SOC2@80.375C SOC0@76.437C CV1@76.843C GPU@87.656C tj@87.656C SOC1@75.437C CV2@74.781C VDD_IN 24489mW/24438mW VDD_CPU_GPU_CV 13750mW/13751mW VDD_SOC 4827mW/4805mW

I have encountered the problem of frequency locking failure when using the purchased development kit or my own designed hardware for testing.

I tried to flash JetPack5.1.3 and JetPack5.1.5, but both experienced frequency locking failure.

Please help me find the reason for the failure of frequency locking. Thanks

Hi jack.yan,

Could you also verify with the latest Jetpack 6.1.2(r36.4.3) to verify if there’s the similar issue?

Please reboot the device after configuring the power mode.

Hi,KevinFFF:

Our algorithm engineers require the system version to be JetPack5.1.5, so testing JetPack6.1.2 is not very helpful for us. I need to solve the problem of JetPack5.1.5 frequency locking failure.

I have tested the restart method and still cannot lock the frequency successfully.

Hi,KevinFFF:

When I was conducting a stress test, I placed a large fan externally facing the Orin NX 16G and found that there was no issue with frequency reduction. I would like to know if it is possible to modify the frequency reduction strategy? Or how is the frequency reduction strategy implemented on Orin NX now? I want to make modifications.

Is the frequency reduction issue caused from the overheat on the module so there’s no similar issue after you add the fan?

Please check if Jetson Orin Nano Series, Jetson Orin NX Series and Jetson AGX Orin Series — NVIDIA Jetson Linux Developer Guide 1 documentation could help for your case.

Hi,KevinFFF:

From the testing phenomenon, yes;The development kit comes with a built-in fan, which will still reduce frequency during pressure testing. If I were to add a set of fans, there would be no frequency reduction.

This is very helpful, so if I want to limit the CPU from downshifting during stress testing, can I do it by setting cpuinfo_min_freq?

I set the maximum fan speed:

sudo jetson_clocks

After running the stress test command, the CPU frequency decreased from 1984 to 49 at 49.4 ° C

01-01-1970 00:15:10 RAM 2934/15503MB (lfb 2882x4MB) SWAP 1/7752MB (cached 0MB) CPU [100%@281,100%@1806,100%@346,100%@1339,100%@271,100%@1984,100%@625,100%@1984] EMC_FREQ 38%@3199 GR3D_FREQ 99%@[917,0] VIC_FREQ 729 APE 174 CV0@56.656C CPU@59.906C SOC2@58C SOC0@55.781C CV1@55.156C GPU@64.25C tj@64.25C SOC1@57.062C CV2@53.937C VDD_IN 23931mW/23754mW VDD_CPU_GPU_CV 12772mW/12529mW VDD_SOC 4671mW/4719mW
01-01-1970 00:15:11 RAM 2936/15503MB (lfb 2882x4MB) SWAP 1/7752MB (cached 0MB) CPU [100%@294,100%@459,100%@269,100%@252,100%@1015,100%@1984,100%@264,100%@1777] EMC_FREQ 38%@3199 GR3D_FREQ 99%@[917,0] VIC_FREQ 729 APE 174 CV0@56.593C CPU@60.031C SOC2@57.906C SOC0@55.781C CV1@55.187C GPU@64.062C tj@64.125C SOC1@57.187C CV2@54C VDD_IN 23931mW/23755mW VDD_CPU_GPU_CV 12772mW/12531mW VDD_SOC 4671mW/4719mW

read jetson_clocks:

root@tegra-ubuntu:/home/nvidia/stress# jetson_clocks --show
SOC family:tegra234  Machine:NVIDIA Orin NX Developer Kit
Online CPUs: 0-7
cpu0:  Online=1 Governor=schedutil MinFreq=1984000 MaxFreq=1984000 CurrentFreq=1984000 IdleStates: WFI=0 c7=0 
cpu1:  Online=1 Governor=schedutil MinFreq=1984000 MaxFreq=1984000 CurrentFreq=1984000 IdleStates: WFI=0 c7=0 
cpu2:  Online=1 Governor=schedutil MinFreq=1984000 MaxFreq=1984000 CurrentFreq=1984000 IdleStates: WFI=0 c7=0 
cpu3:  Online=1 Governor=schedutil MinFreq=1984000 MaxFreq=1984000 CurrentFreq=1984000 IdleStates: WFI=0 c7=0 
cpu4:  Online=1 Governor=schedutil MinFreq=1984000 MaxFreq=1984000 CurrentFreq=1984000 IdleStates: WFI=0 c7=0 
cpu5:  Online=1 Governor=schedutil MinFreq=1984000 MaxFreq=1984000 CurrentFreq=1984000 IdleStates: WFI=0 c7=0 
cpu6:  Online=1 Governor=schedutil MinFreq=1984000 MaxFreq=1984000 CurrentFreq=1984000 IdleStates: WFI=0 c7=0 
cpu7:  Online=1 Governor=schedutil MinFreq=1984000 MaxFreq=1984000 CurrentFreq=1984000 IdleStates: WFI=0 c7=0 
GPU MinFreq=918000000 MaxFreq=918000000 CurrentFreq=918000000
EMC MinFreq=204000000 MaxFreq=3199000000 CurrentFreq=3199000000 FreqOverride=1
DLA0_CORE:   Online=1 MinFreq=0 MaxFreq=614400000 CurrentFreq=614400000
DLA0_FALCON: Online=1 MinFreq=0 MaxFreq=294400000 CurrentFreq=294400000
DLA1_CORE:   Online=1 MinFreq=0 MaxFreq=614400000 CurrentFreq=614400000
DLA1_FALCON: Online=1 MinFreq=0 MaxFreq=294400000 CurrentFreq=294400000
PVA0_VPS0: Online=1 MinFreq=0 MaxFreq=1190400000 CurrentFreq=1190400000
PVA0_AXI:  Online=1 MinFreq=0 MaxFreq=857600000 CurrentFreq=857600000
FAN Dynamic Speed control=active hwmon2_pwm1=107
NV Power Mode: MAXN

I checked the frequency band range that supports setting:

root@tegra-ubuntu:/home/nvidia# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
115200 192000 268800 345600 422400 499200 576000 652800 729600 806400 883200 960000 1036800 1113600 1190400 1267200 1344000 1420800 1497600 1574400 1651200 1728000 1804800 1881600 1958400 1984000

I want to know how Orin NX selects which frequency to use for its CPU?

Do you mean jetson_clocks shows the frequency 1.984MHz but tegrastats shows lower frequencies?

Please also refer to the following thread to configure its governor
Need to change default nvpmodel mode & clock freq's - #10 by KevinFFF

Hi,KevinFFF:

yes.

Referring to the settings for this issue, I successfully set the CPU parameters, but there is no corresponding path for the GPU parameters:

root@tegra-ubuntu:/home/nvidia/stress# echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
performance
root@tegra-ubuntu:/home/nvidia/stress# 
root@tegra-ubuntu:/home/nvidia/stress# 
root@tegra-ubuntu:/home/nvidia/stress# echo performance | tee /sys/class/devfreq/*gpu/governor
tee: '/sys/class/devfreq/*gpu/governor': No such file or directory
performance
root@tegra-ubuntu:/home/nvidia/stress# 
root@tegra-ubuntu:/home/nvidia/stress# 
root@tegra-ubuntu:/home/nvidia/stress# ls /sys/class/devfreq/ -l
total 0
lrwxrwxrwx 1 root root 0 Jun 17  2024 15340000.vic -> ../../devices/platform/13e40000.host1x/15340000.vic/devfreq/15340000.vic
lrwxrwxrwx 1 root root 0 Jun 17  2024 15480000.nvdec -> ../../devices/platform/13e40000.host1x/15480000.nvdec/devfreq/15480000.nvdec
lrwxrwxrwx 1 root root 0 Jun 17  2024 154c0000.nvenc -> ../../devices/platform/13e40000.host1x/154c0000.nvenc/devfreq/154c0000.nvenc
lrwxrwxrwx 1 root root 0 Jan  1 00:01 17000000.ga10b -> ../../devices/platform/17000000.ga10b/devfreq/17000000.ga10b

I conducted a test that only ran a GPU stress testing program:

./matrixMulCUBLAS --sizemult=8 &

It was found that there was a significant decrease in CPU frequency after running.

Here is two questions:

1.How to Set GPU Governor Parameters Correctly?

2.Where is the temperature reference configuration file for setting CPU frequency reduction on the software?

See:

https://docs.nvidia.com/jetson/archives/r35.6.1/DeveloperGuide/SD/PlatformPowerAndPerformance/JetsonOrinNanoSeriesJetsonOrinNxSeriesAndJetsonAgxOrinSeries.html#thermal-zone

After reading the document description, I still don’t know how to adjust the temperature threshold, whether it’s a temporary setting method or a modification method in the device tree.

Looking forward to your help, thank you.

Hi,KevinFFF:

The thermal_zones configuration in orin nx is as follows(read from dtb file):

thermal-zones {
		status = "disabled";

		CPU-therm {
			status = "okay";
			polling-delay = <0x00>;
			polling-delay-passive = <0x3e8>;
			thermal-sensors = <0x0b 0x00>;

			trips {

				cpu-sw-shutdown {
					temperature = <0x19834>;
					type = "critical";
					hysteresis = <0x00>;
					phandle = <0x2c3>;
				};

				cpu-sw-throttle {
					temperature = <0x182b8>;
					type = "passive";
					hysteresis = <0x00>;
					phandle = <0x0c>;
				};

				cpu-hot-surface {
					temperature = <0x11170>;
					type = "active";
					hysteresis = <0x1f40>;
					phandle = <0x12>;
				};
			};

			cooling-maps {

				map0 {
					trip = <0x0c>;
					cooling-device = <0x0d 0xffffffff 0xffffffff 0x0e 0xffffffff 0xffffffff 0x0f 0xffffffff 0xffffffff 0x10 0xffffffff 0xffffffff>;
				};

				user-alert-map0 {
					trip = <0x0c>;
					cooling-device = <0x11 0x01 0x01>;
				};

				hot-surface-alert-map0 {
					trip = <0x12>;
					cooling-device = <0x13 0x01 0x01>;
				};
			};

			thermal-zone-params {
				governor-name = "step_wise";
			};
		};

		GPU-therm {
			status = "okay";
			polling-delay = <0x00>;
			polling-delay-passive = <0x3e8>;
			thermal-sensors = <0x0b 0x01>;

			trips {

				gpu-sw-shutdown {
					temperature = <0x19834>;
					type = "critical";
					hysteresis = <0x00>;
					phandle = <0x2c4>;
				};

				gpu-sw-throttle {
					temperature = <0x182b8>;
					type = "passive";
					hysteresis = <0x00>;
					phandle = <0x14>;
				};

				gpu-hot-surface {
					temperature = <0x11170>;
					type = "active";
					hysteresis = <0x1f40>;
					phandle = <0x16>;
				};
			};

			cooling-maps {

				map0 {
					trip = <0x14>;
					cooling-device = <0x0d 0xffffffff 0xffffffff 0x0e 0xffffffff 0xffffffff 0x0f 0xffffffff 0xffffffff 0x10 0xffffffff 0xffffffff>;
				};

				user-alert-map0 {
					trip = <0x14>;
					cooling-device = <0x15 0x01 0x01>;
				};

				hot-surface-alert-map0 {
					trip = <0x16>;
					cooling-device = <0x13 0x01 0x01>;
				};
			};

			thermal-zone-params {
				governor-name = "step_wise";
			};
		};

		CV0-therm {
			status = "okay";
			polling-delay = <0x00>;
			polling-delay-passive = <0x3e8>;
			thermal-sensors = <0x0b 0x02>;

			trips {

				cv0-sw-shutdown {
					temperature = <0x19834>;
					type = "critical";
					hysteresis = <0x00>;
					phandle = <0x2c5>;
				};

				cv0-sw-throttle {
					temperature = <0x182b8>;
					type = "passive";
					hysteresis = <0x00>;
					phandle = <0x17>;
				};

				cv0-hot-surface {
					temperature = <0x11170>;
					type = "active";
					hysteresis = <0x1f40>;
					phandle = <0x19>;
				};
			};

			cooling-maps {

				map0 {
					trip = <0x17>;
					cooling-device = <0x0d 0xffffffff 0xffffffff 0x0e 0xffffffff 0xffffffff 0x0f 0xffffffff 0xffffffff 0x10 0xffffffff 0xffffffff>;
				};

				user-alert-map0 {
					trip = <0x17>;
					cooling-device = <0x18 0x01 0x01>;
				};

				hot-surface-alert-map0 {
					trip = <0x19>;
					cooling-device = <0x13 0x01 0x01>;
				};
			};

			thermal-zone-params {
				governor-name = "step_wise";
			};
		};

		CV1-therm {
			status = "okay";
			polling-delay = <0x00>;
			polling-delay-passive = <0x3e8>;
			thermal-sensors = <0x0b 0x03>;

			trips {

				cv1-sw-shutdown {
					temperature = <0x19834>;
					type = "critical";
					hysteresis = <0x00>;
					phandle = <0x2c6>;
				};

				cv1-sw-throttle {
					temperature = <0x182b8>;
					type = "passive";
					hysteresis = <0x00>;
					phandle = <0x1a>;
				};

				cv1-hot-surface {
					temperature = <0x11170>;
					type = "active";
					hysteresis = <0x1f40>;
					phandle = <0x1c>;
				};
			};

			cooling-maps {

				map0 {
					trip = <0x1a>;
					cooling-device = <0x0d 0xffffffff 0xffffffff 0x0e 0xffffffff 0xffffffff 0x0f 0xffffffff 0xffffffff 0x10 0xffffffff 0xffffffff>;
				};

				user-alert-map0 {
					trip = <0x1a>;
					cooling-device = <0x1b 0x01 0x01>;
				};

				hot-surface-alert-map0 {
					trip = <0x1c>;
					cooling-device = <0x13 0x01 0x01>;
				};
			};

			thermal-zone-params {
				governor-name = "step_wise";
			};
		};

		CV2-therm {
			status = "okay";
			polling-delay = <0x00>;
			polling-delay-passive = <0x3e8>;
			thermal-sensors = <0x0b 0x04>;

			trips {

				cv2-sw-shutdown {
					temperature = <0x19834>;
					type = "critical";
					hysteresis = <0x00>;
					phandle = <0x2c7>;
				};

				cv2-sw-throttle {
					temperature = <0x182b8>;
					type = "passive";
					hysteresis = <0x00>;
					phandle = <0x1d>;
				};

				cv2-hot-surface {
					temperature = <0x11170>;
					type = "active";
					hysteresis = <0x1f40>;
					phandle = <0x1f>;
				};
			};

			cooling-maps {

				map0 {
					trip = <0x1d>;
					cooling-device = <0x0d 0xffffffff 0xffffffff 0x0e 0xffffffff 0xffffffff 0x0f 0xffffffff 0xffffffff 0x10 0xffffffff 0xffffffff>;
				};

				user-alert-map0 {
					trip = <0x1d>;
					cooling-device = <0x1e 0x01 0x01>;
				};

				hot-surface-alert-map0 {
					trip = <0x1f>;
					cooling-device = <0x13 0x01 0x01>;
				};
			};

			thermal-zone-params {
				governor-name = "step_wise";
			};
		};

		SOC0-therm {
			status = "okay";
			polling-delay = <0x00>;
			polling-delay-passive = <0x3e8>;
			thermal-sensors = <0x0b 0x05>;

			trips {

				soc0-sw-shutdown {
					temperature = <0x19834>;
					type = "critical";
					hysteresis = <0x00>;
					phandle = <0x2c8>;
				};

				soc0-sw-throttle {
					temperature = <0x182b8>;
					type = "passive";
					hysteresis = <0x00>;
					phandle = <0x20>;
				};

				soc0-hot-surface {
					temperature = <0x11170>;
					type = "active";
					hysteresis = <0x1f40>;
					phandle = <0x22>;
				};
			};

			cooling-maps {

				map0 {
					trip = <0x20>;
					cooling-device = <0x0d 0xffffffff 0xffffffff 0x0e 0xffffffff 0xffffffff 0x0f 0xffffffff 0xffffffff 0x10 0xffffffff 0xffffffff>;
				};

				user-alert-map0 {
					trip = <0x20>;
					cooling-device = <0x21 0x01 0x01>;
				};

				hot-surface-alert-map0 {
					trip = <0x22>;
					cooling-device = <0x13 0x01 0x01>;
				};
			};

			thermal-zone-params {
				governor-name = "step_wise";
			};
		};

		SOC1-therm {
			status = "okay";
			polling-delay = <0x00>;
			polling-delay-passive = <0x3e8>;
			thermal-sensors = <0x0b 0x06>;

			trips {

				soc1-sw-shutdown {
					temperature = <0x19834>;
					type = "critical";
					hysteresis = <0x00>;
					phandle = <0x2c9>;
				};

				soc1-sw-throttle {
					temperature = <0x182b8>;
					type = "passive";
					hysteresis = <0x00>;
					phandle = <0x23>;
				};

				soc1-hot-surface {
					temperature = <0x11170>;
					type = "active";
					hysteresis = <0x1f40>;
					phandle = <0x25>;
				};
			};

			cooling-maps {

				map0 {
					trip = <0x23>;
					cooling-device = <0x0d 0xffffffff 0xffffffff 0x0e 0xffffffff 0xffffffff 0x0f 0xffffffff 0xffffffff 0x10 0xffffffff 0xffffffff>;
				};

				user-alert-map0 {
					trip = <0x23>;
					cooling-device = <0x24 0x01 0x01>;
				};

				hot-surface-alert-map0 {
					trip = <0x25>;
					cooling-device = <0x13 0x01 0x01>;
				};
			};

			thermal-zone-params {
				governor-name = "step_wise";
			};
		};

		SOC2-therm {
			status = "okay";
			polling-delay = <0x00>;
			polling-delay-passive = <0x3e8>;
			thermal-sensors = <0x0b 0x07>;

			trips {

				soc2-sw-shutdown {
					temperature = <0x19834>;
					type = "critical";
					hysteresis = <0x00>;
					phandle = <0x2ca>;
				};

				soc2-sw-throttle {
					temperature = <0x182b8>;
					type = "passive";
					hysteresis = <0x00>;
					phandle = <0x26>;
				};

				soc2-hot-surface {
					temperature = <0x11170>;
					type = "active";
					hysteresis = <0x1f40>;
					phandle = <0x28>;
				};
			};

			cooling-maps {

				map0 {
					trip = <0x26>;
					cooling-device = <0x0d 0xffffffff 0xffffffff 0x0e 0xffffffff 0xffffffff 0x0f 0xffffffff 0xffffffff 0x10 0xffffffff 0xffffffff>;
				};

				user-alert-map0 {
					trip = <0x26>;
					cooling-device = <0x27 0x01 0x01>;
				};

				hot-surface-alert-map0 {
					trip = <0x28>;
					cooling-device = <0x13 0x01 0x01>;
				};
			};

			thermal-zone-params {
				governor-name = "step_wise";
			};
		};

		tj-therm {
			status = "okay";
			polling-delay = <0x00>;
			polling-delay-passive = <0x3e8>;
			thermal-sensors = <0x0b 0x08>;
			phandle = <0x2cb>;
		};

		Tboard_tegra {
			status = "disabled";
			polling-delay = <0x00>;
			polling-delay-passive = <0x3e8>;

			thermal-zone-params {
				governor-name = "pid_thermal_gov";
			};
		};

		Tdiode_tegra {
			status = "disabled";
			polling-delay = <0x00>;
			polling-delay-passive = <0x3e8>;
			phandle = <0x2cc>;

			thermal-zone-params {
				governor-name = "pid_thermal_gov";
			};
		};
	};

Will this configuration status be set to disabled and take effect?

Which configuration file is set for this configuration in Orin NX Jetack5.1.5 version?

dts file:

out.dts.txt (408.9 KB)

Please try running the following command instead.

echo performance | tee /sys/class/devfreq/17000000.gpu/governor

Please simply update the temperature property in this device node instead of disabling it.

Please check tegra234-thermal-cooling.dtsi.

Hi,There is no 17000000.gpu directory.

root@tegra-ubuntu:/home/nvidia# echo performance | tee /sys/class/devfreq/17000000.gpu/governor
tee: /sys/class/devfreq/17000000.gpu/governor: No such file or directory
performance
root@tegra-ubuntu:/home/nvidia# 
root@tegra-ubuntu:/home/nvidia# 
root@tegra-ubuntu:/home/nvidia# ls /sys/class/devfreq/1
15340000.vic/   15480000.nvdec/ 154c0000.nvenc/ 17000000.ga10b/ 
root@tegra-ubuntu:/home/nvidia# ls /sys/class/devfreq/1^C
root@tegra-ubuntu:/home/nvidia# 
root@tegra-ubuntu:/home/nvidia# 
root@tegra-ubuntu:/home/nvidia# 
root@tegra-ubuntu:/home/nvidia# jetson_release 
Software part of jetson-stats 4.3.2 - (c) 2024, Raffaello Bonghi
Model: NVIDIA Orin NX Developer Kit - Jetpack 5.1.5 [L4T 35.6.1]
NV Power Mode[0]: MAXN
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:
 - P-Number: p3767-0000
 - Module: NVIDIA Jetson Orin NX (16GB ram)
Platform:
 - Distribution: Ubuntu 20.04 focal
 - Release: 5.10.216-tegra
jtop:
 - Version: 4.3.2
 - Service: Active
Libraries:
 - CUDA: 11.4.315
 - cuDNN: 8.6.0.166
 - TensorRT: 8.5.2.2
 - VPI: 2.4.8
 - OpenCV: 4.5.4 - with CUDA: NO

I have not modified the configuration related to thermal. These are the thermal zones read out after burning, and the default is disabled.

Here is Linux_for_Tegra/source/public/hardware/nvidia/platform/t23x/common/kernel-dts/t234-common-cvm/tegra234-thermal-cooling.dtsi:

tegra234-thermal-cooling.dtsi.txt (6.8 KB)

CPU-therm {
			trips {
				cpu_sw_shutdown: cpu-sw-shutdown {
					temperature = <104500>;
					type = "critical";
					hysteresis = <0>;
				};
				cpu_sw_throttle: cpu-sw-throttle {
					temperature = <99000>;
					type = "passive";
					hysteresis = <0>;
				};
				cpu_hot_surface: cpu-hot-surface {
					temperature = <70000>;
					type = "active";
					hysteresis = <8000>;
				};
			};
			cooling-maps {
				map0 {
					trip = <&cpu_sw_throttle>;
					cooling-device = <&cl0_0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
							 <&cl1_0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
							 <&cl2_0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
							 <&tegra_ga10b THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
				};
			};
			thermal-zone-params {
				governor-name = "step_wise";
			};
		};

		GPU-therm {
			trips {
				gpu_sw_shutdown: gpu-sw-shutdown {
					temperature = <104500>;
					type = "critical";
					hysteresis = <0>;
				};
				gpu_sw_throttle: gpu-sw-throttle {
					temperature = <99000>;
					type = "passive";
					hysteresis = <0>;
				};
				gpu_hot_surface: gpu-hot-surface {
					temperature = <70000>;
					type = "active";
					hysteresis = <8000>;
				};
			};
			cooling-maps {
				map0 {
					trip = <&gpu_sw_throttle>;
					cooling-device = <&cl0_0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
							 <&cl1_0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
							 <&cl2_0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
							 <&tegra_ga10b THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
				};
			};
			thermal-zone-params {
				governor-name = "step_wise";
			};
		};
In the configuration of CPU/GPU, CPU downshifting is triggered only when the temperature exceeds 99 degrees, but in actual testing, the CPU has already downshifted at 60 degrees

From the current testing phenomenon, as long as the GPU runs stress testing, regardless of the CPU/GPU temperature, it will directly downshift. I need to identify the cause of the downshift and fix it. In the configuration of thermal-zones, it can be seen that the temperature threshold for downshifting is 99°c, so it should not be related to this. So, what are the tasks that may cause downshifting when running GPU testing?

Hi,KevinFFF:

Based on the current test results, it appears that there is a frequency drop during GPU stress testing. Is there any way to investigate the cause of the frequency drop? Or is there a way to lock the frequency?

There is no 17000000.gpu directory, I try to set 17000000.ga10b,But this setting is not helpful

root@tegra-ubuntu:/home/nvidia# echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
performance
root@tegra-ubuntu:/home/nvidia# 
root@tegra-ubuntu:/home/nvidia# 
root@tegra-ubuntu:/home/nvidia# 
root@tegra-ubuntu:/home/nvidia# echo performance | tee /sys/devices/17000000.ga10b/devfreq/17000000.ga10b/governor
performance

On JetPack 5.x(r35.x), the Orin NX GPU devfreq node is 17000000.ga10b, so the path is: bash /sys/devices/17000000.ga10b/devfreq/17000000.ga10b/

Please help to check if the following commands helping for your case.

# jetson_clocks
# cd /sys/devices/17000000.ga10b/devfreq/17000000.ga10b/
# cat available_frequencies
# echo <MAX freq> | sudo tee min_freq
# echo <MAX freq> | sudo tee max_freq

Hi, KevinFFF:

root@tegra-ubuntu:/home/nvidia# nvpmodel -q
NV Power Mode: MAXN
0
root@tegra-ubuntu:/home/nvidia# 
root@tegra-ubuntu:/home/nvidia# jetson_clocks
root@tegra-ubuntu:/home/nvidia# 
root@tegra-ubuntu:/home/nvidia# echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
performance
root@tegra-ubuntu:/home/nvidia# 
root@tegra-ubuntu:/home/nvidia#
root@tegra-ubuntu:/home/nvidia# cd /sys/devices/17000000.ga10b/devfreq/17000000.ga10b/
root@tegra-ubuntu:/sys/devices/17000000.ga10b/devfreq/17000000.ga10b# 
root@tegra-ubuntu:/sys/devices/17000000.ga10b/devfreq/17000000.ga10b# cat available_frequencies
306000000 408000000 510000000 612000000 714000000 816000000 918000000
root@tegra-ubuntu:/sys/devices/17000000.ga10b/devfreq/17000000.ga10b# pwd
/sys/devices/17000000.ga10b/devfreq/17000000.ga10b
root@tegra-ubuntu:/sys/devices/17000000.ga10b/devfreq/17000000.ga10b# echo "918000000" | tee /sys/devices/17000000.ga10b/devfreq/17000000.ga10b/min_freq
918000000
root@tegra-ubuntu:/sys/devices/17000000.ga10b/devfreq/17000000.ga10b# echo "918000000" | tee /sys/devices/17000000.ga10b/devfreq/17000000.ga10b/max_freq
918000000
root@tegra-ubuntu:/home/nvidia/stress# cat /sys/devices/17000000.ga10b/devfreq/17000000.ga10b/min_freq
918000000
root@tegra-ubuntu:/home/nvidia/stress# cat /sys/devices/17000000.ga10b/devfreq/17000000.ga10b/max_freq 
918000000
root@tegra-ubuntu:/home/nvidia/stress#

Then run test command:

root@tegra-ubuntu:/sys/devices/17000000.ga10b/devfreq/17000000.ga10b# stress-ng: info:  [4007] defaulting to a 86400 second (1 day, 0.00 secs) run per stressor
stress-ng: info:  [4007] dispatching hogs: 8 cpu

root@tegra-ubuntu:/sys/devices/17000000.ga10b/devfreq/17000000.ga10b# 
root@tegra-ubuntu:/sys/devices/17000000.ga10b/devfreq/17000000.ga10b# 
root@tegra-ubuntu:/sys/devices/17000000.ga10b/devfreq/17000000.ga10b# stress-ng --vm 6 --vm-bytes 500M --vm-method all &
[2] 4041
root@tegra-ubuntu:/sys/devices/17000000.ga10b/devfreq/17000000.ga10b# stress-ng: info:  [4041] defaulting to a 86400 second (1 day, 0.00 secs) run per stressor
stress-ng: info:  [4041] dispatching hogs: 6 vm

root@tegra-ubuntu:/home/nvidia/stress# ./matrixMulCUBLAS --sizemult=8 &
[3] 4101
root@tegra-ubuntu:/home/nvidia/stress# [Matrix Multiply CUBLAS] - Starting...
GPU Device 0: "Ampere" with compute capability 8.7

GPU Device 0: "Orin" with compute capability 8.7

MatrixA(8192,8192), MatrixB(8192,8192), MatrixC(8192,8192)

root@tegra-ubuntu:/home/nvidia/stress# 

Lock freq failed:

01-01-1970 00:09:28 RAM 2942/15503MB (lfb 2839x4MB) SWAP 1/7752MB (cached 0MB) CPU [100%@1984,100%@258,100%@699,100%@273,100%@1614,100%@282,100%@372,100%@1984] EMC_FREQ 39%@3199 GR3D_FREQ 99%@[918,0] VIC_FREQ 729 APE 174 CV0@69.656C CPU@74.031C SOC2@69.875C SOC0@67.187C CV1@68.218C GPU@75.718C tj@75.718C SOC1@69.687C CV2@66.125C VDD_IN 24343mW/24094mW VDD_CPU_GPU_CV 12910mW/12871mW VDD_SOC 4944mW/4814mW
01-01-1970 00:09:29 RAM 2942/15503MB (lfb 2839x4MB) SWAP 1/7752MB (cached 0MB) CPU [100%@250,100%@1984,100%@260,100%@1984,100%@324,100%@1521,100%@253,100%@1117] EMC_FREQ 39%@3199 GR3D_FREQ 99%@[917,0] VIC_FREQ 729 APE 174 CV0@69.687C CPU@74.125C SOC2@69.968C SOC0@67.25C CV1@68.187C GPU@75.875C tj@75.875C SOC1@69.75C CV2@66.125C VDD_IN 24343mW/24095mW VDD_CPU_GPU_CV 13047mW/12872mW VDD_SOC 4807mW/4814mW
01-01-1970 00:09:30 RAM 2943/15503MB (lfb 2839x4MB) SWAP 1/7752MB (cached 0MB) CPU [100%@247,100%@1984,100%@257,100%@853,100%@789,100%@323,100%@1984,100%@1137] EMC_FREQ 39%@3199 GR3D_FREQ 99%@[917,0] VIC_FREQ 729 APE 174 CV0@69.75C CPU@74.218C SOC2@69.937C SOC0@67.25C CV1@68.156C GPU@75.843C tj@75.843C SOC1@69.656C CV2@66.187C VDD_IN 24343mW/24097mW VDD_CPU_GPU_CV 12910mW/12872mW VDD_SOC 4944mW/4815mW
01-01-1970 00:09:31 RAM 2943/15503MB (lfb 2839x4MB) SWAP 1/7752MB (cached 0MB) CPU [100%@794,100%@1850,100%@286,100%@294,100%@1984,100%@254,100%@1984,100%@275] EMC_FREQ 39%@3199 GR3D_FREQ 99%@[917,0] VIC_FREQ 729 APE 174 CV0@69.375C CPU@74C SOC2@69.937C SOC0@67.375C CV1@68.062C GPU@75.875C tj@75.875C SOC1@69.656C CV2@66.218C VDD_IN 24343mW/24099mW VDD_CPU_GPU_CV 12910mW/12872mW VDD_SOC 4944mW/4815mW
01-01-1970 00:09:32 RAM 2942/15503MB (lfb 2839x4MB) SWAP 1/7752MB (cached 0MB) CPU [100%@1984,100%@263,100%@1984,100%@812,100%@309,100%@1984,100%@271,100%@1984] EMC_FREQ 39%@3199 GR3D_FREQ 99%@[918,0] VIC_FREQ 729 APE 174 CV0@69.593C CPU@74.218C SOC2@69.937C SOC0@67.375C CV1@68.125C GPU@76.062C tj@76.062C SOC1@69.625C CV2@66.218C VDD_IN 24343mW/24100mW VDD_CPU_GPU_CV 13047mW/12874mW VDD_SOC 4807mW/4815mW
01-01-1970 00:09:33 RAM 2942/15503MB (lfb 2839x4MB) SWAP 1/7752MB (cached 0MB) CPU [100%@1984,100%@621,100%@1984,100%@291,100%@1984,100%@303,100%@408,100%@372] EMC_FREQ 39%@3199 GR3D_FREQ 99%@[918,0] VIC_FREQ 729 APE 174 CV0@69.75C CPU@74.218C SOC2@69.937C SOC0@67.312C CV1@68.156C GPU@76.031C tj@76.031C SOC1@69.687C CV2@66.187C VDD_IN 24343mW/24102mW VDD_CPU_GPU_CV 13047mW/12875mW VDD_SOC 4944mW/4816mW
01-01-1970 00:09:34 RAM 2942/15503MB (lfb 2839x4MB) SWAP 1/7752MB (cached 0MB) CPU [100%@1687,100%@392,100%@1984,100%@256,100%@1984,100%@723,100%@1452,100%@912] EMC_FREQ 38%@3199 GR3D_FREQ 99%@[918,0] VIC_FREQ 729 APE 174 CV0@69.625C CPU@74.187C SOC2@69.968C SOC0@67.312C CV1@68.281C GPU@76.031C tj@76.031C SOC1@69.812C CV2@66.218C VDD_IN 24343mW/24103mW VDD_CPU_GPU_CV 13047mW/12876mW VDD_SOC 4809mW/4816mW
01-01-1970 00:09:35 RAM 2942/15503MB (lfb 2839x4MB) SWAP 1/7752MB (cached 0MB) CPU [100%@1272,100%@384,100%@257,100%@1862,100%@1414,100%@1848,100%@324,100%@1584] EMC_FREQ 39%@3199 GR3D_FREQ 99%@[917,0] VIC_FREQ 729 APE 174 CV0@69.687C CPU@74.156C SOC2@70C SOC0@67.437C CV1@68.125C GPU@76C tj@76C SOC1@69.875C CV2@66.187C VDD_IN 24343mW/24105mW VDD_CPU_GPU_CV 12910mW/12876mW VDD_SOC 4944mW/4817mW
01-01-1970 00:09:36 RAM 2943/15503MB (lfb 2839x4MB) SWAP 1/7752MB (cached 0MB) CPU [100%@1391,100%@313,100%@1984,100%@254,100%@1984,100%@633,100%@456,100%@535] EMC_FREQ 39%@3199 GR3D_FREQ 99%@[917,0] VIC_FREQ 729 APE 174 CV0@70C CPU@74.281C SOC2@70.031C SOC0@67.406C CV1@68.156C GPU@76.125C tj@76.125C SOC1@69.687C CV2@66.25C VDD_IN 24343mW/24106mW VDD_CPU_GPU_CV 12910mW/12876mW VDD_SOC 4944mW/4818mW
01-01-1970 00:09:37 RAM 2943/15503MB (lfb 2839x4MB) SWAP 1/7752MB (cached 0MB) CPU [100%@1781,100%@269,100%@1984,100%@809,100%@1141,100%@577,100%@1984,100%@256] EMC_FREQ 38%@3199 GR3D_FREQ 99%@[917,0] VIC_FREQ 729 APE 174 CV0@69.687C CPU@74.125C SOC2@70C SOC0@67.468C CV1@68.156C GPU@76.062C tj@76.062C SOC1@69.718C CV2@66.218C VDD_IN 24343mW/24107mW VDD_CPU_GPU_CV 13047mW/12877mW VDD_SOC 4807mW/4818mW
01-01-1970 00:09:38 RAM 2943/15503MB (lfb 2839x4MB) SWAP 1/7752MB (cached 0MB) CPU [100%@264,100%@1984,100%@255,100%@1984,100%@1356,100%@830,100%@1984,100%@1984] EMC_FREQ 39%@3199 GR3D_FREQ 99%@[917,0] VIC_FREQ 729 APE 174 CV0@69.75C CPU@74.25C SOC2@70C SOC0@67.468C CV1@68.25C GPU@76.125C tj@76.125C SOC1@69.843C CV2@66.406C VDD_IN 24343mW/24109mW VDD_CPU_GPU_CV 12910mW/12877mW VDD_SOC 4944mW/4818mW
01-01-1970 00:09:39 RAM 2943/15503MB (lfb 2839x4MB) SWAP 1/7752MB (cached 0MB) CPU [100%@271,100%@1984,100%@418,100%@1984,100%@275,100%@1296,100%@1984,100%@1440] EMC_FREQ 38%@3199 GR3D_FREQ 99%@[917,0] VIC_FREQ 729 APE 174 CV0@69.593C CPU@74.406C SOC2@69.968C SOC0@67.375C CV1@68.187C GPU@75.937C tj@75.937C SOC1@69.875C CV2@66.187C VDD_IN 24343mW/24110mW VDD_CPU_GPU_CV 13047mW/12878mW VDD_SOC 4807mW/4818mW
01-01-1970 00:09:40 RAM 2943/15503MB (lfb 2839x4MB) SWAP 1/7752MB (cached 0MB) CPU [100%@1984,100%@256,100%@495,100%@333,100%@262,100%@1984,100%@257,100%@1984] EMC_FREQ 38%@3199 GR3D_FREQ 99%@[918,0] VIC_FREQ 729 APE 174 CV0@69.656C CPU@74.343C SOC2@69.968C SOC0@67.406C CV1@68.187C GPU@76.031C tj@76.031C SOC1@69.718C CV2@66.343C VDD_IN 24343mW/24112mW VDD_CPU_GPU_CV 13047mW/12879mW VDD_SOC 4807mW/4818mW

I have completed all recommended configurations to lock CPU/GPU frequency and disable dynamic throttling, including:

  1. Modified /etc/nvpmodel.conf to lock CPU A78_0/A78_1 clusters to 1984000Hz (min/max freq = 1984000Hz) for all 8 CPU cores, and set GPU min/max freq to 918000000Hz;

Set config file /etc/nvpmodel.conf:

# MAXN is the NONE power model to release all constraints
< POWER_MODEL ID=0 NAME=MAXN >
CPU_ONLINE CORE_0 1
CPU_ONLINE CORE_1 1
CPU_ONLINE CORE_2 1
CPU_ONLINE CORE_3 1
CPU_ONLINE CORE_4 1
CPU_ONLINE CORE_5 1
CPU_ONLINE CORE_6 1
CPU_ONLINE CORE_7 1
FBP_POWER_GATING FBP_PG_MASK 2
TPC_POWER_GATING TPC_PG_MASK 240
GPU_POWER_CONTROL_ENABLE GPU_PWR_CNTL_EN on
CPU_A78_0 MIN_FREQ 1984000
CPU_A78_0 MAX_FREQ 1984000
CPU_A78_1 MIN_FREQ 1984000
CPU_A78_1 MAX_FREQ 1984000
GPU MIN_FREQ 0
GPU MAX_FREQ -1
GPU_POWER_CONTROL_DISABLE GPU_PWR_CNTL_DIS auto
EMC MAX_FREQ 0
DLA0_CORE MAX_FREQ -1
DLA1_CORE MAX_FREQ -1
DLA0_FALCON MAX_FREQ -1
DLA1_FALCON MAX_FREQ -1
PVA0_VPS MAX_FREQ -1
PVA0_AXI MAX_FREQ -1

2. Switched all 8 CPU cores to performance cpufreq governor (disabled schedutil dynamic scaling);

  1. Executed sudo jetson_clocks --force to force lock all CPU/GPU/EMC frequencies, and enabled fan full speed (PWM=255) for sufficient cooling;
  2. Confirmed EMC (memory controller) is locked to maximum frequency 3199MHz;
  3. No thermal trigger: The CPU/GPU temperature is far below the passive trip point (CPU ~69℃, GPU ~82℃, DTB passive trip is 99℃).

Symptom Details:

  1. Initial stage (1-2 minutes): All 8 CPU cores run stably at 1984MHz (full frequency), GPU at 918MHz, EMC at 3199MHz, total power ~22-23W (safe range);

  2. After running for a few minutes: CPU cores start frequent and random throttling (frequency drops to 200+MHz ~ 1500+MHz, e.g., 276MHz/846MHz/1169MHz), while the GPU always remains at 918MHz full frequency (no throttling at all);

  3. No overheating: CPU temp ~69℃, GPU temp ~82℃, SOC temp ~68℃ (all far below 99℃ passive trip point in device tree);

  4. No power limit: Total power consumption is ~23W (MAXN mode, no power budget cap), EMC runs at 3199MHz stably (41-42% utilization);

  5. No system errors: No kernel panic, no dmesg thermal/power error logs, the system runs normally except CPU throttling.

Why does the CPU throttle frequently under combined load, even when the temperature is far below the thermal trip point and the GPU remains full frequency? Is there any hidden power/thermal constraint for the CPU on Orin NX 16G that I missed?

Okay, the current frequency locking failed issue is specific to CPU rather than GPU.

Is there the oc event happening in your case?
Please simply run the following command to check the oc event count.

# grep "" /sys/class/hwmon/hwmon*/oc*
root@tegra-ubuntu:/home/nvidia/stress# grep "" /sys/class/hwmon/hwmon*/oc*
/sys/class/hwmon/hwmon1/oc1_event_cnt:4535220
/sys/class/hwmon/hwmon1/oc1_throt_en:1
/sys/class/hwmon/hwmon1/oc2_event_cnt:0
/sys/class/hwmon/hwmon1/oc2_throt_en:1
/sys/class/hwmon/hwmon1/oc3_event_cnt:214913
/sys/class/hwmon/hwmon1/oc3_throt_en:1

root@tegra-ubuntu:/sys/class/hwmon# grep "" /sys/class/hwmon/hwmon*/oc*_throt_en
/sys/class/hwmon/hwmon1/oc1_throt_en:1
/sys/class/hwmon/hwmon1/oc2_throt_en:1
/sys/class/hwmon/hwmon1/oc3_throt_en:1
root@tegra-ubuntu:/sys/class/hwmon# grep "" /sys/class/hwmon/hwmon*/oc*_event_cnt
/sys/class/hwmon/hwmon1/oc1_event_cnt:17084494
/sys/class/hwmon/hwmon1/oc2_event_cnt:334285
/sys/class/hwmon/hwmon1/oc3_event_cnt:425380

What does OC1/OC2/OC3 represent as OC events?

What should I do to fix these OC events?
Can the triggering conditions for OC events be found in the device tree?

OC1: Under Voltage
OC2: VDD_IN Average Power
OC3: VDD_IN Instantaneous Power

Please refer to Jetson Orin Nano Series, Jetson Orin NX Series and Jetson AGX Orin Series — NVIDIA Jetson Linux Developer Guide 1 documentation for details.

We don’t suggest modifying their threshold as it is the mechanism to protect the module.
Please create the custom power mode configuration according to your power usage to prevent the OC event causing throttling.

Hi,KevinFFF:

Do you have a recommended GPU stress testing tool?

I want to run the GPU to 95% for testing.

Running matrixMulCUBLAS should be the method for the stress testing.

You can refer to the following elinux page for details.
Jetson/L4T/Power - eLinux.org

Hi,KeveinFFF:

I am running MatrixMulCUBLAS for GPU stress testing.

I originally ran the command:

 ./matrixMulCUBLAS --sizemult=8 &

performs a full stress test, and the CPU will quickly experience a frequency drop after running.

Now, I modified command:

 ./matrixMulCUBLAS --sizemult=7 &

I did not run the CPU stress test process.However, there were still instances of CPU frequency reduction.

01-01-1970 01:09:01 RAM 2019/15504MB (lfb 3013x4MB) SWAP 0/7752MB (cached 0MB) CPU [0%@252,0%@1984,0%@1851,100%@299,1%@1721,0%@1984,0%@1984,0%@569] EMC_FREQ 40%@3199 GR3D_FREQ 99%@[917,0] VIC_FREQ 729 APE 174 CV0@56.031C CPU@58.531C SOC2@59.093C SOC0@58.156C CV1@53.593C GPU@74.625C tj@74.625C SOC1@59.531C CV2@56C VDD_IN 23555mW/23453mW VDD_CPU_GPU_CV 13163mW/13116mW VDD_SOC 4343mW/4346mW
01-01-1970 01:09:02 RAM 2019/15504MB (lfb 3013x4MB) SWAP 0/7752MB (cached 0MB) CPU [0%@1984,2%@1984,0%@1668,100%@679,1%@1984,0%@1984,0%@1330,0%@247] EMC_FREQ 40%@3199 GR3D_FREQ 99%@[917,0] VIC_FREQ 729 APE 174 CV0@56.125C CPU@58.781C SOC2@59.343C SOC0@58.25C CV1@54.031C GPU@74.468C tj@74.468C SOC1@59.781C CV2@56.218C VDD_IN 23592mW/23460mW VDD_CPU_GPU_CV 13163mW/13118mW VDD_SOC 4343mW/4345mW
01-01-1970 01:09:03 RAM 2019/15504MB (lfb 3013x4MB) SWAP 0/7752MB (cached 0MB) CPU [0%@1984,0%@307,0%@1984,100%@1984,1%@246,0%@1984,0%@1984,0%@537] EMC_FREQ 40%@3199 GR3D_FREQ 99%@[917,0] VIC_FREQ 729 APE 174 CV0@56.375C CPU@58.875C SOC2@59.562C SOC0@58.5C CV1@54.031C GPU@75C tj@75C SOC1@59.937C CV2@56.312C VDD_IN 23592mW/23466mW VDD_CPU_GPU_CV 13200mW/13122mW VDD_SOC 4380mW/4347mW
01-01-1970 01:09:04 RAM 2019/15504MB (lfb 3013x4MB) SWAP 0/7752MB (cached 0MB) CPU [0%@263,0%@357,0%@1984,100%@1984,1%@1706,0%@1984,0%@1984,0%@1984] EMC_FREQ 40%@3199 GR3D_FREQ 99%@[917,0] VIC_FREQ 729 APE 174 CV0@56.531C CPU@59.062C SOC2@59.656C SOC0@58.718C CV1@54.343C GPU@74.843C tj@74.843C SOC1@60.343C CV2@56.531C VDD_IN 23555mW/23471mW VDD_CPU_GPU_CV 13163mW/13124mW VDD_SOC 4343mW/4347mW
01-01-1970 01:09:05 RAM 2019/15504MB (lfb 3013x4MB) SWAP 0/7752MB (cached 0MB) CPU [0%@538,0%@1984,0%@1984,100%@1984,1%@252,0%@1984,0%@1984,0%@1984] EMC_FREQ 40%@3199 GR3D_FREQ 99%@[917,0] VIC_FREQ 729 APE 174 CV0@56.625C CPU@59.187C SOC2@59.875C SOC0@58.937C CV1@54.531C GPU@75.062C tj@75.062C SOC1@60.375C CV2@56.718C VDD_IN 23555mW/23474mW VDD_CPU_GPU_CV 13200mW/13128mW VDD_SOC 4380mW/4348mW
01-01-1970 01:09:06 RAM 2019/15504MB (lfb 3013x4MB) SWAP 0/7752MB (cached 0MB) CPU [0%@1984,0%@1984,1%@1333,100%@1414,0%@1984,0%@1984,1%@641,0%@1155] EMC_FREQ 40%@3199 GR3D_FREQ 99%@[917,0] VIC_FREQ 729 APE 174 CV0@56.75C CPU@59.312C SOC2@60.062C SOC0@59.156C CV1@54.625C GPU@75.343C tj@75.343C SOC1@60.531C CV2@56.875C VDD_IN 23592mW/23480mW VDD_CPU_GPU_CV 13200mW/13131mW VDD_SOC 4380mW/4350mW
root@tegra-ubuntu:/home/nvidia/stress# grep "" /sys/class/hwmon/hwmon*/oc*_event_cnt
/sys/class/hwmon/hwmon1/oc1_event_cnt:746
/sys/class/hwmon/hwmon1/oc2_event_cnt:9651
/sys/class/hwmon/hwmon1/oc3_event_cnt:169090

How to check the OC settings in the device?

In which dts file is the OC setting located in the device tree?