OC ALARM while doing inference

Pinout21 · June 12, 2023, 8:43am

Hello,

We are currently working on a product based on the Jetson TX2 NX platform.
I have recently seen the following error message in the dmesg:

[ 4333.277276] soctherm: OC ALARM 0x00000011
[ 4334.421116] soctherm: OC ALARM 0x00000011
[ 4335.993122] soctherm: OC ALARM 0x00000001
[ 4337.439201] soctherm: OC ALARM 0x00000011
[ 4338.580276] soctherm: OC ALARM 0x00000001
[ 4339.707316] soctherm: OC ALARM 0x00000001
[ 4340.713568] soctherm: OC ALARM 0x00000010

This happens on all our TX2 NX, only when doing inference. Our Jetson are in MAXN (0) mode.

After some search on the kernel source code and on this forum, this seem’s to be related to some thermal or electrical issue.
Our engineers have checked, but they haven’t seen anything exceeding the Jetson specifications about power supply or thermal configuration.

We are looking to understand these error codes and also to be sure that theses alarm won’t affect performances of our algorithm by throttling CPU or GPU speed.

Thanks for your help.

JerryChang · June 13, 2023, 6:48am

hello Pinout21,

please see-also Topic 188504.
for test purpose, you may revise current-critical-limit-ma to avoid such warning alarms.

Pinout21 · June 13, 2023, 6:58am

Hello JerryChang,

We have already tryed that in our build but the warning is still here.

Of course we have tried with the official carrier board with different power supply with higher spec’s that recommanded (lab power supply 5V/10A for example) but this message still appears.
For information we have checked our 5V with oscilloscope and even tryed higher voltage 5,2V for example.

The main concern is, does that affect performance ?

JerryChang · June 13, 2023, 7:47am

this is due to it’s reaching the hardware spec, and it’s trying to protect hardware.
you should also check whether cpu/gpu freq drops after OC event happens.
anyways, the actual solution is using the powerestimator to create custom power mode.

Pinout21 · June 13, 2023, 12:57pm

How can we check the CPU/GPU freq without using tegrastat ?
Unfortunatly, the powerestimator doesn’t work with Jetson TX2.

JerryChang · June 14, 2023, 2:54am

you may follow below to monitor CPU/CPU frequency.
CPU freq: $ watch -n 0.1 cat /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_cur_freq
GPU freq: $ watch -n 0.1 cat /sys/kernel/debug/bpmp/debug/clk/nafll_gpu/pto_counter

Pinout21 · June 21, 2023, 12:05pm

Thanks.
Here is what i understand.
The MAXN mode is some kind of overclocking and should not be used in production. The caveat is that others mode loose performances.

system · July 12, 2023, 1:26am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Xavier NX: soctherm: OC ALARM 0x00000002 Jetson Xavier NX power	4	3483	March 2, 2022
Kernel: soctherm: OC ALARM 0x00000002 Jetson Xavier NX tensorrt , power_estimator	3	1320	May 13, 2022
What happen when current over crit_current_limit_0 Jetson TX2 power	6	1610	April 27, 2022
Throttled overcurrent on TX2NX 4.6 Jetson TX2 board-design , power	7	2384	October 18, 2021
NX Shutdown at 65C Jetson Xavier NX board-design , power	5	951	July 1, 2022
Jetson Xavier NX soctherm OC ALARM Codes Jetson Xavier NX power	9	98	October 2, 2024
Jetson OC Alarm Values Jetson Xavier NX power	4	1141	January 1, 2024
Xavier NX - soctherm: OC ALARM 0x00000001 Jetson Xavier NX	3	2229	October 18, 2021
Meaning of OC ALARM error codes Jetson TX2 board-design , power	4	1302	April 27, 2022
Very Frequent OC3 and OC1 Alarms Being Thrown When Current Limit Isn't Reached Jetson Orin NX power	8	549	February 21, 2024

OC ALARM while doing inference

Related topics