Jetson OC Alarm Values

Hello,

We are debugging a potential power issue on a custom carrier board with a Xavier NX SOM on power mode 20W 6Core running Jetpack 5.1.1. This is under high CPU and GPU load. It is producing three different OC Alarm values, including 0x00000001, 0x00000004, and 0x00000005:

I have found reference to OC1 through OC4 in kernel source for these messages, so it looks like we are getting OC1 and OC4 (0x1 | 0x4 =0x5)

What are OC1 and OC4? Is this a power envelope issue, or an issue with our power supply? Or a known bug with Jetpack 5.1.1/Xavier?

I’ve found plenty of other posts about these alarms. In one post, it’s shared that OC1 might mean VDD drops below 4.5V, but this seems to contradict our jtop output:


Is OC1 simply a reflection of exceeding the VDD_IN current warning threshold? That would be consistent with what we see in /etc/nvpmodel.conf:

Screenshot 2023-12-15 at 16.38.41

It’s beyond frustrating that developers do not have any documentation for these statuses.

What specifically do OC1 and OC4 codes mean? What measurements on which rails and which thresholds?

Relevant posts:

HW throttling (OC alarm) would happen when input voltage drops below safe operating voltage ~4.5V. Please make sure you have a stable input 5V@4A power supply.
soctherm: OC ALARM 0x00000001 - #8 by WayneWWW

The OC alarm on jetson nano, in most cases, indicate you have under voltage issue. Please try with other power adapters.
Jetson Nano - soctherm OC Alarm

Those are handled by our BPMP firmware and it is not to public
Xavier NX: soctherm: OC ALARM 0x00000002

The oc throtting is due the the hardware limitation of NX module. We currently set OC limit to 3.6A to protect the NX hardware.
The error message "System throttling due to over-current" appears when running YOLOv4 - #27 by WayneWWW

Thanks in advance

We actually only have 3 kinds of OC event.

https://docs.nvidia.com/jetson/archives/r35.4.1/DeveloperGuide/text/SD/PlatformPowerAndPerformance/JetsonOrinNanoSeriesJetsonOrinNxSeriesAndJetsonAgxOrinSeries.html#overcurrent-event-status

And the code in use is this one.

kernel/nvidia/drivers/thermal/tegra19x_oc_event.c

Xavier NX:
OC1: instant power
OC2: under voltage
OC3: average power

0x00000001 is OC1 (0b0001)
0x00000002 is OC2 (0b0010)
0x00000004 is OC3 (0b0100)
0x00000005 is OC1 + OC3 (0b0101)

You hit OC1 +OC3 at same time.

Thank you for your reply. I will take these OC Alarm messages to indicate that we only have a power envelope problem to be resolved with a power mode in /etc/nvpmodel.conf and the power estimator (NVIDIA), and NOT an issue with our power supply.

Thanks!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.