Very Frequent OC3 and OC1 Alarms Being Thrown When Current Limit Isn't Reached

We are using L4T 35.2.1 on Orin NX 16GB with the MAXN power model

We have implemented a test where we are tracking the current consumption on VDD_IN of our Orin NX in our custom carrier board.

Our data shows a peak amperage of 5.68A on the VDD_IN line to our Orin NX, however, the Orin NX reported many instances of oc1_event_cnt and oc3_event_cnt

The below table shows the reported oc events on the Orin NX

2024-01-18 11:27:30 /sys/class/hwmon/hwmon1/oc2_throt_en:1
2024-01-18 11:27:30 /sys/class/hwmon/hwmon1/oc3_event_cnt:14426537
2024-01-18 11:27:30 /sys/class/hwmon/hwmon1/oc3_throt_en:1
2024-01-18 11:27:31 /sys/class/hwmon/hwmon1/oc1_event_cnt:32446689
2024-01-18 11:27:31 /sys/class/hwmon/hwmon1/oc1_throt_en:1
2024-01-18 11:27:31 /sys/class/hwmon/hwmon1/oc2_event_cnt:0
2024-01-18 11:27:31 /sys/class/hwmon/hwmon1/oc2_throt_en:1
2024-01-18 11:27:31 /sys/class/hwmon/hwmon1/oc3_event_cnt:14426621
2024-01-18 11:27:31 /sys/class/hwmon/hwmon1/oc3_throt_en:1
2024-01-18 11:27:32 /sys/class/hwmon/hwmon1/oc1_event_cnt:32446906
2024-01-18 11:27:32 /sys/class/hwmon/hwmon1/oc1_throt_en:1
2024-01-18 11:27:32 /sys/class/hwmon/hwmon1/oc2_event_cnt:0
2024-01-18 11:27:32 /sys/class/hwmon/hwmon1/oc2_throt_en:1
2024-01-18 11:27:32 /sys/class/hwmon/hwmon1/oc3_event_cnt:14426702
2024-01-18 11:27:32 /sys/class/hwmon/hwmon1/oc3_throt_en:1

Below are the measurements taken from the power supply

Date VDD_IN (V) Current (A) Power (W)
2024-01-18 11:27:30.129127 4.98 3.87 19.2726
2024-01-18 11:27:30.176468 4.98 3.86 19.2228
2024-01-18 11:27:30.224727 4.98 3.86 19.2228
2024-01-18 11:27:30.273574 4.98 3.87 19.2726
2024-01-18 11:27:30.321503 4.98 3.89 19.3722
2024-01-18 11:27:30.369119 4.98 3.91 19.4718
2024-01-18 11:27:30.416665 4.98 3.94 19.6212
2024-01-18 11:27:30.464791 5 3.97 19.85
2024-01-18 11:27:30.512831 5 4 20
2024-01-18 11:27:30.560402 5 4.03 20.15
2024-01-18 11:27:30.608907 4.98 4.08 20.3184
2024-01-18 11:27:30.656748 4.98 4.13 20.5674
2024-01-18 11:27:30.705740 4.98 4.16 20.7168
2024-01-18 11:27:30.754140 4.98 4.21 20.9658
2024-01-18 11:27:30.801967 4.98 4.23 21.0654
2024-01-18 11:27:30.849750 4.98 4.24 21.1152
2024-01-18 11:27:30.897054 4.98 4.26 21.2148
2024-01-18 11:27:30.944789 4.98 4.26 21.2148
2024-01-18 11:27:30.993117 4.98 4.24 21.1152
2024-01-18 11:27:31.040657 4.98 4.21 20.9658
2024-01-18 11:27:31.089352 4.97 4.21 20.9237
2024-01-18 11:27:31.137044 4.97 4.23 21.0231
2024-01-18 11:27:31.184739 4.97 4.23 21.0231
2024-01-18 11:27:31.232992 4.97 4.23 21.0231
2024-01-18 11:27:31.280726 4.97 4.23 21.0231
2024-01-18 11:27:31.329116 4.97 4.21 20.9237
2024-01-18 11:27:31.376754 4.98 4.23 21.0654
2024-01-18 11:27:31.424872 4.98 4.26 21.2148
2024-01-18 11:27:31.473319 4.98 4.24 21.1152
2024-01-18 11:27:31.521088 4.98 4.23 21.0654
2024-01-18 11:27:31.568950 4.98 4.21 20.9658
2024-01-18 11:27:31.616798 4.98 4.19 20.8662
2024-01-18 11:27:31.664974 4.98 4.16 20.7168
2024-01-18 11:27:31.712957 4.98 4.15 20.667
2024-01-18 11:27:31.760752 5 4.13 20.65
2024-01-18 11:27:31.809079 5 4.1 20.5
2024-01-18 11:27:31.856953 4.98 4.05 20.169
2024-01-18 11:27:31.905601 4.98 4.03 20.0694
2024-01-18 11:27:31.953788 4.98 4.05 20.169
2024-01-18 11:27:32.001399 4.98 4.07 20.2686
2024-01-18 11:27:32.049635 4.98 4.1 20.418
2024-01-18 11:27:32.097263 4.98 4.16 20.7168
2024-01-18 11:27:32.144772 4.98 4.21 20.9658
2024-01-18 11:27:32.192985 4.98 4.26 21.2148
2024-01-18 11:27:32.240565 4.98 4.29 21.3642
2024-01-18 11:27:32.288982 4.98 4.35 21.663
2024-01-18 11:27:32.336795 4.98 4.42 22.0116
2024-01-18 11:27:32.384527 4.98 4.43 22.0614
2024-01-18 11:27:32.432892 4.98 4.43 22.0614
2024-01-18 11:27:32.480964 4.98 4.43 22.0614
2024-01-18 11:27:32.528731 4.98 4.42 22.0116
2024-01-18 11:27:32.576676 4.98 4.43 22.0614
2024-01-18 11:27:32.624702 4.98 4.43 22.0614
2024-01-18 11:27:32.673352 4.98 4.4 21.912
2024-01-18 11:27:32.720865 4.98 4.37 21.7626
2024-01-18 11:27:32.768821 4.98 4.35 21.663
2024-01-18 11:27:32.816846 4.98 4.31 21.4638
2024-01-18 11:27:32.864561 4.98 4.27 21.2646
2024-01-18 11:27:32.912675 4.98 4.24 21.1152
2024-01-18 11:27:32.960527 4.98 4.21 20.9658

As you can see from the data above, both oc1 and oc3 are increasing dramatically during this time perioid.

In the nVidia developer documentation, the reported behavior of the OC alarms is as follows:

Module Module TDP Budget Limits SOCTHERM_OC PIN Throttling Level
Jetson Orin NX 16GB 25W VDD_IN Average Power: 25W OC2 * CPU: 50% * GPU: 50%
VDD_IN Instantaneous Power: 30W OC3 * CPU: 87.5% * GPU: 87.5%
Under Voltage: approx. 4.5V OC1 * CPU: 87.5% * GPU: 87.5%

My questions are as follows:

  • Why is the behavior we’re seeing on this Orin NX module not consistent with the table above?

  • Is our module being throttled? How can we verify this?

  • And finally, how can we reduce the frequency of these OC events?

Thanks!

Hi lphillips,

Are you running any application/test on the board with high loading?

Please refer to the following table for the threshold of OC events on Orin NX 16G.
Orin NX 16GB
OC1 (under voltage) 4.5V
OC2 (VDD_IN average power) 25W
OC3 (VDD_IN intant power) 30W

It seems you hit OC1 due to the voltage under 4.5V, and OC3 due to instant high power(it is the instantaneous event and you may not detect it)

Yes, it seems you hit the throttle issue because there’re OC event count.

OC throttling is a mechanism to protect the damage of the board due to unexpected power usage.
We would suggest not using MAXN mode since it is an unconstraint power mode. You could check the Power GUI log and adjust the nvpmodel configuration for a custom power mode in your use case.

@KevinFFF thank you for the reply.

We are using high loading on both CPU and GPU using a synthetic CPU stress test and CUDA applications.

We take 20 samples per second of the voltage and the current consumption, but we never see the voltage drop below 4.9V and we always see the current consumption below 6A.

We are trying to verify that this is not a bug in the voltage/current monitoring on the Orin NX. Are we not taking enough samples of the voltage and current values to detect when these over-current and under-voltage events are occuring?

We have also tried raising the instantaneous current limit to 7A and even 10A with no change to the frequency of oc3 and oc1 events.

According to the logs in my prior post, it seems that the oc3 event is happening nearly 100 times per second. Is it possible that the duration of these events is so small that we are not detecting them from our power supply?

Yes, they may be just instantaneous event.

How do you increase the instantaneous current limit?

Could you verify with other power mode like 25W in your case?

We would run the following comands as root user to change instantaneous current limit. Can you confirm this is correct for us?

To change limit to 7A, the following command is ran:

echo 7000 > /sys/bus/i2c/drivers/ina3221/1-0040/hwmon/hwmon3/curr1_crit

To change the limit to 10A, the following command is ran:

echo 10000 > /sys/bus/i2c/drivers/ina3221/1-0040/hwmon/hwmon3/curr1_crit

We will try other power modes and post the frequency of the OC events. For now we have identified a large performance decrease in our applications if we use 25W power-mode. It’s likely that we will need to create a custom power-mode that fits our use-case. I am trying to understand these OC events and how to avoid them before we dive into that customization.

okay, it is correct to set the instantaneous current limit for channel 1. (VDD_IN)

Yes, you would need to the custom power mode for your use case.
Please check the Power GUI log and use Power Estimator to create custom power mode configuration.

@KevinFFF are you able to clairfy why we still see OC3 and OC1 alarms even if we sent the instantaneous current limit to 7A or 10A?

It depends on your power usage, and we strongly do not recommend you use sysfs to increase the curr*_crit.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.