Xavier NX Over-current JP 5.1.1

We have a heavy TensorRT load running on XNX Jetpack 5.1.1 and consistently run into the “System throttled due to Over-current” warning. We are running mode 8: 20W 6Core and jetson_clocks.

I see many topics related to this issue in JP4 that were resolved after increasing limit from 3600 ma to 5000 ma. JP 5.1.1 appears to default to 5000 ma but I’m seeing evidence that the OC event still occurs at 3600, by monitoring the current value from /sys/class/hwmon/hwmon5/curr1_inputfile. We consistently get the OC warning just under 3600, not 5000 as expected.

The related topic System throttled due to over-current? - #58 by JerryChang references a configuration file /sys/devices/c250000.i2c/i2c-7/7-0040/iio\:device0/crit_current_limit_0 which does not exist in JP 5.1.1.

I also have tried changing the WARN parameter in /etc/nvpmodel.config
VDDIN_OC_LIMIT WARN 4900
VDDIN_OC_LIMIT CRIT 5000
but the limit still appears to be near 3600.

What is the correct way to confirm and set the OC limit on JP 5.1.1?

Or is it simply a hardware limitation that in this power mode with 5V in and 20W max we are limited to 4A?

I have read and understand that pushing beyond 5000 is not advisable. My interest is in making sure that the OC throttling isn’t being prematurely triggered.

It is by default 5000. You should read curr1_crit. Not curr1_input.

I see that curr1_crit is reporting the default at 5000.

But why do I get hundreds of OC events (OC_1/event_cnt) when my VDD_IN_CURRENT is wavering between 3200 and 3600 as in this session screen capture:

I’m having exactly the same issue with JP 5.1.1, with OC alarms whenever the VDD_IN current reading exceeds 3600mA. It seems like the settings in nvpmodel.conf might not be respected somehow - or something else is triggering the OC alarms

UPDATE: my syslog is full of these lines every 10 seconds, even when it’s not under load, and the OC Alarms are not firing:

Jun  5 12:10:16 ubuntu nvpmodel_indicator.desktop[6970]: NVPOWER WARN (read_str:912): Failed to open empty file! (Bad address)
Jun  5 12:10:16 ubuntu nvpmodel_indicator.desktop[6970]: message repeated 58 times: [ NVPOWER WARN (read_str:912): Failed to open empty file! (Bad address)]

Have been doing some more testing and found that with a synthetic load of stress on the CPU and a CUDA sample running on the gpu at 100%, I can draw a higher current than 3600mA and still get no OC Alarms.

The difference compared to our main workload is the VIC is not running in this scenario. Is there something else which alerts on the VIC power draw?

@sam.henderson1 does your workload also use the VIC?

@yangls3pl yes our workload includes VIC activity spiking around 70% along with nearly max-rate NVENC encoding. We are not seeing the NVPOWER WARN messages you mentioned above.

Hi,

First, what is your result of below?

grep “” /sys/class/hwmon/hwmon*/oc*

Second, there is average power case and instantaneous current. Your tegrastats may not catch instantaneous current since such application poll the monitor in sec.

The NVPOWER WARN was not relevant - that was just something to do with the Jetson Power GUI failing to read something.

We’ve narrowed it down to power transients in our case, having oscilloscoped the VDD line. Seems like the instantaneous power draw of the GPU when processing video frames is very high, and results in 10-20ms dips in VDD_IN on our board to about 4.65V. This happens at the same frequency as whatever FPS we are running from our camera.

We were able to stop the OC Alarms by setting the GPU clock down slightly from 1.1GHz to 1GHz using /etc/nvpmodel.conf while leaving every other clock unchanged for the 20W 6CORE power mode.

@sam.henderson1 if you’re using a custom carrier board, you might want to check if your VDD_IN has enough bulk capacitance. We think adding some to ours should fix the issue.

@WayneWWW

Results when idle, prior to OC event:

/sys/class/hwmon/hwmon1/oc1_cpu_throttle_ctrl:0
/sys/class/hwmon/hwmon1/oc1_gpu_throttle_ctrl:1073741824
/sys/class/hwmon/hwmon1/oc1_irq_cnt:0
/sys/class/hwmon/hwmon1/oc1_priority:0
/sys/class/hwmon/hwmon1/oc2_cpu_throttle_ctrl:2147485455
/sys/class/hwmon/hwmon1/oc2_gpu_throttle_ctrl:3221684224
/sys/class/hwmon/hwmon1/oc2_irq_cnt:0
/sys/class/hwmon/hwmon1/oc2_priority:100
/sys/class/hwmon/hwmon1/oc3_cpu_throttle_ctrl:2147483919
/sys/class/hwmon/hwmon1/oc3_gpu_throttle_ctrl:3221291008
/sys/class/hwmon/hwmon1/oc3_irq_cnt:0
/sys/class/hwmon/hwmon1/oc3_priority:200
/sys/class/hwmon/hwmon1/oc4_cpu_throttle_ctrl:2147485455
/sys/class/hwmon/hwmon1/oc4_gpu_throttle_ctrl:3221684224
/sys/class/hwmon/hwmon1/oc4_irq_cnt:0
/sys/class/hwmon/hwmon1/oc4_priority:255
/sys/class/hwmon/hwmon1/oc5_cpu_throttle_ctrl:2147483663
/sys/class/hwmon/hwmon1/oc5_gpu_throttle_ctrl:3221291008
/sys/class/hwmon/hwmon1/oc5_irq_cnt:0
/sys/class/hwmon/hwmon1/oc5_priority:100
/sys/class/hwmon/hwmon1/oc6_cpu_throttle_ctrl:0
/sys/class/hwmon/hwmon1/oc6_gpu_throttle_ctrl:1073741824
/sys/class/hwmon/hwmon1/oc6_irq_cnt:0
/sys/class/hwmon/hwmon1/oc6_priority:0

After first OC event:

/sys/class/hwmon/hwmon1/oc1_cpu_throttle_ctrl:0
/sys/class/hwmon/hwmon1/oc1_gpu_throttle_ctrl:1073741824
/sys/class/hwmon/hwmon1/oc1_irq_cnt:4
/sys/class/hwmon/hwmon1/oc1_priority:0
/sys/class/hwmon/hwmon1/oc2_cpu_throttle_ctrl:2147485455
/sys/class/hwmon/hwmon1/oc2_gpu_throttle_ctrl:3221684224
/sys/class/hwmon/hwmon1/oc2_irq_cnt:0
/sys/class/hwmon/hwmon1/oc2_priority:100
/sys/class/hwmon/hwmon1/oc3_cpu_throttle_ctrl:2147483919
/sys/class/hwmon/hwmon1/oc3_gpu_throttle_ctrl:3221291008
/sys/class/hwmon/hwmon1/oc3_irq_cnt:0
/sys/class/hwmon/hwmon1/oc3_priority:200
/sys/class/hwmon/hwmon1/oc4_cpu_throttle_ctrl:2147485455
/sys/class/hwmon/hwmon1/oc4_gpu_throttle_ctrl:3221684224
/sys/class/hwmon/hwmon1/oc4_irq_cnt:0
/sys/class/hwmon/hwmon1/oc4_priority:255
/sys/class/hwmon/hwmon1/oc5_cpu_throttle_ctrl:2147483663
/sys/class/hwmon/hwmon1/oc5_gpu_throttle_ctrl:3221291008
/sys/class/hwmon/hwmon1/oc5_irq_cnt:0
/sys/class/hwmon/hwmon1/oc5_priority:100
/sys/class/hwmon/hwmon1/oc6_cpu_throttle_ctrl:0
/sys/class/hwmon/hwmon1/oc6_gpu_throttle_ctrl:1073741824
/sys/class/hwmon/hwmon1/oc6_irq_cnt:0
/sys/class/hwmon/hwmon1/oc6_priority:0

@yangls3pl Great find! I reduced our GPU clock rate to 1GHz and no OC alarms so far.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.