Xavier NX: soctherm: OC ALARM 0x00000002

Hi, we are getting this OC ALARM message once a second, if we reduce the supply voltage to the Xavier NX below 4.5V. A customers of ours gets similar messages with a different number:

  • soctherm: OC ALARM 0x00000001 and
  • soctherm: OC ALARM 0x00000004

We do not know what those 2 OC ALARMs are caused by. Is there a list or description what the number after the “OC ALARM” means?

Last year there was a topic called “Xavier NX: soctherm: OC ALARM 0x00000001”. There this “1” message was caused by a power supply of insufficient power. This is a little strange as we get a “2” message if the power supply drops below 4.5V.

What is the peak current draw of the Xavier NX when running in 20W 6core mode (maximum power mode)?

Have you changed the OC limit to 5A and see if it got improvement?

echo 5000 > /sys/devices/c250000.i2c/i2c-7/7-0040/iio:device0/crit_current_limit_0

Also, NX has power estimator to use. Please use it to match your usecase.

https://jetson-tools.nvidia.com/powerestimator/

Thank you for this info. But we would really like to find out, what the meaning of the hex number in the error messages is.

We have checked the soctherm.c file on Github. There an over-current interrupt register is read and the 4 least significant bits are relevant and are being reported. But the meaning of these 4 bits is not explained. Can you please provide this info.

/**

  • soctherm_edp_isr_thread() - log an over-current interrupt request
  • @irq: OC irq number. Currently not being used. See description
  • @arg: a void pointer for callback, currently not being used
  • Over-current events are handled in hardware. This function is called to log
  • and handle any OC events that happened. Additionally, it checks every
  • over-current interrupt registers for registers are set but
  • was not expected (i.e. any discrepancy in interrupt status) by the function,
  • the discrepancy will logged.
  • Return: %IRQ_HANDLED
    */
    static irqreturn_t soctherm_edp_isr_thread(int irq, void *arg)
    {
    struct tegra_soctherm *ts = arg;
    u32 st, ex, oc1, oc2, oc3, oc4;

st = readl(ts->regs + OC_INTR_STATUS);

/* deliberately clear expected interrupts handled in SW */
oc1 = st & OC_INTR_OC1_MASK;
oc2 = st & OC_INTR_OC2_MASK;
oc3 = st & OC_INTR_OC3_MASK;
oc4 = st & OC_INTR_OC4_MASK;
ex = oc1 | oc2 | oc3 | oc4;

pr_err(“soctherm: OC ALARM 0x%08x\n”, ex);

Hi,

Those are handled by our BPMP firmware and it is not to public.

For NX’s OC alarm issue, please refer to the story here.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.