Can I find the power-off reason from boot loader log ( pmic register info)?

Our system shuts down intermittently.
I couldn’t find useful clues from uart log(kernel log) or syslog.

So, Can I find the power-off reason from boot loader log (pmic register info)?

...... Apr 23 10:53:08 tegra-ubuntu kernel: [ 207.577806] xhci_hcd 0000:01:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk? Apr 23 10:53:08 tegra-ubuntu kernel: [ 207.601122] xhci_hcd 0000:01:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk? Apr 23 10:53:08 tegra-ubuntu kernel: [ 207.612382] xhci_hcd 0000:01:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk? Apr 23 10:53:08 tegra-ubuntu kernel: [ 207.622339] xhci_hcd 0000:01:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk? Apr 23 10:53:08 tegra-ubuntu kernel: [ 207.862322] extcon-gpio-states external-connection:extcon@1: Cable state 2 Apr 23 10:53:09 tegra-ubuntu kernel: [ 208.206280] extcon-gpio-states external-connection:extcon@1: Cable state 2 ==> suddenly system power-offed

Hi truemonpark,

Does this always happen with same error message? (xhci_hcd…)

This thread gives some node that you could check the reason why pmic shutdown.
https://devtalk.nvidia.com/default/topic/1042139/jetson-tx2/jetson-tx2-reset-powerdown-issue/

WayneWWW,
Thanks for your reply.

1. xhci_hdc error(warning) log is always printed out while USB 3.0 camea is running.
So, I think the log isn’t the root cause of the sudden power-off.
But, I want to remove(solve) this warning log.
Could you have any idea(solution) to solve this warning log?

2. Could you share the full register information of power-off reason for future debugging?
I could find the below two value’s information in the thread you shared.

0x10 MBSLD
shut down due to main battery low

0x50 NIL_OR_MORE_THAN_1_BIT MBLSD, MBU and MBO
Shutdown due to main battery low/ shutdown due to battery overvoltage lockout and undervoltage lockout.

3. I can see the reset reason from bootloader log.
Is this reset reason info same with the power-off reason info ?

[0000.291] I> Welcome to MB2(TBoot-BPMP)(version: 01.00.160913-t186-M-00.00-mobile-175b7c7b)

[0000.316] I> Boot-device: eMMC
[0000.319] I> sdmmc bdev is already initialized
[0000.324] I> pmic: reset reason (nverc) : 0x50
[0000.328] I> Reading GPT from 512 for device 00000003

We seldom really debug the PMIC error. May I ask few questions here

  1. Are you using devkit? What release are you using?
  2. Do you have any way to reproduce this issue? Could you hit error if you just put device idle?

WayneWWW,

I found this sudden power-off only happen a specific carrier(I/O) board.
After changing the carrier board while using same TX2 module, the issue(sudden power-off) is not reproduced.

To find the root cause why the sudden power-off is happened in the specific carrier board,
I think it will be helpful to know the power on/off reason of the system.
(I guess this issue seems to be related to Power/Heat)
So, could you share the pmic register full information which stores power on/off reason?

These are answers for your questions.

  1. Are you using devkit?
    ==> No, we are using TX2 module + customized carrier(I/O) board.
    What release are you using?
    ==> L4T 28.2.1 (Jetpack 3.2)

  2. Do you have any way to reproduce this issue?
    ==> Yes.
    ==> After booting, run the application which makes heavy load of the system (refer to below result of tegrastats)
    ==> About 15 min ~ 30 min later, the system is suddenly shutdown.

    ./tegrastats
    RAM 2485/7854MB (lfb 1153x4MB) CPU [60%@2419,5%@2419,35%@2419,55%@2419,53%@2419,60%@2419] BCPU@56.5C MCPU@56.5C GPU@59C PLL@56.5C Tboard@46C Tdiode@56.5C PMIC@100C thermal@57.3C VDD_IN 17310/17258 VDD_CPU 3647/3848 VDD_GPU 7288/7074 VDD_SOC 1300/1269 VDD_WIFI 0/15 VDD_DDR 2727/2698

RAM 2486/7854MB (lfb 1153x4MB) CPU [64%@2419,7%@2419,30%@2419,55%@2419,55%@2419,53%@2419] BCPU@56.5C MCPU@56.5C GPU@61.5C PLL@56.5C Tboard@46C Tdiode@57.25C PMIC@100C thermal@57.5C VDD_IN 17000/17256 VDD_CPU 3484/3846 VDD_GPU 7137/7074 VDD_SOC 1260/1269 VDD_WIFI 0/15 VDD_DDR 2689/2698

Could you hit error if you just put device idle?
==>No, the issue isn’t reproduced in idle state.