Our system shuts down intermittently.
I couldn’t find useful clues from uart log(kernel log) or syslog.
So, Can I find the power-off reason from boot loader log (pmic register info)?
......
Apr 23 10:53:08 tegra-ubuntu kernel: [ 207.577806] xhci_hcd 0000:01:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
Apr 23 10:53:08 tegra-ubuntu kernel: [ 207.601122] xhci_hcd 0000:01:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
Apr 23 10:53:08 tegra-ubuntu kernel: [ 207.612382] xhci_hcd 0000:01:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
Apr 23 10:53:08 tegra-ubuntu kernel: [ 207.622339] xhci_hcd 0000:01:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
Apr 23 10:53:08 tegra-ubuntu kernel: [ 207.862322] extcon-gpio-states external-connection:extcon@1: Cable state 2
Apr 23 10:53:09 tegra-ubuntu kernel: [ 208.206280] extcon-gpio-states external-connection:extcon@1: Cable state 2
==> suddenly system power-offed
Hi truemonpark,
Does this always happen with same error message? (xhci_hcd…)
This thread gives some node that you could check the reason why pmic shutdown.
https://devtalk.nvidia.com/default/topic/1042139/jetson-tx2/jetson-tx2-reset-powerdown-issue/
WayneWWW,
Thanks for your reply.
1. xhci_hdc error(warning) log is always printed out while USB 3.0 camea is running.
So, I think the log isn’t the root cause of the sudden power-off.
But, I want to remove(solve) this warning log.
Could you have any idea(solution) to solve this warning log?
2. Could you share the full register information of power-off reason for future debugging?
I could find the below two value’s information in the thread you shared.
0x10 MBSLD
shut down due to main battery low
0x50 NIL_OR_MORE_THAN_1_BIT MBLSD, MBU and MBO
Shutdown due to main battery low/ shutdown due to battery overvoltage lockout and undervoltage lockout.
3. I can see the reset reason from bootloader log.
Is this reset reason info same with the power-off reason info ?
[0000.291] I> Welcome to MB2(TBoot-BPMP)(version: 01.00.160913-t186-M-00.00-mobile-175b7c7b)
…
[0000.316] I> Boot-device: eMMC
[0000.319] I> sdmmc bdev is already initialized
[0000.324] I> pmic: reset reason (nverc) : 0x50
[0000.328] I> Reading GPT from 512 for device 00000003
We seldom really debug the PMIC error. May I ask few questions here
- Are you using devkit? What release are you using?
- Do you have any way to reproduce this issue? Could you hit error if you just put device idle?
WayneWWW,
I found this sudden power-off only happen a specific carrier(I/O) board.
After changing the carrier board while using same TX2 module, the issue(sudden power-off) is not reproduced.
To find the root cause why the sudden power-off is happened in the specific carrier board,
I think it will be helpful to know the power on/off reason of the system.
(I guess this issue seems to be related to Power/Heat)
So, could you share the pmic register full information which stores power on/off reason?
These are answers for your questions.
-
Are you using devkit?
==> No, we are using TX2 module + customized carrier(I/O) board.
What release are you using?
==> L4T 28.2.1 (Jetpack 3.2)
-
Do you have any way to reproduce this issue?
==> Yes.
==> After booting, run the application which makes heavy load of the system (refer to below result of tegrastats)
==> About 15 min ~ 30 min later, the system is suddenly shutdown.
./tegrastats
RAM 2485/7854MB (lfb 1153x4MB) CPU [60%@2419,5%@2419,35%@2419,55%@2419,53%@2419,60%@2419] BCPU@56.5C MCPU@56.5C GPU@59C PLL@56.5C Tboard@46C Tdiode@56.5C PMIC@100C thermal@57.3C VDD_IN 17310/17258 VDD_CPU 3647/3848 VDD_GPU 7288/7074 VDD_SOC 1300/1269 VDD_WIFI 0/15 VDD_DDR 2727/2698
RAM 2486/7854MB (lfb 1153x4MB) CPU [64%@2419,7%@2419,30%@2419,55%@2419,55%@2419,53%@2419] BCPU@56.5C MCPU@56.5C GPU@61.5C PLL@56.5C Tboard@46C Tdiode@57.25C PMIC@100C thermal@57.5C VDD_IN 17000/17256 VDD_CPU 3484/3846 VDD_GPU 7137/7074 VDD_SOC 1260/1269 VDD_WIFI 0/15 VDD_DDR 2689/2698
Could you hit error if you just put device idle?
==>No, the issue isn’t reproduced in idle state.