Abnormal system reset

Hi,

We have an issue that could be reproduced easily, the partial debug message show as below, and we could see a critical section “Power-up reason: invalid (2147483647)”.
When this issue happens, our system will reset automatically then we could see the message occur.
What’s it means, and what condition will trigger this case?

ubuntu@tegra-ubuntu:~$ [  207.928884] pgd = ffffffc09a7c2000
[  207.932295] [00000000] *pgd=000000011a7c3003, *pmd=0000000000000000
[  207.940221] Library at 0x0: 0x400000 /usr/lib/libreoffice/program/soffice.bin
[  207.947536] Library at 0x7f78e953c8: 0x7f78e0f000 /usr/lib/aarch64-linux-gnu/tegra/libGL.so.1
[  207.956680] vdso base = 0x7f7c778000
[0000.151] [TegraBoot] (version 24.00.2015.42-mobile-ec3b827e)
[0000.156] Processing in cold boot mode Bootloader 2
[0000.161] A02 Bootrom Patch rev = 63
[0000.164] Power-up reason: invalid (2147483647)
[0000.169] No Battery Present
[0000.171] RamCode = 0
[0000.174] Platform has Ddr4 type ram
[0000.177] max77620 disabling SD1 Remote Sense
[0000.181] Setting Ddr voltage to 1125mv
[0000.185] Serial Number of Pmic Max77663: 0x91da3
[0000.193] Entering ramdump check
[0000.196] Get RamDumpCarveOut = 0x0
[0000.199] RamDumpCarveOut=0x0,  RamDumperFlag=0xe59ff3f8
[0000.204] Last reboot was clean, booting normally!
[0000.209] Sdram initialization is successful
[0000.213] SecureOs Carveout Base=0xff800000 Size=0x00800000
[0000.218] GSC1 Carveout Base=0xff700000 Size=0x00100000
[0000.223] GSC2 Carveout Base=0xff600000 Size=0x00100000
[0000.229] GSC3 Carveout Base=0xff500000 Size=0x00100000
[0000.234] GSC4 Carveout Base=0xff400000 Size=0x00100000
[0000.239] GSC5 Carveout Base=0xff300000 Size=0x00100000
[0000.244] BpmpFw Carveout Base=0xff2c0000 Size=0x00040000
[0000.249] Lp0 Carveout Base=0xff2bf000 Size=0x00001000
[0000.265] RamDump Carveout Base=0xff23f000 Size=0x00080000
[0000.270] Platform-DebugCarveout: 0
[0000.273] Nck Carveout Base=0xff03f000 Size=0x00200000
[0000.278] Non secure mode. Disable rollback prevention
[0000.283] AOTAG Init Done
[0000.331] Using GPT Primary to query partitions
[0000.336] Loading Tboot-CPU binary
[0000.385] Verifying bootloader in OdmNonSecureSBK mode
[0000.395] Bootloader load address is 0xa0000000, entry address is 0xa0000258
[0000.402] Bootloader downloaded successfully.
[0000.406] Downloaded Tboot-CPU binary to 0xa0000258
[0000.411] MAX77620_GPIO1 Configured.
[0000.415] MAX77620_GPIO5 Configured.
[0000.418] CPU power rail is up
[0000.421] CPU clock enabled
[0000.425] Performing RAM repair
[0000.428] Updating A64 Warmreset Address to 0xa00002e9
[0000.445] Bootloader DTB Load Address: 0x83000000
[0000.462] Kernel DTB Load Address: 0x83080000
[0000.467] Loading cboot binary
[0000.560] Verifying bootloader in OdmNonSecureSBK mode
[0000.655] Bootloader load address is 0x8010fda8, entry address is 0x80110000
[0000.662] Bootloader downloaded successfully.
[0000.666] GPT: Partition NOT found !
[0000.669] Find Partition via GPT Failed
[0000.673] Find Partition via PT Failed
[0000.677] function NvTbootGetBinaryOffsets: 0x1 error
[0000.681] Error in NvTbootLoadBinary: 0x1 !
[0000.685] Next binary entry address: 0x80110000
[0000.690] BoardId: 2180
[0000.716] NvTbootI2cProbe(): error code 0x00045100 Error while read
[0000.722] Display board id is not available
[0000.726] dram memory type is 3
[0000.730] WB0 init successful
[0000.756] Bpmp FW successfully loaded
[0000.760] Set NvDecSticky Bits
[0000.763] GSC1 address : ff700000
[0000.767] GSC2 address ff63fffc value c0edbbcc
[0000.771] GSC2 address : ff600000
[0000.775] GSC3 address : ff500000
[0000.779] GSC4 address : ff400000
[0000.782] GSC5 address : ff300000
[0000.786] GSC MC Settings done
[0000.789] TOS old plaintext Image length 65536
[0000.796] *** Secure OS image signature not verified ***
[0000.801] Loading and Validation of Secure OS Successful
[0000.806] NvTbootPackSdramParams: start.
[0000.811] NvTbootPackSdramParams: done.
[0000.814] Tegraboot started after 130000 us
[0000.818] Basic modules init took 313851 us
[0000.822] Sec Bootdevice Read Time = 194 ms, Read Size = 8464 KB
[0000.828] Sec Bootdevice Write Time = -1940251267 ms, Write Size = 343597383 KB
[0000.836] Next stage binary read took 12279 us
[0000.840] Carveout took 253097 us
[0000.843] CPU initialization took 125327 us
[0000.847] Total time taken by TegraBoot 704554 us
.................................................
.................................................

Thanks

Normally the “Power-up reason” would say something like “on button” (button pushed). It seems the Jetson doesn’t know how it was started here. Don’t know if it has any meaning.

Do you have more logs prior to this:

[  207.928884] pgd = ffffffc09a7c2000
[  207.932295] [00000000] *pgd=000000011a7c3003, *pmd=0000000000000000
[  207.940221] Library at 0x0: 0x400000 /usr/lib/libreoffice/program/soffice.bin
[  207.947536] Library at 0x7f78e953c8: 0x7f78e0f000 /usr/lib/aarch64-linux-gnu/tegra/libGL.so.1
[  207.956680] vdso base = 0x7f7c778000
[0000.151] [TegraBoot] (version 24.00.2015.42-mobile-ec3b827e)

The “[TegraBoot]” line is the first line of boot. Lines prior to this are from the prior boot. Those prior lines seem to be an error with LibreOffice and OpenGL, but the error message seems to be incomplete.

Hi linuxdev,

The prior message attached as below.

Ubuntu 16.04 LTS tegra-ubuntu ttyS0

tegra-ubuntu login: ubuntu (automatic login)

Last login: Wed Jul 19 18:00:09 CST 2017 on ttyS0
Welcome to Ubuntu 16.04 LTS (GNU/Linux 3.10.96-tegra aarch64)

 * Documentation:  https://help.ubuntu.com/

390 packages can be updated.
0 updates are security updates.

Thanks

Those prior lines simply show a normal startup getting to the point of a shell. Why LibreOffice and OpenGL are mentioned in the fault log I do not know. Was any command at all run before the spontaneous reboot? Was there a logged in GUI session?

hello Ray0420,

could you please try this on the latest release of JetPack 3.1.
BTW, please also monitor CPU temperature to check if this is thermal related.

/sys/kernel/debug/cpu_edp/temperature

Hi JerryChang,

  • We use R24.2.1 the latest kernel source for development, Is there any relative difference between JetPack3.1 and R24.2.1?
  • We have already monitor the CPU temperature dynamically, and the values are under 50, Is there any concern for this value?

Thanks

hello Ray0420,

the BSP version of the JetPack 3.1 is R28.1, which is kernel-4.4.

hello Ray0420,

[0000.164] Power-up reason: invalid (2147483647)

it’s unknown status that caused system rebooted.
which 2147483647 ==> NvTbootPoweronSource_Unknown

may i have more details to narrow down the issue.
for example.

  1. had you met this before with original Jetson Tx1?
  2. had you designed your own carrier board and connected it to Jetson TX1?
  3. had you customize the kernel and caused this failure?

Hi JerryChang,

1.Our system combines more sensor than before, and we don’t meet this issue before.
2.Yes, we design our own carrier board.
3.We add a camera driver and modify the device tree for SPI enable.

Otherwise, we also have another issue which I have ever pose this problem in another topic in https://devtalk.nvidia.com/default/topic/995649/battery-over-current-limit-hit-warning-message-spamming-syslog/#5188608(we name this issue a name as TX1 crash issue internally).
When this crash problem happens, the syslog stop record immediately, so we don’t find any clues at the syslog and when we measure the signal, the result list as below:
1.RESET_OUT# keep high
2.CARRIER_PWR_ON change to low.
3.VDD_IN stay high

We don’t know whether the two issue have any relation, so I pose the two here together.

Thanks

hello Ray0420,

could you help us to narrow down the issue,
for example, is this issue still reproducible with the original device tree?

How much current does this last added device draw? Is there any chance you could replace this with a resistor such that the current draw on that rail matches the device? If current limits go too far it would be very typical of any computer to shut down for protection…I’m just wondering if you’re hitting rail limits too hard and shutdown is forced from that. Measuring draw from the camera and replacing it with a resistor and seeing the issue remain or go away would possibly validate whether or not current draw is the issue (versus software).

Hi JerryChang,

Thanks for your suggestion, before the steps we could do including reproducible with the original device tree, BSP version R28.1, We find a signal phenomenon and update as below.
After measuring the power sequence for more detail checking, we found that if the VDD_IN(19V) have a bigger drop about 1V, the crash issue will be reproduced.
We went to make the ripple countermeasure.
Did you know what the spec is for PMIC tolerance about the power sequence especially the VDD_IN?

Thanks

Hi Ray, seems it is because the current draw too much and so cause the big drop on VDD_IN. If remove the last device, then the power up is normal, right?

Hi Trumany,

We are not sure whether the drop could be eliminated entirely, I mean maybe we could improve the drop value from original 1v to 0.6v, but it still has a small drop.
Actually, The crash issue happens in the pressure test, and it is possibilities for practical use.
So this is why we want to know the tolerance about the VDD_IN, if the drop for 0.7v(just a value I guessed) is safe, we may have the confidence to improve this issue.

Thanks

Theoretically, a 20% droop on VDD_IN will cause shut down process. VIN_PWR_BAD# signal is responsible for this scheme, you can measure it and make it as trigger to capture the VDD_IN droop during pressure test. Then you can find out the threshold of VDD_IN droop on your system.

Hi Trumany,

Thanks very much

We will try to find the threshold.