U-Boot failure

Hi All,

I found a strange U-BOOT failure.
Hope, someone knows what it is.

It happened after couple month of development work on unit.
Reflashing by JetPack does not help.

Sometimes it boots successfully.
Pressing a key to interrupt autoboot and typing boot after helps every time.

Please look at the line 120:

[0000.187] [TegraBoot] (version 23.00.2015.14-mobile-3a15407d)
[0000.193] Processing in cold boot mode Bootloader 2
[0000.197] A01 Bootrom Patch rev = 63
[0000.201] Power-up reason: reset button
[0000.204] No Battery Present
[0000.207] Platform has Ddr4 type ram
[0000.211] max77620 disabling SD1 Remote Sense
[0000.215] Setting Ddr voltage to 1125mv
[0000.219] Serial Number of Pmic Max77663: 0xa0ce1
[0000.227] Entering ramdump check
[0000.230] Get RamDumpCarveOut = 0x0
[0000.233] RamDumpCarveOut=0x0,  RamDumperFlag=0xe59ff3f8
[0000.238] Last reboot was clean, booting normally!
[0000.243] Sdram initialization is successful
[0000.247] SecureOs Carveout Base=0xff800000 Size=0x00800000
[0000.252] GSC1 Carveout Base=0xff700000 Size=0x00100000
[0000.278] GSC2 Carveout Base=0xff600000 Size=0x00100000
[0000.304] GSC3 Carveout Base=0xff500000 Size=0x00100000
[0000.309] GSC4 Carveout Base=0xff400000 Size=0x00100000
[0000.314] GSC5 Carveout Base=0xff300000 Size=0x00100000
[0000.320] BpmpFw Carveout Base=0xff2c0000 Size=0x00040000
[0000.325] Lp0 Carveout Base=0xff2bf000 Size=0x00001000
[0000.340] RamDump Carveout Base=0xff23f000 Size=0x00080000
[0000.346] Platform-DebugCarveout: 0
[0000.349] Nck Carveout Base=0xff03f000 Size=0x00200000
[0000.399] Using GPT Primary to query partitions
[0000.405] Loading Tboot-CPU binary
[0000.454] Verifying bootloader in OdmNonSecureSBK mode
[0000.463] Bootloader load address is 0xa0000000,                            entry address is 0xa0000258
[0000.473] Bootloader downloaded successfully.
[0000.477] Downloaded Tboot-CPU binary to 0xa0000258
[0000.482] MAX77620_GPIO1 Configured.
[0000.486] MAX77620_GPIO5 Configured.
[0000.489] CPU power rail is up
[0000.492] CPU clock enabled
[0000.496] Performing RAM repair
[0000.499] Updating A64 Warmreset Address to 0xa00002e9
[0000.516] Bootloader DTB Load Address: 0x83000000
[0000.533] Kernel DTB Load Address: 0x83080000
[0000.538] Loading cboot binary
[0000.631] Verifying bootloader in OdmNonSecureSBK mode
[0000.725] Bootloader load address is 0x8010fda8,                            entry address is 0x80110000
[0000.734] Bootloader downloaded successfully.
[0000.738] GPT: Partition NOT found !
[0000.742] Find Partition via GPT Failed
[0000.745] function NvTbootGetBinaryOffsets: 0x845208 error
[0000.750] Error in NvTbootLoadBinary: 0x845208 !
[0000.755] Next binary entry address: 0x80110000
[0000.759] BoardId: 2180
[0000.786] NvTbootI2cWrite(): error code 0x00045100 Error while starting write transaction
[0000.793] NvTbootI2cDeviceRead(): error code 0x00045001 Error while sending the offset to slave
[0000.802] NvTbootI2c: Read failed for slave 0xa2, offset 0x00 with error code 0x00045001
[0000.810] Display board id read failed
[0000.814] dram memory type is 3
[0000.817] WB0 init successful
[0000.844] Bpmp FW successfully loaded
[0000.847] Set NvDecSticky Bits
[0000.850] GSC1 address : ff700000
[0000.854] GSC2 address : ff600000
[0000.858] GSC3 address : ff500000
[0000.862] GSC4 address : ff400000
[0000.865] GSC5 address : ff300000
[0000.868] GSC MC Settings done
[0000.872] TOS old plaintext Image length 61440
[0000.878] *** Secure OS image signature not verified ***
[0000.883] Loading and Validation of Secure OS Successful
[0000.888] NvTbootPackSdramParams: start.
[0000.894] NvTbootPackSdramParams: done.
[0000.897] Tegraboot started after 166928 us
[0000.901] Basic modules init took 349622 us
[0000.905] Sec Bootdevice Read Time = 194 ms, Read Size = 8459 KB
[0000.911] Next stage binary read took 12266 us
[0000.915] Carveout took 251650 us
[0000.918] CPU initialization took 138810 us
[0000.922] Total time taken by TegraBoot 752348 us

[0000.927] Starting CPU & Halting co-processor

64b[0001.063] LPDDR4 Training: Number of tables = 10
[0001.067] EMC Training (SRC-freq: 204000; DST-freq: 408000)
[0001.073] EMC Training Successful
[0001.076] EMC Training (SRC-freq: 204000; DST-freq: 665600)
[0001.082] EMC Training Successful
[0001.085] EMC Training (SRC-freq: 204000; DST-freq: 800000)
[0001.096] EMC Training Successful
[0001.099] EMC Training (SRC-freq: 204000; DST-freq: 1065600)
[0001.122] EMC Training Successful
[0001.125] EMC Training (SRC-freq: 204000; DST-freq: 1331200)
[0001.147] EMC Training Successful
[0001.150] EMC Training (SRC-freq: 204000; DST-freq: 1600000)
[0001.169] EMC Training Successful
[0001.173] Switching to 800000 KHz Success
[0001.206] LPDDR4 Training: Number of tables = 10


U-Boot 2015.07-rc2-geea3f71 (Feb 08 2016 - 17:37:49 -0800)

TEGRA210
Model: NVIDIA P2371-2180
DRAM:  4 GiB
MMC:   Tegra SD/MMC: 0, Tegra SD/MMC: 1
*** Warning - bad CRC, using default environment

tegra-pcie: PCI regions:
tegra-pcie:   I/O: 0x0000000012000000-0x0000000012010000
tegra-pcie:   non-prefetchable memory: 0x0000000013000000-0x0000000020000000
tegra-pcie:   prefetchable memory: 0x0000000020000000-0x0000000040000000
tegra-pcie: 4x1, 1x1 configuration
tegra-pcie: probing port 0, using 4 lanes
tegra-pcie: probing port 1, using 1 lanes
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, ignoring
In:    serial
Out:   serial
Err:   serial
Net:   No ethernet found.
Hit any key to stop autoboot:  0
Tegra210 (P2371-2180) #     2.479470] usb-vbus1: 5000 mV ; R[    2.509249] en-mdm-pwr-3v7: ; Rail OFF
Unknown command '2.479470]' - try 'help'
Unknown command 'R[' - try 'help'
Unknown command 'Rail' - try 'help'
Tegra210 (P2371-2180) # [    2.513136] en-vdd-disp-1v8: 1800 mV ; Rail ON
Unknown command '[' - try 'help'
Unknown command 'Rail' - try 'help'
Tegra210 (P2371-2180) # [    2.517392] en-vdd-cam-hv-2v8: ; Rail OFF
Unknown command '[' - try 'help'
Unknown command 'Rail' - try 'help'
Tegra210 (P2371-2180) # [    2.521114] rtl-5v0: ; Rail OFF
Unknown command '[' - try 'help'
Unknown command 'Rail' - try 'help'
Tegra210 (P2371-2180) # [    2.524192] en-usb-vbus2: ; Rail OFF
Unknown command '[' - try 'help'
Unknown command 'Rail' - try 'help'
Tegra210 (P2371-2180) # [    2.528071] en-vdd-cam-1v2: ; Rail OFF
Unknown command '[' - try 'help'
Unknown command 'Rail' - try 'help'
Tegra210 (P2371-2180) # [    2.532163] regulator-pwm 0.pwm-regulator: PWM request deferred

Hello Alex_Sharapov, thank you for reporting this. Does this occur when removing all USB devices from the TX1 and starting up again? If so, does it occur with certain devices connected and not with other devices?

Hello ctichenor,

Regarding USB devices we are using a keyboard and a mouse connected via USB-3.0 port of the carrier board.
Also, our PCIe device is usually connected.

However, we have started with the same configuration and it was OK. We started to see such behavior after a couple month of work changing a kernel every day.
Perhaps we had experienced a flash damage problem, because we also can see sometimes mmc erros in dmesg.

Hello, Alex_Sharapov:
From your log, it seems that debug uart port gets input, which interrupts u-boot normal booting sequence.

Hit any key to stop autoboot:  0
Tegra210 (P2371-2180) #     2.479470] usb-vbus1: 5000 mV ; R[    2.509249] en-mdm-pwr-3v7: ; Rail OFF
Unknown command '2.479470]' - try 'help'

Would you please check the hardware?
After that happens, u-boot should still work normally and ‘boot’ command should also work.

br
ChenJian

Hi Jachen,

Certainly, I’ve writtent about it.
If I stop autoboot by pressing any key before

Hit any key to stop autoboot:  0

U-Boot works perfectly and can continue normally after boot command.

What type of test can I do to check a hardware?
We are talking about JTX1 dev kit with my PCIe module, but JTX1 time to time has such behavior even PCIe module is not connected at all.

Thank you,
Alex

Hello, Alex:
You may check why u-boot gets input. Please check the debug UART RXD.

That message comes from kernel booting. That’s weird.

br
ChenJian

Sorry, it is unclear for me…

As far I understand you mean serial console, but it is only way to see u-boot…
Could you give me more details what “check the debug UART RXD” means?

Thank you,
Alex

Hello, Alex:
How to you capture the u-boot/kernel log?

br
Chenjian

Hi Chenjian,

Via serial console. :)

Thank you,
Alex

Hello, Alex:
Is that possible the serial console RXD short-circuited to TXD?

br
ChenJian

Hi Chenjian,

Now I see. You mean serial console loopback?
It looks like that, but actually not.

I have found out that problem comes when TX1 is hot.
We had 35 degrees C in our office last two weeks and Jetson’s fun was not effective…

Hello, Alex:
How do you find that u-boot failure? ‘reboot’ command, reset button, or system reset unexpectedly? I’ve never met such issue in my side.

For the thermal fan issue, I remembered you said it was OK in another thread (https://devtalk.nvidia.com/default/topic/952858/jetson-tx1/themal-control-system-is-unstable/post/4945713/#4945713) With fan work, the system should work stably for long time. If the system always works in over-heat case, I’m not sure what will happen.

Do you use Jetson TX1 board, or build your own board?

Can you test this issue in normal case? (do not consider SOC overheat first)

br
ChenJia

Hi Chenjian,

It happens sometimes on only one of our development units.
Once it starts it repeats after reboot, reset or cold start.

We are talking about Jetson TX1 carrier board, we are debugging on it our PCIe drivers.

Regarding a normal case, I can say only that when it really cold after being all the night turned off for example, it starts well. But it should start normally after some hours of work… Now we repaired a fan control loop in kernel (it was broken due to some strange config dependency issue), so , it should be ok…

It seems something happened with this module, because sometimes I can see filesystem and/or mmc errors.
Even complete reflash by JetPack does not help.

Thank you,
Alex

Hi Chenjian,

Just to let you know.
It was a hardware failure of the JTX1 module.

Thank you,
Alex