How to know why TX2 reset suddenly?

We mounted TX2 on a custom board. Two camera sensors connected to TX2 via a parallel-csi bridge.The board worked well with TX1.

However, the TX2 often reset for unknown reasons. Is there any method to know why TX2 reset?

Sometimes config sensors via SPI may trig the reset (not all configurations but some with long integration time). Sometimes TX2 reset during video streaming out. For some sensors’ configurations or in parallel-csi bridge test mode, the reset would not happen. I tried adding cooler and changing better power source, seemed not the reason.

Any thoughts about the problem? Or method to find out the reason?

Some logs from debug uart, sometimes endless reset, the kmsg “tegra-i2c 3190000.i2c: i2c transfer timed out addr: 0x50” should have nothing to do with the reset:

[   39.288335] tegra-i2c 3190000.i2c: rx dma timeout
[   39.293513] tegra-i2c 3190000.i2c: i2c transfer timed out addr: 0x50
[0000.098] C> I2C command failed
[0000.101] C> block index = (4) and rail_id = (1)
[0000.105] C> Addr: Reg = [0xe8:0x07]: 336166925
[0000.221] I> Welcome to MB2(TBoot-BPMP)(version: 01.00.160913-t186-M-00.00-mobile-2c57a56c)
[0000.229] I> Default Heap @ [0xd486400 - 0xd488400]
[0000.234] I> DMA Heap @ [0x84a00000 - 0x85300000]
[0000.239] I> bit @ 0xd480000
[0000.242] I> BR-BCT relocated to 0xd7220000
[0000.246] I> Boot-device: eMMC
[0000.250] I> sdmmc bdev is already initialized
[0000.254] I> pmic: reset reason (nverc)        : 0x0
[0000.258] I> Reading GPT from 512 for device 00000003
[0000.264] I> Reading GPT from 8388096 for device 00000003
[0000.271] I> Found 6 partitions in 00000003 device
[0000.276] I> Reading GPT from 512 for device 00010003
[0000.283] I> Found 17 partitions in 00010003 device
[0000.288] W> No valid slot number is found in scratch register
[0000.294] W> Return default slot: _a
[0000.297] I> A/B: bin_type (16) slot 0
[0000.301] I> Loading partition bpmp-fw at 0xd7800000
[0000.306] I> Reading two headers - addr:0xd7800000 blocks:1
[0000.311] I> Addr: 0xd7800000, start-block: 58740229, num_blocks: 1
[0000.326] I> Binary(16) of size 528400 is loaded @ 0xd7800000
[0000.332] W> No valid slot number is found in scratch register
[0000.337] W> Return default slot: _a
[0000.341] I> A/B: bin_type (17) slot 0
[0000.344] I> Loading partition bpmp-fw-dtb at 0xd79f0000
[0000.350] I> Reading two headers - addr:0xd79f0000 blocks:1
[0000.355] I> Addr: 0xd79f0000, start-block: 58741437, num_blocks: 1
[0000.369] I> Binary(17) of size 465760 is loaded @ 0xd798e200
[0000.547] I> BPMP-FW load address = 0xd7800000
[0000.551] I> BPMP-FW DTB load address = 0x5018e200
[0000.556] I> Loading SCE-FW ...
[0000.559] W> No valid slot number is found in scratch register
[0000.564] W> Return default slot: _a
[0000.568] I> A/B: bin_type (12) slot 0
[0000.571] I> Loading partition sce-fw at 0xd7300000
[0000.576] I> Reading two headers - addr:0xd7300000 blocks:1
[0000.582] I> Addr: 0xd7300000, start-block: 58742437, num_blocks: 1
[0000.590] I> Binary(12) of size 76592 is loaded @ 0xd7300000
[0000.596] I> Init SCE
[0000.598] I> Copy BTCM section
[0000.601] W> No valid slot number is found in scratch register
[0000.607] W> Return default slot: _a
[0000.610] I> A/B: bin_type (13) slot 0
[0000.614] I> Loading partition cpu-bootloader at 0x96000000
[0000.619] I> Reading two headers - addr:0x96000000 blocks:1
[0000.625] I> Addr: 0x96000000, start-block: 58732545, num_blocks: 1
[0000.635] I> Binary(13) of size 221728 is loaded @ 0x96000000
[0000.641] W> No valid slot number is found in scratch register
[0000.647] W> Return default slot: _a
[0000.650] I> A/B: bin_type (20) slot 0
[0000.654] I> Loading partition bootloader-dtb at 0x85300000
[0000.659] I> Reading two headers - addr:0x85300000 blocks:1
[0000.665] I> Addr: 0x85300000, start-block: 58733057, num_blocks: 1
[0000.676] I> Binary(20) of size 267952 is loaded @ 0x85300000
[0000.682] I> MB2-params(VA) @ 0xd7200000
[0000.685] I> CPUBL-params(VA) @ 0xd7200000
[0000.689] I> CPUBL-params(PA) @ 0x277200000
[0000.694] I> CPU-BL loaded @ PA 0x96000000
[0000.698] I> Loading TOS ...
[0000.700] W> No valid slot number is found in scratch register
[0000.706] W> Return default slot: _a
[0000.709] I> A/B: bin_type (14) slot 0
[0000.713] I> Loading partition secure-os at 0x84a0f400
[0000.718] I> Reading two headers - addr:0x84a0f400 blocks:1
[0000.723] I> Addr: 0x84a0f400, start-block: 58734081, num_blocks: 1
[0000.732] I> Binary(14) of size 58480 is loaded @ 0x84a0f400
[0000.738] I> Copying Monitor (length: 0xe270) from 0x84a0f600 to 0x40000000
[0000.745] I> Erasing Monitor @ 0x84a0f600
[0000.749] I> Unhalting SCE
[0000.752] I> Primary Memory Start:80000000 Size:70000000
[0000.757] I> Extended Memory Start:f0110000 Size:185ef0000
[0000.763] I> Waypoint2-ACK: 0x520120b0
[0000.767] I> MB2(TBoot-BPMP) done

NOTICE:  BL31: v1.2(release):cc5fd7c
NOTICE:  BL31: Built : 00:44:34, Jul 20 2017
NOTICE:  Trusty image missing.
ERROR:   Error initializing runtime service trusty_fast
[0000.118] C> I2C command failed
[0000.121] C> block index = (4) and rail_id = (1)
[0000.125] C> Addr: Reg = [0xe8:0x07]: 336166925
[0000.241] I> Welcome to MB2(TBoot-BPMP)(version: 01.00.160913-t186-M-00.00-mobile-2c57a56c)
[0000.250] I> Default Heap @ [0xd486400 - 0xd488400]
[0000.254] I> DMA Heap @ [0x84a00000 - 0x85300000]
[0000.259] I> bit @ 0▒[0000.112] C> I2C command failed
[0000.115] C> block index = (4) and rail_id = (1)
[0000.119] C> Addr: Reg = [0xe8:0x07]: 336166925
[0000.235] I> Welcome to MB2(TBoot-BPMP)(version: 01.00.160913-t186-M-00.00-mobile-2c57a56c)
[0000.243] I> Default Heap @ [0xd486400 - 0xd488400]
[0000.248] I> DMA Heap @ [0x84a00000 - 0x85300000]
[0000.253] I> bit @ 0xd480000
[0000.256] I> BR-BCT relocated to 0xd7220000
[0000.260] I> Boot-device: eMMC
[0000.263] I> sdmmc bdev is already initialized
[0000.268] I> pmic: reset reason (nverc)        : 0x0
[0000.272] I> Reading GPT from 512 for device 00000003
[0000.278] I> Reading GPT from 8388096 for device 00000003
[0000.285] I> Found 6 partitions in 00000003 device
[0000.290] I> Reading GPT from 512 for device 00010003
[0000.297] I> Found 17 partitions in 00010003 device
[0000.302] W> No valid slot number is found in scratch register
[0000.307] W> Return default slot: _a
[0000.311] I> A/B: bin_type (16) slot 0
[0000.314] I> Loading partition bpmp-fw at 0xd7800000
[0000.319] I> Reading two headers - addr:0xd7800000 blocks:1
[0000.325] I> Addr: 0xd7800000, start-block: 58740229, num_blocks: 1
[0000.340] I> Binary(16) of size 528400 is loaded @ 0xd7800000
[0000.345] W> No valid slot number is found in scratch register
[0000.351] W> Return default slot: _a
[0000.355] I> A/B: bin_type (17) slot 0
[0000.358] I> Loading partition bpmp-fw-dtb at 0xd79f0000
[0000.363] I> Reading two headers - addr:0xd79f0000 blocks:1
[0000.369] I> Addr: 0xd79f0000, start-block: 58741437, num_blocks: 1
[0000.383] I> Binary(17) of size 465760 is loaded @ 0xd798e200
[0000.560] I> BPMP-FW load address = 0xd7800000
[0000.565] I> BPMP-FW DTB load address = 0x5018e200
[0000.570] I> Loading SCE-FW ...
[0000.573] W> No valid slot number is found in scratch register
[0000.578] W> Return default slot: _a
[0000.582] I> A/B: bin_type (12) slot 0
[0000.585] I> Loading partition sce-fw at 0xd7300000
[0000.590] I> Reading two headers - addr:0xd7300000 blocks:1
[0000.595] I> Addr: 0xd7300000, start-block: 58742437, num_blocks: 1
[0000.604] I> Binary(12) of size 76592 is loaded @ 0xd7300000
[0000.610] I> Init SCE
[0000.612] I> Copy BTCM section
[0000.615] W> No valid slot number is found in scratch register
[0000.621] W> Return default slot: _a
[0000.624] I> A/B: bin_type (13) slot 0
[0000.628] I> Loading partition cpu-bootloader at 0x96000000
[0000.633] I> Reading two headers - addr:0x96000000 blocks:1
[0000.639] I> Addr: 0x96000000, start-block: 58732545, num_blocks: 1
[0000.649] I> Binary(13) of size 221728 is loaded @ 0x96000000
[0000.655] W> No valid slot number is found in scratch register
[0000.661] W> Return default slot: _a
[0000.664] I> A/B: bin_type (20) slot 0
[0000.668] I> Loading partition bootloader-dtb at 0x85300000
[0000.673] I> Reading two headers - addr:0x85300000 blocks:1
[0000.679] I> Addr: 0x85300000, start-block: 58733057, num_blocks: 1
[0000.690] I> Binary(20) of size 267952 is loaded @ 0x85300000
[0000.695] I> MB2-params(VA) @ 0xd7200000
[0000.699] I> CPUBL-params(VA) @ 0xd7200000
[0000.703] I> CPUBL-params(PA) @ 0x277200000
[0000.707] I> CPU-BL loaded @ PA 0x96000000
[0000.711] I> Loading TOS ...
[0000.714] W> No valid slot number is found in scratch register
[0000.720] W> Return default slot: _a
[0000.723] I> A/B: bin_type (14) slot 0
[0000.727] I> Loading partition secure-os at 0x84a0f400
[0000.732] I> Reading two headers - addr:0x84a0f400 blocks:1
[0000.737] I> Addr: 0x84a0f400, start-block: 58734081, num_blocks: 1
[0000.746] I> Binary(14) of size 58480 is loaded @ 0x84a0f400
[0000.751] I> Copying Monitor (length: 0xe270) from 0x84a0f600 to 0x40000000
[0000.758] I> Erasing Monitor @ 0x84a0f600
[0000.763] I> Unhalting SCE
[0000.765] I> Primary Memory Start:80000000 Size:70000000
[0000.771] I> Extended Memory Start:f0110000 Size:185ef0000
[0000.777] I> Waypoint2-ACK: 0x520120b0
[0000.781] I> MB2(TBoot-BPMP) done

NOTICE:  BL31: v1.2(release):cc5fd7c
NOTICE:  BL31: Built : 00:44:34, Jul 20 2017
NOTICE:  Trusty image missing.
ERROR:   Error initializing runtime service trusty_fast
[0000.111] C> I2C command failed
[0000.114] C> block index = (4) and rail_id = (1)
[0000.118] C> Addr: Reg = [0xe8:0x07]: 336166925

hello ZhangXin,

  1. you could saw the system reset reason from the TegraBoot message.
    for example.
    [0000.154] [TegraBoot] (version 00.00.2014.50-mobile-d44d4bf0)
    [0000.168] Power-up reason: software reset

  2. you could also check syslog to find more logging outputs as below.

/var/log/syslog

If you can run a serial console at all times, then run “dmesg --follow”, you will see the final dmesg lines as the reboot occurs (be sure your serial console program has enough buffer, or logs to a file where you can later scroll back). Or use “sudo tail -f /var/log/syslog” for the syslog version of this under serial console.

Someone had earlier talked about a watchdog timer, I’m thinking perhaps some software load caused a watchdog timer reboot…this would probably show as “software rest”, but I am not certain. Plus of course looking for “watchdog” in “/var/log/syslog”.

How can I extract the TegraBoot message?
My dmesg starts at about 0.8 seconds into the boot process.
I’ve additionally looked in journalctl -xb, /var/log/syslog and /var/log/kern.log have not found any boot reason.

You would probably get information sent to a serial console which does not make its way into file logs (shutdown might disable some logging, echo to console would still work). Seeing a log from serial console at the moment of reboot can verify not only what it sees as unusual, but a comparison to a normal reboot would show what “normal” things no longer show up. This works even in the U-Boot stage where TegraBoot applies…no log file will ever show U-Boot or T-Boot or C-Boot.

Unfortunately, the serial console is not available for analysis for in-system failure analysis/diagnostics.
Given that “something” knows this data, and that the kernel has a mechanism to capture early log information to the dmesg area, it’d be super helpful if that data actually made it there.