Tx1 software reset problem

Hi,
I am recently encountering a software reset problem in some of the Tx1 I have, the issue is completely random. When the SOM resets, I get the PMC reset source : Software reset in syslog.

Oct 15 07:06:12 bot-3874 systemd[1]: Reached target Timers.
Oct 15 07:06:12 bot-3874 kernel: [    0.249102] platform tegradc.1: domain=ffffffc0fa297c18 allocates as[0]=ffffffc0fa17c2b0
Oct 15 07:06:12 bot-3874 kernel: [    0.251644] platform 57000000.gpu: domain=ffffffc0fa29ef18 allocates as[0]=ffffffc0fa17c318
Oct 15 07:06:12 bot-3874 kernel: [    0.251724] platform 57000000.gpu: domain=ffffffc0fa29ef18 allocates as[1]=ffffffc0fa17c380
Oct 15 07:06:12 bot-3874 kernel: [    0.251799] platform 57000000.gpu: domain=ffffffc0fa29ef18 allocates as[2]=ffffffc0fa17c3e8
Oct 15 07:06:12 bot-3874 kernel: [    0.251870] platform 57000000.gpu: domain=ffffffc0fa29ef18 allocates as[3]=ffffffc0fa17c450
Oct 15 07:06:12 bot-3874 kernel: [    0.252684] tegra-pmc 7000e400.pmc: i2c-thermtrip node not found, emergency thermal reset disabled.
Oct 15 07:06:12 bot-3874 systemd[1]: Listening on OpenBSD Secure Shell server socket.
Oct 15 07:06:12 bot-3874 kernel: [    0.252730] tegra-pmc 7000e400.pmc: scratch reg offset dts data not present
Oct 15 07:06:12 bot-3874 kernel: [    0.252756] tegra-pmc: ### PMC reset source: Software reset
Oct 15 07:06:12 bot-3874 kernel: [    0.252778] tegra-pmc: ### PMC reset status reg: 0x3
Oct 15 07:06:12 bot-3874 kernel: [    0.296351] padctrl padctrl.0: Pad control driver tegra-pmc-padctrl registered
Oct 15 07:06:12 bot-3874 kernel: [    0.296388] tegra-pmc: Clear bootloader IO dpd settings
Oct 15 07:06:12 bot-3874 systemd[1]: Listening on GPS (Global Positioning System) Daemon Sockets.
Oct 15 07:06:12 bot-3874 kernel: [    0.296418] tegra-pmc 7000e400.pmc: IO 
Oct 15 07:06:12 bot-3874 kernel: [    0.296490] tegra-pmc 7000e400.pmc: PMC: Successfully configure bootrom reset commands
Oct 15 07:06:12 bot-3874 systemd[1]: Listening on CUPS Scheduler.
Oct 15 07:06:12 bot-3874 kernel: [    0.299636] platform 1003000.pcie-controller: domain=ffffffc0fa2ae018 allocates as[0]=ffffffc0fa17c4b8
Oct 15 07:06:12 bot-3874 kernel: [    0.301898] platform sdhci-tegra.3: domain=ffffffc0fa333118 allocates as[0]=ffffffc0fa17c520
Oct 15 07:06:12 bot-3874 kernel: [    0.302475] platform sdhci-tegra.0: domain=ffffffc0fa333518 allocates as[0]=ffffffc0fa17c588
Oct 15 07:06:12 bot-3874 systemd[1]: Started Set console keymap.
Oct 15 07:06:12 bot-3874 kernel: [    0.309355] vdd-ac-bat: 5000 mV 
Oct 15 07:06:12 bot-3874 kernel: [    0.310543] vdd-cpu-fixed: 1000 mV 

And below are the last few line in syslog just before reset happened (and the logs are same in 3/4 instances where this reset problem reproduced).

Oct 15 07:00:01 bot-3874 rsyslogd-2007: action 'action 9' suspended, next retry is Thu Oct 15 07:01:31 2020 [v8.16.0 try http://www.rsyslog.com/e/2007 ]
Oct 15 07:00:01 bot-3874 rsyslogd-2007: action 'action 26' suspended, next retry is Thu Oct 15 07:01:31 2020 [v8.16.0 try http://www.rsyslog.com/e/2007 ]
Oct 15 07:00:01 bot-3874 CRON[27139]: (root) CMD (/tmp/collectd.sh)
Oct 15 07:00:01 bot-3874 CRON[27138]: (CRON) info (No MTA installed, discarding output)
Oct 15 07:05:01 bot-3874 rsyslogd-2007: action 'action 9' suspended, next retry is Thu Oct 15 07:06:31 2020 [v8.16.0 try http://www.rsyslog.com/e/2007 ]
Oct 15 07:05:01 bot-3874 rsyslogd-2007: action 'action 26' suspended, next retry is Thu Oct 15 07:06:31 2020 [v8.16.0 try http://www.rsyslog.com/e/2007 ]
Oct 15 07:05:01 bot-3874 CRON[29064]: (root) CMD (/tmp/collectd.sh)
Oct 15 07:05:01 bot-3874 CRON[29060]: (CRON) info (No MTA installed, discarding output) 

Below attached is the complete syslog :https://drive.google.com/file/d/1n0noq9OFvm1WLerDq-OtzS09vmr6ztwx/view?usp=sharing (couldn’t upload directly here, because of some uploading error).

System Information :
Nvidia Tx1

Linux nvidia 4.4.38-tegra #1 SMP PREEMPT Thu Mar 1 20:44:58 PST 2018 aarch64 aarch64 aarch64 GNU/Linux
#R28 (release), REVISION: 2.0, GCID: 10567845, BOARD: t210ref, EABI: aarch64, DATE: Fri Mar 2 04:58:16 UTC 2018

Sorry, I could not find anything specific. What I did find is that you get this sort of error when rsyslogd tries to use a missing service. One case is the “action 9” when rsyslogd is trying to use the optional “xconsole”, and xconsole is not there. Have you changed anything in relation to system logging defaults?

I couldn’t find much information on xconsole, but I did see package “python-exconsole”, and this is related, but not installed by default. Do you have anything running in Python which might be configured to log failures, and if so, do you have installed package “python-exconsole”? I could see this making adjustments to rsyslogd configuration.

I did see a reference to the original xconsole, and it uses the “Athena” widget set, which is extraordinarily old. Now if you were using this particular method of displaying errors, then it would mean you’d need to be logged in to X for it to work.

Sorry, that is all I could find.

Are you able to check the serial console log too?

I guess there should be more logs from serial console if this is software reset.

I will try to collect serial logs next time the issue reproduces.

Reproduced the issue and captured below serial logs

247 packages can be updated.

7 updates are security updates.

New release '18.04.5 LTS' available.

Run 'do-release-upgrade' to upgrade to it.

[ 15.765129] Setting pll_a = 45158400 Hz clk_out = 11289600 Hz

[ 15.818678] Setting pll_a = 45158400 Hz clk_out = 11289600 Hz

[ 15.847370] Setting pll_a = 45158400 Hz clk_out = 11289600 Hz

nvidia@bot-3874:~$ [ 15.879965] Setting pll_a = 45158400 Hz clk_out = 11289600 Hz

[ 15.888305] Setting pll_a = 45158400 Hz clk_out = 11289600 Hz

[ 15.897261] Setting pll_a = 45158400 Hz clk_out = 11289600 Hz

[ 15.923970] Setting pll_a = 45158400 Hz clk_out = 11289600 Hz

[ 15.932234] Setting pll_a = 45158400 Hz clk_out = 11289600 Hz

[ 15.951174] Setting pll_a = 45158400 Hz clk_out = 11289600 Hz

[ 16.010019] Setting pll_a = 45158400 Hz clk_out = 11289600 Hz

[ 16.209674] IPVS: Creating netns size=1424 id=3

[ 684.493976] CFG80211-ERROR) wl_is_linkdown : Link down Reason : WLC_E_DEAUTH_IND

[ 684.501542] CFG80211-ERROR) wl_notify_connect_status : link down if wlan0 may call cfg80211_disconnected. event : 6, reason=7 from 00:35:1a:19:ff:3f

[ 684.532371] CFG80211-ERROR) wl_cfg80211_connect : Connectting with00:35:1a:19:ff:3f channel (48) ssid "gorwifi_2", len (9)

[ 684.547867] cfg80211: World regulatory domain updated:

[ 684.553606] cfg80211: DFS Master region: unset

[ 684.558751] cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)

[ 684.569254] cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)

[ 684.577691] cfg80211: (2457000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)

[ 684.585800] cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm), (N/A)

[ 684.594191] cfg80211: (5170000 KHz - 5250000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (N/A)

[ 684.596562] CFG80211-ERROR) wl_notify_connect_status : wl_bss_connect_done succeeded with 00:35:1a:19:ff:3f

[ 684.613538] cfg80211: (5250000 KHz - 5330000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (0 s)

[ 684.620710] CFG80211-ERROR) wl_bss_connect_done : Report connect result - connection succeeded

[ 684.620718] CFG80211-ERROR) wl_notify_connect_status : wl_bss_connect_done succeeded with 00:35:1a:19:ff:3f

[ 684.641367] cfg80211: (5490000 KHz - 5730000 KHz @ 160000 KHz), (N/A, 2000 mBm), (0 s)

[ 684.649546] cfg80211: (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)

[ 684.657645] cfg80211: (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm), (N/A)

[ 684.671531] cfg80211: World regulatory domain updated:

[ 684.676735] cfg80211: DFS Master region: unset

[ 684.681104] cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)

[ 684.690845] cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)

[ 684.698930] cfg80211: (2457000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)

[ 684.706966] cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm), (N/A)

[ 684.714984] cfg80211: (5170000 KHz - 5250000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (N/A)

[ 684.724467] cfg80211: (5250000 KHz - 5330000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (0 s)

[ 684.733960] cfg80211: (5490000 KHz - 5730000 KHz @ 160000 KHz), (N/A, 2000 mBm), (0 s)

[ 684.742090] cfg80211: (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)

[ 684.750119] cfg80211: (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm), (N/A)

[ 684.789493] cfg80211: Regulatory domain changed to country: IN

[ 684.795369] cfg80211: DFS Master region: JP

[ 684.799513] cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)

[ 684.809385] cfg80211: (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)

[ 684.817415] cfg80211: (5170000 KHz - 5250000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (N/A)

[ 684.826939] cfg80211: (5250000 KHz - 5330000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (0 s)

[ 684.836433] cfg80211: (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)

[ 1690.324240] CFG80211-ERROR) wl_bss_roaming_done : wl_bss_roaming_done succeeded to 00:f2:8b:d2:11:2f (ch:64)

[ 1690.334434] CFG80211-ERROR) wl_notify_connect_status : wl_bss_connect_done succeeded with 00:f2:8b:d2:11:2f

[ 1690.346033] CFG80211-ERROR) wl_bss_roaming_done : BSS already present, Skipping roamed event to upper layer

[ 1712.643975] CFG80211-ERROR) wl_bss_roaming_done : wl_bss_roaming_done succeeded to 00:35:1a:19:ff:3f (ch:48)

[ 1712.654677] CFG80211-ERROR) wl_notify_connect_status : wl_bss_connect_done succeeded with 00:35:1a:19:ff:3f

[ 1712.709666] CFG80211-ERROR) wl_bss_roaming_done : BSS already present, Skipping roamed event to upper layer

[ 1737.530513] CFG80211-ERROR) wl_bss_roaming_done : wl_bss_roaming_done succeeded to 00:f2:8b:d2:11:2f (ch:64)

[ 1737.543132] CFG80211-ERROR) wl_notify_connect_status : wl_bss_connect_done succeeded with 00:f2:8b:d2:11:2f

[ 1737.554825] CFG80211-ERROR) wl_bss_roaming_done : BSS already present, Skipping roamed event to upper layer

[ 1753.996221] CFG80211-ERROR) wl_bss_roaming_done : wl_bss_roaming_done succeeded to 00:35:1a:19:ff:3f (ch:48)

[ 1754.010738] CFG80211-ERROR) wl_notify_connect_status : wl_bss_connect_done succeeded with 00:35:1a:19:ff:3f

[ 1754.071173] CFG80211-ERROR) wl_bss_roaming_done : BSS already present, Skipping roamed event to upper layer

[ 3171.121673] Unable to handle kernel paging request at virtual address dead000000000108

[ 3171.129653] pgd = ffffffc0ce248000

[ 3171.133075] [dead000000000108] *pgd=000000014e193003, *pud=000000014e193003, *pmd=0000000000000000

[ 3171.142119] Internal error: Oops: 96000044 [#1] PREEMPT SMP

[ 3171.147711] Modules linked in: xt_addrtype br_netfilter bcmdhd e1000e ptp pps_core bluedroid_pm

[ 3171.156597] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.38-tegra #1

[ 3171.163050] Hardware name: jetson_tx1 (DT)

[ 3171.167166] task: ffffffc0012cb240 ti: ffffffc0012b4000 task.ti: ffffffc0012b4000

[ 3171.174692] PC is at tegra_dma_tasklet+0x44/0xcc

[ 3171.179328] LR is at tegra_dma_tasklet+0x84/0xcc

[ 3171.183961] pc : [<ffffffc0004fcfe0>] lr : [<ffffffc0004fd020>] pstate: 200001c5

[ 3171.191367] sp : ffffffc0012b7be0

[ 3171.194697] x29: ffffffc0012b7be0 x28: ffffffc0012b4000

[ 3171.200054] x27: ffffffc00128eab8 x26: ffffffc0012ba000

[ 3171.205411] x25: 0000000000000101 x24: ffffffc0302cd250

[ 3171.210766] x23: ffffffc0302cd180 x22: ffffffc0302cd208

[ 3171.216118] x21: ffffffc03031d818 x20: ffffffc00054fff0

[ 3171.221470] x19: 0000000000000000 x18: 0000000000000000

[ 3171.226822] x17: 0000000000000000 x16: 0000000000000000

[ 3171.232171] x15: 0000000000000000 x14: 0000000000000001

[ 3171.237522] x13: 00000000000162c8 x12: 0000000000000000

[ 3171.242877] x11: 0000000000002c59 x10: 0000000000000024

[ 3171.248228] x9 : 0000000000004000 x8 : 0000000000000000

[ 3171.253578] x7 : 0000000000000001 x6 : dead000000000100

[ 3171.258933] x5 : dead000000000200 x4 : 0000000000000100

[ 3171.264284] x3 : 0000000000000000 x2 : ffffffc0f07ccc70

[ 3171.269638] x1 : 0000000000000140 x0 : 0000000000000140

[ 3171.274990]

[ 3171.276505] Process swapper/0 (pid: 0, stack limit = 0xffffffc0012b4020)

[ 3171.283218] Call trace:

[ 3171.285688] [<ffffffc0004fcfe0>] tegra_dma_tasklet+0x44/0xcc

[ 3171.291382] [<ffffffc0000a806c>] tasklet_action+0x8c/0xfc

[ 3171.296801] [<ffffffc0000a74c0>] __do_softirq+0x1a0/0x3f4

[ 3171.302219] [<ffffffc0000a79e4>] irq_exit+0x74/0xe0

[ 3171.307129] [<ffffffc0000f2cf4>] __handle_domain_irq+0x90/0xb0

[ 3171.312985] [<ffffffc000080f10>] gic_handle_irq+0x68/0xbc

[ 3171.318405] [<ffffffc000083f44>] el1_irq+0x84/0x100

[ 3171.323323] [<ffffffc0007fbe54>] cpuidle_enter+0x18/0x20

[ 3171.328656] [<ffffffc0000e2518>] call_cpuidle+0x48/0x54

[ 3171.333900] [<ffffffc0000e27ec>] cpu_startup_entry+0x2c8/0x394

[ 3171.339767] [<ffffffc000b1ffe4>] rest_init+0x84/0x8c

[ 3171.344768] [<ffffffc001168998>] start_kernel+0x3ac/0x3c0

[ 3171.350185] [<0000000080b26000>] 0x80b26000

[ 3171.356359] ---[ end trace 0765fd2ddef179b0 ]---

[ 3171.369388] Kernel panic - not syncing: Fatal exception in interrupt

[ 3171.375974] CPU1: stopping

[ 3171.378755] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G D 4.4.38-tegra #1

[ 3171.386421] Hardware name: jetson_tx1 (DT)

[ 3171.390531] Call trace:

[ 3171.393023] [<ffffffc000088fc8>] dump_backtrace+0x0/0xf4

[ 3171.398357] [<ffffffc0000890d0>] show_stack+0x14/0x1c

[ 3171.403444] [<ffffffc0003769ac>] dump_stack+0xac/0xe4

[ 3171.408519] [<ffffffc00008e964>] handle_IPI+0x16c/0x328

[ 3171.413761] [<ffffffc000080f3c>] gic_handle_irq+0x94/0xbc

[ 3171.419175] [<ffffffc000083f44>] el1_irq+0x84/0x100

[ 3171.424083] [<ffffffc0007fbe54>] cpuidle_enter+0x18/0x20

[ 3171.429412] [<ffffffc0000e2518>] call_cpuidle+0x48/0x54

[ 3171.434651] [<ffffffc0000e27ec>] cpu_startup_entry+0x2c8/0x394

[ 3171.440500] [<ffffffc00008e514>] secondary_start_kernel+0x15c/0x168

[ 3171.446781] [<000000008008126c>] 0x8008126c

[ 3171.450981] CPU3: stopping

[ 3171.453731] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G D 4.4.38-tegra #1

[ 3171.461396] Hardware name: jetson_tx1 (DT)

[ 3171.465505] Call trace:

[ 3171.467982] [<ffffffc000088fc8>] dump_backtrace+0x0/0xf4

[ 3171.473313] [<ffffffc0000890d0>] show_stack+0x14/0x1c

[ 3171.478386] [<ffffffc0003769ac>] dump_stack+0xac/0xe4

[ 3171.483456] [<ffffffc00008e964>] handle_IPI+0x16c/0x328

[ 3171.488696] [<ffffffc000080f3c>] gic_handle_irq+0x94/0xbc

[ 3171.494111] [<ffffffc000083f44>] el1_irq+0x84/0x100

[ 3171.499010] [<ffffffc0007fbe54>] cpuidle_enter+0x18/0x20

[ 3171.504338] [<ffffffc0000e2518>] call_cpuidle+0x48/0x54

[ 3171.509578] [<ffffffc0000e27ec>] cpu_startup_entry+0x2c8/0x394

[ 3171.515429] [<ffffffc00008e514>] secondary_start_kernel+0x15c/0x168

[ 3171.521707] [<000000008008126c>] 0x8008126c

[ 3171.525905] CPU2: stopping

[ 3171.528653] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G D 4.4.38-tegra #1

[ 3171.536319] Hardware name: jetson_tx1 (DT)

[ 3171.540428] Call trace:

[ 3171.542903] [<ffffffc000088fc8>] dump_backtrace+0x0/0xf4

[ 3171.548235] [<ffffffc0000890d0>] show_stack+0x14/0x1c

[ 3171.553309] [<ffffffc0003769ac>] dump_stack+0xac/0xe4

[ 3171.558377] [<ffffffc00008e964>] handle_IPI+0x16c/0x328

[ 3171.563617] [<ffffffc000080f3c>] gic_handle_irq+0x94/0xbc

[ 3171.569032] [<ffffffc000083f44>] el1_irq+0x84/0x100

[ 3171.573931] [<ffffffc0007fbe54>] cpuidle_enter+0x18/0x20

[ 3171.579259] [<ffffffc0000e2518>] call_cpuidle+0x48/0x54

[ 3171.584500] [<ffffffc0000e27ec>] cpu_startup_entry+0x2c8/0x394

[ 3171.590349] [<ffffffc00008e514>] secondary_start_kernel+0x15c/0x168

[ 3171.596628] [<000000008008126c>] 0x8008126c

[ 3171.639763] Rebooting in 30 seconds..�[0000.194] [TegraBoot] (version 00.00.2014.50-mobile-6987b70e)

[0000.200] Processing in cold boot mode Bootloader 2

[0000.204] A02 Bootrom Patch rev = 1023

[0000.208] Power-up reason: software reset

[0000.212] No Battery Present

[0000.214] RamCode = 1

[0000.217] Platform has Ddr4 type ram

[0000.220] max77620 disabling SD1 Remote Sense

[0000.224] Setting Ddr voltage to 1125mv

[0000.228] Serial Number of Pmic Max77663: 0x428c6

[0000.236] Entering ramdump check

[0000.239] Get RamDumpCarveOut = 0xff23f000

[0000.243] RamDumpCarveOut=0xff23f000, RamDumperFlag=0x0

[0000.248] Last reboot was clean, booting normally!

[0000.252] Sdram initialization is successful

[0000.256] SecureOs Carveout Base=0xff800000 Size=0x00800000

[0000.262] GSC1 Carveout Base=0xff700000 Size=0x00100000

[0000.267] GSC2 Carveout Base=0xff600000 Size=0x00100000

[0000.272] GSC3 Carveout Base=0xff500000 Size=0x00100000

[0000.277] GSC4 Carveout Base=0xff400000 Size=0x00100000

[0000.282] GSC5 Carveout Base=0xff300000 Size=0x00100000

[0000.288] BpmpFw Carveout Base=0xff2c0000 Size=0x00040000

[0000.293] Lp0 Carveout Base=0xff2bf000 Size=0x00001000

[0000.308] RamDump Carveout Base=0xff23f000 Size=0x00080000

[0000.314] Platform-DebugCarveout: 0

[0000.317] Nck Carveout Base=0xff03f000 Size=0x00200000

[0000.322] Non secure mode, and RB not enabled.

[0000.366] Using GPT Primary to query partitions

[0000.371] Loading Tboot-CPU binary

[0000.420] Verifying TBC in OdmNonSecureSBK mode

[0000.429] Bootloader load address is 0xa0000000, entry address is 0xa0000258

[0000.436] Bootloader downloaded successfully.

[0000.440] Downloaded Tboot-CPU binary to 0xa0000258

[0000.445] MAX77620_GPIO1 Configured.

[0000.449] MAX77620_GPIO5 Configured.

[0000.452] CPU power rail is up

[0000.455] CPU clock enabled

[0000.459] Performing RAM repair

[0000.462] Updating A64 Warmreset Address to 0xa00002e9

[0000.467] Loading NvTbootBootloaderDTB

[0000.562] Verifying NvTbootBootloaderDTB in OdmNonSecureSBK mode

[0000.692] Bootloader DTB Load Address: 0x83000000

[0000.696] Loading NvTbootKernelDTB

[0000.791] Verifying NvTbootKernelDTB in OdmNonSecureSBK mode

[0000.921] Kernel DTB Load Address: 0x83100000

[0000.926] Loading cboot binary

[0001.020] Verifying EBT in OdmNonSecureSBK mode

[0001.056] Bootloader load address is 0x92c00000, entry address is 0x92c00258

[0001.063] Bootloader downloaded successfully.

[0001.067] GPT: Partition NOT found !

[0001.071] Find Partition via GPT Failed

[0001.074] Find Partition via PT Failed

[0001.078] function NvTbootGetBinaryOffsets: 0x1 error

[0001.083] Error in NvTbootLoadBinary: 0x1 !

[0001.087] Next binary entry address: 0x92c00258

[0001.091] BoardId: 2180

[0001.096] dram memory type is 3

[0001.100] WB0 init successful

[0001.126] Bpmp FW successfully loaded

[0001.130] Set NvDecSticky Bits

[0001.133] GSC1 address : ff700000

[0001.136] GSC2 address ff63fffc value c0edbbcc

[0001.141] GSC2 address : ff600000

[0001.145] GSC3 address : ff500000

[0001.149] GSC4 address : ff400000

[0001.152] GSC5 address : ff300000

[0001.156] GSC MC Settings done

[0001.159] TOS plaintext Image length 32848

[0001.163] Monitor size 32848

[0001.166] OS size 0

[0001.169] *** Secure OS image signature not verified ***

[0001.174] Loading and Validation of Secure OS Successful

[0001.179] NvTbootPackSdramParams: start.

[0001.185] NvTbootPackSdramParams: done.

[0001.188] Tegraboot started after 174477 us

[0001.192] Basic modules init took 951686 us

[0001.196] Sec Bootdevice Read Time = 355 ms, Read Size = 15785 KB

[0001.202] Sec Bootdevice Write Time = 0 ms, Write Size = 0 KB

[0001.208] Next stage binary read took 805 us

[0001.212] Carveout took -6718 us

[0001.215] CPU initialization took 96982 us

[0001.219] Total time taken by TegraBoot 1042755 us

[0001.224] Starting CPU & Halting co-processor

64NOTICE: BL31: v1.2(release):e1e4477

NOTICE: BL31: Built : 20:39:55, Mar 1 2018

NOTICE: Trusty image missing.

ERROR: Error initializing runtime service trusty_fast

[0001.364] RamCode = 1

[0001.386] LPDDR4 Training: Read DT: Number of tables = 10

[0001.391] EMC Training (SRC-freq: 204000; DST-freq: 408000)

[0001.397] EMC Training Successful

[0001.400] EMC Training (SRC-freq: 204000; DST-freq: 665600)

[0001.406] EMC Training Successful

[0001.409] EMC Training (SRC-freq: 204000; DST-freq: 800000)

[0001.421] EMC Training Successful

[0001.424] EMC Training (SRC-freq: 204000; DST-freq: 1065600)

[0001.447] EMC Training Successful

[0001.450] EMC Training (SRC-freq: 204000; DST-freq: 1331200)

[0001.471] EMC Training Successful

[0001.474] EMC Training (SRC-freq: 204000; DST-freq: 1600000)

[0001.494] EMC Training Successful

[0001.497] Switching to 800000 KHz Success

[0001.507] RamCode = 1

[0001.511] DT Write: emc-table@40800 succeeded

[0001.517] DT Write: emc-table@68000 succeeded

[0001.523] DT Write: emc-table@102000 succeeded

[0001.529] DT Write: emc-table@204000 succeeded

[0001.535] DT Write: emc-table@408000 succeeded

[0001.541] DT Write: emc-table@665600 succeeded

[0001.547] DT Write: emc-table@800000 succeeded

[0001.553] DT Write: emc-table@1065600 succeeded

[0001.560] DT Write: emc-table@1331200 succeeded

[0001.566] DT Write: emc-table@1600000 succeeded

[0001.570] LPDDR4 Training: Write DT: Number of tables = 10

[0001.615]

[0001.616] Debug Init done

[0001.618] Marked DTB cacheable

[0001.621] Bootloader DTB loaded at 0x83000000

[0001.626] Marked DTB cacheable

[0001.629] Kernel DTB loaded at 0x83100000

[0001.633] DeviceTree Init done

[0001.652] Pinmux applied successfully

[0001.658] gicd_base: 0x50041000

[0001.663] gicc_base: 0x50042000

[0001.666] Interrupts Init done

[0001.671] Using base:0x60005008 & irq:33 for tick-timer

[0001.676] Using base:0x60005000 for delay-timer

[0001.681] platform_init_timer: DONE

[0001.684] Timer(tick) Init done

[0001.689] osc freq = 38400 khz

[0001.695]

[0001.696] welcome to cboot

[0001.698]

[0001.700] Cboot Version: 00.00.2014.50-t210-e831cf53

[0001.704] calling constructors

[0001.707] initializing heap

[0001.710] initializing threads

[0001.713] initializing timers

[0001.716] creating bootstrap completion thread

[0001.720] top of bootstrap2()

[0001.723] CPU: ARM Cortex A57

[0001.726] CPU: MIDR: 0x411FD071, MPIDR: 0x80000000

[0001.731] initializing platform

[0001.767] config for ddr50 mode completed

[0001.771] sdmmc bdev is already initialized

[0001.775] Enable APE clock

[0001.777] Un-powergate APE partition

[0001.781] of_register: registering tegra_udc to of_hal

[0001.786] of_register: registering inv20628-driver to of_hal

[0001.791] of_register: registering ads1015-driver to of_hal

[0001.797] of_register: registering lp8557-bl-driver to of_hal

[0001.803] of_register: registering bq2419x_charger to of_hal

[0001.808] of_register: registering cpc to of_hal

[0001.813] of_register: registering bq27441_fuel_gauge to of_hal

[0001.833] gpio framework initialized

[0001.836] of_register: registering tca9539_gpio to of_hal

[0001.841] of_register: registering tca9539_gpio to of_hal

[0001.847] of_register: registering i2c_bus_driver to of_hal

[0001.852] of_register: registering i2c_bus_driver to of_hal

[0001.858] of_register: registering i2c_bus_driver to of_hal

[0001.863] pmic framework initialized

[0001.867] of_register: registering max77620_pmic to of_hal

[0001.872] regulator framework initialized

[0001.876] of_register: registering tegra_xhci to of_hal

[0001.881] initializing target

[0001.890] gpio_driver_register: register 'tegra_gpio_driver' driver

[0001.902] fixed regulator driver initialized

[0001.980] initializing OF layer

[0001.983] NCK carveout not present

[0001.987] Skipping dts_overrides

[0001.991] of_children_init: Ops found for compatible string nvidia,tegra210-xhci

[0001.999] of_children_init: Ops found for compatible string nvidia,tegra210-i2c

[0002.031] I2C Bus Init done

[0002.033] of_children_init: Ops found for compatible string nvidia,tegra210-i2c

[0002.048] I2C Bus Init done

[0002.051] of_children_init: Ops found for compatible string ti,tca9539

[0002.065] tca9539_init: i2c bus: 2, slave addr: 0xe8

[0002.070] I2C slave not started

[0002.073] I2C write failed

[0002.076] i2c write failed

[0002.078] Slave: 0xe8, Register 0x4 Instance 2, Buffer Size: 0

[0002.084] tca9539_device_init: failed to write polar reg

[0002.089] tca9539_init: failed to init device!

[0002.094] of_children_init: Ops found for compatible string ti,tca9539

[0002.108] tca9539_init: i2c bus: 2, slave addr: 0xee

[0002.113] I2C slave not started

[0002.116] I2C write failed

[0002.119] i2c write failed

[0002.121] Slave: 0xee, Register 0x4 Instance 2, Buffer Size: 0

[0002.127] tca9539_device_init: failed to write polar reg

[0002.132] tca9539_init: failed to init device!

[0002.137] of_children_init: Ops found for compatible string nvidia,tegra210-i2c

[0002.152] I2C Bus Init done

[0002.154] of_children_init: Ops found for compatible string nvidia,tegra210-i2c

[0002.169] I2C Bus Init done

[0002.172] of_children_init: Ops found for compatible string nvidia,tegra210-i2c

[0002.187] I2C Bus Init done

[0002.190] of_children_init: Ops found for compatible string maxim,max77620

[0002.205] max77620_init using irq 118

[0002.210] register 'maxim,max77620' pmic

[0002.215] gpio_driver_register: register 'max77620-gpio' driver

[0002.221] of_children_init: Ops found for compatible string nvidia,tegra210-i2c

[0002.236] I2C Bus Init done

[0002.241] NCK carveout not present

[0002.245] shim_invoke: No NCT, Calling dts updates

[0002.266] Find /i2c@7000c000's alias i2c0

[0002.270] get eeprom at 1-a2, size 256, type 0

[0002.274] get eeprom at 1-ae, size 256, type 0

[0002.289] Find /i2c@7000c400's alias i2c1

[0002.293] get eeprom at 2-a0, size 256, type 0

[0002.309] Find /i2c@7000c500's alias i2c2

[0002.313] get eeprom at 3-a0, size 256, type 0

[0002.317] get eeprom at 3-ae, size 256, type 0

[0002.329] Find /host1x/i2c@546c0000's alias i2c6

[0002.333] get eeprom at 7-a8, size 256, type 0

[0002.338] pm_ids_update: Updating 1,a2, size 256, type 0

[0002.343] I2C slave not started

[0002.346] I2C write failed

[0002.349] Writing offset failed

[0002.352] eeprom_init: EEPROM read failed

[0002.356] pm_ids_update: eeprom init failed

[0002.360] pm_ids_update: Updating 1,ae, size 256, type 0

[0002.365] I2C slave not started

[0002.368] I2C write failed

[0002.371] Writing offset failed

[0002.374] eeprom_init: EEPROM read failed

[0002.378] pm_ids_update: eeprom init failed

[0002.382] pm_ids_update: Updating 2,a0, size 256, type 0

[0002.387] I2C slave not started

[0002.390] I2C write failed

[0002.393] Writing offset failed

[0002.396] eeprom_init: EEPROM read failed

[0002.400] pm_ids_update: eeprom init failed

[0002.404] pm_ids_update: Updating 3,a0, size 256, type 0

[0002.434] pm_ids_update: The pm board id is 2180-1000-401

[0002.443] Adding plugin-manager/ids/2180-1000-401=/i2c@7000c500:module@0x50

[0002.453] pm_ids_update: pm id update successful

[0002.458] pm_ids_update: Updating 3,ae, size 256, type 0

[0002.463] I2C slave not started

[0002.466] I2C write failed

[0002.469] Writing offset failed

[0002.472] eeprom_init: EEPROM read failed

[0002.476] pm_ids_update: eeprom init failed

[0002.480] pm_ids_update: Updating 7,a8, size 256, type 0

[0002.487] GPIO driver for phandle 0xe7 could not be found

[0002.493] I2C slave not started

[0002.495] I2C write failed

[0002.498] Writing offset failed

[0002.501] eeprom_init: EEPROM read failed

[0002.505] pm_ids_update: eeprom init failed

[0002.511] GPIO driver for phandle 0xe7 could not be found

[0002.544] updating /chosen/nvidia,wifi-mac node 00:04:4b:d3:d8:e3

[0002.554] updating /chosen/nvidia,bluetooth-mac node 00:04:4b:d3:d8:e4

[0002.563] updating /chosen/nvidia,ethernet-mac node 00:04:4b:d3:d8:e5

[0002.570] Plugin Manager: Parse ODM data 0x00084000

[0002.607] Find /i2c@7000c000's alias i2c0

[0002.611] get eeprom at 1-a2, size 256, type 0

[0002.616] get eeprom at 1-ae, size 256, type 0

[0002.631] Find /i2c@7000c400's alias i2c1

[0002.635] get eeprom at 2-a0, size 256, type 0

[0002.651] Find /i2c@7000c500's alias i2c2

[0002.654] get eeprom at 3-a0, size 256, type 0

[0002.659] get eeprom at 3-ae, size 256, type 0

[0002.671] Find /host1x/i2c@546c0000's alias i2c6

[0002.675] get eeprom at 7-a8, size 256, type 0

[0002.679] pm_ids_update: Updating 1,a2, size 256, type 0

[0002.685] I2C slave not started

[0002.688] I2C write failed

[0002.690] Writing offset failed

[0002.693] eeprom_init: EEPROM read failed

[0002.697] pm_ids_update: eeprom init failed

[0002.701] pm_ids_update: Updating 1,ae, size 256, type 0

[0002.707] I2C slave not started

[0002.710] I2C write failed

[0002.712] Writing offset failed

[0002.715] eeprom_init: EEPROM read failed

[0002.719] pm_ids_update: eeprom init failed

[0002.723] pm_ids_update: Updating 2,a0, size 256, type 0

[0002.729] I2C slave not started

[0002.732] I2C write failed

[0002.734] Writing offset failed

[0002.737] eeprom_init: EEPROM read failed

[0002.741] pm_ids_update: eeprom init failed

[0002.745] pm_ids_update: Updating 3,a0, size 256, type 0

[0002.776] pm_ids_update: The pm board id is 2180-1000-401

[0002.784] Adding plugin-manager/ids/2180-1000-401=/i2c@7000c500:module@0x50

[0002.790] pm_ids_update: pm id update successful

[0002.795] pm_ids_update: Updating 3,ae, size 256, type 0

[0002.800] I2C slave not started

[0002.803] I2C write failed

[0002.806] Writing offset failed

[0002.809] eeprom_init: EEPROM read failed

[0002.813] pm_ids_update: eeprom init failed

[0002.817] pm_ids_update: Updating 7,a8, size 256, type 0

[0002.824] GPIO driver for phandle 0xe7 could not be found

[0002.830] I2C slave not started

[0002.833] I2C write failed

[0002.835] Writing offset failed

[0002.838] eeprom_init: EEPROM read failed

[0002.842] pm_ids_update: eeprom init failed

[0002.848] GPIO driver for phandle 0xe7 could not be found

[0002.881] Chip UID is 00000001643073071000000007010500

[0002.889] shim_cmdline_install: /chosen/bootargs: root=/dev/mmcblk0p1 rw rootwait console=ttyS0,115200n8 console=tty0 OS=l4t fbcon=map:0 net.ifnames=0 androidboot.modem=none androidboot.serialno=04201190103290000501 androidboot.security=non-secure

[0002.912] Add serial number:04201190103290000501 as DT property

[0002.921] calling apps_init()

[0002.929] Found 20 GPT partitions in "sdmmc3_user"

[0002.933] Proceeding to Cold Boot

[0002.936] starting app android_boot_app

[0002.940] Device state: unlocked

[0002.943] display console init

[0002.962] No valid panel node found

[0002.971] enabling 'vdd-hdmi' regulator

[0002.975] GPIO driver for phandle 0xe8 could not be found

[0002.995] DT entry for leds-pwm not found

[0003.000] hdmi cable not connected

[0003.003] is_hdmi_needed: HDMI not connected, returning false

[0003.009] set_instance_and_out_type: invalid out_type

[0003.014] display_console_init: No display init

[0003.025] Gpio keyboard init success[serial.log|attachment](upload://pp7AQW6hxf9ndW7udYa3sYRBAb4.log) (133.6 KB)

Hi,

You can see kernel panic that stalls cpu and then cause the reboot.

Such issue is generally caused by your system. Thus, you need to tell us the detail of

  1. What is your carrier board? Is it a nv devkit or custom carrier board?

  2. What is running when this error happens?

  3. Which release are you using?

  4. Does this issue happen to other TX1 module or carrier board combination too?

1. What is your carrier board? Is it a nv devkit or custom carrier board?
We are using Astro carrier board (https://connecttech.com/product/astro-carrier-for-nvidia-jetson-tx2-tx1/)

2. What is running when this error happens?
There are multiple services running on the board, so I can’t exactly point to a specific service which is causing this, rather we want to find out the same. Is there a way we can get the information that what/which service caused the kernel panic?

3. Which release are you using?

Linux nvidia 4.4.38-tegra #1 SMP PREEMPT Thu Mar 1 20:44:58 PST 2018 aarch64 aarch64 aarch64 GNU/Linux
#R28 (release), REVISION: 2.0, GCID: 10567845, BOARD: t210ref, EABI: aarch64, DATE: Fri Mar 2 04:58:16 UTC 2018

4. Does this issue happen to other TX1 module or carrier board combination too?
we have produced this issue in many tx1 boards, So I don’t think so it is one of the case.
(Note: OS image remains same across all the boards)

The one that causes error is “tegra_dma_tasklet”. Do you have any service that uses this is driver?

I am curious, the release is a bit dated, so I am guessing you have a reason to use this release. Is there some particular software or peripheral or driver which is preventing you from trying a newer release?

Also, does this have serial console available? I realize there is no micro-USB, but perhaps it has header pins.

Hi suhail.m,

Any update? Is this still an issue to support?

Hi @kayccc, Yes we are still facing the issue, and trying to trace back the services recently we changed which may be causing it. So far No luck but an observation I have whenever the issue comes, the serial logs just before the kernel panic have following error almost in all the cases I have captured, which is related to CFG80211-ERROR and the last line is always the same as below

[14258.212049] CFG80211-ERROR) wl_bss_roaming_done : BSS already present, Skipping roamed event to upper layer

Complete Serial logs

[14111.699621] CFG80211-ERROR) wl_bss_roaming_done : wl_bss_roaming_done succeeded to 38:0e:4d:d5:4b:8f (ch:56)

[14111.711393] CFG80211-ERROR) wl_notify_connect_status : wl_bss_connect_done succeeded with 38:0e:4d:d5:4b:8f

[14111.798806] CFG80211-ERROR) wl_bss_roaming_done : BSS already present, Skipping roamed event to upper layer

[14140.944196] CFG80211-ERROR) wl_bss_roaming_done : wl_bss_roaming_done succeeded to 00:f2:8b:d2:11:2f (ch:60)

[14140.955326] CFG80211-ERROR) wl_notify_connect_status : wl_bss_connect_done succeeded with 00:f2:8b:d2:11:2f

[14140.967082] CFG80211-ERROR) wl_bss_roaming_done : BSS already present, Skipping roamed event to upper layer

[14224.442830] CFG80211-ERROR) wl_bss_roaming_done : wl_bss_roaming_done succeeded to 38:0e:4d:d5:4b:8f (ch:56)

[14224.454493] CFG80211-ERROR) wl_notify_connect_status : wl_bss_connect_done succeeded with 38:0e:4d:d5:4b:8f

[14224.542558] CFG80211-ERROR) wl_bss_roaming_done : BSS already present, Skipping roamed event to upper layer

[14258.167150] CFG80211-ERROR) wl_bss_roaming_done : wl_bss_roaming_done succeeded to ec:bd:1d:52:fe:bf (ch:40)

[14258.183547] CFG80211-ERROR) wl_notify_connect_status : wl_bss_connect_done succeeded with ec:bd:1d:52:fe:bf

[14258.212049] CFG80211-ERROR) wl_bss_roaming_done : BSS already present, Skipping roamed event to upper layer

 [14258.774682] Unable to handle kernel paging request at virtual address dead000000000108

[14258.782622] pgd = ffffffc0f24aa000

[14258.786028] [dead000000000108] *pgd=00000001660f3003, *pud=00000001660f3003, *pmd=0000000000000000

[14258.795003] Internal error: Oops: 96000044 [#1] PREEMPT SMP

[14258.800565] Modules linked in: xt_addrtype br_netfilter bcmdhd e1000e ptp pps_core bluedroid_pm

[14258.809310] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.38-tegra #1

[14258.815737] Hardware name: jetson_tx1 (DT)

[14258.819825] task: ffffffc0012cb240 ti: ffffffc0012b4000 task.ti: ffffffc0012b4000

[14258.827298] PC is at tegra_dma_tasklet+0x44/0xcc

[14258.831912] LR is at tegra_dma_tasklet+0x84/0xcc

[14258.836526] pc : [<ffffffc0004fcfe0>] lr : [<ffffffc0004fd020>] pstate: 200001c5

[14258.843906] sp : ffffffc0012b7be0
[14258.847210] x29: ffffffc0012b7be0 x28: ffffffc0012b4000

[14258.852521] x27: ffffffc00128eab8 x26: ffffffc0012ba000

[14258.857831] x25: 0000000000000101 x24: ffffffc0302cd250

[14258.863142] x23: ffffffc0302cd180 x22: ffffffc0302cd208

[14258.868452] x21: ffffffc03031d818 x20: ffffffc00054fff0

[14258.873761] x19: 0000000000000000 x18: 0000000000000000

[14258.879070] x17: 0000000000000000 x16: 0000000000000000

[14258.884379] x15: 0000000000000000 x14: 0000000000000001

[14258.889689] x13: 000000000000a5e0 x12: 0000000000000000

[14258.895000] x11: 00000000000014bc x10: 0000000000000024

[14258.900308] x9 : 0000000000004000 x8 : 0000000000000000

[14258.905619] x7 : 0000000000000001 x6 : dead000000000100

[14258.910928] x5 : dead000000000200 x4 : 0000000000000100

[14258.916238] x3 : 0000000000000000 x2 : ffffffc0e7765a70

[14258.921549] x1 : 0000000000000140 x0 : 0000000000000140

[14258.926858]

[14258.928345] Process swapper/0 (pid: 0, stack limit = 0xffffffc0012b4020)

[14258.935028] Call trace:

[14258.937467] [<ffffffc0004fcfe0>] tegra_dma_tasklet+0x44/0xcc

[14258.943116] [<ffffffc0000a806c>] tasklet_action+0x8c/0xfc

[14258.948501] [<ffffffc0000a74c0>] __do_softirq+0x1a0/0x3f4

[14258.953886] [<ffffffc0000a79e4>] irq_exit+0x74/0xe0

[14258.958753] [<ffffffc0000f2cf4>] __handle_domain_irq+0x90/0xb0

[14258.964573] [<ffffffc000080f10>] gic_handle_irq+0x68/0xbc

[14258.969958] [<ffffffc000083f44>] el1_irq+0x84/0x100

[14258.974827] [<ffffffc0007fbe54>] cpuidle_enter+0x18/0x20

[14258.980127] [<ffffffc0000e2518>] call_cpuidle+0x48/0x54

[14258.985339] [<ffffffc0000e27ec>] cpu_startup_entry+0x2c8/0x394

[14258.991160] [<ffffffc000b1ffe4>] rest_init+0x84/0x8c

[14258.996114] [<ffffffc001168998>] start_kernel+0x3ac/0x3c0

[14259.001498] [<0000000080b26000>] 0x80b26000

[14259.006011] ---[ end trace c6f32708a76740d7 ]---

[14259.012354] Kernel panic - not syncing: Fatal exception in interrupt

4-nov_serial_console.log (248.3 KB)

Not sure what the meaning is, but I am surprised your WiFi has two MAC addresses. Are you using two WiFi devices simultaneously? That interaction might be related to the problem.

@linuxdev there are multiple Access Point with the same SSID. The tegra board is mounted on a moving mobile robot, So the wifi access points would keep on changing based on the range it is nearer to.

Can you then confirm that there are multiple WiFi devices running at the same time? I’m not so much thinking about SSID as I am having multiple WiFi devices with different MAC addresses showing up (the physical devices have their own MAC addresses in the logs). These devices seem to show up their setup just before the kernel error, and I wondered if perhaps there was a conflict between them. You might want to closely describe the WiFi devices (such as why there are 2) and whether there is any kind of bridging or other interaction intended. If you get the chance, then prior to the error you might include the output from:

ifconfig
rfkill list
route

After some search I found the following thread which also have the similar crash logs. Solution for the same is marked by a patch . Can some point me after applying the patch and compiling the kernel module again, what would be files needs to changed for current system to work with applied changes? I found following directories which has the relevant files at

/sys/bus/platform/drivers/tegra-apbdma/

any lead would be helpful regarding this

Unfortunately that URL about the patch does not say which kernel version this goes with. The comments there are actually from 2016, and so this is very old. I would be surprised if this is not already in the kernel code for the more recent releases. However, it looks like this was sent to NVIDIA. Perhaps someone from NVIDIA could comment? The email address of an NVIDIA employee is given in the git name, so it might not be too hard to find out.

Hi,
You can download the kernel source code from
Linux for Tegra R28.2 | NVIDIA Developer
Jetson TX1 Sources

I have built the kernel with patch suggested above and we are observing the issue with updated kernel Image. I will be monitoring the issue for couple of weeks and would update if the patch has actually fixed it.

Also I checked in the latest L4T releases and found till Jetpack 4.3 ( L4T 32.3.1]) the changes has not been merged, however in Jetpack version 4.4 ( L4T 32.4.2) onwards changes are present in the source code.