Reboot stress testing with R28.1 on jetson_tx1 devkit

We ran the reboot testing with R28.1 on Jetson_tx1 development kit overnight, and encountered the system crash about the Bluetooth module as follows.

Is there any solution to this?

[  141.422643] nf_conntrack: automatic helper assignment is deprecated and it will be removed soon. Use the iptables CT target to attach helpers instead.
[  196.918673] Unable to handle kernel NULL pointer dereference at virtual address 0000001c
[  196.926757] pgd = ffffffc001528000
[  196.926762] [0000001c] *pgd=000000017b1d8003, *pud=000000017b1d8003, *pmd=000000017b1d9003, *pte=00e8000050041707
[  196.926766] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[  196.926777] Modules linked in: bnep hci_uart bluetooth bcmdhd bluedroid_pm
[  196.926781] CPU: 3 PID: 220 Comm: kworker/3:2 Not tainted 4.4.38-tegra #1
[  196.926782] Hardware name: jetson_tx1 (DT)
[  196.926794] Workqueue: events hci_uart_write_work [hci_uart]
[  196.926796] task: ffffffc0f4d6f080 ti: ffffffc0f4a64000 task.ti: ffffffc0f4a64000
[  196.926801] PC is at _raw_spin_lock_irqsave+0x34/0x70
[  196.926804] LR is at _raw_spin_lock_irqsave+0x1c/0x70
[  196.926806] pc : [<ffffffc000b356a0>] lr : [<ffffffc000b35688>] pstate: 600001c5
[  196.926807] sp : ffffffc0f4a67cc0
[  196.926810] x29: ffffffc0f4a67cc0 x28: ffffffc0f4b05500 
[  196.926813] x27: 0000000000000002 x26: ffffffc0f4b05578 
[  196.926815] x25: ffffffc0f4b05578 x24: ffffffc0f4b058e0 
[  196.926817] x23: 0000000000000005 x22: ffffffc07b448000 
[  196.926820] x21: 000000000000001c x20: 0000000000000140 
[  196.926822] x19: 000000000000001c x18: ffffffc0c72a7b14 
[  196.926824] x17: ffffffc000bc04b0 x16: 000000000000000e 
[  196.926826] x15: 0000000000000007 x14: 0000007fd28a8b28 
[  196.926828] x13: 0000000000000019 x12: 0000000000000000 
[  196.926831] x11: 0000000000000017 x10: 0000000000000860 
[  196.926833] x9 : ffffffc0f4a67b10 x8 : ffffffc0f4d6f940 
[  196.926835] x7 : ffffffc0ffe79618 x6 : ffffffc000fb1cb8 
[  196.926837] x5 : ffffffc0f4a67d70 x4 : ffffffc0ffe7a280 
[  196.926839] x3 : 0000000000000004 x2 : 0000000000000002 
[  196.926841] x1 : ffffffc0f4a64000 x0 : 0000000000000001 
[  196.926842] 
[  196.926844] Process kworker/3:2 (pid: 220, stack limit = 0xffffffc0f4a64020)
[  196.926844] Call trace:
[  196.926848] [<ffffffc000b356a0>] _raw_spin_lock_irqsave+0x34/0x70
[  196.926852] [<ffffffc00095c400>] skb_dequeue+0x20/0x7c
[  196.926859] [<ffffffbffd020074>] h4_dequeue+0x14/0x1c [hci_uart]
[  196.926865] [<ffffffbffd01f7f0>] hci_uart_write_work+0x108/0x14c [hci_uart]
[  196.926870] [<ffffffc0000bd9cc>] process_one_work+0x25c/0x438
[  196.926873] [<ffffffc0000be038>] worker_thread+0x168/0x290
[  196.926875] [<ffffffc0000c3b2c>] kthread+0xfc/0x104
[  196.926879] [<ffffffc000084790>] ret_from_fork+0x10/0x40
[  196.926882] ---[ end trace 0cf247bfcf0a8e62 ]---
[  196.928385] note: kworker/3:2[220] exited with preempt_count 1
[  196.928394] ------------[ cut here ]------------
[  196.928395] WARNING: at /dvs/git/dirty/git-master_linux/kernel/kernel-4.4/kernel/softirq.c:150
[  196.928401] Modules linked in: bnep hci_uart bluetooth bcmdhd bluedroid_pm
[  196.928401] 
[  196.928405] CPU: 3 PID: 220 Comm: kworker/3:2 Tainted: G      D         4.4.38-tegra #1
[  196.928406] Hardware name: jetson_tx1 (DT)
[  196.928412] task: ffffffc0f4d6f080 ti: ffffffc0f4a64000 task.ti: ffffffc0f4a64000
[  196.928417] PC is at __local_bh_enable_ip+0x3c/0xf4
[  196.928421] LR is at _raw_spin_unlock_bh+0x20/0x28
[  196.928422] pc : [<ffffffc0000a9aa8>] lr : [<ffffffc000b35618>] pstate: 400001c5
[  196.928423] sp : ffffffc0f4a67960
[  196.928426] x29: ffffffc0f4a67960 x28: ffffffc0f4a64000 
[  196.928428] x27: 0000000000000002 x26: ffffffc0f4b05578 
[  196.928430] x25: ffffffc0f4b05578 x24: 0000000000000025 
[  196.928433] x23: 00000000000001c0 x22: ffffffc000fad830 
[  196.928435] x21: ffffffc0014d25b8 x20: ffffffc000131410 
[  196.928437] x19: 0000000000000201 x18: ffffffc0c72a7b14 
[  196.928439] x17: ffffffc000bc04b0 x16: 000000000000000e 
[  196.928441] x15: 0000000000000007 x14: 0ffffffffffffffe 
[  196.928443] x13: 0000000000000018 x12: 0101010101010101 
[  196.928445] x11: 7f7f7f7f7f7f7f7f x10: fefefefeff313932 
[  196.928448] x9 : 7f7f7f7f7f7f7f7f x8 : ffffffc00149f220 
[  196.928450] x7 : 0000000000000000 x6 : ffffffc000fb4218 
[  196.928452] x5 : 0000000000000061 x4 : ffffffc0012e8958 
[  196.928454] x3 : ffffffc0fb2d13f0 x2 : ffffffc0012e88d0 
[  196.928456] x1 : 0000000000000201 x0 : 0000000000000000 
[  196.928456] 
[  196.928458] ---[ end trace 0cf247bfcf0a8e63 ]---
[  196.928459] Call trace:
[  196.928462] [<ffffffc0000a9aa8>] __local_bh_enable_ip+0x3c/0xf4
[  196.928464] [<ffffffc000b35618>] _raw_spin_unlock_bh+0x20/0x28
[  196.928467] [<ffffffc000131410>] cgroup_exit+0x5c/0xf8
[  196.928470] [<ffffffc0000a7c50>] do_exit+0x2e4/0x3d0
[  196.928474] [<ffffffc000089968>] die+0x108/0x11c
[  196.928477] [<ffffffc00009a6d4>] __do_kernel_fault+0x80/0xa0
[  196.928480] [<ffffffc00009aa08>] do_page_fault+0x314/0x3d8
[  196.928482] [<ffffffc00009ab68>] do_translation_fault+0x34/0x48
[  196.928484] [<ffffffc000080958>] do_mem_abort+0x3c/0x98
[  196.928487] [<ffffffc000083d40>] el1_da+0x18/0x78
[  196.928490] [<ffffffc00095c400>] skb_dequeue+0x20/0x7c
[  196.928497] [<ffffffbffd020074>] h4_dequeue+0x14/0x1c [hci_uart]
[  196.928503] [<ffffffbffd01f7f0>] hci_uart_write_work+0x108/0x14c [hci_uart]
[  196.928507] [<ffffffc0000bd9cc>] process_one_work+0x25c/0x438
[  196.928510] [<ffffffc0000be038>] worker_thread+0x168/0x290
[  196.928513] [<ffffffc0000c3b2c>] kthread+0xfc/0x104
[  196.928515] [<ffffffc000084790>] ret_from_fork+0x10/0x40
[  196.933636] Unable to handle kernel paging request at virtual address ffffffffffffffd8
[  196.933638] pgd = ffffffc0c31d4000
[  196.933642] [ffffffffffffffd8] *pgd=00000001431c0003, *pud=00000001431c0003, *pmd=0000000000000000
[  196.933645] Internal error: Oops: 96000005 [#2] PREEMPT SMP
[  196.933652] Modules linked in: bnep hci_uart bluetooth bcmdhd bluedroid_pm
[  196.933656] CPU: 3 PID: 220 Comm: kworker/3:2 Tainted: G      D W       4.4.38-tegra #1
[  196.933657] Hardware name: jetson_tx1 (DT)
[  196.933663] task: ffffffc0f4d6f080 ti: ffffffc0f4a64000 task.ti: ffffffc0f4a64000
[  196.933669] PC is at kthread_data+0x4/0xc
[  196.933672] LR is at wq_worker_sleeping+0x14/0xd8
[  196.933674] pc : [<ffffffc0000c3dbc>] lr : [<ffffffc0000bef90>] pstate: 600001c5
[  196.933675] sp : ffffffc0f4a67940
[  196.933677] x29: ffffffc0f4a67940 x28: ffffffc0f4a64000 
[  196.933680] x27: 0000000000000002 x26: ffffffc0f4b05578 
[  196.933682] x25: ffffffc0ffe79580 x24: ffffffc000b31a9c 
[  196.933685] x23: 0000000000000000 x22: ffffffc0f4d6f5f0 
[  196.933687] x21: 0000000000000003 x20: ffffffc0f4d6f080 
[  196.933689] x19: 0000000000000003 x18: ffffffc0f4a678d8 
[  196.933691] x17: ffffffc000bc04b0 x16: 000000000000000e 
[  196.933693] x15: 0000000000000007 x14: 0000000000000001 
[  196.933695] x13: 0000000000000007 x12: 000000000000000e 
[  196.933698] x11: 0000000000000013 x10: 000000000000001a 
[  196.933700] x9 : 0000000000000000 x8 : 000000004db26158 
[  196.933702] x7 : 0000000048a9c0e8 x6 : ffffffc0ffe79580 
[  196.933704] x5 : 00000000001271c4 x4 : 00000000001d0620 
[  196.933707] x3 : 0000000000000003 x2 : 000000000000af3b 
[  196.933709] x1 : 0000000000000003 x0 : 0000000000000000 
[  196.933709] 
[  196.933711] Process kworker/3:2 (pid: 220, stack limit = 0xffffffc0f4a64020)
[  196.933712] Call trace:
[  196.933715] [<ffffffc0000c3dbc>] kthread_data+0x4/0xc
[  196.933719] [<ffffffc000b316c0>] __schedule+0x150/0x4b0
[  196.933721] [<ffffffc000b31a9c>] schedule+0x7c/0xac
[  196.933725] [<ffffffc0000a7d28>] do_exit+0x3bc/0x3d0
[  196.933728] [<ffffffc000089968>] die+0x108/0x11c
[  196.933732] [<ffffffc00009a6d4>] __do_kernel_fault+0x80/0xa0
[  196.933735] [<ffffffc00009aa08>] do_page_fault+0x314/0x3d8
[  196.933738] [<ffffffc00009ab68>] do_translation_fault+0x34/0x48
[  196.933740] [<ffffffc000080958>] do_mem_abort+0x3c/0x98
[  196.933742] [<ffffffc000083d40>] el1_da+0x18/0x78
[  196.933745] [<ffffffc00095c400>] skb_dequeue+0x20/0x7c
[  196.933754] [<ffffffbffd020074>] h4_dequeue+0x14/0x1c [hci_uart]
[  196.933760] [<ffffffbffd01f7f0>] hci_uart_write_work+0x108/0x14c [hci_uart]
[  196.933763] [<ffffffc0000bd9cc>] process_one_work+0x25c/0x438
[  196.933766] [<ffffffc0000be038>] worker_thread+0x168/0x290
[  196.933768] [<ffffffc0000c3b2c>] kthread+0xfc/0x104
[  196.933771] [<ffffffc000084790>] ret_from_fork+0x10/0x40
[  196.933774] ---[ end trace 0cf247bfcf0a8e64 ]---
[  196.935383] Fixing recursive fault but reboot is needed!
[  216.025225] Watchdog detected hard LOCKUP on cpu 1
[  216.029840] ------------[ cut here ]------------
[  216.034627] WARNING: at /dvs/git/dirty/git-master_linux/kernel/kernel-4.4/kernel/watchdog.c:352
[  216.043304] Modules linked in: bnep hci_uart bluetooth bcmdhd bluedroid_pm
[  216.050213] 
[  216.051700] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G      D W       4.4.38-tegra #1
[  216.059337] Hardware name: jetson_tx1 (DT)
[  216.063421] task: ffffffc0012d5ac0 ti: ffffffc0012c0000 task.ti: ffffffc0012c0000
[  216.070891] PC is at watchdog_check_hardlockup_other_cpu+0xec/0x13c
[  216.077143] LR is at watchdog_check_hardlockup_other_cpu+0xec/0x13c
[  216.083394] pc : [<ffffffc00013fb0c>] lr : [<ffffffc00013fb0c>] pstate: 600001c5
[  216.090770] sp : ffffffc0012c3aa0
[  216.094073] x29: ffffffc0012c3aa0 x28: 0000000000000000 
[  216.099386] x27: 7fffffffffffffff x26: ffffffc0ffe41c98 
[  216.104698] x25: ffffffc0ffe41cd0 x24: ffffffc0012c3d70 
[  216.110012] x23: 0000000000000000 x22: 00000000000000c5 
[  216.115326] x21: ffffffc00129d238 x20: ffffffc0012c8fb0 
[  216.120640] x19: 0000000000000001 x18: 0000000000000000 
[  216.125954] x17: 0000000000000000 x16: 0000000000000000 
[  216.131267] x15: 0000000000000000 x14: 0000000000000000 
[  216.136579] x13: 0000000034d5d91d x12: 0000000001000000 
[  216.141892] x11: 0000000000000000 x10: 0000000000001000 
[  216.147205] x9 : ffffffc000083000 x8 : ffffffc0002f061c 
[  216.152517] x7 : 0000000000000000 x6 : 0000000000000035 
[  216.157830] x5 : 0000000000000000 x4 : 0000000000000000 
[  216.163142] x3 : 0000000000000000 x2 : 0000000000003d77 
[  216.168453] x1 : 0000000000010001 x0 : 0000000000000026 
[  216.173766] 
[  216.175250] ---[ end trace 0cf247bfcf0a8e65 ]---
[  216.179854] Call trace:
[  216.182292] [<ffffffc00013fb0c>] watchdog_check_hardlockup_other_cpu+0xec/0x13c
[  216.189585] [<ffffffc00013fc50>] watchdog_timer_fn+0x4c/0x23c
[  216.195317] [<ffffffc00010b4ac>] __run_hrtimer+0x1d8/0x2f8
[  216.200788] [<ffffffc00010b650>] __hrtimer_run_queues+0x84/0xb0
[  216.206692] [<ffffffc00010b95c>] hrtimer_interrupt+0xac/0x1c8
[  216.212426] [<ffffffc0008466d4>] tegra210_timer_isr+0x28/0x34
[  216.218157] [<ffffffc0000f76dc>] handle_irq_event_percpu+0xf8/0x26c
[  216.224408] [<ffffffc0000f7898>] handle_irq_event+0x48/0x78
[  216.229966] [<ffffffc0000faf88>] handle_fasteoi_irq+0xb0/0xf4
[  216.235697] [<ffffffc0000f6b34>] generic_handle_irq+0x18/0x2c
[  216.241428] [<ffffffc0000f7010>] __handle_domain_irq+0x80/0xac
[  216.247245] [<ffffffc000080ba0>] gic_handle_irq+0x6c/0xb8
[  216.252629] [<ffffffc000083f44>] el1_irq+0x84/0x100
[  216.257494] [<ffffffc00080d704>] cpuidle_enter+0x18/0x20
[  216.262793] [<ffffffc0000e6758>] call_cpuidle+0x4c/0x58
[  216.268004] [<ffffffc0000e6830>] cpuidle_idle_call+0xcc/0x120
[  216.273735] [<ffffffc0000e6ae0>] cpu_idle_loop+0x25c/0x27c
[  216.279206] [<ffffffc0000e6b10>] convert_prio+0x0/0x3c
[  216.284333] [<ffffffc000b303b4>] rest_init+0x8c/0x98
[  216.289286] [<ffffffc001175a7c>] start_kernel+0x2dc/0x2e4
[  216.294670] [<0000000080b36000>] 0x80b36000
[ 4015.918375] CFG80211-ERROR) wl_is_linkdown : Link down Reason : WLC_E_DEAUTH_IND
[ 4015.925771] CFG80211-ERROR) wl_notify_connect_status : link down if wlan0 may call cfg80211_disconnected. event : 6, reason=6 from 48:ee:0c:2b:55:a6
[ 4015.939559] WLDEV-ERROR) wldev_set_country : wldev_set_country: set country for (null) as XR rev 122 failed
[ 4015.949289] CFG80211-ERROR) wl_notify_connect_status : wl_notify_connect_status: failed to reset ccode (-23)
[ 3204.839389] irq 14: nobody cared (try booting with the "irqpoll" option)
[ 3204.846085] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G      D W       4.4.38-tegra #1
[ 3204.853722] Hardware name: jetson_tx1 (DT)
[ 3204.857806] Call trace:
[ 3204.860249] [<ffffffc0000893ac>] dump_backtrace+0x0/0xe8
[ 3204.865548] [<ffffffc000089858>] show_stack+0x14/0x1c
[ 3204.870587] [<ffffffc00036cde0>] __dump_stack+0x20/0x28
[ 3204.875798] [<ffffffc00036ce84>] dump_stack+0x9c/0xd8
[ 3204.880837] [<ffffffc0000fa05c>] __report_bad_irq+0x48/0xd8
[ 3204.886396] [<ffffffc0000fa340>] note_interrupt+0x158/0x1ec
[ 3204.891955] [<ffffffc0000f7820>] handle_irq_event_percpu+0x23c/0x26c
[ 3204.898294] [<ffffffc0000f7898>] handle_irq_event+0x48/0x78
[ 3204.903851] [<ffffffc0000faf88>] handle_fasteoi_irq+0xb0/0xf4
[ 3204.909581] [<ffffffc0000f6b34>] generic_handle_irq+0x18/0x2c
[ 3204.915312] [<ffffffc0000f7010>] __handle_domain_irq+0x80/0xac
[ 3204.921129] [<ffffffc000080ba0>] gic_handle_irq+0x6c/0xb8
[ 3204.926513] [<ffffffc000083f44>] el1_irq+0x84/0x100
[ 3204.931379] [<ffffffc00080d704>] cpuidle_enter+0x18/0x20
[ 3204.936677] [<ffffffc0000e6758>] call_cpuidle+0x4c/0x58
[ 3204.941889] [<ffffffc0000e6830>] cpuidle_idle_call+0xcc/0x120
[ 3204.947619] [<ffffffc0000e6ae0>] cpu_idle_loop+0x25c/0x27c
[ 3204.953091] [<ffffffc0000e6b10>] convert_prio+0x0/0x3c
[ 3204.958216] [<ffffffc000b303b4>] rest_init+0x8c/0x98
[ 3204.963169] [<ffffffc001175a7c>] start_kernel+0x2dc/0x2e4
[ 3204.968554] [<0000000080b36000>] 0x80b36000
[ 3204.972723] handlers:
[ 3204.974987] [<ffffffc0008aa794>] actmon_dev_isr threaded [<ffffffc0008aa6b8>] actmon_dev_fn
[ 3204.983336] Disabling IRQ #14
[ 3087.284202] CFG80211-ERROR) wl_cfg80211_get_station : NOT assoc, error -17
[ 3087.291444] CFG80211-ERROR) wl_cfg80211_get_station : NOT assoc, error -17
[ 3483.745883] CFG80211-ERROR) wl_cfg80211_get_station : NOT assoc, error -17
[ 3483.752984] CFG80211-ERROR) wl_cfg80211_get_station : NOT assoc, error -17

PS: Please refer to attachment for the fully log.

putty-1211-1.zip (2.79 MB)

hello jjc_0115,

we will take a look into this issue.
however, here are questions we would like to get your feedback first.

  1. what’s the criteria of your reboot testing. for example, how many testing hours ?
  2. is there any other devices connected to Jetson development kit ?

Hi JerryChang,

  1. We run a testing script after Ubuntu is started and tested overnight, But I believe you can try to reproduce this issue by manually restarting Ubuntu.
    Here is the script file content:
    #!/bin/bash

echo “Display all pci device…”
lspci -vv
echo “Display network information…”
ifconfig
echo “Display bluetooth mac address…”
hcitool dev
sleep 180
reboot

  1. No any other devices on the Jetson_tx1, but make sure the Bluetooth module is enabled.

hello jjc_0115,

FYI,
we’re able to produce this issue from our side.
will update after we have some findings.
thanks

Hi JerryChang/jjc_0115,

Same issue we are having while using 28.1 L4T.

JerryChang,

Could you update on this?

hello mohanprasath_12,

we’re still look into this internally.
will update the result here,
thanks

hi all,

we had fix bluetooth related crash during a stress test on reboot internally.
this fix will be include in next formal public release.
thanks

Hi JerryChang,

Could you share the solution here? Because, we may not wait for kernel updates.

hello jjc_0115,

here are patches for you to update the kernel image.
thanks
topic_1027440.tar.gz (2.08 KB)

Hi jerryChang,

Thanks for the patches! This one solved the Issue.