L4T R28.1 TX2 kernel crashes after enabling watchdog

Hi,

Our jetson-tx2 board running L4T R28.1 image reboots continuously after enabling tegra18x watchdog module.

Enabled watchdog by opening /dev/watchdog device node.

cat /dev/watchdog

After watchdog timeout expires , I got this kernel crash message

[   72.188295] Bad mode in Synchronous Abort handler detected, code 0x86000006 -- IABT (current EL)
[   72.188299] Bad mode in Synchronous Abort handler detected, code 0x86000006 -- IABT (current EL)
[   72.188304] Bad mode in Synchronous Abort handler detected, code 0x86000006 -- IABT (current EL)
[   72.188307] Bad mode in Synchronous Abort handler detected, code 0x86000006 -- IABT (current EL)
[   72.188311] Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP
[   72.188322] Modules linked in: bnep bluetooth bcmdhd
[   72.188328] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.38-l4t-r28.1+g79e4600 #2
[   72.188330] Hardware name: quill (DT)
[   72.188333] task: ffffffc00122ce40 ti: ffffffc00121c000 task.ti: ffffffc00121c000
[   72.188336] PC is at 0x0
[   72.188344] LR is at t18x_a57_enter_state+0x1c/0xc8
[   72.188347] pc : [<0000000000000000>] lr : [<ffffffc00091f33c>] pstate: 800000c5
[   72.188348] sp : ffffffc00121fe80
[   72.188353] x29: ffffffc00121fe80 x28: ffffffc00121c000 
[   72.188356] x27: ffffffc00121ff20 x26: ffffffc000b48000 
[   72.188360] x25: 00000010ca363960 x24: 0000000000000000 
[   72.188362] x23: ffffffc00132afd8 x22: ffffffc00139ec88 
[   72.188365] x21: ffffffc1edb9f498 x20: ffffffc0014662b8 
[   72.188367] x19: 0000000000000000 x18: 0000000000000000 
[   72.188369] x17: 0000007fa2e6ce00 x16: ffffffc0001ddae8 
[   72.188371] x15: 0000000000005b10 x14: 0000000000005b10 
[   72.188373] x13: 0000000000005b10 x12: ffffffc000b49610 
[   72.188375] x11: 0000000000000000 x10: 00000000000008a0 
[   72.188376] x9 : ffffffc00121fe70 x8 : 00000000ffff20ab 
[   72.188379] x7 : 7fffffffffffffff x6 : 00000000964ebe2d 
[   72.188380] x5 : 00ffffffffffffff x4 : 000000003b9aca00 
[   72.188382] x3 : 0000000011a43d00 x2 : 0000000011a43d00 
[   72.188384] x1 : 0000000000000000 x0 : 0000000000000000 
[   72.188384] 
[   72.188386] Process swapper/0 (pid: 0, stack limit = 0xffffffc00121c020)
[   72.188387] Call trace:
[   72.188389] [<          (null)>]           (null)
[   72.188394] [<ffffffc0007e04e4>] cpuidle_enter_state+0x10c/0x350
[   72.188396] [<ffffffc0007e0760>] cpuidle_enter+0x18/0x20
[   72.188399] [<ffffffc0000e82fc>] call_cpuidle+0x24/0x50
[   72.188401] [<ffffffc0000e8598>] cpu_startup_entry+0x270/0x340
[   72.188405] [<ffffffc000b38b40>] rest_init+0x88/0x98
[   72.188409] [<ffffffc0010db95c>] start_kernel+0x38c/0x3a0
[   72.188410] [<0000000080b3f000>] 0x80b3f000
[   72.188413] ---[ end trace a4b51e6a466923a1 ]---
[   72.188414] Internal error: Oops - bad mode: 0 [#2] PREEMPT SMP
[   72.189340] Modules linked in: bnep bluetooth bcmdhd
[   72.189342] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G      D         4.4.38-l4t-r28.1+g79e4600 #2
[   72.189343] Hardware name: quill (DT)
[   72.189345] task: ffffffc1ecaca580 ti: ffffffc1ecadc000 task.ti: ffffffc1ecadc000
[   72.189346] PC is at 0x0
[   72.189349] LR is at t18x_a57_enter_state+0x1c/0xc8
[   72.189351] pc : [<0000000000000000>] lr : [<ffffffc00091f33c>] pstate: 800000c5
[   72.189351] sp : ffffffc1ecadfed0
[   72.189353] x29: ffffffc1ecadfed0 x28: ffffffc1ecadc000 
[   72.189355] x27: ffffffc1ecadff70 x26: ffffffc000b48000 
[   72.189357] x25: 00000010c6e07800 x24: 0000000000000000 
[   72.189359] x23: ffffffc00132afd8 x22: ffffffc00139ec88 
[   72.189360] x21: ffffffc1edbd2498 x20: ffffffc0014662b8 
[   72.189362] x19: 0000000000000000 x18: 0000000000000000 
[   72.189364] x17: 0000007f7dd70fc0 x16: ffffffc0001ca3d8 
[   72.189365] x15: 000000000000aaf8 x14: 00000000000098c0 
[   72.189367] x13: 000000000000aaf8 x12: 000000000000ab07 
[   72.189368] x11: 0000000000003a2b x10: 00000000000008a0 
[   72.189370] x9 : ffffffc1ecadfec0 x8 : 00000000ffff2166 
[   72.189372] x7 : 00000000000002f0 x6 : 0000000096340a3d 
[   72.189373] x5 : 00ffffffffffffff x4 : 000000003b9aca00 
[   72.189375] x3 : 00000000057ab140 x2 : 00000000057ab140 
[   72.189376] x1 : 0000000000000000 x0 : 0000000000000000 
[   72.189377] 
[   72.189378] Process swapper/3 (pid: 0, stack limit = 0xffffffc1ecadc020)
[   72.189379] Call trace:
[   72.189380] [<          (null)>]           (null)
[   72.189383] [<ffffffc0007e04e4>] cpuidle_enter_state+0x10c/0x350
[   72.189385] [<ffffffc0007e0760>] cpuidle_enter+0x18/0x20
[   72.189387] [<ffffffc0000e82fc>] call_cpuidle+0x24/0x50
[   72.189388] [<ffffffc0000e8598>] cpu_startup_entry+0x270/0x340
[   72.189391] [<ffffffc00008e16c>] secondary_start_kernel+0x12c/0x168
[   72.189392] [<000000008008192c>] 0x8008192c
[   72.189395] ---[ end trace a4b51e6a466923a2 ]---
[   72.189396] Internal error: Oops - bad mode: 0 [#3] PREEMPT SMP
[   72.189400] Modules linked in: bnep bluetooth bcmdhd
[   72.189402] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G      D         4.4.38-l4t-r28.1+g79e4600 #2
[   72.189403] Hardware name: quill (DT)
[   72.189405] task: ffffffc1ecacb200 ti: ffffffc1ecae0000 task.ti: ffffffc1ecae0000
[   72.189406] PC is at 0x0
[   72.189409] LR is at t18x_a57_enter_state+0x1c/0xc8
[   72.189410] pc : [<0000000000000000>] lr : [<ffffffc00091f33c>] pstate: 800000c5
[   72.189411] sp : ffffffc1ecae3ed0
[   72.189413] x29: ffffffc1ecae3ed0 x28: ffffffc1ecae0000 
[   72.189415] x27: ffffffc1ecae3f70 x26: ffffffc000b48000 
[   72.189417] x25: 000000107ee12960 x24: 0000000000000000 
[   72.189419] x23: ffffffc00132afd8 x22: ffffffc00139ec88 
[   72.189420] x21: ffffffc1edbe3498 x20: ffffffc0014662b8 
[   72.189422] x19: 0000000000000000 x18: 0000000000000000 
[   72.189423] x17: 0000007fa83a4e00 x16: ffffffc0001ddae8 
[   72.189425] x15: 00000000000060f0 x14: 00000000000060f0 
[   72.189426] x13: 00000000000060f0 x12: 00000000000060f0 
[   72.189428] x11: 00000000328dc7e0 x10: 00000000000008a0 
[   72.189430] x9 : ffffffc1ecae3ec0 x8 : 00000000ffff1f39 
[   72.189431] x7 : ffffffc1edbe0260 x6 : 0000000093f4054d 
[   72.189433] x5 : 00ffffffffffffff x4 : 000000003b9aca00 
[   72.189435] x3 : 0000000012965500 x2 : 0000000012965500 
[   72.189436] x1 : 0000000000000000 x0 : 0000000000000000 
[   72.189436] 
[   72.189438] Process swapper/4 (pid: 0, stack limit = 0xffffffc1ecae0020)
[   72.189438] Call trace:
[   72.189439] [<          (null)>]           (null)
[   72.189442] [<ffffffc0007e04e4>] cpuidle_enter_state+0x10c/0x350
[   72.189444] [<ffffffc0007e0760>] cpuidle_enter+0x18/0x20
[   72.189446] [<ffffffc0000e82fc>] call_cpuidle+0x24/0x50
[   72.189448] [<ffffffc0000e8598>] cpu_startup_entry+0x270/0x340
[   72.189450] [<ffffffc00008e16c>] secondary_start_kernel+0x12c/0x168
[   72.189451] [<000000008008192c>] 0x8008192c
[   72.189453] ---[ end trace a4b51e6a466923a3 ]---
[   72.189459] Kernel panic - not syncing: Attempted to kill the idle task!
[   72.189471] CPU1: stopping
[   72.189478] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D         4.4.38-l4t-r28.1+g79e4600 #2
[   72.189480] Hardware name: quill (DT)
[   72.189482] Call trace:
[   72.189495] [<ffffffc000089348>] dump_backtrace+0x0/0xe8
[   72.189499] [<ffffffc000089444>] show_stack+0x14/0x20
[   72.189525] [<ffffffc000320000>] dump_stack+0xa0/0xc8
[   72.189528] [<ffffffc00008e7a4>] handle_IPI+0x304/0x338
[   72.189531] [<ffffffc00008160c>] gic_handle_irq+0xa4/0xc0
[   72.189534] [<ffffffc000084740>] el1_irq+0x80/0xf8
[   72.189539] [<ffffffc0007e0760>] cpuidle_enter+0x18/0x20
[   72.189544] [<ffffffc0000e82fc>] call_cpuidle+0x24/0x50
[   72.189546] [<ffffffc0000e8598>] cpu_startup_entry+0x270/0x340
[   72.189549] [<ffffffc00008e16c>] secondary_start_kernel+0x12c/0x168
[   72.189552] [<000000008008192c>] 0x8008192c
[   72.189556] CPU2: stopping
[   72.189560] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G      D         4.4.38-l4t-r28.1+g79e4600 #2
[   72.189562] Hardware name: quill (DT)
[   72.189564] Call trace:
[   72.189569] [<ffffffc000089348>] dump_backtrace+0x0/0xe8
[   72.189574] [<ffffffc000089444>] show_stack+0x14/0x20
[   72.189577] [<ffffffc000320000>] dump_stack+0xa0/0xc8
[   72.189580] [<ffffffc00008e7a4>] handle_IPI+0x304/0x338
[   72.189582] [<ffffffc00008160c>] gic_handle_irq+0xa4/0xc0
[   72.189585] [<ffffffc000084740>] el1_irq+0x80/0xf8
[   72.189588] [<ffffffc0007e0760>] cpuidle_enter+0x18/0x20
[   72.189591] [<ffffffc0000e82fc>] call_cpuidle+0x24/0x50
[   72.189593] [<ffffffc0000e8598>] cpu_startup_entry+0x270/0x340
[   72.189596] [<ffffffc00008e16c>] secondary_start_kernel+0x12c/0x168
[   72.189599] [<000000008008192c>] 0x8008192c
[   72.925696] Internal error: Oops - bad mode: 0 [#4] PREEMPT SMP
[   72.931601] Modules linked in: bnep bluetooth bcmdhd
[   72.936597] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G      D         4.4.38-l4t-r28.1+g79e4600 #2
[   72.945363] Hardware name: quill (DT)
[   72.949017] task: ffffffc1ecacbe80 ti: ffffffc1ecae4000 task.ti: ffffffc1ecae4000
[   72.956484] PC is at 0x0
[   72.959010] LR is at t18x_a57_enter_state+0x1c/0xc8
[   72.963875] pc : [<0000000000000000>] lr : [<ffffffc00091f33c>] pstate: 800000c5
[   72.971254] sp : ffffffc1ecae7ed0
[   72.974560] x29: ffffffc1ecae7ed0 x28: ffffffc1ecae4000 
[   72.979881] x27: ffffffc1ecae7f70 x26: ffffffc000b48000 
[   72.985200] x25: 0000001082543620 x24: 0000000000000000 
[   72.990518] x23: ffffffc00132afd8 x22: ffffffc00139ec88 
[   72.995837] x21: ffffffc1edbf4498 x20: ffffffc0014662b8 
[   73.001156] x19: 0000000000000000 x18: 0000000000000000 
[   73.006476] x17: 0000007f97fbb9d8 x16: ffffffc0001dd2d0 
[   73.011794] x15: 0000000000003db0 x14: 0000000000003db0 
[   73.017114] x13: 0000000000003db0 x12: 0000000000003db0 
[   73.022432] x11: 00000000000b33a8 x10: 00000000000008a0 
[   73.027751] x9 : ffffffc1ecae7ec0 x8 : 00000000ffff13bb 
[   73.033069] x7 : 00000000000002f0 x6 : 00000000940eb93d 
[   73.038387] x5 : 00ffffffffffffff x4 : 000000003b9aca00 
[   73.043706] x3 : 000000000fda6640 x2 : 000000000fda6640 
[   73.049026] x1 : 0000000000000000 x0 : 0000000000000000 
[   73.054345] 
[   73.055830] Process swapper/5 (pid: 0, stack limit = 0xffffffc1ecae4020)
[   73.062515] Call trace:
[   73.064953] [<          (null)>]           (null)
[   73.069647] [<ffffffc0007e04e4>] cpuidle_enter_state+0x10c/0x350
[   73.075641] [<ffffffc0007e0760>] cpuidle_enter+0x18/0x20
[   73.080940] [<ffffffc0000e82fc>] call_cpuidle+0x24/0x50
[   73.086151] [<ffffffc0000e8598>] cpu_startup_entry+0x270/0x340
[   73.091970] [<ffffffc00008e16c>] secondary_start_kernel+0x12c/0x168
[   73.098222] [<000000008008192c>] 0x8008192c
[   73.102395] ---[ end trace a4b51e6a466923a4 ]---
[   73.233197] SMP: failed to stop secondary CPUs
[   73.242687] Rebooting in 5 seconds..

Then kernel reboots and last PMC reset source is notified as “Software Reset”.

  1. Why I got this kernel crash message after enabling watchdog?

  2. Why watchdog reboot doesn’t occur normally?

Can you help me with this ?

Thanks.

I’m using 28.2 pre-release, so this won’t necessarily be a good test. However, try this and see if you are able to keep the system up without crash or reboot. If that succeeds, then we can move on to figuring out the shutdown case.

I created this file (I named it “watch.sh”) and did chmod to make it executable. I then run this as root (sudo):

#!/bin/bash

while [ true ]; do
        date '+%a-%d-%b-%Y_%H-%M-%S'
        echo "1" | tee -a /dev/watchdog
        sleep 15
done

Following this I run “sudo cat /dev/watchdog”. I monitor “dmesg --follow” on serial console. This works for me. Then, I kill the watch.sh script, and later the system reboots. See if you get an error while the system is up.

What files do you see from:

ls /dev/watchdog*

What do you see from:

zcat /proc/config.gz | grep WATCHDOG