Sporadic systemd hang during boot on Jetson TX2 (32.4.4 / 32.5.1)

I’m seeing sporadic hangs during boot on L4T 32.4.4 AND L4T 32.5.1, these always have the exact same stack trace, and the board boots normally with no filesystem errors when rebooted. The stack traces are identical and look like this:

1:

[  OK  ] Created slice system-getty.slice.
[  OK  ] Created slice system-serial\x2dgetty.slice.
[  OK  ] Created slice User and Session Slice.
[  OK  ] Started Forward Password R��…uests to Wall Directory Watch.
[  OK  ] Reached target Remote File Systems.
[  OK  ] Reached target Slices.
[  OK  ] Reached target Swap.
[  OK  ] Listening on initctl Compatibility Named Pipe.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Journal Socket.
[  OK  ] Listening on udev Control Socket.
[  OK  ] Listening on udev Kernel Socket.
         Mounting Huge Pages File System...
         Mounting POSIX Message Queue File System...
         Mounting Kernel Debug File System...
         Starting Create list of st��…odes for the current kernel...
         Mounting Kernel Configuration File System...
         Starting Remount Root and Kernel File Systems...
         Starting Apply Kernel Variables...
         Starting udev Coldplug all Devices...
         Starting Setup Virtual Console...
         Starting Set the system clock from the build timestamp...
[  OK  ] Mounted Huge Pages File System.
[  OK  ] Mounted POSIX Message Queue File System.
[  OK  ] Mounted Kernel Debug File System.
[  OK  ] Started Create list of sta��… nodes for the current kernel.
[  OK  ] Started Rollback clear boo��…ltboot crumb is still present.
[  OK  ] Mounted Kernel Configuration File System.
[  OK  ] Started Remount Root and Kernel File Systems.
[  242.822653] INFO: task systemd:1 blocked for more than 120 seconds.
[  242.829027]       Not tainted 4.9.140-l4t-r32.4 #1
[  242.833876] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.841755] systemd         D    0     1      0 0x00000000
[  242.847341] Call trace:
[  242.849836] [<ffffff80080863a4>] __switch_to+0x9c/0xc0
[  242.855044] [<ffffff8008ef41fc>] __schedule+0x22c/0x570
[  242.860330] [<ffffff8008ef4580>] schedule+0x40/0xa8
[  242.865265] [<ffffff8008ef7b00>] schedule_timeout+0x328/0x428
[  242.871066] [<ffffff8008ef3f70>] io_schedule_timeout+0x90/0xf0
[  242.876946] [<ffffff8008ef5020>] bit_wait_io+0x20/0x60
[  242.882141] [<ffffff8008ef4b50>] __wait_on_bit+0x78/0xe8
[  242.887507] [<ffffff8008ef4cf4>] out_of_line_wait_on_bit+0x84/0xb0
[  242.893740] [<ffffff800829dd94>] __wait_on_buffer+0x3c/0x48
[  242.899369] [<ffffff80083169f8>] ext4_find_entry+0x310/0x5d0
[  242.905078] [<ffffff8008316d44>] ext4_lookup+0x8c/0x220
[  242.910364] [<ffffff800826b24c>] lookup_slow+0xa4/0x170
[  242.915720] [<ffffff800826c4cc>] lookup_one_len_unlocked+0x11c/0x148
[  242.922133] [<ffffff8008385edc>] ovl_lookup+0xa4/0x500
[  242.927316] [<ffffff800826efe0>] lookup_open+0x208/0x540
[  242.932683] [<ffffff80082706cc>] do_last+0x614/0x770
[  242.937668] [<ffffff80082708c8>] path_openat+0xa0/0x308
[  242.942909] [<ffffff8008271d28>] do_filp_open+0x88/0x108
[  242.948240] [<ffffff800825cda0>] do_sys_open+0x180/0x270
[  242.953568] [<ffffff800825cf14>] SyS_openat+0x3c/0x50
[  242.958638] [<ffffff8008083900>] el0_svc_naked+0x34/0x38
[  242.963966] Kernel panic - not syncing: hung_task: blocked tasks
[  242.969975] CPU: 5 PID: 47 Comm: khungtaskd Not tainted 4.9.140-l4t-r32.4 #1
[  242.977020] Hardware name: quill (DT)
[  242.980682] Call trace:
[  242.983136] [<ffffff800808c678>] dump_backtrace+0x0/0x1a8
[  242.988535] [<ffffff800808c844>] show_stack+0x24/0x30
[  242.993593] [<ffffff800841e7a8>] dump_stack+0x94/0xbc
[  242.998649] [<ffffff80081c3db0>] panic+0x128/0x28c
[  243.003443] [<ffffff8008178190>] watchdog+0x2f8/0x398
[  243.008496] [<ffffff80080dcd60>] kthread+0x100/0x108
[  243.013459] [<ffffff80080838a0>] ret_from_fork+0x10/0x30
[  243.018775] SMP: stopping secondary CPUs
[  243.022747] Kernel Offset: disabled
[  243.026237] Memory Limit: none
[  243.040680] Rebooting in 1 seconds..

2

[  242.849540] [<ffffff80080863a4>] __switch_to+0x9c/0xc0
[  242.854746] [<ffffff8008ef41fc>] __schedule+0x22c/0x570
[  242.860031] [<ffffff8008ef4580>] schedule+0x40/0xa8
[  242.864965] [<ffffff8008ef7b00>] schedule_timeout+0x328/0x428
[  242.870764] [<ffffff8008ef3f70>] io_schedule_timeout+0x90/0xf0
[  242.876647] [<ffffff8008ef5020>] bit_wait_io+0x20/0x60
[  242.881835] [<ffffff8008ef4b50>] __wait_on_bit+0x78/0xe8
[  242.887197] [<ffffff8008ef4cf4>] out_of_line_wait_on_bit+0x84/0xb0
[  242.893435] [<ffffff800829dd94>] __wait_on_buffer+0x3c/0x48
[  242.899061] [<ffffff80083169f8>] ext4_find_entry+0x310/0x5d0
[  242.904774] [<ffffff8008316d44>] ext4_lookup+0x8c/0x220
[  242.910052] [<ffffff800826b24c>] lookup_slow+0xa4/0x170
[  242.915331] [<ffffff800826c4cc>] lookup_one_len_unlocked+0x11c/0x148
[  242.921743] [<ffffff8008385edc>] ovl_lookup+0xa4/0x500
[  242.926935] [<ffffff800826efe0>] lookup_open+0x208/0x540
[  242.932295] [<ffffff80082706cc>] do_last+0x614/0x770
[  242.937308] [<ffffff80082708c8>] path_openat+0xa0/0x308
[  242.942584] [<ffffff8008271d28>] do_filp_open+0x88/0x108
[  242.947951] [<ffffff800825cda0>] do_sys_open+0x180/0x270
[  242.953314] [<ffffff800825cf14>] SyS_openat+0x3c/0x50
[  242.958416] [<ffffff8008083900>] el0_svc_naked+0x34/0x38
[  242.963769] Kernel panic - not syncing: hung_task: blocked tasks
[  242.969797] CPU: 5 PID: 47 Comm: khungtaskd Not tainted 4.9.140-l4t-r32.4 #1
[  242.976854] Hardware name: quill (DT)
[  242.980526] Call trace:
[  242.982999] [<ffffff800808c678>] dump_backtrace+0x0/0x1a8
[  242.988415] [<ffffff800808c844>] show_stack+0x24/0x30
[  242.993488] [<ffffff800841e7a8>] dump_stack+0x94/0xbc
[  242.998558] [<ffffff80081c3db0>] panic+0x128/0x28c
[  243.003368] [<ffffff8008178190>] watchdog+0x2f8/0x398
[  243.008437] [<ffffff80080dcd60>] kthread+0x100/0x108
[  243.013418] [<ffffff80080838a0>] ret_from_fork+0x10/0x30
[  243.018751] SMP: stopping secondary CPUs
[  243.022726] Kernel Offset: disabled
[  243.026229] Memory Limit: none

3

[  242.845592] [<ffffff80080863a4>] __switch_to+0x9c/0xc0
[  242.850806] [<ffffff8008ef41fc>] __schedule+0x22c/0x570
[  242.856092] [<ffffff8008ef4580>] schedule+0x40/0xa8
[  242.861029] [<ffffff8008ef7b00>] schedule_timeout+0x328/0x428
[  242.866831] [<ffffff8008ef3f70>] io_schedule_timeout+0x90/0xf0
[  242.872714] [<ffffff8008ef5020>] bit_wait_io+0x20/0x60
[  242.877904] [<ffffff8008ef4b50>] __wait_on_bit+0x78/0xe8
[  242.883267] [<ffffff8008ef4cf4>] out_of_line_wait_on_bit+0x84/0xb0
[  242.889504] [<ffffff800829dd94>] __wait_on_buffer+0x3c/0x48
[  242.895132] [<ffffff80083169f8>] ext4_find_entry+0x310/0x5d0
[  242.900842] [<ffffff8008316d44>] ext4_lookup+0x8c/0x220
[  242.906124] [<ffffff800826b24c>] lookup_slow+0xa4/0x170
[  242.911403] [<ffffff800826c4cc>] lookup_one_len_unlocked+0x11c/0x148
[  242.917813] [<ffffff8008385edc>] ovl_lookup+0xa4/0x500
[  242.923007] [<ffffff800826efe0>] lookup_open+0x208/0x540
[  242.928367] [<ffffff80082706cc>] do_last+0x614/0x770
[  242.933381] [<ffffff80082708c8>] path_openat+0xa0/0x308
[  242.938657] [<ffffff8008271d28>] do_filp_open+0x88/0x108
[  242.944024] [<ffffff800825cda0>] do_sys_open+0x180/0x270
[  242.949385] [<ffffff800825cf14>] SyS_openat+0x3c/0x50
[  242.954492] [<ffffff8008083900>] el0_svc_naked+0x34/0x38
[  242.959852] Kernel panic - not syncing: hung_task: blocked tasks
[  242.965880] CPU: 5 PID: 47 Comm: khungtaskd Not tainted 4.9.140-l4t-r32.4 #1
[  242.972936] Hardware name: quill (DT)
[  242.976610] Call trace:
[  242.979080] [<ffffff800808c678>] dump_backtrace+0x0/0x1a8
[  242.984496] [<ffffff800808c844>] show_stack+0x24/0x30
[  242.989569] [<ffffff800841e7a8>] dump_stack+0x94/0xbc
[  242.994640] [<ffffff80081c3db0>] panic+0x128/0x28c
[  242.999451] [<ffffff8008178190>] watchdog+0x2f8/0x398
[  243.004520] [<ffffff80080dcd60>] kthread+0x100/0x108
[  243.009500] [<ffffff80080838a0>] ret_from_fork+0x10/0x30
[  243.014831] SMP: stopping secondary CPUs
[  243.018807] Kernel Offset: disabled
[  243.022310] Memory Limit: none

Disabling panic on hung task just leaves the system locked. As mentioned, after rebooting the system makes it continue fine for another 50-80 boot loops.

Do you have any suggestions for some kernel fixes I could try? Thank you

Got about 6 boards running the same boot test and am seeing the same sporadic failures. Looks rather like a BSP related race-condition more than filesystem corruption, because rebooting the board once more works without problems.

I’ve backported all mmc and ext4 patches from L4T 32.5.1 to L4T 32.4.4:

26d47e3d1d6a ext4: avoid unnecessary stalls in ext4_evict_inode()
d1be3afd97e1 [4.4,029/241] ext4: zero out the unused memory region in the extent tree block
c173c3841633 ext4: avoid kernel warning when writing the superblock to a dead device
eb24a3b6bb4f ext4: fix a potential fiemap/page fault deadlock w/ inline_data
d1cfd9cba0cb ext4: make sure enough credits are reserved for dioread_nolock writes
40a19e58f205 ext4: force inode writes when nfsd calls commit_metadata()
2eb64177787d ext4: include terminating u32 in size of xattr entries when expanding inodes yyy
c72f56c5ae2a ext4: fix EXT4_IOC_GROUP_ADD ioctl
eaec103eb0c3 ext4: missing unlock/put_page() in ext4_try_to_write_inline_data()
836bf269c56f ext4: fix possible use after free in ext4_quota_enable
c6a659b054f5 mmc: core: Bypass voltage calls for fixed regulator
9875882890c1 mmc: core: Skip CMD11 for always-on slot regulator
780ae00e3fe8 mmc: cmdq: handle all pending cq interrupts
686ef4545a84 mmc: sdhci-iproc: handle mmc_of_parse() errors during probe
20f33f37b0b0 mmc: atmel-mci: do not assume idle after atmci_request_end

and this is how the stack looks like now:

[  242.822423] INFO: task systemd:1 blocked for more than 120 seconds.
[  242.828799]       Not tainted 4.9.140-l4t-r32.4 #1
[  242.833651] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.841531] systemd         D    0     1      0 0x00000000
[  242.847109] Call trace:
[  242.849605] [<ffffff80080863a4>] __switch_to+0x9c/0xc0
[  242.854810] [<ffffff8008ef4664>] __schedule+0x22c/0x570
[  242.860092] [<ffffff8008ef49e8>] schedule+0x40/0xa8
[  242.865020] [<ffffff8008ef7f68>] schedule_timeout+0x328/0x428
[  242.870822] [<ffffff8008ef43d8>] io_schedule_timeout+0x90/0xf0
[  242.876701] [<ffffff8008ef5488>] bit_wait_io+0x20/0x60
[  242.881889] [<ffffff8008ef4fb8>] __wait_on_bit+0x78/0xe8
[  242.887249] [<ffffff8008ef515c>] out_of_line_wait_on_bit+0x84/0xb0
[  242.893484] [<ffffff800829dd94>] __wait_on_buffer+0x3c/0x48
[  242.899113] [<ffffff8008309bb0>] ext4_bread+0xe8/0xf8
[  242.904217] [<ffffff8008314d50>] __ext4_read_dirblock+0x48/0x318
[  242.910270] [<ffffff8008315518>] htree_dirblock_to_tree+0x80/0x1f8
[  242.916498] [<ffffff80083163e8>] ext4_htree_fill_tree+0xb8/0x2c0
[  242.922554] [<ffffff800830161c>] ext4_readdir+0x5ac/0x770
[  242.928006] [<ffffff8008275278>] iterate_dir+0x80/0x1b8
[  242.933282] [<ffffff800838a094>] ovl_iterate+0xcc/0x1f8
[  242.938554] [<ffffff8008275364>] iterate_dir+0x16c/0x1b8
[  242.943915] [<ffffff8008276170>] SyS_getdents64+0xa0/0x200
[  242.949449] [<ffffff8008083900>] el0_svc_naked+0x34/0x38
[  242.954861] INFO: task jbd2/mmcblk0p25:2525 blocked for more than 120 seconds.
[  242.962125]       Not tainted 4.9.140-l4t-r32.4 #1
[  242.966957] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.974820] jbd2/mmcblk0p25 D    0  2525      2 0x00000000
[  242.980370] Call trace:
[  242.982867] [<ffffff80080863a4>] __switch_to+0x9c/0xc0
[  242.988059] [<ffffff8008ef4664>] __schedule+0x22c/0x570
[  242.993336] [<ffffff8008ef49e8>] schedule+0x40/0xa8
[  242.998263] [<ffffff8008ef7f68>] schedule_timeout+0x328/0x428
[  243.004061] [<ffffff8008ef43d8>] io_schedule_timeout+0x90/0xf0
[  243.009941] [<ffffff8008ef5488>] bit_wait_io+0x20/0x60
[  243.015126] [<ffffff8008ef4fb8>] __wait_on_bit+0x78/0xe8
[  243.020485] [<ffffff8008ef515c>] out_of_line_wait_on_bit+0x84/0xb0
[  243.026712] [<ffffff800829dd94>] __wait_on_buffer+0x3c/0x48
[  243.032334] [<ffffff80083648b8>] jbd2_write_superblock+0x1a0/0x240
[  243.038563] [<ffffff80083653d4>] jbd2_journal_update_sb_log_tail+0x5c/0xd0
[  243.045491] [<ffffff800835d218>] jbd2_journal_commit_transaction+0x7c0/0x16f0
[  243.052755] [<ffffff8008363744>] kjournald2+0xbc/0x258
[  243.057933] [<ffffff80080dcd60>] kthread+0x100/0x108
[  243.062916] [<ffffff80080838a0>] ret_from_fork+0x10/0x30
[  243.068256] INFO: task jbd2/mmcblk0p27:2540 blocked for more than 120 seconds.
[  243.075479]       Not tainted 4.9.140-l4t-r32.4 #1
[  243.080272] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.088098] jbd2/mmcblk0p27 D    0  2540      2 0x00000000
[  243.093596] Call trace:
[  243.096048] [<ffffff80080863a4>] __switch_to+0x9c/0xc0
[  243.101190] [<ffffff8008ef4664>] __schedule+0x22c/0x570
[  243.106417] [<ffffff8008ef49e8>] schedule+0x40/0xa8
[  243.111297] [<ffffff8008ef7f68>] schedule_timeout+0x328/0x428
[  243.117043] [<ffffff8008ef43d8>] io_schedule_timeout+0x90/0xf0
[  243.122876] [<ffffff8008ef5488>] bit_wait_io+0x20/0x60
[  243.128014] [<ffffff8008ef4fb8>] __wait_on_bit+0x78/0xe8
[  243.133326] [<ffffff8008ef515c>] out_of_line_wait_on_bit+0x84/0xb0
[  243.139506] [<ffffff800829dd94>] __wait_on_buffer+0x3c/0x48
[  243.145079] [<ffffff80083648b8>] jbd2_write_superblock+0x1a0/0x240
[  243.151258] [<ffffff80083653d4>] jbd2_journal_update_sb_log_tail+0x5c/0xd0
[  243.158135] [<ffffff800835d218>] jbd2_journal_commit_transaction+0x7c0/0x16f0
[  243.165268] [<ffffff8008363744>] kjournald2+0xbc/0x258
[  243.170407] [<ffffff80080dcd60>] kthread+0x100/0x108
[  243.175373] [<ffffff80080838a0>] ret_from_fork+0x10/0x30
[  243.180686] INFO: task jbd2/mmcblk0p28:2566 blocked for more than 120 seconds.
[  243.187905]       Not tainted 4.9.140-l4t-r32.4 #1
[  243.192695] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.200519] jbd2/mmcblk0p28 D    0  2566      2 0x00000000
[  243.206016] Call trace:
[  243.208469] [<ffffff80080863a4>] __switch_to+0x9c/0xc0
[  243.213611] [<ffffff8008ef4664>] __schedule+0x22c/0x570
[  243.218838] [<ffffff8008ef49e8>] schedule+0x40/0xa8
[  243.223717] [<ffffff8008ef7f68>] schedule_timeout+0x328/0x428
[  243.229462] [<ffffff8008ef43d8>] io_schedule_timeout+0x90/0xf0
[  243.235295] [<ffffff8008ef5488>] bit_wait_io+0x20/0x60
[  243.240434] [<ffffff8008ef4fb8>] __wait_on_bit+0x78/0xe8
[  243.245745] [<ffffff8008ef515c>] out_of_line_wait_on_bit+0x84/0xb0
[  243.251925] [<ffffff800829dd94>] __wait_on_buffer+0x3c/0x48
[  243.257497] [<ffffff80083648b8>] jbd2_write_superblock+0x1a0/0x240
[  243.263676] [<ffffff80083653d4>] jbd2_journal_update_sb_log_tail+0x5c/0xd0
[  243.270551] [<ffffff800835d218>] jbd2_journal_commit_transaction+0x7c0/0x16f0
[  243.277685] [<ffffff8008363744>] kjournald2+0xbc/0x258
[  243.282826] [<ffffff80080dcd60>] kthread+0x100/0x108
[  243.287792] [<ffffff80080838a0>] ret_from_fork+0x10/0x30
[  243.293106] INFO: task mount:2603 blocked for more than 120 seconds.
[  243.299458]       Not tainted 4.9.140-l4t-r32.4 #1
[  243.304248] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.312071] mount           D    0  2603   2583 0x00000000
[  243.317570] Call trace:
[  243.320022] [<ffffff80080863a4>] __switch_to+0x9c/0xc0
[  243.325163] [<ffffff8008ef4664>] __schedule+0x22c/0x570
[  243.330389] [<ffffff8008ef49e8>] schedule+0x40/0xa8
[  243.335271] [<ffffff8008ef7f68>] schedule_timeout+0x328/0x428
[  243.341017] [<ffffff8008ef43d8>] io_schedule_timeout+0x90/0xf0
[  243.346849] [<ffffff8008ef5488>] bit_wait_io+0x20/0x60
[  243.351988] [<ffffff8008ef4fb8>] __wait_on_bit+0x78/0xe8
[  243.357323] [<ffffff8008ef515c>] out_of_line_wait_on_bit+0x84/0xb0
[  243.363516] [<ffffff800829dd94>] __wait_on_buffer+0x3c/0x48
[  243.369094] [<ffffff80082a1b88>] __bread_gfp+0xf0/0x100
[  243.374544] [<ffffff800836fdc8>] fat__get_entry+0x158/0x230
[  243.380533] [<ffffff800836ff10>] fat_get_short_entry+0x70/0xd8
[  243.386588] [<ffffff8008373cc4>] fat_subdirs+0x64/0xa0
[  243.391923] [<ffffff800837884c>] fat_fill_super+0x654/0xbf8
[  243.397688] [<ffffff800837a710>] vfat_fill_super+0x40/0x50
[  243.403324] [<ffffff8008262e50>] mount_bdev+0x1c8/0x1f8
[  243.408746] [<ffffff800837a6bc>] vfat_mount+0x44/0x58
[  243.413915] [<ffffff8008263e24>] mount_fs+0x174/0x1c8
[  243.419084] [<ffffff8008285560>] vfs_kern_mount.part.0+0x70/0x158
[  243.425279] [<ffffff80082882f4>] do_mount+0x1dc/0xc48
[  243.430444] [<ffffff8008289090>] SyS_mount+0x98/0x100
[  243.435605] [<ffffff8008083900>] el0_svc_naked+0x34/0x38
[  243.441019] INFO: task (dtime.sh):2607 blocked for more than 120 seconds.
[  243.447902]       Not tainted 4.9.140-l4t-r32.4 #1
[  243.452778] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.460697] (dtime.sh)      D    0  2607      1 0x00000000
[  243.466400] Call trace:
[  243.468899] [<ffffff80080863a4>] __switch_to+0x9c/0xc0
[  243.474164] [<ffffff8008ef4664>] __schedule+0x22c/0x570
[  243.479498] [<ffffff8008ef49e8>] schedule+0x40/0xa8
[  243.484490] [<ffffff8008ef7f68>] schedule_timeout+0x328/0x428
[  243.490406] [<ffffff8008ef43d8>] io_schedule_timeout+0x90/0xf0
[  243.496350] [<ffffff8008ef5488>] bit_wait_io+0x20/0x60
[  243.501611] [<ffffff8008ef4fb8>] __wait_on_bit+0x78/0xe8
[  243.507108] [<ffffff8008ef515c>] out_of_line_wait_on_bit+0x84/0xb0
[  243.513396] [<ffffff800829dd94>] __wait_on_buffer+0x3c/0x48
[  243.519075] [<ffffff8008309bb0>] ext4_bread+0xe8/0xf8
[  243.524225] [<ffffff8008314d50>] __ext4_read_dirblock+0x48/0x318
[  243.530323] [<ffffff8008316c00>] ext4_find_entry+0x4f0/0x5d0
[  243.536144] [<ffffff8008316d6c>] ext4_lookup+0x8c/0x220
[  243.541480] [<ffffff800826b24c>] lookup_slow+0xa4/0x170
[  243.546806] [<ffffff800826c4cc>] lookup_one_len_unlocked+0x11c/0x148
[  243.553279] [<ffffff800838623c>] ovl_lookup+0xa4/0x500
[  243.558527] [<ffffff800826efe0>] lookup_open+0x208/0x540
[  243.563945] [<ffffff80082706cc>] do_last+0x614/0x770
[  243.569008] [<ffffff80082708c8>] path_openat+0xa0/0x308
[  243.574328] [<ffffff8008271d28>] do_filp_open+0x88/0x108
[  243.579778] [<ffffff8008265ed0>] do_open_execat+0x88/0x210
[  243.585362] [<ffffff8008268084>] do_execveat_common.isra.0+0x1f4/0x670
[  243.591981] [<ffffff80082687b4>] SyS_execve+0x4c/0x60
[  243.597141] [<ffffff8008083900>] el0_svc_naked+0x34/0x38
[  363.655547] INFO: task systemd:1 blocked for more than 120 seconds.
[  363.662063]       Not tainted 4.9.140-l4t-r32.4 #1
[  363.667082] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  363.675010] systemd         D    0     1      0 0x00000000
[  363.680630] Call trace:
[  363.683195] [<ffffff80080863a4>] __switch_to+0x9c/0xc0
[  363.688443] [<ffffff8008ef4664>] __schedule+0x22c/0x570
[  363.693767] [<ffffff8008ef49e8>] schedule+0x40/0xa8
[  363.698755] [<ffffff8008ef7f68>] schedule_timeout+0x328/0x428
[  363.704600] [<ffffff8008ef43d8>] io_schedule_timeout+0x90/0xf0
[  363.710528] [<ffffff8008ef5488>] bit_wait_io+0x20/0x60
[  363.715761] [<ffffff8008ef4fb8>] __wait_on_bit+0x78/0xe8
[  363.721205] [<ffffff8008ef515c>] out_of_line_wait_on_bit+0x84/0xb0
[  363.727496] [<ffffff800829dd94>] __wait_on_buffer+0x3c/0x48
[  363.733172] [<ffffff8008309bb0>] ext4_bread+0xe8/0xf8
[  363.738322] [<ffffff8008314d50>] __ext4_read_dirblock+0x48/0x318
[  363.744463] [<ffffff8008315518>] htree_dirblock_to_tree+0x80/0x1f8
[  363.750739] [<ffffff80083163e8>] ext4_htree_fill_tree+0xb8/0x2c0
[  363.756844] [<ffffff800830161c>] ext4_readdir+0x5ac/0x770
[  363.762378] [<ffffff8008275278>] iterate_dir+0x80/0x1b8
[  363.767701] [<ffffff800838a094>] ovl_iterate+0xcc/0x1f8
[  363.773021] [<ffffff8008275364>] iterate_dir+0x16c/0x1b8
[  363.778432] [<ffffff8008276170>] SyS_getdents64+0xa0/0x200
[  363.784048] [<ffffff8008083900>] el0_svc_naked+0x34/0x38
[  363.789722] INFO: task jbd2/mmcblk0p25:2525 blocked for more than 120 seconds.
[  363.797041]       Not tainted 4.9.140-l4t-r32.4 #1
[  363.801913] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  363.809829] jbd2/mmcblk0p25 D    0  2525      2 0x00000000
[  363.815500] Call trace:
[  363.817997] [<ffffff80080863a4>] __switch_to+0x9c/0xc0
[  363.823253] [<ffffff8008ef4664>] __schedule+0x22c/0x570
[  363.828577] [<ffffff8008ef49e8>] schedule+0x40/0xa8
[  363.833550] [<ffffff8008ef7f68>] schedule_timeout+0x328/0x428
[  363.839394] [<ffffff8008ef43d8>] io_schedule_timeout+0x90/0xf0
[  363.845319] [<ffffff8008ef5488>] bit_wait_io+0x20/0x60
[  363.850571] [<ffffff8008ef4fb8>] __wait_on_bit+0x78/0xe8
[  363.855984] [<ffffff8008ef515c>] out_of_line_wait_on_bit+0x84/0xb0
[  363.862266] [<ffffff800829dd94>] __wait_on_buffer+0x3c/0x48
[  363.867939] [<ffffff80083648b8>] jbd2_write_superblock+0x1a0/0x240
[  363.874217] [<ffffff80083653d4>] jbd2_journal_update_sb_log_tail+0x5c/0xd0
[  363.881209] [<ffffff800835d218>] jbd2_journal_commit_transaction+0x7c0/0x16f0
[  363.888443] [<ffffff8008363744>] kjournald2+0xbc/0x258
[  363.893682] [<ffffff80080dcd60>] kthread+0x100/0x108
[  363.898747] [<ffffff80080838a0>] ret_from_fork+0x10/0x30
[  363.904149] INFO: task jbd2/mmcblk0p27:2540 blocked for more than 120 seconds.
[  363.911474]       Not tainted 4.9.140-l4t-r32.4 #1
[  363.916346] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  363.924260] jbd2/mmcblk0p27 D    0  2540      2 0x00000000
[  363.929861] Call trace:
[  363.932405] [<ffffff80080863a4>] __switch_to+0x9c/0xc0
[  363.937644] [<ffffff8008ef4664>] __schedule+0x22c/0x570
[  363.942971] [<ffffff8008ef49e8>] schedule+0x40/0xa8
[  363.947945] [<ffffff8008ef7f68>] schedule_timeout+0x328/0x428
[  363.953785] [<ffffff8008ef43d8>] io_schedule_timeout+0x90/0xf0
[  363.959642] [<ffffff8008ef5488>] bit_wait_io+0x20/0x60
[  363.964787] [<ffffff8008ef4fb8>] __wait_on_bit+0x78/0xe8
[  363.970102] [<ffffff8008ef515c>] out_of_line_wait_on_bit+0x84/0xb0
[  363.976285] [<ffffff800829dd94>] __wait_on_buffer+0x3c/0x48
[  363.981862] [<ffffff80083648b8>] jbd2_write_superblock+0x1a0/0x240
[  363.988045] [<ffffff80083653d4>] jbd2_journal_update_sb_log_tail+0x5c/0xd0
[  363.994923] [<ffffff800835d218>] jbd2_journal_commit_transaction+0x7c0/0x16f0
[  364.002059] [<ffffff8008363744>] kjournald2+0xbc/0x258
[  364.007206] [<ffffff80080dcd60>] kthread+0x100/0x108
[  364.012177] [<ffffff80080838a0>] ret_from_fork+0x10/0x30
[  364.017495] INFO: task jbd2/mmcblk0p28:2566 blocked for more than 120 seconds.
[  364.024717]       Not tainted 4.9.140-l4t-r32.4 #1
[  364.029510] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  364.037340] jbd2/mmcblk0p28 D    0  2566      2 0x00000000
[  364.042855] Call trace:
[  364.045300] [<ffffff80080863a4>] __switch_to+0x9c/0xc0
[  364.050444] [<ffffff8008ef4664>] __schedule+0x22c/0x570
[  364.055674] [<ffffff8008ef49e8>] schedule+0x40/0xa8
[  364.060557] [<ffffff8008ef7f68>] schedule_timeout+0x328/0x428
[  364.066308] [<ffffff8008ef43d8>] io_schedule_timeout+0x90/0xf0
[  364.072144] [<ffffff8008ef5488>] bit_wait_io+0x20/0x60
[  364.077287] [<ffffff8008ef4fb8>] __wait_on_bit+0x78/0xe8
[  364.082604] [<ffffff8008ef515c>] out_of_line_wait_on_bit+0x84/0xb0
[  364.088786] [<ffffff800829dd94>] __wait_on_buffer+0x3c/0x48
[  364.094363] [<ffffff80083648b8>] jbd2_write_superblock+0x1a0/0x240
[  364.100545] [<ffffff80083653d4>] jbd2_journal_update_sb_log_tail+0x5c/0xd0
[  364.107422] [<ffffff800835d218>] jbd2_journal_commit_transaction+0x7c0/0x16f0
[  364.114558] [<ffffff8008363744>] kjournald2+0xbc/0x258
[  364.119700] [<ffffff80080dcd60>] kthread+0x100/0x108
[  364.124669] [<ffffff80080838a0>] ret_from_fork+0x10/0x30

Any suggestion for other patches I could check? Thanks

Update: I’m seeing this on another board with L4T 32.5.1 too.

You should tell more about the board. For example, is it a custom board or devkit? How many modules have this issue?

I don’t think patching the kernel is the right direction for your issue. Rel-32.4.4 has been released for like 1 year and we didn’t hear any other users talking about this before.

Hi @WayneWWW , I’m seeing this with 2 Devkits, one CTI Orbitty and one Auvidea J120. All 4 modules are using the same devkit image and same devkit device-tree.

Can you share the sticker of those tx2 modules?

Sure @WayneWWW , these are the stickers:


Are there modules TX2 D02? D01? C03? or something else?

Are you referring to the sticker near the antennas @WayneWWW ? If so they are: D00 H, B02 F, D01 B, B02 C.

Can you help move to jetpack4.6.1 (32.6.1) and check if all these modules can reproduce this issue?

Also, please share the steps to reproduce this issue.

I’m using BalenaOS images, which are based on OE4T meta-tegra yocto layer. Version 2.82.11+rev8 is based on L4T 32.5.1, and 2.73.1+rev1 on 32.4.4. Regardless of OS release or distribution, be it a BalenaOS build, a OE4T one or if you will be reproducing with Ubuntu it’s necessary to:

  1. Ensure the watchdog is disabled to avoid reboots. In BalenaOS systemd is used for petting the watchdog, in this case I remove watchdog.conf.
  2. CONFIG_BOOTPARAM_HUNG_TASK_PANIC=n must be set in the kernel defconfig.
  3. After the system boots to userspace and login is prompted, trigger reboot from cmdline. This must be done repeatedly until the issue occurs.

The above steps are necessary to avoid the watchdog or hung task panic rebooting the system, and to be able to observe the logs:

[  OK  ] Started Apply Kernel Variables.
[  243.272684] INFO: task systemd:1 blocked for more than 120 seconds.
[  243.279039]       Not tainted 4.9.140-l4t-r32.4 #1
[  243.283885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.291760] systemd         D    0     1      0 0x00000000
[  243.297331] Call trace:
[  243.299814] [<ffffff80080863a4>] __switch_to+0x9c/0xc0
[  243.305008] [<ffffff8008ef4664>] __schedule+0x22c/0x570
[  243.310287] [<ffffff8008ef49e8>] schedule+0x40/0xa8
[  243.315217] [<ffffff8008ef7f68>] schedule_timeout+0x328/0x428
[  243.321012] [<ffffff8008ef43d8>] io_schedule_timeout+0x90/0xf0
[  243.326889] [<ffffff8008ef5488>] bit_wait_io+0x20/0x60
[  243.332072] [<ffffff8008ef4fb8>] __wait_on_bit+0x78/0xe8
[  243.337424] [<ffffff8008ef515c>] out_of_line_wait_on_bit+0x84/0xb0
[  243.343654] [<ffffff800829dd94>] __wait_on_buffer+0x3c/0x48
[  243.349273] [<ffffff8008316a20>] ext4_find_entry+0x310/0x5d0
[  243.355049] [<ffffff8008316d6c>] ext4_lookup+0x8c/0x220
[  243.360327] [<ffffff800826b24c>] lookup_slow+0xa4/0x170
[  243.365591] [<ffffff800826c4cc>] lookup_one_len_unlocked+0x11c/0x148
[  243.371998] [<ffffff800838623c>] ovl_lookup+0xa4/0x500
[  243.377160] [<ffffff800826efe0>] lookup_open+0x208/0x540
[  243.382484] [<ffffff80082706cc>] do_last+0x614/0x770
[  243.387461] [<ffffff80082708c8>] path_openat+0xa0/0x308
[  243.392698] [<ffffff8008271d28>] do_filp_open+0x88/0x108
[  243.398025] [<ffffff800825cda0>] do_sys_open+0x180/0x270
[  243.403348] [<ffffff800825cf14>] SyS_openat+0x3c/0x50
[  243.408413] [<ffffff8008083900>] el0_svc_naked+0x34/0x38
[  364.245892] INFO: task jbd2/mmcblk0p25:2715 blocked for more than 120 seconds.
[  364.253153]       Not tainted 4.9.140-l4t-r32.4 #1
[  364.257980] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  364.265840] jbd2/mmcblk0p25 D    0  2715      2 0x00000000
[  364.271385] Call trace:
[  364.273879] [<ffffff80080863a4>] __switch_to+0x9c/0xc0
[  364.279069] [<ffffff8008ef4664>] __schedule+0x22c/0x570
[  364.284342] [<ffffff8008ef49e8>] schedule+0x40/0xa8
[  364.289265] [<ffffff8008ef7f68>] schedule_timeout+0x328/0x428
[  364.295058] [<ffffff8008ef43d8>] io_schedule_timeout+0x90/0xf0
[  364.300935] [<ffffff8008ef5488>] bit_wait_io+0x20/0x60
[  364.306118] [<ffffff8008ef4fb8>] __wait_on_bit+0x78/0xe8
[  364.311474] [<ffffff8008ef515c>] out_of_line_wait_on_bit+0x84/0xb0
[  364.317694] [<ffffff800829dd94>] __wait_on_buffer+0x3c/0x48
[  364.323310] [<ffffff80083648b8>] jbd2_write_superblock+0x1a0/0x240
[  364.329536] [<ffffff80083653d4>] jbd2_journal_update_sb_log_tail+0x5c/0xd0
[  364.336461] [<ffffff800835d218>] jbd2_journal_commit_transaction+0x7c0/0x16f0
[  364.343639] [<ffffff8008363744>] kjournald2+0xbc/0x258
[  364.348824] [<ffffff80080dcd60>] kthread+0x100/0x108
[  364.353835] [<ffffff80080838a0>] ret_from_fork+0x10/0x30
[  364.359190] INFO: task jbd2/mmcblk0p27:2730 blocked for more than 120 seconds.
[  364.366461]       Not tainted 4.9.140-l4t-r32.4 #1
[  364.371359] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  364.379201] jbd2/mmcblk0p27 D    0  2730      2 0x00000000
[  364.384732] Call trace:
[  364.387175] [<ffffff80080863a4>] __switch_to+0x9c/0xc0
[  364.392314] [<ffffff8008ef4664>] __schedule+0x22c/0x570
[  364.397538] [<ffffff8008ef49e8>] schedule+0x40/0xa8
[  364.402414] [<ffffff8008ef7f68>] schedule_timeout+0x328/0x428
[  364.408158] [<ffffff8008ef43d8>] io_schedule_timeout+0x90/0xf0
[  364.413986] [<ffffff8008ef5488>] bit_wait_io+0x20/0x60
[  364.419121] [<ffffff8008ef4fb8>] __wait_on_bit+0x78/0xe8
[  364.424428] [<ffffff8008ef515c>] out_of_line_wait_on_bit+0x84/0xb0
[  364.430603] [<ffffff800829dd94>] __wait_on_buffer+0x3c/0x48
[  364.436171] [<ffffff80083648b8>] jbd2_write_superblock+0x1a0/0x240
[  364.442346] [<ffffff80083653d4>] jbd2_journal_update_sb_log_tail+0x5c/0xd0
[  364.449216] [<ffffff800835d218>] jbd2_journal_commit_transaction+0x7c0/0x16f0
[  364.456345] [<ffffff8008363744>] kjournald2+0xbc/0x258
[  364.461482] [<ffffff80080dcd60>] kthread+0x100/0x108
[  364.466445] [<ffffff80080838a0>] ret_from_fork+0x10/0x30
[  364.471753] INFO: task jbd2/mmcblk0p28:2756 blocked for more than 120 seconds.
[  364.478967]       Not tainted 4.9.140-l4t-r32.4 #1
[  364.483755] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  364.491576] jbd2/mmcblk0p28 D    0  2756      2 0x00000000
[  364.497075] Call trace:
[  364.499519] [<ffffff80080863a4>] __switch_to+0x9c/0xc0
[  364.504656] [<ffffff8008ef4664>] __schedule+0x22c/0x570
[  364.509878] [<ffffff8008ef49e8>] schedule+0x40/0xa8
[  364.514753] [<ffffff8008ef7f68>] schedule_timeout+0x328/0x428
[  364.520496] [<ffffff8008ef43d8>] io_schedule_timeout+0x90/0xf0
[  364.526323] [<ffffff8008ef5488>] bit_wait_io+0x20/0x60
[  364.531457] [<ffffff8008ef4fb8>] __wait_on_bit+0x78/0xe8
[  364.536766] [<ffffff8008ef515c>] out_of_line_wait_on_bit+0x84/0xb0
[  364.542942] [<ffffff800829dd94>] __wait_on_buffer+0x3c/0x48
[  364.548512] [<ffffff80083648b8>] jbd2_write_superblock+0x1a0/0x240
[  364.554687] [<ffffff80083653d4>] jbd2_journal_update_sb_log_tail+0x5c/0xd0
[  364.561557] [<ffffff800835d218>] jbd2_journal_commit_transaction+0x7c0/0x16f0
[  364.568686] [<ffffff8008363744>] kjournald2+0xbc/0x258
[  364.573820] [<ffffff80080dcd60>] kthread+0x100/0x108
[  364.578782] [<ffffff80080838a0>] ret_from_fork+0x10/0x30

We cannot guarantee this BalenaOS.

Please try to use pure jetpack release to reproduce this issue. Otherwise, we cannot help.

@AlexCo – Do you only see this behavior in balenaOS, or have you experienced the same thing on an Ubuntu or pure Yocto distro? I am curious if this is specific to us (I work at balena), or can be replicated on other OS’es as well, thus indicating non-balena specific.