Unable to shutdown the system sometimes

We have developed a custom carrier board for Jetson Orin Nano.

Sometimes we cannot shutdown the system by the command shutdown -h now

Bellowing messages are shown.

[ 67.887599] systemd-shutdown[1]: Waiting for process: pulseaudio
[ 147.899719] systemd-shutdown[1]: Sending SIGKILL to PID 2171 (pulseaudio).
[ 157.907460] systemd-shutdown[1]: Waiting for process: pulseaudio
[ 237.937345] [2274]: Failed to remount ‘/’ read-only: Device or resource busy
[ 237.984076] systemd-shutdown[1]: Failed to finalize file systems, ignoring
[ 243.098712] INFO: task kworker/5:1:48 blocked for more than 120 seconds.
[ 243.105634] Tainted: G OE 5.10.192-tegra #7
[ 243.111753] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
[ 243.120005] INFO: task kworker/4:2:116 blocked for more than 120 seconds.
[ 243.126998] Tainted: G OE 5.10.192-tegra #7
[ 243.133112] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
[ 243.141306] INFO: task pulseaudio:2171 blocked for more than 120 seconds.
[ 243.148305] Tainted: G OE 5.10.192-tegra #7
[ 243.154412] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
[ 363.930707] INFO: task systemd-shutdow:1 blocked for more than 120 seconds.
[ 363.937905] Tainted: G OE 5.10.192-tegra #7
[ 363.943996] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
[ 363.952180] INFO: task kworker/5:1:48 blocked for more than 241 seconds.
[ 363.959084] Tainted: G OE 5.10.192-tegra #7
[ 363.965186] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
[ 363.973363] INFO: task kworker/4:2:116 blocked for more than 241 seconds.
[ 363.980354] Tainted: G OE 5.10.192-tegra #7
[ 363.986442] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
[ 363.994616] INFO: task pulseaudio:2171 blocked for more than 241 seconds.
[ 364.001604] Tainted: G OE 5.10.192-tegra #7
[ 364.007692] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
[ 484.762705] INFO: task systemd-shutdow:1 blocked for more than 241 seconds.
[ 484.769901] Tainted: G OE 5.10.192-tegra #7
[ 484.775999] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
[ 484.784178] INFO: task kworker/5:1:48 blocked for more than 362 seconds.
[ 484.791079] Tainted: G OE 5.10.192-tegra #7
[ 484.797168] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
[ 484.805336] INFO: task kworker/4:2:116 blocked for more than 362 seconds.
[ 484.812324] Tainted: G OE 5.10.192-tegra #7
[ 484.818427] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.

That’s a tough one since you have to know which process it is. If it is hung while the system is running normally there are a lot of ways of finding it, but during shutdown you won’t be able to run any tools. Just an idea, perhaps not useful: If you can identify a hung task prior to shutdown, then you just have to find out why that one task does this, and you might be able to run something like top or htop before shutdown to see one process churning away. It sounds like most of the time it doesn’t block, but that one time you run htop or top prior to shutdown and you notice something might be hung, pay attention to what the process is.

That said, kworker is a kernel thread. This is one of the things which might result from hardware spinning off some task, e.g., if receiving data on a network it might spin off a checksum. It is also suggesting a “tainted” kernel, so something seems modified. Has this kernel been modified? If so, what did you add to it? What hardware is associated with the modifications?

Hi linuxdev,
As you can see the pulseaudio process was hung during the system shutdown.

[ 363.994616] INFO: task pulseaudio:2171 blocked for more than 241 seconds.

And the kernel has not been modified.

I do not know enough about pulseaudio configuration or operation to give a good comment on this. However, this probably does involve a kernel driver, and it might be locking that driver on, blocking other operation (and if it is a hardware driver, then it would lock out CPU0, which controls other hardware). My guess is that there is a user space program which has gone wrong, and killing that would perhaps stop the shutdown stall. However, it could be a kernel driver which pulseaudio triggers and that driver could lock. I’m not sure how to debug which it is without something like a JTAG or DCELL debugger.

If this isn’t always the case, then it is really problematic to spot the cause that one time it does become a problem. A way to find some information on both pulseaudio and parent and group IDs:
ps jax --sort=uid,-ppid,+pid | egrep '(PPID.*PID.*PGID|pulseaudio)'

By itself this is only slightly useful, but there is something you can do to make this more useful (and it is a pain to do this all the time; sorry, if this only happens sometimes, this might never help): The exit the GUI login before shutdown. If you log out of the GUI, but do not shut down, then the pulseaudio should be stopped because the process spawning it is removed (and the child process should go away as well). You can then go to a console (with one of ssh, serial console, or just CTRL-ALT-F2 or CTRL-ALT-F3, so on), and run the command related to pulseaudio with ps.

Here’s the part which is really stretching things and you’d have to get lucky for it to help: Investigate the PPID and PGID (which might be the same) to see what those are. If shutdown then fails due to pulseaudio, then likely the problem is related to the parent process. If you’ve also tracked down the PPID prior to this, then you have a clue as to the origins of the fault. To see information on a specific PID (taken from that PPID), and assuming the PPID is “42” just as an example from the man pages:
ps -q 4684 aux

It isn’t much, but it is a possible place to start. If it turns out that it never fails to shut down when logging out of the GUI, then that is a huge clue (however, since the issue is intermittent, how do you know how many times to do this to “guarantee” logging out of the GUI first stops the issue?).

Knowing about the parent PID might help others slightly as well; that earlier example for PID 42 should show the command line of the parent, and that should probably be posted here.

Someone else may need to answer, but see what info you can dig up with the above first.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.