/usr/sbin/acpid and /usr/sbin/sshd are taking up 100% CPU load

I have a Jetson TX2 tomcat8 server running an AI application in some region in France.
I can access this APP via a browser, but every 2 or 3 days it will become too slow and almost unresponsive.

I can access the Jetson via ssh and after killing all the APP-related processes I end up with the following:

All the CPU load in the computer is consumed by processes I’m unfamiliar with. /usr/sbin/scpid and /usr/sbin/sshd are the most constant ones.

What is even more strange is that acpid seems to not even be installed in the Jetson:

$ acpid --help
-bash: acpid: command not found

And when inspecting manually the /usr/sbin folder I can confirm the file is non-existent.

Has someone experienced something similar or has an idea on how to debug/solve this issue?

Thanks!

I am far from being able to say anything particularly useful, but some thoughts come to mind. First, acpid is an event monitoring (ACPI events…power configuration), and not needed. This is not necessarily an executable as it is a kernel feature (there could be user space software associated with it). Basically the daemon (acpid) reads events from the acpi kernel feature. To find out if acpi is available (a kernel feature, not an executable program), what do you see from this:
zcat /proc/config.gz | grep 'ACPI'

I have no idea if this will show anything, but I am also curious what you see from:
sudo systemctl list-units | grep -i 'acpi'

FYI, if a process blocks in certain ways while trying to reach a file which is non-existent, then it might block with 0% CPU usage, or it might block (differently) with 100% CPU usage.

Other processes can similarly cause 100% CPU usage in some circumstances where they are waiting for a file read (or they could drop to 0% depending on how it is coded). I am thinking perhaps the processes showing 100% usage might just be a side effect of the real problem, and not the cause.

Are you able to run command “df -H -T” when things are bad? It would be useful to know how much disk space you have left (a full or near full filesystem could account for some issues).

1 Like

Hi @linuxdev,

I greatly appreciate your answer. I’ve restarted the Jetson yesterday and today it is still working fine but I can already provide you an answer to some of your questions.

$ zcat /proc/config.gz | grep 'ACPI'
# CONFIG_ACPI is not set

$ sudo systemctl list-units | grep -i 'acpi'

I am not sure how to interpret the first one and the second command showed nothing.

I agree with you that this might be a side effect and not the cause. However, I am not sure how to proceed and I need some advice :)

Regarding the disk space. I did check it when things went bad and I did not notice anything strange, but I can take a look again once the APP is behaving slow again. Just for future references I will include the output of df -H -T when things are running fine.

$ df -H -T
Filesystem                               Type        Size  Used Avail Use% Mounted on
/dev/root                                ext4         15G  9.8G  4.2G  71% /
devtmpfs                                 devtmpfs    8.2G     0  8.2G   0% /dev
tmpfs                                    tmpfs       8.3G  246k  8.3G   1% /dev/shm
tmpfs                                    tmpfs       8.3G   23M  8.3G   1% /run
tmpfs                                    tmpfs       5.3M  4.1k  5.3M   1% /run/lock
tmpfs                                    tmpfs       8.3G     0  8.3G   0% /sys/fs/cgroup
/dev/mmcblk0p2                           ext4         11G  3.7G  6.3G  37% /home
/dev/mmcblk0p3                           ext4        4.1G  980M  2.9G  26% /home/eyesea/log
tmpfs                                    tmpfs       824M   66k  824M   1% /run/user/1001
/dev/sda1                                ext4        4.0T  3.3T  474G  88% /home/eyesea/imageSink
tmpfs                                    tmpfs       824M     0  824M   0% /run/user/1000
tmpfs                                    tmpfs       824M     0  824M   0% /run/user/1002 

One last thing. The Jetson TX2 is being powered by a battery which is frequently being recharged, could there be the case that when this battery is low the Jetson locks itself somehow and it manifests as processes taking up 100% CPU load?

In this case you know the kernel itself does not support ACPI. Thus no user space software can succeed if trying to work with ACPI. An ACPI monitoring daemon would fail. I don’t know how the acpid (the daemon) would behave during such fail, but it is conceivable that it would either churn CPU cycles, or else block and go idle. Don’t know.

You have no official “services” with “acpi” in the name, which is how it should be given you have no ACPI support.

You seem to have plenty of disk space, so that is not an issue.

I doubt the battery causes this. The normal behavior when a battery is not behaving well is that the unit shuts down. If you happen to have some sort of special software installed which is related to battery and charging, then this would possibly related. Consider that ACPI is a power management feature, and software designed to work with a battery would quite likely be interested in that power management feature. Has anything been installed to monitor the battery or modify when it charges?