Jetson TX2: kworker CPU usage at 100%

tommaso.maioli · September 24, 2018, 11:15am

Hello everybody,

I’ve been struggling with this problem for a while now, and i couldn’t find a working solution.

The kworker process constantly uses 100% of a CPU and blocks everything, I can’t even shut the NVidia down.

I know it’s a known problem in the community, but all the solutions I found (this, for example [url]12.04 - Why does kworker cpu usage get so high? - Ask Ubuntu) seem to pass through the ACPI interrupts manager, which is (apparently) not present on my board.

In my current setting, I have connected to the board a CAN bus and a LiDAR talking through ethernet, which are both using interrupts (as long as I know).

Thanks in advance for any help!

Bibek · September 27, 2018, 8:43am

Hi,

Though the %utilization shown is wrong.
Can you tell me when the utilization shoots up? When you have just booted the system and its idling? Or when can bus is busy or whe LiDAR is talking through ethernet? Is the issue see without ethernet on?
I am just trying to locate where is the problem? Locally we have not seen it.

Can you also check in your system what is this kworker doing? For example using ftrace

thanks
Bibek

tommaso.maioli · September 28, 2018, 11:05am

Thanks for your answer.
Unfortunately, I don’t have access to the NVidia in this moment, as soon as I can ftrace the problem I’ll post the result here.

The problem usually appears when both CAN and Ethernet are connected and talking. After it appears, there’s no way of stopping it, and stays constant to that percentage even when idling.

Please notice that I’ve observed this problem using different Lidars and Transceiver CAN, so I tend to exclude a hardware/software problem from that side.

I did try this ([url]https://www.linuxquestions.org/questions/linux-software-2/high-cpu-usage-by-kworker-4175563563/[/url]) solution: it took longer to appear but appeared anyway.

Let me know if there’s something more that could help you!

Thanks!

Bibek · October 30, 2018, 11:48am

Hi,

Can you dump all the task backtrace using sysrq
https://www.kernel.org/doc/html/v4.11/admin-guide/sysrq.html

tommaso.maioli · November 5, 2018, 3:23pm

Hi,

I finally managed to reproduce the error and used ftrace to log what I could.

The file out.txt corresponds to the output of:

$ cat /sys/kernel/debug/tracing/trace_pipe > out.txt

while out2.txt is the output of:

cat /proc/THE_OFFENDING_KWORKER/stack

with THE_OFFENDING_KWORKER being the PID of the kworker as seen from htop

Thanks in advance!
out2.txt (153 Bytes)
out.txt (3.44 MB)

Bibek · November 9, 2018, 10:10am

out.txt:
Its showing two things:

there is display related SMMU error. Wrong address which is out of display mapped region is trying to be accesses, which is throwing these errors. But I don’t think you are bothered about those.
One thing is, CPU0 is only spewing this error. Not doing any workqueue job. If CPU 0 is stuck, then this could be the reason.

   <idle>-0     [000] d.h1     3.156581: arm_smmu_context_fault: Unhandled context fault: iova=0x96d82e40, fsynr=0x1, cb=19, sid=9(0x9 - NVDISPLAY), pgd=0 pud=0, pmd=0, pte=0
   <idle>-0     [000] d.h1     3.156609: arm_smmu_context_fault: Unhandled context fault: iova=0x96d86740, fsynr=0x1, cb=19, sid=9(0x9 - NVDISPLAY), pgd=0 pud=0, pmd=0, pte=0
   <idle>-0     [000] d.h1     3.156644: arm_smmu_context_fault: Unhandled context fault: iova=0x96d8a000, fsynr=0x1, cb=19, sid=9(0x9 - NVDISPLAY), pgd=0 pud=0, pmd=0, pte=0
   <idle>-0     [000] d.h1     3.156671: arm_smmu_context_fault: Unhandled context fault: iova=0x96d8e7c0, fsynr=0x1, cb=19, sid=9(0x9 - NVDISPLAY), pgd=0 pud=0, pmd=0, pte=0
   <idle>-0     [000] d.h1     3.156700: arm_smmu_context_fault: Unhandled context fault: iova=0x96d91e00, fsynr=0x1, cb=19, sid=9(0x9 - NVDISPLAY), pgd=0 pud=0, pmd=0, pte=

Can you tell me which process id was hogging CPU this time?
kworker/0:3 is not seen in this log.

tommaso.maioli · November 9, 2018, 2:39pm

Hi bbasu.

Thanks for your answer.

Regarding question 1, I agree with you. It seems that the display is giving problems. After reading the ftrace we’ve been working without any display attached and the problem has not appeared since then. Our guess is that is the problem. Is it a reasonable guess in your opinion? How to solve it?

Regarding question 2, the PID was 55. I don’t think it was kworker/0:3. What I’ve seen is that the kworker hogging cpu changes from time to time.

Thanks you very much for your time!

Bibek · November 14, 2018, 8:48am

Yeah, we should fix the SMMU display issue.
What display panel you are using. Over HDMI or over DP?
Are you using Jetson or your customized Hardware?
Can you share the boot log?

tommaso.maioli · November 14, 2018, 1:46pm

I am using HDMI with Jetson.

I attached a txt file with the dmesg.

I think that what we are looking for is at time [0.244]

Thanks for your help
dmesg.txt.txt (69.5 KB)

Bibek · November 22, 2018, 10:24am

Hi Tommaso

Thanks for the log.
Can you boot without HDMI connected and then connect after boot?
This issue was fixed in latest release, what release version you are using?

regards
Bibek

Topic		Replies	Views
Unbalanced CPU usage, CPU stalls and reboots Jetson AGX Xavier performance	18	1685	October 18, 2021
High CPU usage when idle Jetson AGX Xavier hw , ros , kernel , nvbugs	34	3055	April 27, 2020
Overheating issue? Jetson TX1	19	3026	May 28, 2016
TX2 HDMI, eth, USB stops working after sometime Jetson TX2	12	795	January 9, 2020
Cannot get Display to work Jetson TX2	8	741	October 18, 2021
does opencv_dnn use gpu? Jetson TX2	11	3097	October 18, 2021
Jetson TK1: Mobile Embedded Supercomputer Takes CUDA Everywhere Technical Blog	54	579	February 12, 2016
CPU's utilization Jetson TX2	6	2908	October 18, 2021
Jetson locks up when monitor is connected Jetson TX2	11	579	October 18, 2021
emmc failure on multiple boards Jetson TX2	8	2071	October 18, 2021

Jetson TX2: kworker CPU usage at 100%

Related topics