High CPU usage when idle

I have a AGX Xavier that I use with ROS (w/ ZED2 and E-con systems cameras). Intermittently, the CPU load will peg at 100% for 1 of the cores. I can see it is a kernel thread (it shows up as a red bar using ‘htop’). It stays at this high usage even when I quit running everything (and the heatsink gets very warm). This usage will sometimes stay even over a warm reboot.

Any suggestions on how to debug this or figure out what kernel thread is causing the problem? Thanks.

Just one more followup. Whenever I see this happen, the ‘hi’ number on top (hardware interrupts) goes up. I would like to find a way to debug this.

Hi jseng,

1.Could you reproduce this scenario with tegrastats running as well?
You could run it by command “sudo tegrastats” and it will dump the cpu usage here. Please provide it here.

2.Also, sharing the result of htop (probably screenshot) when device is idle may help.

3.To check the interrupt
cat /proc/interrupts

Okay, I will try to get that information. It is difficult because I don’t know how to easily reproduce it. It occurs maybe 10% of the time.

Here are the screenshots (I hope they are readable). I took the screenshots when there was nothing running (heatsink is warm though). As I let the system idle, the value that goes up quickly in /proc/ interrupts is this line:

391: 54836312 0 0 0 tegra-gpio 192 Edge bluetooth hostwake

This seems an issue that happens to many forum user. I would suggest you to use rmmod command to remove bluedroid_pm kernel module at this moment.

You didn’t have any bluetooth device connected on xavier, right?

Yes, I just found the thread: I have turned off Bluetooth, why are there interrupts about bluetooth hostwake?

I think removing the module fixed it, but I will test more.

I do not have any bluetooth devices and I have been trying to figure this out. I don’t know why the bluetooth module is loaded at all. I did go through my ‘dmesg’ and this is what I found:

[ 1.173773] Bluetooth: Core ver 2.22
[ 1.173854] NET: Registered protocol family 31
[ 1.173862] Bluetooth: HCI device and connection manager initialized
[ 1.173876] Bluetooth: HCI socket layer initialized
[ 1.173888] Bluetooth: L2CAP socket layer initialized
[ 1.173953] Bluetooth: SCO socket layer initialized
[ 4.388373] Bluetooth: RFCOMM socket layer initialized
[ 4.388387] Bluetooth: RFCOMM ver 1.11
[ 4.388392] Bluetooth: HIDP (Human Interface Emulation) ver 1.2
[ 4.388398] Bluetooth: HIDP socket layer initialized

If the module is not necessary, it would be great to figure out why it is getting loaded (or at least do not install it with the default kernel).

Yes, we will check why this driver has unnecessary IRQ output here.

jseng,

May I ask your sku # of your xavier module + devkit? Also, could you help observe whether this issue happens in every boot?

Which release are you using?

Okay, I will try to find that info. I don’t think it happens every boot. I feel like it used to happen more often and then after some updates, it became less frequent, but it still would occur.

I am using Jetpack 4.3.

BTW, you didn’t modify any pinmux setting, right?

No, I did not modify those settings.

jseng,

We may need your help to debug because somehow we cannot reproduce this issue with our device.

To debug, could you try to restore the driver back and modify the pinmux setting of TEGRA194_MAIN_GPIO(Y, 0) with Int PU enabled?

You need to download the pinmux spreadsheet from download center and search for PY.00. Modify the init state to PU and generate new pinmux cfg.

It is little complicated if you are new to pinmux setting. Please let us know if you have any trouble.

I downloaded the spreadsheet (Jetson Xavier pinmux v1.06) and found it on row 120. I have never modified pinmux settings, so I will need you to lead me through the process.

Let me try to summarize what I understand so far from reading documentation:

  1. Modify the spreadsheet to enable the internal pullup for GPIO3_PY.00 (which is currently set to not assigned (is it floating and that is why it is generating the random interrupts?).
  2. Generate the .dts files using the Excel button
  3. Use the dts2cfg.py script to generate .cfg files (is this for 3 files: pinmux, gpio, and pad? Or just 1 file: pinmux)
  4. I am not sure what next.

I am not going to try anything until I am sure of the process, since I don’t want to brick my system.

Modify the spreadsheet to enable the internal pullup for GPIO3_PY.00 (which is currently set to not assigned (is it floating and that is why it is generating the random interrupts?).

Yes, you are right. We think that might cause the the interrupt.

Use the dts2cfg.py script to generate .cfg files (is this for 3 files: pinmux, gpio, and pad? Or just 1 file: pinmux)

should only have one pinmux file if you use the command to generate pinmux only. We only need pinmux so no need to generate the pad setting.

I am not going to try anything until I am sure of the process, since I don’t want to brick my system.

After creating the pinmux file, go to Linux_for_Tegra/bootloader and search pinmux.

HP-Compaq-6200-Pro-MT-PC:~/nvidia/nvidia_sdk/JetPack_4.3_Linux_P2888-0060/Linux_for_Tegra/bootloader$ find -iname “*pinmux *.cfg”
./tegra19x-mb1-pinmux-p2888-0000-a04-p2822-0000-b01.cfg → This is the file name you previously flash into your board, but it is a copy.
./t186ref/BCT/tegra19x-mb1-pinmux-p2888-0000-p2822-0000.cfg
./t186ref/BCT/tegra19x-mb1-pinmux-p2888-0000-a00-p2822-0000-a00.cfg
./t186ref/BCT/tegra186-mb1-bct-pinmux-quill-p3489-1000-a00.cfg
./t186ref/BCT/tegra19x-mb1-pinmux-p2888-0000-a04-p2822-0000-b01.cfg → this is the real one
./t186ref/BCT/tegra19x-mb1-pinmux-p2888-slvs-0000-a00-p2822-0000-a00.cfg
./t186ref/BCT/tegra186-mb1-bct-pinmux-quill-p3310-1000-a00.cfg
./t186ref/BCT/tegra186-mb1-bct-pinmux-quill-p3310-1000-c00.cfg
./t186ref/BCT/tegra186-mb1-bct-pinmux-quill-p3310-1000-c03.cfg

I would suggest you to rename the old one as back-up and put the new one under Linux_for_Tegra/bootloader/t186ref/BCT. Please remember to rename the new file to the same name as the old one so that the flash script could find it.

and don’t worry about the device would be dead or not. If you hit any error, just use the old pinmux to re-flash it again and it would be alive again. Actually, device is seldom dead because of changing only pinmux.

Just to confirm:

  1. I will replace /t186ref/BCT/tegra19x-mb1-pinmux-p2888-0000-a04-p2822-0000-b01.cfg with the new file (I will make a backup)

I have never reflashed from a host. What script do I run and from where? Will it build a .deb and send it over and then run the flashing? I am assuming it won’t erase the root filesystem.

Hi jseng,

Unfortunately, the flash step would erase the boot file system. Do you have something important in rootfs already? The ROS and camera driver?

The flash steps are:

  1. remember to put device into recovery mode
  2. cd Linux_for_Tegra
  3. sudo ./flash jetson-xavier mmcblk0p1
    jetson-xavier is indicated by the “jetson-xavier.conf” file under Linux_for_Tegra.

Actually there is a back up method to back up current APP partition from your device. But it would take more time to do it and the test environment would be not so clean.
https://elinux.org/Jetson/Clone

I would suggest you to also try with clean image first. Though ROS or camera driver may not cause such interrupt, it would still a better test environment.

Hi Wayne,
I actually have a lot of stuff in the root filesystem that I don’t think I have the time to rebuild all of it at the moment. So, at the moment, I am reluctant to reflash. I am thinking of a way I can trigger the problem, so that you can see the problem as well. Sometimes it happens without running ROS, so I believe it is not related to that (although running the system at a higher load seems to trigger it more often).
I was looking at the pinmux spreadsheet again. Where does that pin go? It is highlighted in yellow (GPIO), but is it accessible anywhere? Thanks.

I just had a thought. The pin is listed as SPI3_CLK. Is it possible to drive the pin (or read the pin) given the current pinmux settings? I know that when I boot the system, the number of bluetooth interrupts increases right away (even at idle). If I can drive the pin, it should keep the pin from floating and make the interrupts stop (hopefully).