Can't turn off both Denver cores when using real-time kernel

Hi, I’m trying out the PREEMPT-RT(4.9.140-rt93) kernel that comes with the
L4T Driver Package (BSP) Sources (R32.4.2).
The patch worked fine, and I found little trouble running real-time programs on Jetson AGX Xavier.

However when I try to use the nvpmodel -m{option} command to change the power settings, the board freezes immediately, and I have to restart the board manually.

It appears that out of the 7 options provided with nvpmodel.conf in /etc/nvpmodel.conf, only two of them are working correctly. The working ones are options 0 and 3, both of which do not turn off any of the Denver CPU cores completely. Other options result in board freeze.

I am guessing that the issue occurs when I try to shut down an entire Denver core.
For example, I tried turning off cpu0 and cpu1 separately, which worked fine.
(okay)
echo 1 > /sys/devices/system/cpu/cpu0/online
echo 0 > /sys/devices/system/cpu/cpu1/online
(okay)
echo 0 > /sys/devices/system/cpu/cpu0/online
echo 1 > /sys/devices/system/cpu/cpu1/online

However, if I try to turn both of them off, the board once again freezes.
(not okay)
echo 0 > /sys/devices/system/cpu/cpu0/online
echo 0 > /sys/devices/system/cpu/cpu1/online

Turning off cores from different Denver module was okay.
(okay)
echo 0 > /sys/devices/system/cpu/cpu0/online
echo 0 > /sys/devices/system/cpu/cpu2/online

For a non-RT (vanilla) AGX Xavier, all nvpmodel options were functioning correctly.

I do not think this behavior is related directly to PREEMPT-RT, since the problem only happens within a Denver module.
Is there a fix for this behavior?

Hi upoque,

Do you mean you cannot disable both denver cores but only one?

Could you past the kernel error here?

Hi WayneWWW,
Here’s the kernel error message.

-m4 will disable core 6 and 7.
$ sudo nvpmodel -m4 --verbose
[sudo] password for nvidia:
NVPM VERB: Config file: /etc/nvpmodel.conf
NVPM VERB: parsing done for /etc/nvpmodel.conf
NVPM VERB: set power mode as MODE_30W_6CORE(4) in /etc/nvpmodel.conf.
NVPM VERB: Set power mode: 4
NVPM VERB: write PARAM CPU_ONLINE: ARG CORE_0: PATH: /sys/devices/system/cpu/cpu0/online VAL: 1
NVPM VERB: write PARAM CPU_ONLINE: ARG CORE_1: PATH: /sys/devices/system/cpu/cpu1/online VAL: 1
NVPM VERB: write PARAM CPU_ONLINE: ARG CORE_2: PATH: /sys/devices/system/cpu/cpu2/online VAL: 1
NVPM VERB: write PARAM CPU_ONLINE: ARG CORE_3: PATH: /sys/devices/system/cpu/cpu3/online VAL: 1
NVPM VERB: write PARAM CPU_ONLINE: ARG CORE_4: PATH: /sys/devices/system/cpu/cpu4/online VAL: 1
NVPM VERB: write PARAM CPU_ONLINE: ARG CORE_5: PATH: /sys/devices/system/cpu/cpu5/online VAL: 1
NVPM VERB: write PARAM CPU_ONLINE: ARG CORE_6: PATH: /sys/devices/system/cpu/cpu6/online VAL: 0

$ dmesg -w

[533730.088783] CPU6: shutdown
[533730.088943] psci: CPU6 killed.
(freeze)

Hi,

Could you move to rel-32.4.3 GA and test this patch again?

Hi,

Also want to correct it. Xavier does not have x2 Denver cores but all Carmel cores.

Thus, those cores should not be different.

We have tried to patch rel-32.4.3 but cannot reproduce your issue.

What is your method to apply rt-patch?

Hi Wayne,
I also tried rel-32.4.3 but am still stuck with the same problem.

Here’s how I applied the patch:

  1. Create NVIDIA Jetson Xavier image
$ mkdir -p $HOME/nvidia/docs
$ cd $HOME/nvidia

(Driver Package)
$ wget https://developer.nvidia.com/embedded/L4T/r32_Release_v4.3/t186ref_release_aarch64/Tegra186_Linux_R32.4.3_aarch64.tbz2
$ sudo tar -xf Tegra186_Linux_R32.4.2_aarch64.tbz2

(RootFileSystem R32.4.3)
$ wget https://developer.nvidia.com/embedded/L4T/r32_Release_v4.3/t186ref_release_aarch64/Tegra_Linux_Sample-Root-Filesystem_R32.4.3_aarch64.tbz2
$ sudo tar -xf Tegra_Linux_Sample-Root-Filesystem_R32.4.2_aarch64.tbz2 -C $HOME/nvidia/Linux_for_Tegra/rootfs

$ cd $HOME/nvidia/Linux_for_Tegra
$ sudo ./apply_binaries.sh
  1. Flash NVIDIA Jetson Xavier image

  2. Prepare RT Kernel

$ mkdir -p $HOME/nvidia_rt
$ cd $HOME/nvidia_rt

(cross-compiler)
$ wget -O l4t-gcc-7-3-1-toolchain-64-bit.tar.xz https://developer.nvidia.com/embedded/dlc/l4t-gcc-7-3-1-toolchain-64-bit
$ tar -xf l4t-gcc-7-3-1-toolchain-64-bit.tar.xz

(env vars for cross-compiler)
$ export BSPTOOLCHAIN=$HOME/nvidia_rt/install/bin
$ export PATH=${BSPTOOLCHAIN}:${PATH}
$ export ARCH=arm64
$ export CROSS_COMPILE=$HOME/nvidia_rt/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-
$ export TEGRA_XAVIER_KERNEL_OUT=$HOME/nvidia_rt/tegra-jetson-xavier-kernel

(public sources 4.3)
$ wget https://developer.nvidia.com/embedded/L4T/r32_Release_v4.3/Sources/T186/public_sources.tbz2
$ tar -xf public_sources.tbz2
$ cd $HOME/nvidia_rt/Linux_for_Tegra/source/public
$ tar -xjf kernel_src.tbz2

**(apply rt-patch)**
$ cd $HOME/nvidia_rt/Linux_for_Tegra/source/public/kernel/kernel-4.9
$ for i in rt-patches/*.patch; do echo $i; done
$ for i in rt-patches/*.patch; do patch -p1 < $i; done

(config kernel)
$ make O=$TEGRA_XAVIER_KERNEL_OUT ARCH=$ARCH tegra_defconfig
$ make O=$TEGRA_XAVIER_KERNEL_OUT ARCH=$ARCH menuconfig
changed CONFIG_PREEMPT_RT_FULL=y

(compile / install)
$ mkdir -p $HOME/nvidia_rt/L4T
$ cd $HOME/nvidia_rt/Linux_for_Tegra/source/public/kernel/kernel-4.9
$ make -j12 O=$TEGRA_XAVIER_KERNEL_OUT ARCH=$ARCH zImage
$ make O=$TEGRA_XAVIER_KERNEL_OUT ARCH=$ARCH dtbs
$ make -j12 O=$TEGRA_XAVIER_KERNEL_OUT ARCH=$ARCH modules
$ make O=$TEGRA_XAVIER_KERNEL_OUT ARCH=$ARCH modules_install INSTALL_MOD_PATH=$TEGRA_XAVIER_KERNEL_OUT/modules
$ cp $TEGRA_XAVIER_KERNEL_OUT/arch/arm64/boot/Image $HOME/nvidia_rt/L4T/kernel
$ mkdir -p $HOME/nvidia_rt/L4T/kernel/dtb
$ cp $TEGRA_XAVIER_KERNEL_OUT/arch/arm64/boot/dts/*.dtb $HOME/nvidia_rt/L4T/kernel/dtb
$ cd $TEGRA_XAVIER_KERNEL_OUT/modules
$ tar --owner root --group root -cjf kernel_supplements.tbz2 *
$ cp $TEGRA_XAVIER_KERNEL_OUT/modules/kernel_supplements.tbz2 $HOME/nvidia_rt/L4T/kernel/kernel_supplements.tbz2
$ cd $HOME/nvidia_rt
$ tar -cjf $HOME/nvidia_rt/L4T.tbz2 L4T
$ scp $HOME/nvidia_rt/L4T.tbz2 nvidia@JETSON_IP_ADDRESS:/home/nvidia
  1. Replace kernel (on NVIDIA Jetson Xavier)
$ cd /home/nvidia
$ tar -xjf L4T.tbz2
$ sudo cp L4T/kernel/Image /boot/Image
$ sudo cp L4T/kernel/dtb/* /boot/dtb
$ sudo cp L4T/kernel/dtb/* /boot
$ sudo tar -xvf L4T/kernel/kernel_supplements.tbz2 -C /
$ sudo reboot

This seems to be related to the same cause…

The board is set default to nvpmodel -m7 (MODE_15W_DESKTOP),
which switches off core 4 to 7.

So when any of the core 4, 5, 6, 7 is not shut down correctly, the board won’t boot.

If boot correctly, dmseg would output something like:

...
[   20.107172] CPU4: shutdown
[   20.107299] psci: CPU4 killed.
[   20.396419] ras_fhi_disable: FHI 483 disabled
[   20.405339] CPU5: shutdown
[   20.419660] psci: Retrying again to check for CPU kill
[   20.419679] psci: CPU5 killed.
[   20.511509] ras_fhi_disable: FHI 484 disabled
[   20.511801] NOHZ: local_softirq_pending 02
[   20.512056] NOHZ: local_softirq_pending 02
[   20.512117] NOHZ: local_softirq_pending 02
[   20.512180] NOHZ: local_softirq_pending 02
[   20.512231] NOHZ: local_softirq_pending 02
[   20.512647] CPU6: shutdown
[   20.513241] psci: CPU6 killed.
[   20.529924] ras_fhi_disable: FHI 485 disabled
[   20.538825] CPU7: shutdown
[   20.550661] psci: Retrying again to check for CPU kill
[   20.550680] psci: CPU7 killed.
...

However when it hangs during booting,
it gets stuck somewhere between [ 20.107172] CPU4: shutdown and [ 20.550680] psci: CPU7 killed.,
which is exactly the same problem I get when using nvpmodel -m{option}.

It’s funny that every time I need to restart, I have to rely on chance and press reboot several times to get all 4 cores down.

Hi upoque,

Please try below steps to apply rt-patch:

$ export TEGRA_KERNEL_OUT=<outdir>
$ export CROSS_COMPILE=<toolchain_install_path>/bin/aarch64-linux-gnu-
$ export LOCALVERSION=-tegra
$ cd /Linux_for_Tegra/source/public/kernel/kernel-4.9
$ ./scripts/rt-patch.sh apply-patches
$ make ARCH=arm64 O=$TEGRA_KERNEL_OUT tegra_defconfig
$ make ARCH=arm64 O=$TEGRA_KERNEL_OUT -j<n>

After build completed, replace kernel Image to your Xavier:

$ cp TEGRA_KERNEL_OUT/arch/arm64/boot/Image /boot/Image
$ sudo reboot

Using this steps to disable cpu0 and cpu1 on Xavier, we don’t get issue.

1 Like

Hello,

Any feedback here?

Carolyuu, Wayne, thank you for having interest in this issue, it is helping me a lot.
I tried as carolyuu’s suggested, but this time the board would not boot at all.

...
[  OK  ] Started crash report submission daemon.
         Starting Tool to automatically collect and submit kernel crash signatures...
         Starting Ubuntu FAN network setup...
         Starting Permit User Sessions...  
[FAILED] Failed to start nvpmodel service.
See 'systemctl status nvpmodel.service' for details.
[  OK  ] Started containerd container runtime.
...

Maybe I am still applying the rt-patch the wrong way.
Please point out the error if I’m doing something differently.

I flashed the board with the image created from the latest driver package and root file system:

(Driver Package)
$ wget https://developer.nvidia.com/embedded/L4T/r32_Release_v4.3/t186ref_release_aarch64/Tegra186_Linux_R32.4.3_aarch64.tbz2
$ sudo tar -xf Tegra186_Linux_R32.4.3_aarch64.tbz2

(RootFileSystem R32.4.3)
$ wget https://developer.nvidia.com/embedded/L4T/r32_Release_v4.3/t186ref_release_aarch64/Tegra_Linux_Sample-Root-Filesystem_R32.4.3_aarch64.tbz2
$ sudo tar -xf Tegra_Linux_Sample-Root-Filesystem_R32.4.3_aarch64.tbz2 -C $HOME/nvidia/Linux_for_Tegra/rootfs

$ cd $HOME/nvidia/Linux_for_Tegra
$ sudo ./apply_binaries.sh

After that I followed carolyuu’s guide to replace the kernel image:
For this, I used the latest public source r32.4.3.

(public sources 4.3)
$ wget https://developer.nvidia.com/embedded/L4T/r32_Release_v4.3/Sources/T186/public_sources.tbz2
$ tar -xf public_sources.tbz2

(rt-patch (carolyuu))
$ export TEGRA_KERNEL_OUT=<outdir>
$ export CROSS_COMPILE=<toolchain_install_path>/bin/aarch64-linux-gnu-
$ export LOCALVERSION=-tegra
$ cd /Linux_for_Tegra/source/public/kernel/kernel-4.9
$ ./scripts/rt-patch.sh apply-patches
$ make ARCH=arm64 O=$TEGRA_KERNEL_OUT tegra_defconfig
$ make ARCH=arm64 O=$TEGRA_KERNEL_OUT -j<n>
...
(replace kernel image)
$ cp TEGRA_KERNEL_OUT/arch/arm64/boot/Image /boot/Image
$ sudo reboot

Could you replace this Image to Linux_for_Tegra/kernel/Image and relfash your board again?

I tried as you suggested, but booting stops due to load kernel error.

...
[FAILED] Failed to start Load Kernel Modules.
See 'systemctl status systemd-modules-load.service' for details.
...

Maybe this is because grub is loading a wrong kernel image?
export LOCALVERSION=-tegra

Do I need to change any settings in the rootfs or in the bootloader to correctly load the kernel image?

The only thing I did was ./apply_bianaries.sh in the DriverPackage, and ./scripts/rt-patch.sh apply-patches in the rt public source, then compile and copy image as written above.

Hi,

Could you also rebuild the kernel modules along with the kernel image and put the modules to Linux_for_Tegra/roofs/lib/modules?

Thanks! Problem solved after copying the new modules.
nvpmodel now works for all modes! (needs reboot for certain modes)

Although it works well now, it will be very helpful to me, if you could elaborate why the problem happened for my initial case.
As I see it, by applying rt patch using ./scripts/rt-patch.sh apply-patches as you suggested, the kernel config is set as:

<rt-patch.sh>
...
 28                         --enable PREEMPT_RT_FULL \
 29                         --disable DEBUG_PREEMPT \
 30                         --disable CPU_IDLE_TEGRA18X \
 31                         --disable CPU_FREQ_TIMES \
 32                         --disable CPU_FREQ_GOV_SCHEDUTIL \
 33                         --disable CPU_FREQ_GOV_INTERACTIVE
...

… while in my initial settings had only PREEMPT_RT_FULL enabled without disabling other flags. Other than that, the procedure looks similar to me.

Could this be the cause? Or may there be other reasons why my first attempt did not work?

Regardless, many thanks for solving the issue!

1 Like