I am encountering an issue with cboot and a real-time kernel in Jetpack 32.5.2 on a Jetsons TX2. It appears that cboot intermittently reverts to the other rootfs partition when I’m using a Linux Kernel compiled with RT PREEMPT patches.
Here is the current setup: I have a Jetsons TX2 running a Yocto distribution based on meta-tegra in the Dunfell 32.5.2 branch. This distribution handles software updates with Mender.
Here’s the testing scenario: I’ve conducted a cycle of 100 Mender updates on the device. Each update triggers an A/B rootfs partition switch. However, at some point during a reboot, the device rolls back to the previous partition. I’ve monitored the logs on the serial port to check if the error might be related to Mender, cboot, or the Linux kernel. The issue is that the rollback error occurs before any output appears on the serial port. This suggests that the error occurs either before or during cboot execution since cboot typically logs information on the serial port during the boot process. There are no error indications in the logs for the Linux kernel or the Mender update stages.
To replicate the error, you can use the meta-tegra demo distribution and their test script for Mender updates. Make sure you have a Jetson TX2 development kit connected to the local network. Here are the steps to build and flash the OS with a RT Kernel:
from the demo distro. Checkout the dunfell-l4t-r32.5.0 branch: GitHub - OE4T/tegra-demo-distro at dunfell-l4t-r32.5.0
git clone https://github.com/OE4T/tegra-demo-distro cd tegra-demo-distro git checkout dunfell-l4t-r32.5.0 git submodule update --init
apply RT patches to the demo distro
source setup-env --machine jetson-tx2-devkit --distro tegrademo-mender build devtool modify linux-tegra cd workspace/sources/linux-tegra/ ./scripts/rt-patch.sh apply-patches
build the image
put the jetson in recovery mode and plug USB cable and use
lsusb to check NVIDIA device presence and flash the image:
cd tmp/deploy/image/jetson-tx2-devkit/ mkdir flash tar -C flash -xvf demo-image-base-jetson-tx2-devkit.tegraflash.tar.gz cd flash sudo ./doflash
connect the jetson to the network, it will get an IP with dhcp. search for it with nmap or check with serial connection that gives you a shell. login as root without password and check mender and kernel versions (check for RT tag)
root@j140-tx2-d02:~# mender --version 2.6.1 runtime: go1.14.15 root@j140-tx2-d02:~# uname -a Linux j140-tx2-d02 4.9.201-rt134-l4t-r32.5+g618f59196be6 #1 SMP PREEMPT RT Fri Sep 8 09:18:27 UTC 2023 aarch64 GNU/Linux
on your laptop, setup a web server for the stress test
cd tmp/deploy/image/jetson-tx2-devkit/ # or cd .. from last step python3 -m http.server 8080
in a new terminal, run the stress test to trigger the unwanted rollback
cd tegra-demo-distro/layers/meta-mender-tegra/scripts/test python3 -m pip install -r requirements.txt ./mender_tegra_test.py --test mender_torture --device <Jetson_IP> --mender_install http://<Laptop_IP>:8080/demo-image-base-jetson-tx2-devkit.mender 2>&1 | tee -a logfile.log
At some point, the script should crash due to the rollback occurring. You can monitor the process on the serial connection. Could you please investigate this error to understand the behavior on the TX2 with a real-time kernel?
If you have any further questions or need additional information, please let me know.