Hi.
I need help in understanding something with root file system A/B redundancy feature. Reading the NVIDIA documentation, and browsing several guides and posts about rootfs A/B redundancy, I developed the following understanding of it:
In rootfs A/B redundancy, the system contains two root file systems. These two ‘slots’ are useful so that if one of the slots is corrupted/not bootable, the system can failover to the other slot (thus increasing the reliability of the Jetson in a production environment).
In order to implement this feature, I flashed my Jetson-AGX-Xavier with rootfs A/B redundancy using Jetpack 4.6. Then, to check whether this would work in a real data corruption scenario, I booted into slot A, corrupted the same slot (slot A), and then hard rebooted the system (by giving a power cycle). The thinking behind this was that in a real application, we would be using one slot (say slot A), and if that slot is corrupted by say a sudden power failure, then the system should boot into slot B.
However, the above mentioned test failed. The system would be forever stuck trying to boot from slot A (when slot A is clearly corrupted), and would not failover to slot B.
I carried out the above procedure to test this A/B redundancy feature for Jetpack 5.0.2, and for Jetpack 5.1 as well. However, everytime, I ran into the same outcome: The system would attempt to boot from the corrupted slot, be stuck there forever, and not boot from the other slot.
Finally, on Jetpack 5.0.2, I tried booting into slot B, then mounting and removing slot A file system, and then using nvbootctrl to boot to slot A. In this case, the system does indeed failover to slot B after failing to boot from slot A.
My question is this: Is the above method (of working in slot A, and corrupting/removing file system of slotA) the correct method for testing rootfs A/B redundancy? Isn’t this the most accurate representation of a real life data corruption? Or was my test method completely wrong, and I misunderstood things, and using the second method (slot B to remove slot A, then boot to slot A) is the correct way?
Any guidance on this would be appreciated. Thanks.