I uses an AGX Orin customer board which uses L4T 35.1. I expect to have sudden power losses on the board. What are the steps to avoid file system corruption? Im especially concerned that the board will not boot up anymore. Not so much about the data loss during the writing while board has a dudden power loss.
What are possible ways to mitigate this? Can I avoid this completely?
Hi,
Can you have a small battery and it can let the board shutdown gracefully in the sudden power loss? This would be better if sudden power loss is expected in the use-case.
The short answer is that the only way to completely avoid corruption (and this includes any hardware or operating system), in which power to provide time for shutdown is not the answer, will (A) destroy the solid state memory lifetime, and (B) degrade performance to a high degree.
The option to add a larger journal will allow the filesystem to not corrupt, but this is not the same as not losing data. A journal is basically a surgeon’s scalpel at disregarding incomplete writes (and losing those writes), whereas a filesystem check to fix a corrupt partition could be considered a lumberjack’s axe. Providing power such that shutdown completes is that magic “get out of jail free” card.
Thanks for the detailed answers. I now have a clearer picture.
In your opinion is having a read-only partition with the rootFS next to a data rw partition better or worse approach then playing around with the journaling. The main concern here is for me: can the read-only rootFS partition be affected by a sudden power loss and therefore corrupt the rootFS read only partition. Assuming that Im writing to data rw partition when the sudden power loss hits. Both partition are on the eMMC. On the internet there are some contradiction infos that even in that case a data loss can happen since the eMMC is not really aware of the different partitions.
I would say that is a better approach, but maybe not as easy as it sounds since there are temp files. You’d just need to make sure that only the actual rootfs is read-only, and not just “everything”. Note that if temporary files cannot be created, then you cannot even log in (which requires temp files). Playing with journals is just asking “how much more can we lose without corrupting the system”. A read-only rootfs though is rather reliable and won’t wear out the eMMC near so fast either.
There is another in-between and clever alternative, but I’m not sure if anyone ever got it to work: OverlayFS. See:
What you get with OverlayFS (note the “fuse” in part of its naming) is a read-only filesystem, but a layer in RAM such that if you write to a file, then only RAM is edited, and the RAM overlays onto the actual disk for that file such that it looks like the file was edited. Upon reboot everything edited disappears since it is RAM. Everything in the backing (the actual disk content) is always 100% matching even if there is power loss. This is used in a lot of kiosks. If you were to couple this with your data being written to a different partition, then I suspect it would be very reliable (but updates might be a pain). Actually getting OverlayFS to work might not be easy.
Anyone here know if OverlayFS has been tested on Orin eMMC models?
The bottom line is that any partition which has a filesystem on it, if writes are not uncached and synchronous, will lose something upon power loss. There is no exception. Even in some cases where the operating system has switched from read-write to read-only there can still be loss of data due to the caching which is internal to the drive itself. If you completely remove that cache on solid state memory you’ll destroy its lifetime, but if the only cache is part of the disk itself, then odds are it has a moment of power via capacitors at the moment of loss that it stands a chance of finishing writing its own cache. If you booted such that nothing was ever written, then there isn’t any cache or buffer anywhere that can cause problems. To have never written or to destructively remove your solid state memory’s internal cache is the only way to guarantee everything.
Or you could have a battery backup with enough time for a shutdown. Maybe even a supercapacitor. All of that complication, or a few seconds longer of power. Those are the two choices.