Tuning linux on jetson nano for better data reliability in power failure scenario

linuxdev · May 9, 2023, 6:11pm

The issue you are running into is not specific to a Jetson. This is an issue of all computers (including Windows, Mac, desktop PCs, RPis, so on) which have a filesystem which caches for improved performance. If there is caching, then there is no way around the possibility of loss.

If there is no caching it implies the disk is operating “synchronously”. Every bit written is written immediately, and the write command will not continue to the next bit until the previous bit is written. Power loss won’t matter because there is no outstanding cached write (although the program specifying what to write will cut off some of the data, the filesystem itself will not corrupt and it is guaranteed that if the filesystem returned success of write, then the filesystem can guarantee the data written is still valid even after sudden power loss).

Hard drives (and their equivalents) “mostly” all have a cache internal to the drive. Perhaps on a tiny Cortex-M controller no cache will be used, but that’s rare. The operating system itself might use “synchronous” writes within its own software, but if the drive caches, you are still at risk of loss of the cache which was being written. The disk would have to be intentionally told to go to a synchronous mode to avoid using that cache and have guarantees. Performance would absolutely “suck” (one of my favorite technical terms).

If you were to operate solid state memory without cache, and be purely synchronous, then the solid state memory would die much much sooner. Part of the purpose of the cache with solid state memory is to aid in wear leveling. If you have a small microcontroller which only reads, this is not a problem, but if you want to write to the disk, this is a dangerous thing to do for solid state memory.

Enter journaling filesystems. ext4 and NTFS are journalling. At this point it is important to know the difference between loss of data and filesystem corruption. If one simply loses data, then it isn’t a big problem; if the system is writing metadata which changes the filesystem structure itself (e.g., adding or removing a directory), then it is possible that a sudden cut of power might make the rules of seeking within the filesystem break. Reading might result in completely absurd issues, or crashing; writing could result in loss of the entire disk’s content. A journal does not stop data loss, but it does stop corruption.

A journal is a small amount of disk space which is 100% synchronous. Neither the operating system, nor the disk itself, will cache this space. It is a tiny amount of space, and so it doesn’t usually hurt solid state disk life (the space can level by traveling over other disk space via a pointer rather than referring to one specific small set of solid state memory).

When content is cached to write (on a journaling filesystem like ext4), the journal marks that content’s destination as available, but not yet written; as bytes are written, the journal marks them as written. Should power be suddenly lost, the journal can be replayed, and content which is valid will be marked clean; content ready for write, but not written, will be reversed out. That reversed out content is gone.

Regardless of whether it is a desktop PC or Windows or Linux, loss of power would result in loss of any content which is cached but not written. So long as the journal is large enough, corruption will not occur. That is a big “if” regarding journal size. Maybe you have 20 GB of unwritten data; the implication is that your “small” journal won’t be able to mark all of that data, at least not the blocks being written at a given moment (it is a “block device”…writes are in blocks). Too much data relative to journal size will result in corruption needing filesystem repair (which is always hit and miss…repair will always lose “something”, but whether it is able to prevent total disk loss is unpredictable).

You could “tune” the ext4 filesystem to make a larger journal. You’d be at risk of losing more data upon power loss, but the odds of corruption would be reduced. It is a tradeoff between how much you’re willing to lose versus risk of corruption. On top of this, if it is a solid state device, then a synchronous larger journal will start to effect the life of the disk. One reason the journal size is not very big on a Jetson is to preserve the lifetime of the eMMC for those models. I think an SD card could be tuned for a larger journal, but it too would have a shorter lifetime if the journal is too large, and of course losing more data at loss of power is a risk.

It won’t matter if you are using a Jetson, any web search on how to tune an ext4 filesystem for a larger journal (or query of existing journal size) would be valid. The Jetson is not responsible for the issue; consider yanking the power cord from a desktop PC in the middle of a write…the two have exactly the same filesystem, and although journal sizes differ, they both have the same result. There is no “magic” way to avoid issues from sudden power loss. The ideal answer is to start proper shutdown upon detecting power loss, and have a backup which can last long enough for proper shutdown.

Incidentally, there are ways to force the filesystem to sync (flush all cache to disk), followed by change to read-only mode. Once in read-only mode there is no risk. You wouldn’t be able to write, but you could still otherwise operate normally. Read-only also does not hurt the life of solid state memory…no writes will basically provide the longest life you can get.

A useful URL:
https://www.loggly.com/ultimate-guide/managing-journal-size/

See:
journalctl --disk-usage
(compare your Jetson to your PC)

Topic		Replies	Views
Is Jetson software protected for incorrect shut down? Jetson Orin Nano jetson	8	44	November 21, 2024
Stuck at Nvidia logo after power outage Jetson Nano reboot	12	1721	January 10, 2023
Jetson Nano and SD corruption Jetson Nano	20	3666	October 28, 2024
SD card damage protection Jetson Nano security	119	5768	October 15, 2021
Jetson TX2 micro USB port is broken Jetson TX2	21	2463	October 18, 2021
Full backup and restore system on the jetson nano jetpack 4.4 Jetson Nano jetpack	6	2009	October 18, 2021
How to boot ubuntu 18.04 from one USB 2.0 disk Jetson Nano boot	28	1401	May 3, 2023
USB errors using external drive, works fine on host Jetson Nano usb	19	519	May 10, 2023
[HELP!] Configuring SSD & L4T Install & Boot Configuring Jetson TX1	15	4136	January 27, 2017
The integrity of the Jetson file storage system Jetson Orin Nano linux	5	898	July 18, 2023

Tuning linux on jetson nano for better data reliability in power failure scenario

Related topics