Tuning linux on jetson nano for better data reliability in power failure scenario

Hi there,

I am using Jetson Nano in a scenario where we currently experience power loss. It’s not ideal but we can’t fix it quickly, and I think linux should be doing better than it is. I am facing 0-sized files upon boot after power loss, for files that are minutes old (I can be more precise with some more experimentation, but the bottom line is I know it’s >> 5s, which I’ll explain the significance of in a moment).

I’ve been explicitly calling sync with some success, but it occurred to me that I should be able to simply ask the kernel to make some guarantee about when it writes data to the disk. I am using ext4 filesystem, and I found the “commit” setting here: https://www.kernel.org/doc/Documentation/filesystems/ext4.txt – and it defaults to 5s. My /etc/fstab mounts the EMMC disk with no mention of the commit option.

I’m hoping for replies along one of two directions (assuming I’ve made my problem clear!):

  • jetson’s linux flavor doesn’t respect ext4 commit option? or has a different default?
  • I am missing something else about the situation that causes files older than 5s to not be written to disk?

Thanks!

Hi,
Having a backup power supply looks to be better. So that you can shutdown the device gracefully in power outage. Not sure if this is possible in your setup.

That is a goal for a future revision of the hardware, but outside the scope of the question. Can you explain what you mean by “looks to be better”? I don’t necessarily need “better”. I am curious why files older than 5s don’t seem to get written to disk.

The issue you are running into is not specific to a Jetson. This is an issue of all computers (including Windows, Mac, desktop PCs, RPis, so on) which have a filesystem which caches for improved performance. If there is caching, then there is no way around the possibility of loss.

If there is no caching it implies the disk is operating “synchronously”. Every bit written is written immediately, and the write command will not continue to the next bit until the previous bit is written. Power loss won’t matter because there is no outstanding cached write (although the program specifying what to write will cut off some of the data, the filesystem itself will not corrupt and it is guaranteed that if the filesystem returned success of write, then the filesystem can guarantee the data written is still valid even after sudden power loss).

Hard drives (and their equivalents) “mostly” all have a cache internal to the drive. Perhaps on a tiny Cortex-M controller no cache will be used, but that’s rare. The operating system itself might use “synchronous” writes within its own software, but if the drive caches, you are still at risk of loss of the cache which was being written. The disk would have to be intentionally told to go to a synchronous mode to avoid using that cache and have guarantees. Performance would absolutely “suck” (one of my favorite technical terms).

If you were to operate solid state memory without cache, and be purely synchronous, then the solid state memory would die much much sooner. Part of the purpose of the cache with solid state memory is to aid in wear leveling. If you have a small microcontroller which only reads, this is not a problem, but if you want to write to the disk, this is a dangerous thing to do for solid state memory.

Enter journaling filesystems. ext4 and NTFS are journalling. At this point it is important to know the difference between loss of data and filesystem corruption. If one simply loses data, then it isn’t a big problem; if the system is writing metadata which changes the filesystem structure itself (e.g., adding or removing a directory), then it is possible that a sudden cut of power might make the rules of seeking within the filesystem break. Reading might result in completely absurd issues, or crashing; writing could result in loss of the entire disk’s content. A journal does not stop data loss, but it does stop corruption.

A journal is a small amount of disk space which is 100% synchronous. Neither the operating system, nor the disk itself, will cache this space. It is a tiny amount of space, and so it doesn’t usually hurt solid state disk life (the space can level by traveling over other disk space via a pointer rather than referring to one specific small set of solid state memory).

When content is cached to write (on a journaling filesystem like ext4), the journal marks that content’s destination as available, but not yet written; as bytes are written, the journal marks them as written. Should power be suddenly lost, the journal can be replayed, and content which is valid will be marked clean; content ready for write, but not written, will be reversed out. That reversed out content is gone.

Regardless of whether it is a desktop PC or Windows or Linux, loss of power would result in loss of any content which is cached but not written. So long as the journal is large enough, corruption will not occur. That is a big “if” regarding journal size. Maybe you have 20 GB of unwritten data; the implication is that your “small” journal won’t be able to mark all of that data, at least not the blocks being written at a given moment (it is a “block device”…writes are in blocks). Too much data relative to journal size will result in corruption needing filesystem repair (which is always hit and miss…repair will always lose “something”, but whether it is able to prevent total disk loss is unpredictable).

You could “tune” the ext4 filesystem to make a larger journal. You’d be at risk of losing more data upon power loss, but the odds of corruption would be reduced. It is a tradeoff between how much you’re willing to lose versus risk of corruption. On top of this, if it is a solid state device, then a synchronous larger journal will start to effect the life of the disk. One reason the journal size is not very big on a Jetson is to preserve the lifetime of the eMMC for those models. I think an SD card could be tuned for a larger journal, but it too would have a shorter lifetime if the journal is too large, and of course losing more data at loss of power is a risk.

It won’t matter if you are using a Jetson, any web search on how to tune an ext4 filesystem for a larger journal (or query of existing journal size) would be valid. The Jetson is not responsible for the issue; consider yanking the power cord from a desktop PC in the middle of a write…the two have exactly the same filesystem, and although journal sizes differ, they both have the same result. There is no “magic” way to avoid issues from sudden power loss. The ideal answer is to start proper shutdown upon detecting power loss, and have a backup which can last long enough for proper shutdown.

Incidentally, there are ways to force the filesystem to sync (flush all cache to disk), followed by change to read-only mode. Once in read-only mode there is no risk. You wouldn’t be able to write, but you could still otherwise operate normally. Read-only also does not hurt the life of solid state memory…no writes will basically provide the longest life you can get.

A useful URL:
https://www.loggly.com/ultimate-guide/managing-journal-size/

See:
journalctl --disk-usage
(compare your Jetson to your PC)

1 Like

Thanks for the detailed explanation. Can you place where in all that where the ext4 commit option lives? Here’s its documentation:

commit=nrsec
            Ext4 can be told to sync all its data and metadata
			every 'nrsec' seconds. The default value is 5 seconds.
			This means that if you lose your power, you will lose
			as much as the latest 5 seconds of work (your
			filesystem will not be damaged though, thanks to the
			journaling).  This default value (or any low value)
			will hurt performance, but it's good for data-safety.
			Setting it to 0 will have the same effect as leaving
			it at the default (5 seconds).
			Setting it to very large values will improve
			performance.

It sounds to me like I shouldn’t lose more than 5s of data – which other layer could be in the way?

Note that 5 seconds of buffering can equal a much larger content than what eMMC would write (synchronously) in 5 seconds.

I have not altered my journal size before. It is risky, as you might lose data. All of this would fall under the jurisdiction of ext2/ext3/ext4 tools (they share some of the tools). Some example URLs regarding this (in particular, pay attention to the program “tune2fs” for edit of an existing filesystem, but options are also available during creation):

If you need what is on that partition, then consider cloning prior to manipulating this. Incidentally, you could clone, modify the clone under loopback on the host PC, and then flash the clone if it remains the exact same size. Even if you don’t want to flash, if you don’t want to risk working directly on the Jetson before knowing what you are doing will work, then consider cloning anyway just so you can work on the clone instead of the Jetson. Loopback mounted clones (raw clones, not sparse clones) are quite powerful.

I don’t think I’m after playing with the journal. I haven’t experienced corruption too often. I’m more concerned about finding ways to increase data reliability in this unfortunate situation.

I don’t think emmc write speed is a problem. I’m writing continuously and everything is fine as long as the power is up. When a power loss happens, I lose about a minute of recent data, rather than 5s.

Additionally, I’ve already implemented a solution with sync and it works correctly. And I need that solution to be specific about which files are most important and need to survive as much as possible in a data loss.

But I’m confused why ext4 doesn’t seem to be living up to its 5s promise for other files.

There are these choices:

  • Prevent power loss, e.g., with an UPS.
  • Increase the journal size.
  • Run read-only and never write (unless it is a kiosk, this probably isn’t what you want).

Even if you increase journal size any data writing at the moment of loss will be truncated (but the filesystem would not corrupt).

There are no other possibilities.

ext4 is living up to its promise. There is no such as any operating system’s filesystem which can beat loss of power in the middle of a write. You’ll find that desktop PCs simply use a larger journal. A larger journal can prevent corruption for 5s of writes; no cached/buffered filesystem can do better.

You will find some hard drives, intended for reliability, might have a super capacitor, and might have the ability to self-power whatever is in the cache which is part of that drive. Or some RAID controllers have a super capacitor and battery. Stopping the data being written from being lost without power (even if it is from a super capacitor) is magic (not even "Space Magic"™…a term I see a lot lately with computer games :P)

I’m going to add this script for people testing what their current journal is (I used “/dev/sda5” as an example, edit for your case):

/bin/bash

# Edit for your case.
export PARTITION=/dev/sda5

# Note: By executing tune2fs with sudo you can use sudo without password for a short time.
# You probably want to see this info anyway, and it simplifies embedding the command with sed/cut.
# You could just run the whole script sudo.
sudo tune2fs -l "$PARTITION" | egrep '(Filesystem features|Journal inode|Journal backup)'
export INODE=sudo tune2fs -l "$PARTITION" | grep 'Journal inode' | sed 's/  */ /g' | cut -d' ' -f3
echo 'INODE: ' $INODE

sudo debugfs -R "stat <$INODE>" "$PARTITION" 2>&1 | grep 'User:.*Size:' | sed 's/  */ /g' | cut -d' ' -f 8

Note: Telling the system to sync every 5 seconds probably limits exposure, but it takes time for actual write to disk. Calling sync twice without any intervening writes is the only way to guarantee this (the second sync blocks until the first is complete). What you’re seeing is only an attempt to sync every 5 seconds, but it doesn’t block filling in further cache/buffer while this occurs. The write every 5 seconds is not a robust method of stopping data loss; all it can do is reduce loss. Also, this is how you wear out your solid state memory and reduce its life.

Do you consider 0-sized files to be data corruption or data loss? I’m thinking it’s data loss 'cause the filesystem is happy but the data is gone. I’m looking to reduce data loss (from ~60s to ~5s recent data), and not concerned about data corruption or “stopping” data loss.

Regarding degrading the disk: understood. I can take that into account, but I need to understand the rest of the playing field to make good decisions. For the sake of the thread, we can assume the hard disk does not need to last for very long compared to its size and the number of writes it can tolerate per sector.

Thanks for the tip re calling sync twice. I’m calling it repeatedly, so I guess that counts as twice. I’m not worried about disk life. Think of my application like a security camera that keeps the last 24 hrs on a loop. It’s important to capture the 24 hrs, especially leading up to something bad like a power loss. And the same spot on the disk is not being written often.

I’m trying to understand how ext4 is living up to its promise. Let’s assume that a system is in a steady state writing data to disk at some constant rate. It works fine for hours, and does not perceptibly delay to catch up on writes when powering down gracefully. This means there’s no bottleneck smaller than the data write rate. Then “catching up” can only exist downstream of the 5s interval if there’s another buffer that flushes less often than every 5s. Are you saying that the disk only physically writes data every minute or so? In that case I should be able to see a range of data loss between 5s and 60s – I haven’t checked if this is the case yet.

0-sized files would likely occur if a file was being modified and not yet written (in cache or buffer), and power was lost. This is not corruption because the structure of the filesystem is still correct. You are right that this is data loss and not corruption; it means the journal did its job. Something was lost, but it is safe to use the filesystem.

The filesystem is basically a tree of nodes (an undirected graph mathematically). Some nodes point to more than one other node, or more than one other node will point to the original node. An example being a directory can point to more than one file, or another directory. A regular text file has a series of characters, and the file points to the vector of characters (bytes), and also to the owning directory node. If you were modifying something inside of a directory it is possible that the directory structure itself might be temporarily disconnected (we’re dealing with block devices, you can’t really insert data in the middle and have everything move left/right to get out of the way without deleting everything to the left or right and rewriting it once something is inserted) while writing new connections rather than simply altering a connection in place; should the journal not be large enough to know to reconnect the directory, then the entire directory could be lost; worse yet, other temporary changes might cause the directory to exist but to point to the wrong thing(s)™(I’m sure that’s a technical term somewhere), or for wrong thing(s) to point to the directory. Any attempt to write to that directory after corruption could end up filling unrelated locations with nonsense. That’s corruption…it becomes dangerous to write, and you can’t trust what you read (think of a Sci Fi movie with multiple universes, whereby everything merges and it is difficult to tell what is from the current universe).

It is up to you how long you need the hardware to last. If it is an SD card, and you have some standard content you can simply plug a new SD card into, then it might be ok. However, performance will drop radically if you run in synchronous mode. In a way, as the journal approaches being as large the entire filesystem, then the degree of synchronous/asynchronous behavior approaches being purely synchronous. If it is purely synchronous, then you might as well get rid of the journal, there is no longer any need for it. You could simply use a mount option in “/etc/fstab” to tell it to mount synchronously (or pass it as an argument to the kernel either in extlinux.conf or the device tree’s “chosen->bootargs” node).

Incidentally, you could add an alternate boot entry to “/boot/extlinux/extlinux.conf”, and pick that with serial console during boot. That boot argument could be an exact copy of the original boot argument, but add an argument which forces the rootfs to run synchronously. Or, since it is an SD card, you could just clone that card to an exact duplicate on another SD card, and edit fstab there. Then compare performance. SD cards are not terribly high performance to start with, and I think most people would be driven to tears by low performance if there was any writing needed.

If you are troubled by loss of data (e.g., truncating file content), then I doubt loss of the entire SD card would be considered acceptable.

Out of curiosity, are these Jetsons on a local network with lots of bandwidth? Or are they going over the Internet and Wi-Fi? If this is going over a local network with lots of reliable bandwidth, then there are other options.

The Jetsons are remote, they talk to our servers via LTE. The 24 hr store is local, and much less is sent home. I do have some summary data that we stream home, but in this case I’m interested in the detailed local data. When something bad happens, it is also worth retrieving the device to get the last 24 hr detailed data off it. But the most interesting is the moments leading up to the power loss.

This does seem like a Linux/ext4 question, but there’s a couple reasons I came here:

  • Jetsons are more likely to be involved in power loss situations
  • the kernel on the Jetson device may actually behave differently than stock Linux
  • I don’t actually know where to ask a Linux/ext4 questions

Is there a place I should go to ask about ext4 commit and its behavior? If I do get a clear answer that yes, there shouldn’t be more than 5s recent data loss in a power loss, then what would be the next step to digging in further here?

It is ok to ask here about ext4, but the maintainers would probably be able to answer in more detail:
http://ext4.wiki.kernel.org

You are in a tough situation. You could increase the sync rate, although I’m not sure how to do that (it might just be an echo to a file in “/sys”, or a config file in “/etc”…don’t know). This talks about fast commits, but adjusting this based on this article seems to require doing so at the time of creation of the filesystem:
https://lwn.net/Articles/842385/

If you have momentary physical access, then something you could consider is mounting another storage device, e.g., an SD card or USB hard drive. This could be mounted on an alternate mount point, e.g., “/usr/local/mount”, and your data writes could be redirected to this. That device could be mounted in synchronous mode. After that, even if power was lost, you’d be guaranteed that your data is intact. This would not require much change on the actual Jetson.

Would you be able to add external storage?

Incidentally, it is also possible to add network storage, which could be mounted synchronously. iSCSI is very high performance, but it is not trivial to set up, and it also requires a new kernel configuration (the default kernel does not enable iSCSI).

We have an SD card in addition to emmc, but we found that the SD card dismounts every once in a while, I’d guess because of vibrations in the environment. So we prefer to write to emmc first and ferry to SD card later.

We only have limited network bandwidth. We write locally and selectively send home. I don’t think a LAN would make sense, not more than the SD card anyway?

Will attempt comms via IRC and post what I find out here.

Does your Jetson model have any means of adding an NVMe? Even in synchronous mode this would have useful performance, and the mount is probably reliable even in vibration.

In the case of the SD card, do you typically need to remove it to copy data? If not, then you could add something temporary to resist the vibration. I’m thinking of a couple of tiny hot glue drops which could be popped off.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.