One must distinguish between synchronous disk writes and buffered/cached writes (which are asynchronous). The journal can be tuned to have a larger size, but this has its own tradeoffs. Once you exceed the journal’s records (which are used to reverse information with an incomplete write), there is no such thing as a system which won’t corrupt.
Some microcontrollers, which are astonishingly slow compared to what you are thinking of, run entirely synchronous. If the controller does not have a need to write, e.g., its programming is in a read-only ROM, this works out well and the system is immune to corruption.
Several things happen if you are unable to live with a small journal, and need an actual guarantee:
- Solid state memory “wear leveling” will destroy that memory (eMMC or SD card examples) in a very short time.
- Performance will drop by orders of magnitude. This is not a small or insignificant drop. We’re talking about performance dropping back to something from the 1970s for any kind of storage which won’t have wear leveling issues (e.g., old style disk with a spinning platter does not have issues of wear leveling; the performance of these disks, without cache/buffer, is far slower than you would expect…cache/buffer is an enormous speed boost).
- A kiosk style application reads from solid state memory, but only writes to a RAM buffer; that RAM buffer overlays the read-only solid state memory to give the illusion of read-write, and is limited by how much RAM you have. When power is lost, there will never be corruption, but you will lose 100% of anything written during that boot.
Note that a custom ext4
tuning can increase the journal size. This implies less for storage. The journal itself does contribute to wear of solid state systems. Anything which writes contributes to this, but writing to cache/buffer millions of times before shutdown implies there was only one write. The larger that synchronous journal gets, especially in relation to the total size of the eMMC or SD card, the faster that memory will fail. The existing defaults don’t have much of an issue with causing failure, but a larger journal might.
Some of the Jetson hardware is designed with an A/B redundancy whereby there is a backup partition. If one partition fails, then it will go to the other partition. Whether or not this allows repair of the original failed partition is a question you have to ask at each failure. Certainly this will involve a human intervening for repairs.
In the case of a system where you don’t think it writes, there are in fact some small writes often needed. Consider that lock files, which don’t contain anything, do in fact write to the content of a directory. Named pipes tend to have a filesystem entry even though any “content” is going through a driver and not the disk. If you are certain that nothing other than the o/s lock files and temp files are being written, then you have a very good chance that even a small journal will prevent corruption. Still, this is not a guarantee unless the filesystem is truly mounted read-only.
If your data storage and all significant writes go to an external SSD, and if that SSD is not the operating system partition, then this will have a lot of advantages. Speed is one of them; less wear of the eMMC is another. However, the SSD itself must have a journal and buffer/cache if it is to operate “normally” like any other partition which can be written to. You will lose data on the SSD from power loss. If that data exceeds the journal, then the SSD will corrupt. If the mount options to a corrupt SSD are not set up correctly to tolerate error, then boot will fail. Still, it is easy to set up such that the SSD won’t fail boot if it corrupts. Then you could fix the SSD corruption at the risk of significant loss of anything on that partition.
An option to improve this situation requires knowing something about mount options. If you have something like an eMMC rootfs partition for everything, and there is content in a directory, a mount of an SSD with a blank partition onto an eMMC mount point which has content will cause that content to be hidden. The “hiding” goes away upon unmount. This means that if you were to do something like copy “/home
” to one SSD partition, and then mount that SSD version onto the “/home
” as a mount point, then the content on the eMMC is protected until the SSD fails to mount. Any updates to the SSD would not go to eMMC though. If for some reason you have a basic “/home
” on eMMC, and the SSD partition mounted on “/home
” fails to mount, then this will revert to the eMMC version of the content. At this point the eMMC begins writing instead of the SSD.
Common disk arrangements on any *NIX, for reliability purposes, might include these items:
- A separate partition for:
- Use of
rsync
on occasion to update the eMMC /home
from the SSD /home
(optional).
What the above would do is to make even temporary files and logs write to the SSD. You have to be careful though with “/var
” since this has the dpkg
/apt
package database. Installing new packages won’t happen often, but you’d likely want to update both copies every time you change packages, even if it is just an update of existing packages and nothing new.
The /var
can be set up more finely grained, and so you could for example not use the SSD partition for all of /var
, but instead do something like mount the SSD partition on /var/log
.
Of course if you have three separate SSD partitions for /var/log
, /tmp
, and /home
, then you cannot share space between those three partitions. You have to have a large enough partition for each individual mount point, and if you choose too little or too much, then you’re going to have a lot of work ahead of you to tweak that. LVM (logical volume manager) can help deal with this, but then your boot options will get a lot more complicated.
You won’t find any magic bullets which makes the system perfectly safe without either an extreme performance hit or reduction in solid state memory life.
Incidentally, you would not use the “nofail
” option for the rootfs “/
” or the “/boot
”. However, if you have correctly set up a backup “/home
” and then copied it to SSD, then the mirror partition could be mounted with the “nofail
” option. The average user might not realize that “/home
” had failed and reverted to eMMC. However, here is an example entry for “/etc/fstab
” to mount an SSD partition with the ability to continue boot even if it fails (I’m pretending the SSD partition is “/dev/nvme0n1p1
”, but you have to adjust for whatever the actual SSD partition is):
/dev/nvme0n1p1 /home ext4 defaults,nofail 0 2
In the above I assume the “2
” because the rootfs is “/
” and is “1
”. The “1
” means the first partition to error check, the “`2``” means the second partition to error check. Recovering a journal on rootfs first, followed by the device which mounts on the rootfs, is the logical order.
The “defaults
” is due to trying to create a “normal” mount. This is actually an alias for several options. The option list is comma-delimited, and so by following this (without spaces) using “,nofail
” appends that option. Normally, if “/home
” failed to mount, then boot would halt and offer some sort of rescue environment or suggest fixing the device before boot can continue. The “nofail
” won’t do this; logs will show the failure, and boot will continue without the mount of “/home
” on the SSD. You of course can’t do this with “/
” or “/boot
”, but you can do this with a lot of partitions (above suggestions for “/var/log
” and “/tmp
” are candidates).
Some people choose to use a network device for “/var
” or “/var/log
”, with nofail
, and so logs can have a network device attached for debugging, but detached for normal boot.
One final note: A partition UUID can be used in place of a device, e.g., the “/dev/nvme0n1p1
” could instead name one exact and specific partition. However, if you were to replace the disk with a new one, this would only mount if the UUID were cloned. Sometimes this is what you want because it gives you a chance to rsync
your “/home
” to a new SSD, and then set the UUID in /etc/fstab
for the new device.