What I mean is that would it boot up successful after just power cycle the board?
-----> it still cannot be recovered after power-on, only it can boot up successful after re-burn the image.
Could you reproduce the same issue on the devkit?
----->only tested about 5000 times and haven’t found anything yet.
Have you also verified with the latest R32.7.5 release?
Or try using R35.5.0 for Xavier NX?
----->Okay, I’ll try to verify it, but it will take some time.
Please also help to check if you can reproduce the issue on the devkit to clarify if the issue may be specific to the custom carrier board since we don’t hit this issue before.
I would like to know if you can reproduce the issue on the devkit.
---->yes, it can reproduce the issue on the devkit.
as fellow test:
Step1:
download from Jetson Linux R32.7.2 Release Page | NVIDIA Developer
tar xf Jetson_Linux_R32.7.2_aarch64.tbz2
sudo tar xf Tegra_Linux_Sample-Root-Filesystem_R32.7.2_aarch64.tbz2 -C ./Linux_for_Tegra/rootfs/
sudo tar xf secureboot_R32.7.2_aarch64.tbz2
cd Linux_for_Tegra
sudo ./apply_binaries.sh
This is not a “Jetson” issue. All Linux which uses a filesystem which is not synchronous (and all Windows and Mac which don’t use a synchronous filesystem) have damage to the filesystem by cutting power like that. They must be shut down properly. On your desktop PC, what would happen if you always turn it off by yanking the power cable in the middle of operation? Proper shutdown is not optional.
What you’ve described is either (A) damaging the filesystem beyond what journals can protect against, or (B) losing something the journal had to delete which just happens to be required to boot.
EDIT: fsck, when it works, loses content to do so. fsck is just a manual and more extreme version of a journal. The journal tends to keep basic structure working, but it doesn’t stop data loss.
But isn’t jetson designed for embedded core boards? For the use scenario of embedded products, there is a high probability that power will be directly cut off. Is there an alternative root file system, or other solution, to avoid this problem?
Jetsons might be classified as an embedded system, but that is a form factor, not a statement of filesystem behavior. 100% just cutting the power is exactly the same as doing so on your desktop PC while it is running.
What @KevinFFF suggests is how most kiosks are set up and is for this very purpose. OverlayFS does not use ext4 filesystem, or at least not the way it is normally used. What OverlayFS does is to mount the real filesystem read-only, and then, if a change is made, a RAM disk overlay will "edit’ the content, and this is what an end user sees: The actual filesystem if no change was made, but the RAM disk if the content has changed.
With OverlayFS any changes to the RAM disk will be lost on power off. With an ext4 (or any journaling filesystem), the journal content will be lost. What is noteworthy about OverlayFS is that the original filesystem is always restored, and so this is great for a kiosk, but not so good if the data shouldn’t be lost.
It is a guarantee that the hardware itself is not what causes the problem, and that the Jetson is behaving as intended. If you have no ability to work with a “fixed and read-only” filesystem which sees edits only as temporary, then you really must have a power backup system which allows flushing the cache and buffers to the disk and then powering down normally prior to cutting the backup power.
May I know the failed rate in your case?
------>The latest test had approximately 5603 occurrences
Have you also verified with the latest JP5.1.4(R35.6.0)?
------>not yet, But I also tested R35.5.0 and found that after more than 10000 cycles.
Do you mean that you hit 5603 times errors out of 10000 times power cut-off test?
Okay, please verify with the latest JP5.1.4(R35.6.0) and check if there’s any improvement.
And let us know if OverlayFS may meet your requirement in your use case.
I want to add that the power consumption on a Jetson might be low enough that a supercapacitor, if properly set up, might work to give the Jetson time to shut down. What is needed is to flush writes to disk, switch to read-only, and at that point you can simply cut power (which is faster than a full shutdown sequence, but might leave temp files in place).
There is a second part of this to understand: Mostly during boot the boot chain which precedes the kernel load is mostly read-only, and probably is not a risk for corruption. I don’t think there is much risk of power cut when in the initrd stage (for those systems which use this), but there might be some risk of needing power cut adjustments. The moment the root filesystem is mounted read-write though the system needs time to sync and switch to read-only.
Are you familiar with the Linux “Magic SysRq”? This can be disabled, but to illustrate do the following:
Monitor “dmesg --follow”.
On a local keyboard, hit the keystroke “ALT+SysRq+s” (three keys simultaneously, starting with ALT; the SysRq is the “print screen” button; the s key selects a “sync” function).
Observe in the dmesg log that an “emergency sync” has occurred. This is just one of many commands which can be given for immediate “extreme” priority.
If you want to see shutdown protection, then you would run “ALT+SysRq+s” sync a second time (both times tell the disk to flush, but the second sync won’t start until the first sync completes; this is needed due to the way cache and buffer flush runs asynchronously to force some synchronous behavior).
If you now are ok with shutdown (the system still runs normally now other than flushing the cache/buffer), use “ALT+SysRq+u”. This runs the “umount” command to unmount the filesystem from read-write to become read-only. This read-only filesystem is safe to cut power to. Anything not flushed to disk would be lost, but if the emergency sync has flushed fully, then you’ve saved everything. Keep in mind that if a program is performing heavy writes, then after you’ve sync’d and before you go to read-only there is a possibility that some program performing heavy writes might add something to cache/buffer.
It is more or less ok to simply cut power at this point.
A script can perform the above.
If you want to force an actual shutdown now, instantly, you can use “ALT+SysRq+b”.
Performed sequentially, with as little time between spacing of these commands as possible, this can very quickly allow you to yank power if you know power is going down:
The above is likely something a supercapacitor can give you time with, although it would have to be a large supercapacitor when under heavy load. There are some newer “lithium supercapacitors” out which are hybrid batteries and capacitors which might do for this. That’s a different story, but briefly, those capacitors ship charged, and if they drop below about half of their rated voltage, they can be ruined. However, they offer an enormous output for a significant time in a rather small and light weight package. A couple of regular supercapacitors might work as well (it depends a lot on the computing load the Jetson is under).
The hardest part is that you’d likely need a sensor to trigger this almost instantly. There is also a Magic SysRq key combination for killing all of the user space processes (other than init) which could be added in prior to the sync commands. This would make for a very very fast shutdown without corruption. Temp files would still exist, but I think the default Linux boot is rather resilient to stale temp files (but you’d still have to test since there could be a corner case).