Sudden Power Loss and filesystem integrity

Hello,

We are designing an NX based system and are trying to find the quickest shutdown method which retains filesystem integrity and avoids its corruption.

The OEM Design Guide specifies the conditions for the safe handling of a Sudden Power Loss.

Could you confirm that the file system integrity is assured if the timing is respected (voltage > 3.0V for >10ms after deasserting Power_En)
Or is this sequence only safe for the HW module but might cause file system corruption?

Thanks!

1 Like

hello raffael.hochreutener,

this SHUTDOWN_REQ signal can be driven active (low) if the system must be shut down,
for example, system shutdown due to a critical thermal issue.
since it’s a signal coming from system level, it should also ensure the file system integrity.
BTW, you may confirm the power control logic on the carrier board should drive POWER_EN inactive (low) if SHUTDOWN_REQ is asserted.
thanks

1 Like

Hello Jerry,

To confirm:
The carrier board needs to supervise the supply (or implement a power button) to pull the POWER_EN low such that VDD_IN >3.0V for >10ms is respected. The SHUTDOWN_REQ* would not be asserted or can be ignored in this case.

The NX does not supervise its power input and assert the SHUTDOWN_REQ*. This signal is only used for the Hardware Thermal Shutdown.
(Although the Jetson Xavier NX module has one INA3221 power monitor which could potentially provide the information )

Is this correct?

Thanks for the clarification!
Raffael

hello Raffael,

may I know what’s your use-case, or, could you please share more details about the “quickest shutdown method”
thanks

Hello Jerry,

We are designing a system which uses a user-removable battery.
We therefore need to find a good way to shut down the system safely but also very quickly, so that the user does not need to wait too long before removing the battery.

Due to size constraints, we can not afford adding a big backup battery or super-cap.

  1. Potentially we could provide enough power to bridge the 10ms specified in the design guideline. (Sudden Power Loss)
  2. Alternatively, we could use a button on the device which the user has to press to shut down the system and NX before the battery can be removed. In this case it would be ok if the system takes a few seconds to shut down.

But in any case, we don’t want to risk system damage or file-system corruption.

I hope this clarifies the need a bit.

hello Raffael,

file system integrity is assured when this sudden power loss is initiated by software,
so, you should have implementation to initiate shutdown by pressing shutdown_req before battery removal,
thanks

1 Like

This is a bit “out of the box”, and normally I would not recommend this. The following URL explains MagicSysRq but is probably too general to learn much by, so I’ll explain a bit more.

This special key binding system is usually for developers who might cause some part of the system to fail, and need some sort of controlled way to safely shut down or perform various debug functions when only part of the system works. This cannot save content of open applications, and perhaps may even leave data truncated in files being written to, but it can freeze the filesystem and change it to read-only. Being read-only implies there won’t be corruption needing fsck, thus it could save the system from needing to be brought in and worked on locally even if an individual file is truncated.

Before I mention any more specifics, you’ll see the flaw that the keyboard needs to be connected. However, it doesn’t. The heart of this is the pseudo file “/proc/sysrq-trigger”. Many developers perform kernel debugging using “kgdb” and “kgdboc” over a serial console. Echo of various values to the sysrq-trigger file performs the same function as if the keyboard magic key combinations were run, so you can script this and perhaps assign a button or some other trigger to echo the right thing to that file.

Before starting, note that there are many key combinations available. Setup is able to disable or enable this via a mask. If you look at file “/etc/sysctl.conf”, then you can consider mostly that this config is to automatically echo values to certain “/proc” entries at boot. The directory “/proc/sys” contains many sysctl files, and the entry in “/etc/sysctl.conf” will name paths within the “/proc/sys” tree, although it will abbreviate and not mention that prefix. This entry, in “/etc/sysctl.conf”, would echo “1” to “/proc/sys/kernel/sysrq”:
kernel.sysrq=1

If you want to see the current mask:
cat /proc/sys/kernel/sysrq
(Jetsons would normally have this enabled, many desktop distributions would only enable parts)

Not every system supports every binding, and often a vendor will disable this entirely to prevent an end user from accessing such function. It would in fact be dangerous to call “sync” too many times on eMMC since it causes wear. However, a system in emergency, or a system which you already know is about to shut down anyway, won’t be harmed by sync. A while back @snarky mentioned something important, that calling sync will start a sync, but it won’t guarantee finishing it, but if you call sync twice in a row, then the second sync won’t begin until the first sync has completed.

If sysrq is enabled, then these keystrokes would safely shut down a system (but would also perhaps truncate open files, leave temporary file locks open and to fail proper cleanup, so on…but the ext4 would remain uncorrupt):

# Two sync in a row:
ALT+sysrq+s
ALT+sysrq+s
# Unmount file systems and turn them read-only:
ALT+sysrq+u
# Shutdown:
ALT+sysrq+b

You could try this via sudo echos using serial console as well, or even over ethernet.

There are also other variations, such as sending SIGKILL to all processes except init, and this could be added just after the two sync calls, followed by two more sync calls. You might experiment with this. For the purposes of testing you can probably use sync quite a bit without worrying about wear leveling on the eMMC.

Some more information on using Magic SysRq:
https://www.kernel.org/doc/html/v4.11/admin-guide/sysrq.html

Please keep in mind that freezing a running system and then rebooting without applications cleaning up may have side effects which cannot be predicted. However, this beats side effects in combination with corrupted filesystems, and so is still superior to just yanking out the power cable.

1 Like