Jetson Nano Board EXT4-fs error

Dear all.

While using the Jetson Nano board well, it was discovered that files in a specific folder were suddenly deleted.

The Nano board uses an SD card as a storage space, and there is no eMMC.

In the system log, the following log is checked.

Apr  8 14:42:13 aaeon-desktop kernel: [  316.802490] EXT4-fs (mmcblk0p1): error count since last fsck: 1995
Apr  8 14:42:13 aaeon-desktop kernel: [  316.809091] EXT4-fs (mmcblk0p1): initial error at time 1617665918: ext4_iget:4571: inode 921098
Apr  8 14:42:13 aaeon-desktop kernel: [  316.818033] EXT4-fs (mmcblk0p1): last error at time 1617860261: ext4_mb_generate_buddy:759
Apr  8 14:42:21 aaeon-desktop kernel: [  324.270077] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:991: inode #923439: block 3678456: comm sudo: bad entry in directory: rec_len % 4 != 0 - offset=0, inode=4294965247, rec_len=57343, name_len=111, size=4096
Apr  8 14:42:36 aaeon-desktop kernel: [  339.272702] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:991: inode #923439: block 3678456: comm sudo: bad entry in directory: rec_len % 4 != 0 - offset=0, inode=4294965247, rec_len=57343, name_len=111, size=4096
Apr  8 14:42:36 aaeon-desktop systemd[1]: /etc/systemd/system/cc_beyless.service:1: Missing '='.
Apr  8 14:43:18 aaeon-desktop kernel: [  381.771403] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:991: inode #923439: block 3678456: comm sudo: bad entry in directory: rec_len % 4 != 0 - offset=0, inode=4294965247, rec_len=57343, name_len=111, size=4096

What is the meaning of the above system log and why does it occur?

For reference, the SD card firmware in use is a flash of the image file created by the dd command on another board through the balenaEtcher program.

If I type sudo fdisk -l command, GPT PMBR size mismatch error appears as shown in the picture below. Could it possibly be an effect of this?


The above implies the system was not properly shut down. Like any modern computer, if you encounter any condition which does not allow time to flush the cache, then a problem occurs.

For a journal type filesystem, the journal can prevent corruption to the extent that the amount of unwritten cached data which the journal tracks. It simply reverses the journal, and “unwrites” what the changes were from what it thinks is wrong. Parts (or all) of the files which were being written towards the end will go away. This prevents corruption of the filesystem node structure.

If the amount of unwritten data is larger than the journal, then corruption occurs. Then the ext4 filesystem has to correct by removing things which go beyond the original cached data and can destroy parts of the filesystem which were not even “cached” at the moment of failure. Looks like your system had this much unwritten data…it exceeded even the journal’s ability to compensate, and it had to (somewhat) randomly remove data which was more or less already correctly written.

fdisk -l” will never harm the disk. Btw, the better command to work with is “gdisk -l” (might require “sudo” in some cases). “fdisk” is from older style BIOS partitions, and “gdisk” evolved for GPT partition schemes. For listing I doubt it matters, but in some cases, when writing to a solid state device or a GPT older tech drive, it could actually matter.

The “dd” command in itself could be used for cloning, but beware it might actually be a problem. The guarantee there is no problem the partition or content being read must not change at all during the read. “dd” has no ability to take a snapshot of a moving target. If for example you are backing up a root filesystem while the filesystem is active, then you could actually create the kind of corruption that incorrect power down creates (but it would be in the saved “dd” content only, the original filesystem would not be harmed). An example of guaranteed correct behavior is to “dd” copy an SD card partition when it is simply on a card reader and not mounted or not being written, whereas doing so with a running system’s rootfs is always a bad idea.

Was the “dd” copy from a running system, and thus a moving target, or was it from an SD card which was not being actively modified? Is the system in question always properly shut down?


Thanks for the detailed reply.

First, the dd command was performed by removing the SD card from the original nano board and connecting it to a separate PC through a USB reader.

The dd command is executed as follows.

$ sudo dd if=/dev/mmcblk0 of=clone_of_SD.img bs=512

Maybe there is a problem with specifying options for the dd command?

Second, the nano board with the above error was mounted on the drone.
When the power is turned off, the power connection line is disconnected, not the shutdown command to the nano board operating system. (This is an unchangeable part of drone design.)
Is there any other way to avoid the above error?
I thought that dividing the SD card partition into the OS space and the user space and performing frequent writes in the user space could be a way to do this. Will this work?

Third, the picture below is an issue that occurred with another nano board.
It was in normal use and then suddenly it did not boot from one day. (This nanoboard was also installed on the drone)
The log cannot be captured because it cannot boot.
If the SD card file system is damaged, can the nanoboard fail to boot as above?
(It’s stuck on the screen above and doesn’t move on to the next step.)

No problem with the above, this is the correct way to do it. However, if the SD card were corrupt upon removing it from the Jetson, then dd would copy the corrupt state. If the SD were in good shape, then the dd itself would also be in good shape. Was the SD card from a Jetson which was correctly shut down? According to this, the shutdown method is why it failed:

If this is hard wired, and there is no method to call for shutdown, then you are guaranteed you will have errors. The extent of the error could change from trivial to catastrophic, it just depends on how much data was written and not yet committed. There really is no choice but to change the design.

There are basically two caches: One used in the operating system for performance improvement, and one built into the SD card. Without this the SD card would be worn out extremely fast, and performance would drop through the floor by orders of magnitude. The content which is for booting is read but not written, so that content won’t care, but the operating system partition has no chance to function correctly, data or not.

The SD card models do have QSPI memory on the module itself, and that content is what you have to flash to change the boot software. Only the operating system lives on the SD card (though in older releases some content other than o/s lived on partitions of the SD card). I suppose the stuck image could be from QSPI corruption, and flashing the Jetson itself (not the SD card) would fix this, but then you’d also probably have to build a new SD card with the correct software release to match the QSPI release you just flashed.

You’re going to be very disappointed and stuck with a lot of failures in the scheme of yanking the power for shutdown. This is a full computer, not an embedded controller, and it is no different than yanking the power cord to shut down your desktop PC. Would you ever cut power on your desktop PC for standard shutdown?


Thanks for your reply.

While checking the above problem, I found out about the Linux journaling system.

The Jetson Nano board I’m using uses the EXT4 filesystem and includes a journaling system function.

It looks like it will be restored by the journaling system. Is there any reason it failed?

Are there any situations that cannot be recovered by the journaling system?

That is, are there any minimum requirements to be recovered by the journaling system? I wonder.

Keep in mind that the journal removes content which did not complete flush of the buffer. A journal does not fix corrupt data, it only removes it. The journal is also a small size, and if the amount of data which was cached exceeds the size of that journal, then corruption occurs (and such a filesystem can no longer be used). This indicates that data was probably actively being written when incorrectly shut down, and this is correct behavior since any data unwritten and not in the journal can be anywhere and corruption and loss is not predictable. You really must shut down any computer like this correctly and not via loss of power. I have to emphasize that this is no different than taking your desktop PC, while it is under heavy load, and yanking the power cord from the wall. It is a “normal” guaranteed failure.

Imagine the journal is 1 MB in size. A record (a list of filesystem inodes) of data written but not yet flushed to disk takes a certain amount of space. This is not a lot of space, and in fact most situations under heavy write load result in a corrupt and non-usable filesystem. In cases where the system is not corrupt, then all of that recent data is lost.

The only result of incorrect shutdown is to lose the filesystem, or a lesser amount which the journal is large enough to remember (and thus a partial loss of the filesystem, but not corruption). This is not a micro controller, this is a full computer.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.