EXT4-fs error from htree_dirblock_to_tree when accessing some files

The error occured when I try to access some specific files, for instance using ls in the example below.

Jul 18 10:27:00 ubuntu kernel:  [88610.360542] EXT4-fs error (device mmcblk0p1): htree_dirblock_to_tree:958: inode #29558: block 49: comm ls: Directory block failed checksum

I have tried to diagnostic the device without success:

sudo smartctl -H -i /dev/mmcblk0
smartctl 6.5 2016-01-24 r4214 [aarch64-linux-4.4.38-tegra] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

/dev/mmcblk0: Unable to detect device type
Please specify device type with the -d option.

Use smartctl -h to get a usage summary

And the help goes as follow:

-d TYPE, --device=TYPE
         Specify device type to one of: ata, scsi, sat[,auto][,N][+TYPE], usbcypress[,X], usbjmicron[,p][,x][,N], usbsunplus, marvell, areca,N/E, 3ware,N, hpt,L/M/N, megaraid,N, aacraid,H,L,ID, cciss,N, auto, test

From what I have read, the output of this command would help in knowing how to fix it… but I would also appreciate help on the following steps.

Any advice welcome :)

Emmanuel

  • cortexia.ch

I could be wrong, but I believe the eMMC is not a regular drive and thus is not S.M.A.R.T. capable.

Generally speaking there might be filesystem corruption from improper shutdown. Most of the time the journal can replay and get rid of issues, but corruption might be more than replay can handle.

Hmmm, ok.

I have seen that fsck was the way to go when you had some corruptions, but it implies booting from a CD or a USB. Do you know any other way to fix the issue ? That I could do remotely though SSH ?

Possibly if the filesystem were put in read-only mount you could then safely fsck, but that would have limitations. You could try to ssh in, and then use a magic sysrq combination to go read-only:

sudo echo u > /proc/sysrq-trigger u

From there you could run fsck.ext4, but you might need to use “force” flags. This would only be safe if read-only was successfully reached. Even if “safe” keep in mind that something causing corruption beyond what the journal replay can handle will still leave a damaged filesystem content. Some parts may be missing after you are done. If the sysrq trigger is refused, then it might just be a setting, so post here whatever results you have if it fails.

Normally a bootable SD card would be recommended.

A far more difficult way to handle this, but which is rather safe, is to clone the filesystem, loopback mount it on another host, repair there, and then re-flash the repaired clone. Lots of time and disk space are required.

Thanks for those hints !

The second idea, cloning, sounds great!

Indeed, I already have a clone image that I used to setup the jetson in the first place. Therefore, I could just save whatever content can be saved from the corrupted system, and flash it again with the original image. Would that clean up things ?

of course, I will have to get the unit back to the office, but if that works, it eliminates the risks of more corruption, and restore a proper system. Have I understood correctly ?

A “normal” flash builds a new filesystem down to the last bit, so this would erase the original content and replace it with a new filesystem which isn’t corrupt. Saving individual files and then flashing is a good plan.

FYI, in the flash directory is a “rootfs/” subdirectory. Other than the “rootfs/boot/” content (or boot configuration content) the image will be an exact match to that directory. Whatever you copy into that will be reproduced. As an example, you could copy the “~/.ssh/” content from the existing system into the “rootfs/” content and the ssh keys would be a match (there is also a “/etc/ssh/” directory with host keys) right from the start. Any home directory content could be pre-loaded, so on.

A clone just happens to be what the rootfs build procedure creates.

Thanks @linuxdev, I feel you saved myself from a lot of trouble there :)