How to properly shutdown TX2 kit after an SD card crash?

I think it was caused by a crash of my SD card, suddenly I lost the access to the device while vi editing files. My first reaction was to shutdown the system, then noticed the shutdown process hung at

memcblk1:error-110 
sending stop command
original card response 0x900, card status 0xe00
...
Failed to send WATCHDOG=1 notification message: connection refused
Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected

It loops the last line and hangs

Look like my SD card is gone. My entire development codes are stored and operated on the Sony 32G card. . . so bad.

In a crash situation where I’ve lost command line I usually try to use the magic sysrq key combinations (“SYSRQ” is the same key as the “print screen” button…but with “ALT” instead of “SHIFT”):

# Directly attached keyboard...
# Call sync of the file system:
ALT-SYSRQ-s
# Remount all partitions read-only:
ALT-SYSRQ-u
# Force reboot:
ALT-SYSRQ-b

You can still lose things this way, and sysrq may not always work, but it tends to be very resilient even in a crash situation. If you want to see how it works without actually doing much just run “dmesg --follow”, and then hit the “sync” key combination “ALT-SYSRQ-s”…you’ll see a note about emergency sync. Note that on a solid state drive you don’t want to call sync a lot, but calling it prior to shutdown or doing something risky is normal.

Thanks linuxdev. I’ll put a note on using the sequence of commands in a crash.

Now, I’d like to recover the data from the SD card if possible.
The partition mmcblk1p1 can be seen, but could not be mounted. ‘lsblk -l’ would promot at the end

mmcblk1      179:32   0  29.5G  0 disk 
mmcblk1p1    179:33   0  29.5G  0 part

But I can’t mount the card. If I ran

$ sudo mkdir /dev/sdcard
$ sudo mount /dev/mmcblk1p1 /mnt/sdcard

The system would just hang at the second command.

By the way, ‘dmesg’ also shows an error at the very end:

[   47.970229] mmcblk1: error -110 sending stop command, original cmd response 0x900, card status 0xe00

Any hope?

what if you mount sdcard at a different device ,at HostOS?

I would do as @Andrey1984 says, try first on a different connector (e.g., a host PC).

After that, assuming it is file system type ext4 and it is mmcblk1p1 on whichever system you use (perhaps monitor “dmesg --follow” for this and the next commands shown):

sudo fsck.ext4 /dev/mmcblk1p1

If it is another file system type you can adapt, e.g., there is:

sudo fsck.vfat /dev/mmcblk1p1

If those cannot succed, then see what gdisk thinks of the device as a whole (not the partition):

sudo gdisk -l /dev/mmcblk1

@Andrey and linuxdev, thanks for the tips.
The SD card is in ext4 format. I have a host PC pretty much doing nothing but to flash the DTB for TX2. It doesn’t come with SD reader, so I plug the SD into a USB3 card reader.
Here are the output for some commands on the host PC.
‘sudo lsblk -l’

sdb    8:16   1  29.5G  0 disk 
sdb1   8:17   1  29.5G  0 part 
sr0   11:0    1  1024M  0 rom  
sda    8:0    0 232.9G  0 disk 
sda2   8:2    0     1K  0 part 
sda5   8:5    0   7.9G  0 part [SWAP]
sda1   8:1    0   225G  0 part /

‘sudo e2fsck /dev/sdb1’ (took a few minutes to respond)

e2fsck 1.42.13 (17-May-2015)
/dev/sdb1: recovering journal
Superblock needs_recovery flag is clear, but journal has data.
Run journal anyway<y>? yes
e2fsck: unable to set superblock flags on /dev/sdb1
/dev/sdb1: ********** WARNING: Filesystem still has errors **********

‘sudo fsck.ext4 /dev/sdb1’ (similar result as to ‘e2fsck’)

e2fsck 1.42.13 (17-May-2015)
/dev/sdb1: recovering journal
Superblock needs_recovery flag is clear, but journal has data.
Run journal anyway<y>? yes
fsck.ext4: unable to set superblock flags on /dev/sdb1
/dev/sdb1: ********** WARNING: Filesystem still has errors **********

The ‘gdisk’ command, however gives some interesting report. I don’t know how to decipher it though.
‘sudo gdisk -l /dev/sdb’

GPT fdisk (gdisk) version 1.0.1
Partition table scan:
  MBR: MBR only
  BSD: not present
  APM: not present
  GPT: not present
***************************************************************
Found invalid GPT and valid MBR; converting MBR to GPT format
in memory. 
***************************************************************
Warning! Secondary partition table overlaps the last partition by
33 blocks!
You will need to delete this partition or resize it in another utility.
Disk /dev/sdb: 61896704 sectors, 29.5 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): C9E1BD58-1391-4ED3-AFC6-787226044014
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 61896670
Partitions will be aligned on 2048-sector boundaries
Total free space is 8158 sectors (4.0 MiB)
Number  Start (sector)    End (sector)  Size       Code  Name
   1            8192        61896703   29.5 GiB    0700  Microsoft basic data

Looks like I have to give up, doesn’t it?

P.S.
Here are the ‘dmesg --follow’ output (pretty much repeating the same message)

[ 1495.382862] sd 6:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_SENSE
[ 1495.382866] sd 6:0:0:0: [sdb] tag#0 Sense Key : Hardware Error [current] 
[ 1495.382868] sd 6:0:0:0: [sdb] tag#0 Add. Sense: No additional sense information
[ 1495.382871] sd 6:0:0:0: [sdb] tag#0 CDB: Write(10) 2a 00 01 c4 20 00 00 00 08 00
[ 1495.382873] blk_update_request: I/O error, dev sdb, sector 29630464
[ 1495.382879] Buffer I/O error on dev sdb1, logical block 3702784, lost async page write
[ 1495.424934] VFS: Dirty inode writeback failed for block device sdb1 (err=-5).
[ 1546.822406]  sdb: sdb1

I am not confident seeing the error messages mentioning hw error, but just before giving up, you may try testdisk (can be installed by apt). The interface is quite old, but it is able to find many things back (partitions, filesystems, some file formats).

Interesting app, have never used testdisk before but just installed it. Other than testdisk I can’t think of any way to non-destructively fix the content…and even if you destroy content with something like dd the hardware failure looks like something you won’t be able to get around.

One side question though…was this SD card ever partitioned with fdisk? gdisk is for GPT partitioning and is what U-Boot works with (and is preferred), whereas fdisk has old style BIOS partitioning. Mixing the two might cause some issues, but it really looks like the SD card failed.

If you are unable to recover with testdisk, then you might consider wiping it out and re-partitioning in a last effort to at least have the SD card work. Assuming the card is on the host and is “/dev/sdb” (be very very careful to not get the wrong device), then you could zero this out (assuming hardware is not failed) via:

sudo dd if=/dev/zero of=/dev/sdb bs=512

Once this is done you could attempt to partition it again with gdisk or gparted. I am guessing it won’t get that far though.

@honey_patoucehl and linuxdev, I couldn’t get ‘testdisk’ to install … Got an error message “E: Unable to locate package testdisk”. Tried stepping back to run 'sudo apt update", then failed on getting xenial-security to update.

I guess I’d be better just let go the SD (I’ll try to reuse it). Thanks for you guys’ efforts in helping out. I need to be more careful about the SD card usage down the road.

@linuxdev, I didn’t do partition on the card. You perhaps still remember that you help me out in a SD card related question a while ago. It’s the same SD card. Back then I didn’t know I should format the card in ext4. Plugging in the vfat format SD caused an error at power up (/usr/lib/colord/colord-sane crash). After you pointed out the problem in format, I ran a “sudo mkfs.ext4 /dev/mmcblk1p1” command. The entire card was formated and mounted then.
https://devtalk.nvidia.com/default/topic/1024595/error-prompt-tx2-system-program-problem-detected

For the last few months, I was using the SD card as the storage of video and audio programs/data. The usage was a sort of heavy when cranking the media programs. Perhaps SD card should not be the best choice for daily operation? Or mine was just an isolated unlucky instance.

Probably you need to enable ‘universe’ repository in apt for testdisk.
You can do that by editing file /etc/apt/sources.list (use sudo), or with the ‘Settings’ of Software Updater.

SD cards do not have the endurance of regular hard drives, and eMMC and many NVMe type drives have more redundancy built in against wear (versus SD cards), so heavy use of an SD card is suspect in it failing earlier than other devices might fail. Not all brands of SD cards (and not all models of the same brand) have the same life. I might expect the SD card to last longer, but I also wouldn’t say there is anything unusual about an SD card failing under heavy use. I think there are actually SD cards available for more extreme use, though I couldn’t tell you where to find them…the “ruggedized” versions always cost a fortune compared to others. Probably you want to back up anything important on an SD card.

@linuxdev, agree. I actually thought of the limitation of SD cards, and planned to make a backup at certain stage. The catastrophic arrived much sooner than I was expecting. :
It’s a good lessen for me.

@Honey-Patouceul, do you know which lines I need to uncomment in /etc/apt/sources.list? I remember I did something a few months ago to the file, but forgot about the details. If I ran ‘sudo add-apt-repository universe’, I would get a prompt:
‘universe’ distribution component is already enabled for all sources.

Below are the ‘xenial-security’ related contents in /etc/apt/sources.list:

## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team. Also, please note that software in universe WILL NOT receive any
## review or updates from the Ubuntu security team.
deb http://us.archive.ubuntu.com/ubuntu/ xenial universe
# deb-src http://us.archive.ubuntu.com/ubuntu/ xenial universe
deb http://us.archive.ubuntu.com/ubuntu/ xenial-updates universe
# deb-src http://us.archive.ubuntu.com/ubuntu/ xenial-updates universe

## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team, and may not be under a free licence. Please satisfy yourself as to
## your rights to use the software. Also, please note that software in
## multiverse WILL NOT receive any review or updates from the Ubuntu
## security team.
deb http://us.archive.ubuntu.com/ubuntu/ xenial multiverse
# deb-src http://us.archive.ubuntu.com/ubuntu/ xenial multiverse
deb http://us.archive.ubuntu.com/ubuntu/ xenial-updates multiverse
# deb-src http://us.archive.ubuntu.com/ubuntu/ xenial-updates multiverse

...

deb http://us.archive.ubuntu.com/ubuntu/ xenial-backports main restricted universe multiverse
# deb-src http://us.archive.ubuntu.com/ubuntu/ xenial-backports main restricted universe multiverse

## Uncomment the following two lines to add software from Canonical's
## 'partner' repository.
## This software is not part of Ubuntu, but is offered by Canonical and the
## respective vendors as a service to Ubuntu users.
deb http://archive.canonical.com/ubuntu xenial partner
# deb-src http://archive.canonical.com/ubuntu xenial partner

# deb-src http://security.ubuntu.com/ubuntu xenial-security main restricted
# deb-src http://security.ubuntu.com/ubuntu xenial-security universe
# deb-src http://security.ubuntu.com/ubuntu xenial-security multiverse

Deeply appreciated.

On R28.1R28.2 pre-release:

# apt-cache policy testdisk
testdisk:
  Installed: (none)
  Candidate: 7.0-1
  Version table:
     7.0-1 500
        500 http://ports.ubuntu.com/ubuntu-ports xenial/universe arm64 Packages

I don’t have the same repositories.
Where you have http://us.archive.ubuntu.com/ubuntu , I have instead http://ports.ubuntu.com/ubuntu-ports.