Jetson-TK1 21.3, 21.4, SATA does not work

Hello,
The following is directed at version 21.4 but applies to 21.3 as well. I have SATA devices that all work fine with installation 19.3. I can see the devices and I can mount file systems on these devices. When flashing the jetson to 21.4, the SATA devices do not show up and the serial output is as follows when hot plugging a SATA SSD:

Last login: Sat Jan  1 04:33:38 UTC 2000 on ttyS0
Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.10.40-gdacac96 armv7l)

 * Documentation:  https://help.ubuntu.com/

0 packages can be updated.
0 updates are security updates.

ubuntu@tegra-ubuntu:~$ [  386.886486] ata1: exception Emask 0x10 SAct 0x0 SErr 0x5800000 action 0xe frozen
[  386.897566] ata1: irq_stat 0x00000040, connection status changed
[  386.904873] ata1: SError: { LinkSeq TrStaTrns DevExch }

Following is the kernel output when booting with the SATA SSD already plugged in and powered:

[    8.812839] as3722-rtc as3722-rtc.1: setting system clock to 2000-01-01 04:43:17 UTC (946701797)
[    8.825814] ALSA device list:
[    8.832068]   #0: HDA NVIDIA Tegra at 0x70038000 irq 113
[    8.840755]   #1: tegra-rt5639
[    9.687958] ata1: link is slow to respond, please be patient (ready=0)
[   14.236957] ata1: softreset failed (device not ready)
[   19.753950] ata1: link is slow to respond, please be patient (ready=0)
[   24.251973] ata1: softreset failed (device not ready)
[   24.719968] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   29.729976] ata1.00: qc timeout (cmd 0xec)
[   29.738056] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x5)
[   30.207966] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   40.217978] ata1.00: qc timeout (cmd 0xec)
[   40.225996] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x5)
[   40.236051] ata1: limiting SATA link speed to 1.5 Gbps
[   40.704984] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[   70.714968] ata1.00: qc timeout (cmd 0xec)
[   70.723088] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x5)
[   71.192985] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[   71.204442] EXT4-fs (mmcblk0p1): couldn't mount as ext3 due to feature incompatibilities
[   71.217353] ata1: exception Emask 0x10 SAct 0x0 SErr 0x4180000 action 0xe frozen
[   71.229129] EXT4-fs (mmcblk0p1): couldn't mount as ext2 due to feature incompatibilities
[   71.229136] ata1: irq_stat 0x00000040, connection status changed
[   71.229144] ata1: SError: { 10B8B Dispar DevExch }
[   71.239981] ata1: hard resetting link
[   71.278386] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)
[   71.291244] VFS: Mounted root (ext4 filesystem) on device 179:1.
[   71.305476] devtmpfs: mounted

I have tried the following SATA SSD devices:

  • Innodisk 3MG2-P, 1TB
  • Samsung 840EVO, 1TB
  • Crucial MX100, 512GB

In addition I have tried rotating SATA media:

  • Seagate Barracuda ES, 750 GB

All of these worked fine on the 19.3 release so cables are all good. Clearly something broke when going to 21.X. I’ve seen some posts that relate to this but I haven’t seen any answers. Does anyone have any insight into this. My development is seriously being hampered by this as I wish to boot off of eMMC and mount a RFS from SATA.

Thanks for reading this post.

Cheers,

Bob

I know on R21.4 that ordinary SATA2 drives work (I’m using one right now). In the past there were a lot of issues for specific SSD drives, which required fixes. I don’t remember which models of SSD were “fixed”, but I don’t think the 1 TB size existed yet…possibly old fixes were lost, or perhaps the larger drives simply haven’t been available to test until now.

Thanks for the reply! By ordinary, do you mean non-SSD? Can you tell me the model that you’re using that works?

Also, what forum have you seen the “fixes” posted? I haven’t seen any in this one. Were the fixes driver issues?

I don’t know the specific HD model number, but I’m using a non-SSD SATA2 Seagate 300 GB 7200 rpm hard drive.

The “fixes” I recall were reports from the early R19.x days (rofl, not that old, but about a year ago) of several specific SSD drives, where a few worked, and many did not. There were no end-user fixes, just reports which resulted in fixes on the next release cycle for L4T.

Apart from the below mentioned SSDs of Sandisk, Micron Crucial, we also verified Kingston and transcend SSD drives,
All of them worked fine on R21.4, as well as R21.3.

First, there is a hook here, Pls update firmware for the Micron Curcial Drives, we did so.
Old firmware had some issues of “Frozen”, when used with jetson-TK1.
Please have other SSDs updated too, with respective latest firmware.

Second thing being, check your boot kernel md5sum, and update the kernel if not matching with the below.
If the old rootfs is residing in the /boot directory of the boot device, it silently picks the old kernels residing on respective boot media

I used the below 3 SSDs with a SATA Port Multiplier connected to the SATA port of on Jetson-TK1.
I also used the same with SSDs directly connected to the SATA port of Jetson-TK1, no issues seen ever.

root@tegra-ubuntu:~# mkdir /media/{a..c}
root@tegra-ubuntu:~# ls /media/
a/      b/      c/      ubuntu/

root@tegra-ubuntu:~# lsscsi
[0:0:0:0]    disk    ATA      SanDisk SDSSDRC0 3.1.  /dev/sda
[0:1:0:0]    disk    ATA      Crucial_CT240M50 MU05  /dev/sdc
[0:2:0:0]    disk    ATA      Crucial_CT256MX1 MU02  /dev/sdb

root@tegra-ubuntu:~# lsblk -io KNAME,TYPE,SIZE,MODEL
KNAME       TYPE   SIZE MODEL
sda         disk  29.8G SanDisk SDSSDRC0
sda1        part  29.8G
sdb         disk 238.5G Crucial_CT256MX1
sdb1        part 238.5G
sdc         disk 223.6G Crucial_CT240M50
sdc1        part 223.6G
mmcblk0rpmb disk     4M
mmcblk0     disk  14.7G
mmcblk0p1   part    14G
mmcblk0p2   part     4M
mmcblk0p3   part    64M
mmcblk0p4   part     4M
mmcblk0p5   part     4M
mmcblk0p6   part     4M
mmcblk0p7   part     4M
mmcblk0p8   part     2M
mmcblk0p9   part   558M

root@tegra-ubuntu:~# mount /dev/sda1 /media/a
[  263.526272] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
root@tegra-ubuntu:~# mount /dev/sdb1 /media/b
[  267.880109] EXT4-fs (sdb1): mounting ext2 file system using the ext4 subsystem
[  267.903179] EXT4-fs (sdb1): warning: mounting unchecked fs, running e2fsck is recommended
[  267.914747] EXT4-fs (sdb1): mounted filesystem without journal. Opts: (null)
root@tegra-ubuntu:~# mount /dev/sdc1 /media/c
[  272.860939] EXT4-fs (sdc1): mounted filesystem with ordered data mode. Opts: (null)
root@tegra-ubuntu:~# cp /boot/zImage /media/a/
root@tegra-ubuntu:~# cp /boot/zImage /media/b/
root@tegra-ubuntu:~# cp /boot/zImage /media/c/
root@tegra-ubuntu:~# touch /media/?/zImage

root@tegra-ubuntu:~# md5sum /media/?/zImage /boot/zImage
d0b092c1b9615a6ba3b14e00dba5c4b7  /media/a/zImage
d0b092c1b9615a6ba3b14e00dba5c4b7  /media/b/zImage
d0b092c1b9615a6ba3b14e00dba5c4b7  /media/c/zImage
d0b092c1b9615a6ba3b14e00dba5c4b7  /boot/zImage

@spencer, the first log before splitting to the second log looked normal.

For reference, the first log change due to the port multiplier is just past this line in the second listing:

ip6_tables: (C) 2000-2006 Netfilter Core Team

In terms of required drivers, were there any kernel compiles or module additions required for the JMicron JMB321evaluation card? To determine if modules changed at all (assuming you didn’t change them but perhaps modules pre-existed without loading), can you compare the output of “lsmod” both with and without the controller and see if there is any difference? It could be important to know if a non-default driver loaded, as well as whether it loaded in the form of a module.

The first change I see which might or might not be significant is that the kernel command line loads sda1 as the root partition. It’s quite possible this has nothing to do with the issue, but if there were unforeseen side-effects for other reasons unrelated to the port multiplier it might be reasonable to try the simultaneous read/write testing still running on eMMC for 36 hours without putting rootfs on the drives being tested.

The first issue I see (which made the prior rootfs worth noting) is this:

EXT4-fs (sda1): couldn't mount as ext3 due to feature incompatibilities
EXT4-fs (sda1): couldn't mount as ext2 due to feature incompatibilities
EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)

Were those partitions formatted on a different machine or host? The Jetson might have some differences related to features such as SElinux not being enabled on Jetson. It might be important to do the formatting from Jetson itself to guarantee all features being compatible. In terms of moving those disks around to different machines, was everything done on Jetson, or was another host involved? I’m trying to eliminate differences which are unrelated to the controller driver.

I also see this:

XFS (sdb2): Mounting Filesystem
XFS (sdb2): Ending clean mount
XFS (sdc2): Mounting Filesystem
XFS (sdc2): Ending clean mount
XFS (sdd2): Mounting Filesystem
XFS (sdd2): Ending clean mount

…for testing purposes and narrowing testing down to minimal differences, would it be possible to restrict all partitions to type ext4 formatted directly from Jetson? If nothing “bad” happens, you could then partition non-root partitions as XFS for further testing. In addition to XFS file system differences, the security labels could add to complicating differences unrelated to the port multiplier.

One more difference I see is it looks like you have multiple network interfaces via:

igb 0000:01:00.1 eth2: igb: eth2 NIC Link is Down

If at all possible perhaps test without that, although I doubt this would matter other than perhaps module loads differing.

Following those differences is the first actual SATA failure:

ata1.00: failed to read SCR 1 (Emask=0x40)
ata1.01: failed to read SCR 1 (Emask=0x40)
ata1.02: failed to read SCR 1 (Emask=0x40)
ata1.03: failed to read SCR 1 (Emask=0x40)
ata1.04: failed to read SCR 1 (Emask=0x40)
ata1.15: exception Emask 0x10 SAct 0x0 SErr 0x1c00000 action 0x6 frozen
ata1.15: irq_stat 0x08000000, interface fatal error
ata1.15: SError: { Handshk LinkSeq TrStaTrns }
ata1.00: exception Emask 0x100 SAct 0x6 SErr 0x0 action 0x6 frozen
ata1.00: failed command: WRITE FPDMA QUEUED

I’m sorry it probably takes significant time to re-run testing with suggested changes, but it would be very helpful to remove as many complications as possible.