Jetson TK1 and SATA Drive Issue

Hi,

I have multiple TK1 boards but one does the following if an external SATA SSD is connected:
When the device is powered up it starts up normally, powering an external board through the USB (the lights on the board goes on). During boot I can not connect to the device through SSH (which I assume is normal).

It takes a few minutes to boot and then I can connect via SSH. When connected I can not detect the SSD (fdisk -l) and the USB is also not working.

When the SATA drive is not connected, the device boots in less than a minute, I can connect via SSH and the USB is working.

Starting the device with the SSD connected I get the following output over serial:

[    7.585661] input: gpio-keys.4 as /devices/platform/gpio-keys.4/input/input1
[    7.595103] as3722-rtc as3722-rtc.1: setting system clock to 2000-01-01 00:24:34 UTC (946686274)
[    7.607745] ALSA device list:
[    7.612384]   #0: HDA NVIDIA Tegra at 0x70038000 irq 113
[    7.619412]   #1: tegra-rt5639
[    9.748736] ata1: link is slow to respond, please be patient (ready=0)
[   14.447965] ata1: COMRESET failed (errno=-16)
[   14.758975] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   19.767024] ata1.00: qc timeout (cmd 0xec)
[   19.772865] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   25.133963] ata1: link is slow to respond, please be patient (ready=0)
[   29.782965] ata1: COMRESET failed (errno=-16)
[   35.142964] ata1: link is slow to respond, please be patient (ready=0)
[   39.791965] ata1: COMRESET failed (errno=-16)
[   45.151963] ata1: link is slow to respond, please be patient (ready=0)
[   64.764876] tegra-xhci tegra-xhci: failed to init firmware from filesystem: tegra_xusb_firmware
[   74.841965] ata1: COMRESET failed (errno=-16)
[   74.848161] ata1: limiting SATA link speed to 1.5 Gbps
[   79.902964] ata1: COMRESET failed (errno=-16)
[   79.909135] ata1: reset failed, giving up
[   79.926028] ata1: exception Emask 0x10 SAct 0x0 SErr 0x4180000 action 0xe frozen
[   79.937025] ata1: irq_stat 0x00000040, connection status changed
[   79.944950] ata1: SError: { 10B8B Dispar DevExch }
[   79.951622] EXT4-fs (mmcblk0p1): couldn't mount as ext3 due to feature incompatibilities
[   79.961987] ata1: hard resetting link

I also get the following at regular intervals, after booted:

[  150.104287] ata1: COMRESET failed (errno=-16)
[  160.113875] ata1: COMRESET failed (errno=-16)
[  195.164837] ata1: COMRESET failed (errno=-16)
[  210.939025] ata1: COMRESET failed (errno=-16)
[  220.990099] ata1: COMRESET failed (errno=-16)
[  241.345187] ata1: COMRESET failed (errno=-16)
[  251.394977] ata1: COMRESET failed (errno=-16)
[  291.750830] ata1: COMRESET failed (errno=-16)
[  301.800610] ata1: COMRESET failed (errno=-16)
[  302.121395] ata1: exception Emask 0x10 SAct 0x0 SErr 0x4180000 action 0xe frozen
[  302.128985] ata1: irq_stat 0x00000040, connection status changed
[  302.135051] ata1: SError: { 10B8B Dispar DevExch }

When the SSD is not connected, I do not get those errors.
The SSD is not broken since it works on the other TK1 boards.
I confirmed the USB issue also with a Keyboard connected.

root@tegra-ubuntu:/home/ubuntu# lsusb
unable to initialize libusb: -99

My L4T version is: R21 (release), REVISION: 1.0

Any help will be appreciated.

Can you try with the 21.3 release? I think it has some fixes for the SATA drivers.

It does sound odd though, if only one of the boards shows these symptoms.

I have installed the latest 21.3 but the problem is still there.

I have confirmed that it is not the SSD. I have swapped out the SSD with another TK1 and the other TK1 boots with no problems and I can access the drive. The original TK1 still does not boot.

Any other ideas?

I have two Jetson’s where the SATA connection doesn’t work. Like you, the drives and cables work fine with other Jetson boards. One of the failed boards was working fine for weeks, then failed for no apparent reason while inside a case (no mechanical/electrical abuse possible, other than power-cycling). It’s got to be a hardware failure, at least in my case.

tonyvr

My conclusion is also a hardware failure.

This is our second TK1 that broke then (not the same failure though)…

Hi guys,

There is a possibility that it is a kernel or configuration issue rather than a hardware issue. Can you both try applying the patch at “http://pastebin.com/dE68gJXF” to the latest L4T 21.3 and then build & flash the kernel, to see if it solves your problem? Also, what model SSDs are you using? NVIDIA might be able to get the same SSD to debug it further, if it looks like a kernel issue.

Cheers,
Shervin.

Can you please point me to some instructions on how this should be done.

Kingston 480GB (SV300S37A/480G)

Regards

If you go to that URL in a browser, you’ll see a section “RAW Paste Data”. You can mouse copy and paste that (be careful to get the whole thing, even the parts that require scrolling), and use it to create a file such as “testing.diff”.

Once unpacked, the R21.3 package contains the kernel source in file kernel_src.tbz2. If unpacked, this source will be in a subdirectory “kernel”, so be careful not to unpack it where that directory already exists. CD into that directory. Assuming testing.diff is in that directory just above the unpack directory:

patch -p 1 -c < ../testing.diff

The output should read:

patching file drivers/ata/ahci-tegra.c

From there it’s just the usual kernel compile and install.

Hello,
Did this patch fix the problem?

I had some older version with kernal 3.10.24[letters] and the SSD seemed to work fine through several reboots. I updated to r21.3 by flashing it through the micro usb.

Since then, I’ve had inconsistent ata connections to any SSD connected to the SATA connector on this one board. Other boards with r21.3 work just fine.

What I specifically mean is on boot up, the OP’s error occurs. Sometimes it doesn’t happen and the drive shows up at /dev/sda.

I’ve moved the cables and ssds to another Tegra on r21.3 and they all work just fine.

On this troubled board, 5V is coming through the power cables.
We’re using the same Samsung SSDs on other boards and they have been working fine for months on older versions of the kernel, including a week on r21.3 on some other boards.

This gets repeated on the serial console/dmesg over and over:

[  714.294691] ata1: softreset failed (1st FIS failed)
[  724.341707] ata1: softreset failed (1st FIS failed)
[  759.388461] ata1: softreset failed (1st FIS failed)
[  764.604962] ata1: softreset failed (device not ready)
[  764.643803] ata1: reset failed, giving up
[  765.153687] ata1: exception Emask 0x10 SAct 0x0 SErr 0x4180000 action 0xe frozen
[  765.226641] ata1: irq_stat 0x00000040, connection status changed
[  765.266710] ata1: SError: { 10B8B Dispar DevExch }

edit:
I think the board is fubared. I tried another board in the same box setup (making embedded systems) and this board does not have any issues with the ATA when I flash r21.3 over usb, even after 6 reboots.

Hi TrapAtLri, you should be able to do an RMA with the instructions at this thread if you haven’t yet: https://devtalk.nvidia.com/default/topic/821235/defective-jetson/?offset=1

I have still not applied the patch to the kernel but still have the same problem with the latest L4T (21.4).

We bought 3 new TK1 boards with the result with 7 TK1 boards.
1x Always gives the error as above when connecting the SATA drive (old)
2x Intermittently gives the error as above, when this happens, I disconnect the SATA cable and Power cables and reconnect them (powered down). This has about an 80% success rate (1 old, 1 new)
3x Never shows this issue (not yet) (2 old, 1 new)
1x Unsure, initial test shows that the issue does not occur (new)

On the 2 with the intermittent error, if I just power cycle, the success rate observed is 0%. I can not confirm nor deny that the reseating of the cables are related to resolving the issue, but it helps.

The setup for all 7 is exactly the same, I do a nvflash --rawdeviceread on the one I consider a master and then a nvflash --rawdevicewrite on the other 6. Due to this I am hesitant to think that it is a kernel issue.

Any suggestions are welcome?

Regarding the RMA, say it is the result of defective hardware, what is the time after purchase that one can attempt an RMA?

It might be useful to see if power load is an issue. If you were to power the drive from an alternate source, then you could test if failure remains the same. You could even use the power from an alternate Jetson as a test…plug power to SATA in one Jetson which isn’t being used, and data cable into the Jetson being tested.

The reason I wonder about this is because of noise observed in audio when powering a drive through the molex connector.

I am seeing exactly the same issue with both spinning HDD and a SSD.

I have tried connecting power through a separate power supply, but it doesn’t fix the problem.

I cannot find the patch on pastebin anymore. Did anyone have luck by applying this.

I am using the most recent r21.4

I have not heard of this issue with a regular hard drive…only with SSD.

Hi TheBadger101,
Pls check if the SSD has got updated firmware from the manufacturer,
Obsolete firmware fed SSDs were/are known to misbehave with such errors mentioned in your post.
We used Transcend, Kingston, Micron, Sandisk SSDs of various capacities with updated firmware and they work absolutely well.

But when used with old firmware, in the past, they have shown similar issues mentioned in your post.
Hence I would request you to check and update your SSD firmware from manufacturer.

Hi
We have verified Micron, Kingston, Transcend, Sandisk SSDs and none of them have erred, till date when used with Jetson-TK1.

Please use an updated firmware with the SSD, from the manufacturer.

We hit similar issues when used with obsolete firmware for the SSDs, in the past.

Cheers