Tegra not recognizing Hard Drive

OrrinJelo · August 10, 2016, 5:47pm

We are using the NVIDIA Jetson TK1 Tegra board. It is running Ubuntu 14.04.1 LTS with a 3.10.40-gdacac96 armv7l kernel. We are using CUDA 6.5 as provided with the drivers in the Tegra R21.4.0 package. We are using a Samsung 850 EVO SSD connected via SATA connection.

We have observed a number of our systems that use this Tegra board that intermittently cannot detect the SSD drive we’ve attached, and other systems that have yet to have this problem. It has been unpredictable and erratic.

We have found other posts here from people that seem to have the same issue as us.

https://devtalk.nvidia.com/default/topic/830349/jetson-tk1-and-sata-drive-issue/

What we have tried from this post:

The latest issue R21.5 was installed, we still observed this issue.
Swapped the drive with another brand, still observed this issue.
Updated to a later Linux kernel, still observed this issue.
Checked for the latest firmware for SSD, it is at its latest.

Out of 10 Tegra boards in our possession, 3 exhibit this behavior. That seems like rather high odds.

A clip from dmesg:

[    9.310555] ata1.00: qc timeout (cmd 0xec)
[    9.316488] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   19.323506] ata1: softreset failed (1st FIS failed)
[   24.841505] ata1: link is slow to respond, please be patient (ready=0)
[   29.340503] ata1: softreset failed (device not ready)
[   29.809541] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   29.820687] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x100)
[   29.831734] ata1: limiting SATA link speed to 1.5 Gbps
[   44.819500] ata1: softreset failed (1st FIS failed)
[   45.288515] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[   55.299529] ata1.00: qc timeout (cmd 0xec)
[   55.308555] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   55.779535] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

linuxdev · August 10, 2016, 6:36pm

Would it be possible to swap a drive and the cable used with that drive from a unit exhibiting the behavior with a drive and cable from a unit which does not exhibit that behavior, and see if the problem follows?

I don’t know if this actually applies to SATA, but you might try an experiment, edit kernel file “arch/arm/mach-tegra/tegra12_clock.h”. Change line 29 from “define USE_PLLE_SS 1” to:

#ifdef USE_PLLE_SS
#undef USE_PLLE_SS
#endif /* USE_PLLE_SS */

OrrinJelo · August 10, 2016, 7:07pm

I have not done this the way you are describing, but I have connected a completely new SSD (from different manufacturer and ones from the same) and a completely new cable to the same unit, and still the problem persists. It may be a challenge to get the same setup as you describe, but it’s something I’ll try.

linuxdev:

I don’t know if this actually applies to SATA, but you might try an experiment, edit kernel file “arch/arm/mach-tegra/tegra12_clock.h”. Change line 29 from “define USE_PLLE_SS 1” to:
#ifdef USE_PLLE_SS
#undef USE_PLLE_SS
#endif /* USE_PLLE_SS */

I am unable to find the tegra12_clock.h in the path “/usr/src/linux-headers-3.10.40-gdacac96/arch/arm/mach-tegra/” for R21.4 and kernel 3.10.40 as you suggest. I will work on getting everything to the latest version so that we can be on the same page.

Edit: The filename is clock.h, I think.

Edit 2: I’m told the mentioned thread did suggest that the R21.4 solves the issue, but it did not. R21.5 as far as we tested also does not solve the issue.

linuxdev · August 10, 2016, 9:58pm

You will need the full kernel source, but yes, the correct file is clock.h. Apparently my brain had a bit of data corruption. The actual user of the define is in tegra12_clocks.c, and I had both files open while I was looking at it.

There were a number of SSD issues fixed, but there may be more than one issue beyond what was already fixed. I just have a suspicion that in cases where a drive or PCIe device sometimes works but sometimes does not…and yet works on other machines…it might be a signal quality issue which spread spectrum pushes from working to marginal and sometimes working. On a desktop PC this would be enabled or disabled in BIOS, but normally not enabled. Desktop PC overclockers would never enable this, it limits top end performance. The reason for enabling spread spectrum on a computer would be to avoid generating as much noise (e.g., to avoid noise on audio equipment).

OrrinJelo · August 12, 2016, 8:11pm

Just an update: we are working today to try the kernel modification you suggested.

We’ve went through both hardware and software checks (essentially, hardware guys are blaming the software, and the software guys are blaming the hardware). It looks like a power failure to the software guys, but hardware checks show nothing out of the ordinary.

A caveat I forgot to mention: these are modified boards. The audio has been removed, wires replace the ethernet adapter and switch/status pinout, etc. We have done this before, however, we have not done this with solid state drives attached.

As for the kernel modification and building, we are following this tutorial from two years ago with some changes, of course: https://devtalk.nvidia.com/default/topic/762653/-howto-build-own-kernel-for-jetson-tk1/

Once we feel like we have the process down, I’ll post any changes that are different from that process (the kernel version and some menuconfig options, for example).

linuxdev · August 12, 2016, 8:17pm

This is just a general comment, but changing any of the traces could have an effect on signal quality even if the changes are basically correct. In this case the spread spectrum issues where spread spectrum has a harder time with signal quality there would still basically be the same issue. It would be interesting to see a very high quality view of the signals from a high end oscilloscope both with and without spread spectrum, and then with and without spread spectrum on a board with the modifications you mention. If this is the case, then both your hardware guys and software guys are correct at the same time (you must have quantum engineers! :P)

OrrinJelo · August 12, 2016, 9:48pm

After making the change, we also made the further changes:

tegra12_clocks.c: (4221) #if => #ifdef

So I’ve finally got the changes to the kernel and I’m still getting the SATA fail. No such luck there.

Edit: We’ve also tested a combination of different drives, cables, power sources and Tegras–drive with bad Tegra power and SATA cable to good Tegra (works), drive with good Tegra power and SATA to bad Tegra (fails), drive with good Tegra power and SATA cable to good Tegra (works). Replacing the cables seem to do no good.

Something that I should add is it does seem to fail less often with the kernel code change, but it might just be my mind. Yeah, it’s possible we’re all going out of our minds here.

BlueCito · August 18, 2016, 12:28am

I am working with OrrinJelo on this issue, and we still do not have a resolution.

Power supplies investigation results: used an oscilloscope and compared Jetson boards that have no SATA SSD recognition issues and ones that do and can find no detectable differences. Rise times on supplies, noise/ripple on supplies, voltage levels, sequencing, etc are indistinguishable.

We have followed this thread and recompiled the kernel as suggested, and saw no improvement.

30% of the Tegra boards exhibiting this problem.

Any suggestions?

OrrinJelo · August 18, 2016, 12:31am

SSD failure is still intermittent on the same Tegra boards.

On the Tegras that we have installed the kernel fix (clock.h changes), we might be seeing a kernel panic once in a while. I’m thinking of rolling back to the old kernel tomorrow.

So it looks like we’re back on square one, knowing that it’s not a cable issue, power issue, or drive issue. Any other thoughts on where to go from here?

linuxdev · August 18, 2016, 12:43am

It’s interesting because the issue seems to track with specific Jetsons, rather than tracking with the specific drive (and not all Jetsons have the issue, although a high percentage do). On the Jetsons which fail, do regular SATA drives have any issue, or is it just SSD drives?

Preetham260 · August 19, 2016, 6:22am

Hi OrionJelo,

Can you try the below patch and tell us if it fixes the issue.

diff --git a/drivers/ata/ahci-tegra.c b/drivers/ata/ahci-tegra.c
index a80a8b1..4725798 100644
--- a/drivers/ata/ahci-tegra.c
+++ b/drivers/ata/ahci-tegra.c
@@ -1081,12 +1081,10 @@ static int tegra_ahci_controller_init(struct tegra_ahci_host_priv *tegra_hpriv,
 	val &= ~NVA2SATA_OOB_ON_POR_MASK;
 	misc_writel(val, SATA_AUX_MISC_CNTL_1_REG);
 
-	if (tegra_hpriv->sata_connector != MINI_SATA) {
-		/* Disable DEVSLP Feature */
-		val = misc_readl(SATA_AUX_MISC_CNTL_1_REG);
-		val &= ~SDS_SUPPORT;
-		misc_writel(val, SATA_AUX_MISC_CNTL_1_REG);
-	}
+	/* Disable DEVSLP Feature */
+	val = misc_readl(SATA_AUX_MISC_CNTL_1_REG);
+	val &= ~SDS_SUPPORT;
+	misc_writel(val, SATA_AUX_MISC_CNTL_1_REG);
 
 	val = sata_readl(SATA_CONFIGURATION_0_OFFSET);
 	val |= EN_FPCI;

Edit: Please try the above change on R21.5

Thanks & Regards
Preetham

OrrinJelo · August 20, 2016, 12:51am

So I think we finally concluded that it is nothing to do with the power to the drive, the SSD drive, or the SATA cable, or our modifications that are causing this issue. We tried a mechanical hard drive and it failed with the same units in question.

We ordered a several more TK1s and found one (unmodified) that acted similarly. We did the same tests with the SSD and mechanical SATA drives, swapping out cables, power cables, etc.

I am currently working on getting the above patch installed. We’ll see if that does it.

OrrinJelo · August 26, 2016, 12:38am

I thought I updated this thread with the results of the patch, turns out I didn’t.

The SSD issues persist with that kernel patch. No go there as well.

We do have some newly out of the box Tegras, at least one of which seem to exhibit the same SSD/SATA issue of not recognizing the drive now and then. Nothing’s been observed on the other units.

Preetham260 · August 26, 2016, 1:51am

So the 3 boards that were failing earlier its not failing anymore with the new patch? It’s only failing with the new units that you received? or is it that it’s not failing even without applying the above patch for the 3 earlier boards?

On the failing unit when it boots successfully can you please attach the SATA register dumps? You can obtain as below:

cat /sys/kernel/debug/tegra_ahci

Also please attach the complete uart log for the failure case with and without the above patch.

Is it possible to collect and share LeCroy trace which will help us analyze why exactly it is failing?

thanks.

OrrinJelo · August 26, 2016, 2:29am

The 3 boards that were failing earlier are still failing with the patch. Of the new units, we have detected one that has the same issue.

I’ll try to get those things done and posted tomorrow.

mywu1987 · September 22, 2019, 4:45pm

Hi OrrinJelo,
This problem be solved later?Because I also encountered this problem recently.

OrrinJelo · September 23, 2019, 3:25pm

As far as I know, we were not able to solve the problem for the units that failed. We kept tabs on the units that succeeded and failed, and used only those that succeeded.

Edit: FYI, we have since moved on to use the TK1 and TX2i modules in incorporated them onto house-designed boards. These work fine. The above issue seen was only seen with TK1 dev boards, it seems.

mywu1987 · September 24, 2019, 1:25am

Hi OrrinJelo,
Thanks for your replay!
Do you have used TK1 chipset on your own designed boards?
We also have designed TK1 chipset on our product,but unfortunately,about 5% of the TK1 platform product that intermittently cannot detect the HDD drive in mass production.

OrrinJelo · September 24, 2019, 5:28pm

Unfortunately I can’t verify your issue. We have since moved on to use eMMC instead of HDD in our products. I am not directly involved in the manufacturing process or board bring up, so I’m not sure what the failure rate is on that end.

Topic		Replies	Views
Jetson TK1 and SATA Drive Issue Jetson TK1	15	6894	November 16, 2015
Jetson-TK1 21.3, 21.4, SATA does not work Jetson TK1	6	2028	August 20, 2015
Jetson TK1 board and using SSD drive Jetson TK1	17	3122	March 10, 2017
SATA SSD not detected after reboot Jetson TX1	7	3040	May 24, 2018
SATA issues with 3.10 kernel Jetson TK1	11	4618	February 10, 2015
SSD Drive mounting error on PCIe on Jetson TX2 platform Jetson TX2	5	984	October 18, 2021
Trouble using Jetson TX1 Developer Kit SATA with Samsung 850 EVO SSD Jetson TX1	7	1535	July 14, 2016
Jetson TK1 - Stopped working with HDMI monitors and can't access internet anymore... Jetson TK1	14	4484	November 27, 2017
SATA Hotplug Jetson TX2	37	2496	February 18, 2019
[ Jetson- TK1 ] boot : gk20a.0 unrecognized ioctl cmd Jetson TK1	1	853	November 27, 2017

Tegra not recognizing Hard Drive

Related topics