M.2 NVMe Detection issues

Dear Nvidia Team

We face some problems with NVMe SSDs on our custom carrier boards. The M.2 Key M Interface is identical to the reference design. Now with some systems, we get the following two errors:

[ 1.983689] nvme nvme0: pci function 0000:01:00.0
[ 1.983890] nvme 0000:01:00.0: enabling device (0000 → 0002)
[ 62.167901] nvme nvme0: I/O 15 QID 0 timeout, disable controller
[ 62.816003] nvme nvme0: Identify Controller failed (-4)
[ 62.816037] nvme nvme0: Removing after probe failure status: -5

[ 199.557910] nvme nvme0: pci function 0000:01:00.0
[ 327.640232] nvme nvme0: Device not ready; aborting initialisation
[ 327.640272] nvme nvme0: Removing after probe failure status: -19

We tried already without success to add the kernel parameters “pcie_aspm=off” and “pci=nomsi”, still sometimes after a reboot or a poweroff, the SSD is not detected. Also changing the bootloader boot device order did not help totally. Sometimes the following workaround helps to get the NVMe back:

echo “1” > /sys/bus/pci/devices/0000:01:00.0/remove
sleep 1
echo “1” > /sys/bus/pci/rescan

Also on one system, we moved the entire rootfs to the NVMe drive, and then it was always detected fine.
On the devkit we did not see the behavior so far. Also we do not see it with every custom carrier board. The SSD is the following from Apacer:
https://industrial.apacer.com/upfiles/ADUpload/allshare/PV310-M280_EDM_20210108.pdf

Do you have any other idea what we could try? As there are similar topics in the forum, it seems a problem with certain NVMe SSDs. We would also like to mention that we see similar errors with the Xavier NX and a custom carrier.
Thank you for your help.

Best regards

You meam some boards can work normally, only some boards got issue?
Have your HW team checked/probe the pin?

Please try to add some delay in the pcie driver when it is probing and see if it helps.

msleep(1000);

We had a patch on TX2-NX but I am not sure if this works fine on your case

diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
index 60958d5..385ae3f1 100644
--- a/drivers/pci/host/pci-tegra.c
+++ b/drivers/pci/host/pci-tegra.c
@@ -5,7 +5,7 @@
  * Author: Mike Rapoport <mike@compulab.co.il>
  *
  * Based on NVIDIA PCIe driver
- * Copyright (c) 2008-2018, NVIDIA Corporation. All rights reserved.
+ * Copyright (c) 2008-2022, NVIDIA Corporation. All rights reserved.
  *
  * Bits taken from arch/arm/mach-dove/pcie.c
  *
@@ -635,6 +635,7 @@
 		      (PCI_FUNC(devfn) << 8) | (where & 0xff);
 		addr = (val & (SZ_4K - 1)) + addr;
 		val = val & ~(SZ_4K - 1);
+		afi_writel(pcie, SZ_4K >> 12, AFI_AXI_BAR0_SZ);
 		afi_writel(pcie, pcie->cs->start - val, AFI_AXI_BAR0_START);
 		afi_writel(pcie, (val + SZ_4K) >> 12, AFI_AXI_BAR0_SZ);
 	}

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.