Hi, We have Micron 1TB NVMe (Vendor ID1344: Device ID 6001) connected to PCIe C5 controller on Xavier NX via Microsemi PFX 48X switch.
From the boot log, the boot gets stuck at Enumeration of 1TB NVMe on the switch. The NVMe works with Jetpack 4.6 (Tegra 4.9.253) but not with Jetpack 5.x. The same PCIe switch firmware and configuration were used across different versions of BSP loaded.
It works on the Xavier NX dev kit without PCIe switch when NVMe connected to M.2 connector on PCIe C5 controller. I’ve attached three logs in event you find them useful.
See below illustrations:
These setups work:
JetPack 4.6
Xavier NX → PCIE Switch → Micron SSD 1TB NVMe [Works]
JetPack 5.0.2 (and upwards)
Xavier NX → Micron SSD 1TB NVMe [Works]
This setup is not working, but what we need to work:
JetPack 5.0.2 (and upwards)
Xavier NX → PCIE Switch → Micron SSD 1TB NVMe [Does not work]
We had posted on the NVIDIA forum and its been active for a few months now, but we still have not yet come to a resolution.
Boot stuck at pcie enumeration with NVMe - Jetson & Embedded Systems / Jetson Xavier NX - NVIDIA Developer Forums
It does not appear to have anything to do with Micron NVMe, we’ve tried different NVMe and 80% of them doesn’t work in 5.x when behind a Microchip PCIE switch.
Many thanks for your attention in advance.
dmesg_xavier_JP5.0.2.txt (58.1 KB)
pcie_boot_log.txt (5.2 KB)
Ispci -vv [JP 4.6].txt (7.3 KB)
Hi WayneWWW,
The device boots without NVMe connected to the switch. If I connect the NVMe after the device boots and run the command you mentioned the device shows the below logs and reboot after sometime.
[ 93.164627] pci_bus 0004:00: scanning bus
[ 93.173331] pcieport 0004:00:00.0: scanning [bus 01-ff] behind bridge, pass 0
[ 93.173354] pci_bus 0004:01: scanning bus
[ 93.182762] pci_bus 0004:01: bus scan returning with max=01
[ 93.182787] pcieport 0004:00:00.0: scanning [bus 01-ff] behind bridge, pass 1
[ 93.182805] pci_bus 0004:00: bus scan returning with max=ff
[ 93.182829] pci_bus 0005:00: scanning bus
[ 93.190819] pcieport 0005:00:00.0: scanning [bus 01-ff] behind bridge, pass 0
[ 93.190839] pci_bus 0005:01: scanning bus
[ 93.199495] pcieport 0005:01:00.0: scanning [bus 02-05] behind bridge, pass 0
[ 93.199554] pci_bus 0005:02: scanning bus
[ 93.208258] pcieport 0005:02:00.0: scanning [bus 03-03] behind bridge, pass 0
[ 93.208315] pci_bus 0005:03: scanning bus
[ 93.208901] pci 0005:03:00.0: [1344:6001] type 00 class 0x010802
[ 93.209270] pci 0005:03:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit][ 93.229818] pcieport 0005:00:00.0: PCIe 93.21 Bus Erro726r: sevpcieport Unc000orr5:00:00.0: A(NoER:ata Unl),cor type=Tratednsa (Nction LayFater,al) (Requror receiestved: 0er 005
:00:00.0
[ 93 93.29.222106] 8] pciepoeport rt 0000000:00.00:0: 0.0 deCIe Bue [s Error: severity=10dUnce:1ad0] erroed r s(Notatn-Fus/atak=0000400 type=Transa0000
ction Layer, (Requester ID)
[ 93.292106] pcieport 0005:00:00.0: device [10de:1ad0] error status/mask=00004000/00400000[ 93.366930] pcieport 0005:00:0 930.0: .366930] [pci14]epopltTO 0005:00:0 0.0 : 14](Fi CmrstpltTO )
(First)
[ 93.441831] pci 0005:01:00.1: AER: can't recover (no error_detected callback)
[ 93.441876] pcieport 0005:00:00.0: AER: device recovery failed
[ 93.441887] pcieport 0005:00:00.0: AER: Multiple Uncorrected (Non-Fatal) error received: 0005:00:00.0
[ 93.653128] pcieport 0005:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
[ 93.709160] pcieport 0005:00:00.0: device [10de:1ad0] error status/mask=00004000/00400000
[ 93.653128] [ 93pciepo274rt 3] 000epo5:0rt 0:00005:0: P0:00.0: Bu [14]s E Cmrror: sev ty= Unc orrected (Non-F(Fiatarstl), type=Transaction Layer, (Requester ID)
[ 93.709160] pcieport 0005:00:00.0: device [10de:1ad0
Hi @WayneWWW ,
lspci dumps are as below
lspci_JP4_6.txt (38.7 KB)
lspci_JP5_1_1.txt (34.1 KB)
1 Like
Hi,
We notice there is ACS enabled on jetpack5 but not enabled on Jetpack4.
Please modify this patch to match this switch ID to make this switch into quirk list.
Below example is for another kind of switch but not yours.
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 26ed12386871..f097b7c9458b 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5153,6 +5153,11 @@ static int pci_quirk_disable_intel_spt_pch_acs_redir(struct pci_dev *dev)
return 0;
}
+static int pci_quirk_fake_pericom_acs(struct pci_dev *dev)
+{
+ return 0;
+}
+
static const struct pci_dev_acs_ops {
u16 vendor;
u16 device;
@@ -5166,6 +5171,9 @@ static const struct pci_dev_acs_ops {
.enable_acs = pci_quirk_enable_intel_spt_pch_acs,
.disable_acs_redir = pci_quirk_disable_intel_spt_pch_acs_redir,
},
+ { PCI_VENDOR_ID_PERICOM, PCI_DEVICE_ID_PERICOM_SWITCH_PORT,
+ .enable_acs = pci_quirk_fake_pericom_acs,
+ },
};
int pci_dev_specific_enable_acs(struct pci_dev *dev)
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 4ff680e22cc3..8f40c41c1ae7 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -1842,6 +1842,7 @@
#define PCI_DEVICE_ID_PERICOM_PI7C9X7952 0x7952
#define PCI_DEVICE_ID_PERICOM_PI7C9X7954 0x7954
#define PCI_DEVICE_ID_PERICOM_PI7C9X7958 0x7958
+#define PCI_DEVICE_ID_PERICOM_SWITCH_PORT 0x2608
#define PCI_SUBVENDOR_ID_CHASE_PCIFAST 0x12E0
#define PCI_SUBDEVICE_ID_CHASE_PCIFAST4 0x0031
HI @WayneWWW , I have tried the above changes, but I am experiencing problem at the same spot.
Hi,
Please be aware that you cannot just copy and paste my patch. You have to modify the vendor ID to match your switch.
My patch is for PERICOM switch, but not yours.
Hi @WayneWWW, I have added the below patch.
static int pci_quirk_fake_microsemi_pfx_acs(struct pci_dev *dev) {
return 0;
}
static const struct pci_dev_acs_ops {
u16 vendor;
u16 device;
int (*enable_acs)(struct pci_dev *dev);
int (*disable_acs_redir)(struct pci_dev *dev);
} pci_dev_acs_ops[] = {
{ PCI_VENDOR_ID_INTEL, PCI_ANY_ID,
.enable_acs = pci_quirk_enable_intel_pch_acs,
},
{ PCI_VENDOR_ID_INTEL, PCI_ANY_ID,
.enable_acs = pci_quirk_enable_intel_spt_pch_acs,
.disable_acs_redir = pci_quirk_disable_intel_spt_pch_acs_redir,
},
{ PCI_VENDOR_ID_MICROSEMI, PCI_DEVICE_ID_MICROSEMI_SWITCH_PORT,
.enable_acs = pci_quirk_fake_microsemi_pfx_acs,
},
};
#define PCI_VENDOR_ID_MICROSEMI 0x11f8
#define PCI_DEVICE_ID_MICROSEMI_SWITCH_PORT 0x8533
Please let me know if you think something is incorrect.
Hi,
Please add a print in your pci_quirk_fake_microsemi_pfx_acs and see if it got printed.
Hi @WayneWWW , the print is not coming up in logs.
Hi,
I am not sure where to start here.
Are you sure your driver is updated with your patch? I mean are you sure you know how to update kernel driver?
Also, your lspci says your device is PMC-Sierra Inc. Device 8573. But your patch uses 8533.
Hi @WayneWWW ,
I have updated the patch with 8573 and the logs started coming up. But the issue still persists. The PCIe related log is as below. It has “pci_quirk_fake_microsemi_pfx_acs” prints.
[ 4.916141] tegra194-pcie 141a0000.pcie: Link up
[ 4.925990] tegra194-pcie 141a0000.pcie: PCI host bridge to bus 0005:00
[ 4.926169] pci_bus 0005:00: root bus resource [bus 00-ff]
[ 4.926305] pci_bus 0005:00: root bus resource [io 0x100000-0x1fffff] (bus address [0x3a100000-0x3a1fffff])
[ 4.926518] pci_bus 0005:00: root bus resource [mem 0x1c00000000-0x1f3fffffff pref]
[ 4.926681] pci_bus 0005:00: root bus resource [mem 0x1f40000000-0x1fffffffff] (bus address [0x40000000-0xffffffff])
[ 4.926960] pci 0005:00:00.0: [10de:1ad0] type 01 class 0x060400
[ 4.927246] pci 0005:00:00.0: PME# supported from D0 D3hot D3cold
[ 4.933931] pci 0005:01:00.0: [11f8:8573] type 01 class 0x060400
[ 4.935133] pci 0005:01:00.0: enabling Extended Tags
[ 4.937392] pci 0005:01:00.0: PME# supported from D0 D3hot D3cold
[ 4.938242] pci 0005:01:00.0: pci_quirk_fake_microsemi_pfx_acs
[ 4.939448] pci 0005:01:00.1: [11f8:8573] type 00 class 0x068000
[ 4.940067] pci 0005:01:00.1: reg 0x10: [mem 0x00000000-0x003fffff]
[ 4.940960] pci 0005:01:00.1: reg 0x18: [mem 0x00000000-0x0fffffff 64bit pref]
[ 4.941495] pci 0005:01:00.1: reg 0x20: [mem 0x00000000-0x01ffffff]
[ 4.941978] pci 0005:01:00.1: reg 0x24: [mem 0x00000000-0x007fffff]
[ 4.942364] pci 0005:01:00.1: enabling Extended Tags
[ 4.948538] pci 0005:01:00.1: PME# supported from D0 D3hot D3cold
[ 4.953956] pci 0005:01:00.1: pci_quirk_fake_microsemi_pfx_acs
[ 4.965701] pci 0005:01:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 4.968902] pci 0005:02:00.0: [11f8:8573] type 01 class 0x060400
[ 4.974227] pci 0005:02:00.0: enabling Extended Tags
[ 4.980081] pci 0005:02:00.0: PME# supported from D0 D3hot D3cold
[ 4.985361] pci 0005:02:00.0: pci_quirk_fake_microsemi_pfx_acs
[ 4.991524] pci 0005:02:01.0: [11f8:8573] type 01 class 0x060400
[ 4.997610] pci 0005:02:01.0: enabling Extended Tags
[ 5.003597] pci 0005:02:01.0: PME# supported from D0 D3hot D3cold
[ 5.008628] pci 0005:02:01.0: pci_quirk_fake_microsemi_pfx_acs
[ 5.015062] pci 0005:02:02.0: [11f8:8573] type 01 class 0x060400
[ 5.020800] pci 0005:02:02.0: enabling Extended Tags
[ 5.026742] pci 0005:02:02.0: PME# supported from D0 D3hot D3cold
[ 5.031367] pci 0005:02:02.0: pci_quirk_fake_microsemi_pfx_acs
[ 5.043205] pci 0005:02:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 5.044956] pci 0005:02:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 5.053169] pci 0005:02:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 5.062300] pci 0005:03:00.0: [1344:6001] type 00 class 0x010802
[ 5.067109] pci 0005:03:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
The boot gets stuck at this point.
Hi,
I am not sure if I ever asked you about this before.
Did this issue happen on other kind of nvme?
Did this issue happen on Micron NVMe but with other disk size? For example, 256G.
Hi @WayneWWW ,
Yes I have tried several other brands of NVMe, such as KioXia, Apacer, Kingston etc. Among them, Only Apacer works well with it. Do you need lspci -vv output with Apacer after successful boot?
We haven’t tried Micron with smaller size.
- The patch is as below:
static int pci_quirk_fake_microsemi_pfx_acs(struct pci_dev *dev) {
dev->dev_flags |= PCI_DEV_FLAGS_ACS_ENABLED_QUIRK;
pci_info(dev, "pci_quirk_fake_microsemi_pfx_acs");
return 0;
}
static const struct pci_dev_acs_ops {
u16 vendor;
u16 device;
int (*enable_acs)(struct pci_dev *dev);
int (*disable_acs_redir)(struct pci_dev *dev);
} pci_dev_acs_ops[] = {
{ PCI_VENDOR_ID_INTEL, PCI_ANY_ID,
.enable_acs = pci_quirk_enable_intel_pch_acs,
},
{ PCI_VENDOR_ID_INTEL, PCI_ANY_ID,
.enable_acs = pci_quirk_enable_intel_spt_pch_acs,
.disable_acs_redir = pci_quirk_disable_intel_spt_pch_acs_redir,
},
{ PCI_VENDOR_ID_MICROSEMI, PCI_DEVICE_ID_MICROSEMI_SWITCH_PORT,
.enable_acs = pci_quirk_fake_microsemi_pfx_acs,
},
};
pci_ids.h
#define PCI_VENDOR_ID_PMC_Sierra 0x11f8
#define PCI_VENDOR_ID_MICROSEMI 0x11f8
#define PCI_DEVICE_ID_MICROSEMI_SWITCH_PORT 0x8573
I have talked with Vendor and the firmware on the NVMe is the latest version.
Yes, please also share the Apacer lspci -vvv result.
Hi,
I just notice you have this added in your patch.
dev->dev_flags |= PCI_DEV_FLAGS_ACS_ENABLED_QUIRK;
Could you remove it and also the print and run again? As we know it would be executed so we don’t need print now.