Hi guys, I met a problem that we cannot enumerate a 10G network card on 14160000.pcie. We use custom board, and the recurrence probability is low(1/50) in some carrier board and that may be higher in other board(3/5). Here is the full logs, containing four cases: the cold/hot boot + success/failure of enumerating the PCIe:
I try to extend the waiting time of PCIe by the following patch:
diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
index 5563979310ba..fe8d790db305 100644
--- a/drivers/pci/controller/dwc/pcie-designware.c
+++ b/drivers/pci/controller/dwc/pcie-designware.c
@@ -557,6 +557,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci)
return 0;
}
usleep_range(LINK_WAIT_USLEEP_MIN, LINK_WAIT_USLEEP_MAX);
+ dev_info(pci->dev, "ben: wait loop (%d/%d)\n", retries+1, LINK_WAIT_MAX_RETRIES);
}
dev_info(pci->dev, "Phy link never came up\n");
diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
index 76c57d4fa714..fd4c78dab204 100644
--- a/drivers/pci/controller/dwc/pcie-designware.h
+++ b/drivers/pci/controller/dwc/pcie-designware.h
@@ -25,7 +25,7 @@
#define DW_PCIE_VER_562A 0x3536322A
/* Parameters for the waiting for link up routine */
-#define LINK_WAIT_MAX_RETRIES 10
+#define LINK_WAIT_MAX_RETRIES 100
#define LINK_WAIT_USLEEP_MIN 90000
#define LINK_WAIT_USLEEP_MAX 100000
Sometimes the enumeration success at the first try as before. While sometimes it needs about 30 tries, sometimes it needs about 60 tries and other times it will be failed even after100 tries. Any advice or suspicion do you have?
Sorry for confusing. The initial purpose is that we try to reload the driver to see if this help. As the external disk is also on the PCIe bus, we have to use a usb stick to store the rootfs.
The first one, only 10G network card cause the problem.
The reason I mentioned external disk is that if I unload the pcie_tegra194, the external disk will off. The whole system is stored in that, so I cannot “reload the pcie driver” without helping of usb disk. The “external disk” I mean here is the nvme disk on pcie bus.
Sorry again for confusing. Please ignore the usb stick or disk, the point is that sometimes reloading the pcie driver can enumerate the 10G network card.
Besides, I compare the kernel output between two boots, before the PCIe never link up, there is such difference( left is the ok boot, while the right is the ng boot:
Hi WayneWWW. After rechecking the release note of AQC113C, we may found out the reason of the problem. As the release note said, we need to switch to new method:
original method:
No Link
Train to Gen 1
Immediately speed change and train to Gen 3 (usually fails at this location)
Immediately speed change and train to Gen 4
new method:
No Link
Set host Train to Gen 1
Confirm Gen 1 linked up
Proceed to link to Gen 3
Proceed to speed change and link to Gen 4
We’ve check the device tree documents, and found out that there is a property called nvidia,init-link-speed which may do this. But we does not find out where it is been used. We also tried with the config set to <0x1>, but it does not take effect.
Questions:
If the tegra PCIe using the new method?
If the answer to Q1 is no, where is the implement of nvidia,init-link-speed if it is been implement?
For this circumstance, Could you help us solving it? If all done in kernel is complicated, may we set the speed with gen-1 during boot, and then set the speed to gen-4 from user space through sysfs?
With nvidia,init-speed = 0x1; I failed to build the dtb. So I change it to nvidia,init-speed = <0x1>;. But it does not help with enumerating the pcie device.