Xavier NX failed to boot up to Desktop

Hi,

Our customized carrier board with NX SOM often shows the following pcie error messages while it is booting. System gets stuck at these error messages without entering Desktop.

[    3.717805] pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[    3.718075] pcieport 0004:00:00.0:   device [10de:1ad1] error status/mask=00004000/00400000
[    3.718266] pcieport 0004:00:00.0:    [14] Completion Timeout     (First)
[    3.718968] tegra-i2c 3190000.i2c: no acknowledge from address 0x50
[    3.723270] tegra-i2c 3190000.i2c: no acknowledge from address 0x50
[    3.724046] tegradc 15200000.nvdisplay: hdmi: edid read failed
[    3.793826] pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[    3.794230] pcieport 0004:00:00.0:   device [10de:1ad1] error status/mask=00004000/00400000
[    3.794492] pcieport 0004:00:00.0:    [14] Completion Timeout     (First)
[    3.869799] pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[    3.870234] pcieport 0004:00:00.0:   device [10de:1ad1] error status/mask=00004000/00400000
[    3.870503] pcieport 0004:00:00.0:    [14] Completion Timeout     (First)
[    3.953845] pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[    3.954248] pcieport 0004:00:00.0:   device [10de:1ad1] error status/mask=00004000/00400000
[    3.954511] pcieport 0004:00:00.0:    [14] Completion Timeout     (First)
[    4.037831] pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[    4.038269] pcieport 0004:00:00.0:   device [10de:1ad1] error status/mask=00004000/00400000
[    4.038539] pcieport 0004:00:00.0:    [14] Completion Timeout     (First)
[    4.117863] pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[    4.379118] pcieport 0004:00:00.0:   device [10de:1ad1] error status/mask=00004000/00400000
[    4.423130] pcieport 0004:00:00.0:    [14] Completion Timeout     (First)

Here is the full boot log:
boot_stuck_pciex1_0323.log (61.1 KB)

According to the “pcieport 0004:00:00.0: PCIe Bus Error” in log, we use a PCIe switch on the PCIE1 bus(Ctrl #4, PCIe x1). We are not sure if this boot failure issue is caused by it.

Could you help advise how to debug this kind of PCIe issue?

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Are these errors observed only when the functionality of the endpoints is exercised or otherwise also?
Is it possible to share the output of ‘sudo lspci -vvvv’?
I’m not sure about the status of ASPM states, could you please try adding ‘pcie_aspm=off’ to the kernel command line and see if it makes any difference?