Error in pcie driver when using NVME 'drive' on TX2

Dmesg shows an error during kernel initialization. Will attach the relevant lines here. Btw, this same nvme ‘drive’ works just fine when installed in the host but the TX2 refuses to cooperate. Notice at 14.926.

[   14.919363] iommu: Adding device 0000:00:01.0 to group 54
[   14.919425] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[   14.919528] pci 0000:01:00.0: [1b85:6018] type 00 class 0x010802
[   14.919574] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
[   14.919775] iommu: Adding device 0000:01:00.0 to group 55
[   14.924807] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[   14.924889] pci 0000:00:01.0: BAR 8: assigned [mem 0x50100000-0x501fffff]
[   14.924892] pci 0000:01:00.0: BAR 0: assigned [mem 0x50100000-0x50103fff 64bit]
[   14.924907] pci 0000:00:01.0: PCI bridge to [bus 01]
[   14.924912] pci 0000:00:01.0:   bridge window [mem 0x50100000-0x501fffff]
[   14.924967] pcieport 0000:00:01.0: enabling device (0000 -> 0002)
[   14.925039] pcieport 0000:00:01.0: Signaling PME through PCIe PME interrupt
[   14.925040] pci 0000:01:00.0: Signaling PME through PCIe PME interrupt
[   14.925044] pcie_pme 0000:00:01.0:pcie01: service driver pcie_pme loaded
[   14.925104] aer 0000:00:01.0:pcie02: service driver aer loaded
[   14.925915] nvme 0000:01:00.0: enabling device (0000 -> 0002)
[   14.926097] tegra-pcie 10003000.pcie-controller: speed change : Gen-1 -> Gen-2  <<<-------------****
[   15.079451] pcieport 0000:00:01.0: AER: Uncorrected (Non-Fatal) error received: id=0020
[   15.079462] pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0008(Requester ID)
[   15.079464] pcieport 0000:00:01.0:   device [10de:10e5] error status/mask=00004000/00000000
[   15.079467] pcieport 0000:00:01.0:    [14] Completion Timeout     (First)
[   15.079472] pcieport 0000:00:01.0: broadcast error_detected message
[   15.079474] pcieport 0000:00:01.0: AER: Device recovery failed
[   15.079559] nvme 0000:01:00.0: Failed status: ffffffff, reset controller
[   15.079565] pcieport 0000:00:01.0: AER: Uncorrected (Non-Fatal) error received: id=0020
[   15.079573] pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0008(Requester ID)
[   15.079576] pcieport 0000:00:01.0:   device [10de:10e5] error status/mask=00004000/00000000
[   15.079577] pcieport 0000:00:01.0:    [14] Completion Timeout     (First)
[   15.079582] pcieport 0000:00:01.0: broadcast error_detected message
[   15.079584] pcieport 0000:00:01.0: AER: Device recovery failed
[   15.079586] pcieport 0000:00:01.0: AER: Uncorrected (Non-Fatal) error received: id=0020
[   15.079594] pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0008(Requester ID)
[   15.079596] pcieport 0000:00:01.0:   device [10de:10e5] error status/mask=00004000/00000000
[   15.079598] pcieport 0000:00:01.0:    [14] Completion Timeout     (First)
[   15.079602] pcieport 0000:00:01.0: broadcast error_detected message
[   15.079603] pcieport 0000:00:01.0: AER: Device recovery failed
[   15.079639] pcieport 0000:00:01.0: AER: Uncorrected (Non-Fatal) error received: id=0020
[   15.079647] pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0008(Requester ID)
[   15.079649] pcieport 0000:00:01.0:   device [10de:10e5] error status/mask=00004000/00000000
[   15.079651] pcieport 0000:00:01.0:    [14] Completion Timeout     (First)
[   15.079659] nvme 0000:01:00.0: Cancelling I/O 1 QID 0
[   15.079669] pcieport 0000:00:01.0: broadcast error_detected message
[   15.079670] pcieport 0000:00:01.0: AER: Device recovery failed
[   15.079672] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.079689] pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0008(Requester ID)
[   15.079691] pcieport 0000:00:01.0:   device [10de:10e5] error status/mask=00004000/00000000
[   15.079692] pcieport 0000:00:01.0:    [14] Completion Timeout     (First)
[   15.079697] pcieport 0000:00:01.0: broadcast error_detected message
[   15.079698] pcieport 0000:00:01.0: AER: Device recovery failed
[   15.079742] nvme 0000:01:00.0: Device failed to resume
[   15.080748] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081623] pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0008(Requester ID)
[   15.081625] pcieport 0000:00:01.0:   device [10de:10e5] error status/mask=00004000/00000000
[   15.081627] pcieport 0000:00:01.0:    [14] Completion Timeout     (First)
[   15.081631] pcieport 0000:00:01.0: broadcast error_detected message
[   15.081633] pcieport 0000:00:01.0: AER: Device recovery failed
[   15.081635] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081643] pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0008(Requester ID)
[   15.081645] pcieport 0000:00:01.0:   device [10de:10e5] error status/mask=00004000/00000000
[   15.081647] pcieport 0000:00:01.0:    [14] Completion Timeout     (First)
[   15.081651] pcieport 0000:00:01.0: broadcast error_detected message
[   15.081654] iommu: Removing device 0000:01:00.0 from group 55
[   15.081655] pcieport 0000:00:01.0: AER: Device recovery failed
[   15.081658] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081666] pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0008(Requester ID)
[   15.081668] pcieport 0000:00:01.0:   device [10de:10e5] error status/mask=00004000/00000000
[   15.081669] pcieport 0000:00:01.0:    [14] Completion Timeout     (First)
[   15.081673] pcieport 0000:00:01.0: broadcast error_detected message
[   15.081675] pcieport 0000:00:01.0: AER: Device recovery failed
[   15.081677] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081685] pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0008(Requester ID)
[   15.081686] pcieport 0000:00:01.0:   device [10de:10e5] error status/mask=00004000/00000000
[   15.081688] pcieport 0000:00:01.0:    [14] Completion Timeout     (First)
[   15.081693] pcieport 0000:00:01.0: broadcast error_detected message
[   15.081694] pcieport 0000:00:01.0: broadcast mmio_enabled message
[   15.081696] pcieport 0000:00:01.0: broadcast resume message
[   15.081699] pcieport 0000:00:01.0: AER: Device recovery successful
[   15.081701] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081706] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081707] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081712] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081713] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081718] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081720] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081724] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081726] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081730] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081732] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081736] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081738] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081742] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081744] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081748] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081750] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081754] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081756] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081760] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081762] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081766] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081768] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081772] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081774] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081778] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081780] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081785] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081786] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081791] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081792] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081797] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081798] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081803] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081804] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081809] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081811] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081815] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081817] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081821] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081823] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081827] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081829] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081833] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081835] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081839] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081841] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081845] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081847] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081852] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081853] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081858] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081859] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081864] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081865] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081870] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081871] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081876] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081878] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081882] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081884] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081888] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081890] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081894] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081896] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081900] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081902] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081906] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081908] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081912] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081914] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081919] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081920] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081925] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081926] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081931] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081933] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081937] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081939] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081943] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081945] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081949] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081951] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081955] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081957] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081961] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081963] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081968] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081969] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081974] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081975] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081980] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081982] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081986] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081988] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081992] pcieport 0000:00:01.0: can't find device of ID0020
[   15.081994] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.081998] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082000] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082004] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082006] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082010] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082012] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082017] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082018] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082023] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082024] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082029] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082031] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082035] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082037] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082041] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082043] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082047] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082049] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082053] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082055] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082059] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082061] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082066] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082067] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082072] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082073] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082078] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082079] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082084] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082086] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082090] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082092] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082096] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082098] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082102] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082104] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082108] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082110] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082114] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082116] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082121] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082122] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082127] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082128] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082133] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082135] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082139] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082141] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082145] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082147] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082151] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082154] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082158] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082160] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082164] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082166] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082170] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082172] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082176] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082178] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082182] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082184] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082188] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082190] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082194] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082196] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082201] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082202] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082207] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082208] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082213] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082214] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082219] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082221] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082225] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082227] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082231] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082233] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082237] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082239] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082243] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082245] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082249] pcieport 0000:00:01.0: can't find device of ID0020
[   15.082251] pcieport 0000:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0020
[   15.082256] pcieport 0000:00:01.0: can't find device of ID0020
[   15.604350] cfg80211: World regulatory domain updated:
[   15.604354] cfg80211:  DFS Master region: unset
[   15.604355] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
[   15.604358] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)

Is this the same NVMe we looked at earlier where the character device showed up, but not the block device? If so, then previously it had backed off to PCIe v1 speeds, but was working without error. If this is the case, then something has degraded which had worked previously.

It’s the same unit. The nvme still works on the host but never did on the TX2. The above dmesg is from the TX2.

Hi Skypuppy,

We are clarifying this issue. Will update more information later.

Sorry for the inconvenience.

Please use ‘pcie_aspm=off’ to kernel command line.

Thank you, vidyas.
I first tried it in extlinux.conf but upon reboot, it doesn’t even see the nvme at all. I mean no mention of nvme in dmesg. Weird So I tried to find other places in grub2 to to insert it. But, I can’t find just where to poke that command in. There is no /etc/default/grub directory or the file /etc/default/grub. Neither is there a /boot/grub/menu.1st or a boot/grub/grub.cfg. So where do I insert the aspm command, please?

Thanks.

you can add that to line starting with APPEND in /boot/extlinux/extlinux.conf file on target.

Yes, that was the first thing I tried but that appeared to make matters worse as upon reboot nvme did not show up anywhere, in dmesg or anywhere else.

If PCIe is failing (such as in the first post), then even if the ‘pcie_aspm=off’ parameter is good the result won’t show up. Your earliest post elsewhere did not have the errors in the lspci from above (in that case it was backed down to PCIe v1 speeds, but no errors). I’m wondering if you can disconnect power, discharge capacitors by holding power button on for 10 seconds, and then after a minute re-seat the card…perhaps “sudo lspci -vvv” will get past previous errors (I’m being extra cautious in removing/re-seating because of testing). If the device can operate at even PCIe v1 speeds then the ‘pcie_aspm=off’ test can be attempted…if it can’t operate at even PCIe v1 speeds, then testing won’t be possible.

Interesting. With aspm set to off, lspci -vvv gives no output whatsoever. It’s like the entire pcie bus is completely turned off, which indeed it may be. As soon as I removed the kernel command, lspci -vvv worked again.

However, I’m getting odd errors during boot now. I think it’s time to reflash as if it’s a brand new TX2. I will leave the nvme disk in the pcie slot during the reflash and see if the proper drivers are loaded. Much time wasted so far.

linuxdev, with aspm on and even with the nvme disconnected (using the ultra-conservative method) lspci gives no output. When I removed the kernel command, lspci -vvv showed the normal pcie info.

The way the memory controller and PCIe are arranged lspci won’t show the root complex or anything else for PCIe unless something is connected to it. With your card connected it should always show.

The aspm on/off is @vidyas’s test, he’ll have to tell you what he’s looking for. However, I would think that active power management being disabled would keep the drive active at all times and be the case when the PCIe device always shows up under lspci. With aspm on it would be possible for some sort of bug with low power state tp get in the way by turning off the drive when it shouldn’t be off…I suspect this is what the “off” test is…to force drive power to always be “on”.

I posted a detailed report on the nvme tests right here and it disappeared into the bit bucket. Grrr. Here is the summary:

I reflashed the TX2 from scratch and did a lot of tests.
Basically, with many variations of testing, the only way the nvme works correctly is if the micro USB port on the TX2 is connected to a USB port on the host. No other combination works but that.

How do I fix this, please?

Was your NVMe test report added as an attachment to your post? If so, then the virus scanner may have lost it.

That’s a curious combination. What is your “lsusb -t” when NVMe can be detected, and again when NVMe cannot be detected? PCIe should not depend on this (perhaps conflicting drivers are involved). I am also curious if it matters what is plugged in to the micro USB…in normal operation the micro-USB port is a host, and the supplied micro-B USB cable going to another host should be inert (host-to-host connections via micro-B should be ignored at both sides).

This is truly bizarre. NVME shows up and works as long as the micro USB cable is plugged into the host. Does not work when plugged into just the USB hub.


On TX2, lsusb -t, when micro USB cable is plugged in. Detected:

nvidia@tegra-ubuntu:~ lsusb -t /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci-tegra/3p, 5000M /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci-tegra/4p, 480M &&&&&&&&&&&&&&&&&&&&&& nvidia@tegra-ubuntu:~ ls -al /dev/nvme*
crw------- 1 root root 237, 0 Jun 2 16:06 /dev/nvme0
brw-rw---- 1 root disk 259, 10 Jun 2 16:06 /dev/nvme0n1
brw-rw---- 1 root disk 259, 11 Jun 2 16:06 /dev/nvme0n1p1

on host:

Jack:~/InstallJetPack3.0/64_TX2/NVIDIA_CUDA-8.0_Samples/2_Graphics/Mandelbrot$ lsusb -t
/: Bus 11.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 5000M
/: Bus 10.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
/: Bus 09.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 5000M
/: Bus 08.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
/: Bus 07.Port 1: Dev 1, Class=root_hub, Driver=ohci-pci/4p, 12M
/: Bus 06.Port 1: Dev 1, Class=root_hub, Driver=ohci-pci/2p, 12M
/: Bus 05.Port 1: Dev 1, Class=root_hub, Driver=ohci-pci/5p, 12M
|__ Port 5: Dev 2, If 0, Class=Vendor Specific Class, Driver=, 12M
/: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=ohci-pci/5p, 12M
|__ Port 1: Dev 2, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M
|__ Port 5: Dev 3, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M
|__ Port 5: Dev 3, If 1, Class=Human Interface Device, Driver=usbhid, 1.5M
/: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/4p, 480M
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/5p, 480M


And with micro USB cable removed from host:

nvidia@tegra-ubuntu:~ lsusb -t /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci-tegra/3p, 5000M /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci-tegra/4p, 480M &&&&&&&&&&&&&& nvidia@tegra-ubuntu:~ ls -al /dev/nvme*
crw------- 1 root root 237, 0 Jun 2 16:01 /dev/nvme0

[/code]

Grounding issue?

The “lsusb -t” on the Jetson side didn’t post correctly, I see “&&&&&&&&&&&&&&&&&&&&&&”. This is likely a font issue…e.g., Helvetica does not have the box drawing characters used with the tree view, but Deja Vu Sans does, and CSS stylesheets are setting fonts. Can you paste the Jetson side “lsusb -t” again, but replace any box drawing character with any other character, e.g., with an ‘x’?

@Skypuppy may be on to something…there is no way the USB between PC and Jetson should have any effect at all on software since they are both in host mode and Jetson is not in recovery mode. It seems more likely to be an electrical issue. Now if there were also insufficient power being delivered on the PCIe slot then grounding could change or improve this. Insufficient electrical power would also be consistent with less signal quality and backing off to lower speeds or failing entirely with a bit of degradation (the extra ground can help both power delivery and noise rejection).

Consider that this drive is also PCIe v3 capable…which requires more power if using v3. The drive (when it works) backs off to v1 speeds…but there is a strong possibility that the manufacturer did not correctly test to see if a v1/v2 mode also reduced power requirement (imagine the difference between testing at v1/v2 speeds on a v3 slot which was told to reduce to v1/v2 speeds, versus testing in a PCIe slot which never had v3 available and thus was never designed for more power delivery…and also imagine that many PCIe slots used for testing may provide more power delivery than standards require such that testing if it works or doesn’t work is not quite valid). The Jetson PCIe is validated for v1/v2, not v3.

It would be quite interesting to see how much power the PCIe slot is capable of delivering, and compare to the amount of power the PCIe NVMe is in need of.

Does anyone know how to measure power being used by the PCIe slot device at any particular moment in time?

LOL. No, the “&” and “-” characters were put there by me to designate a separator between text blocks. Done on purpose. :)

Re power: I ran some tests on power draw (in watts) on the TX2 earlier this week. Without the nvme drive and with the clocks all running at max, power draw was about 5 watts. With the nvme drive and wired ethernet, it grabbed another 2 watts. Still, at max draw, it equated to about $6 - $7 dollars per year in electricity costs. Yeah, that’s right, per YEAR.

@linuxdev, I should load and run hwinfo again, with the micro USB connected so we can see if it still tries to drop back to v1.

And can you guys think of a (sort-of) easy way to hook a good ground to the TX2 and we can see if that helps things? The only thing I can think of involves quite a bit of hassle and some really, really long wire (like 100 feet.) :)

In terms of power draw you’d need to know not just what the NVMe is drawing (~2 watts)…you’d also need to know what the drive needs to run correctly, and perhaps if the voltage at the PCIe connector drops or goes up depending on micro USB connection. One thing which might be interesting…turn things on with the USB to host connected, get it where the NVMe is detected. Measure current draw while this is running. Without stopping the system, unplug the micro USB and see what changes. Probably you also want to run “dmesg --follow” to see what messages pop up in the log as you disconnect. Note that all of this may get more complicated because the power draw on the drive will change depending on what commands are issued to it…there would be spikes and increases in power usage as the drive actually starts useful function.

In terms of noise…if the issue is not power draw and is instead noise, then there are many grounds which might work…you’ll need to get creative. Noise grounding really only depends on AC, there is no need in RF for a DC ground. E.g., a quality capacitor between the micro-USB connector and the host at its USB connector might do the trick. If it does, then this rules out DC changes from the host causing the PCIe to work.