Hmm, I can’t replicate the not-lighting-LED problem anymore, the M.2->mPCIe card seems to work correctly now regardless of using a patched kernel or not. I’m not sure what changed, but now it starts up every time.
However, the mPCIe cards connected to it still behave wrong. I tried both with patched and unpatched kernel, and it seems to me the patch did not help (and usually made things even worse, i.e. none of the devices worked after reboot). The Ethernet card more or less worked with the unpatched kernel, but there were spurious errors like CPU freeze and watchdog-induced reboot. After such reboot, the ethernet controller endpoints would not be found and appeared only after power off/on (however, the PCIe switch on the mPCIe card appeared - all three devices; they were just missing the ethernet endpoints).
The USB controller card was even worse now - it started correctly on coldboot, but warmboots almost always ended up in stuck CPUs. Even after the coldboot start, as soon as I instered a flash drive, something happened on the PCI bus and the controller disappeared (I tested this both with the patch and without it). This is what appeared on UART console:
[ 35.915792] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd
[ 35.941499] usb 2-2: New USB device found, idVendor=125f, idProduct=de7a
[ 35.941565] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 35.941610] usb 2-2: Product: ADATA USB Flash Drive
[ 35.941651] usb 2-2: Manufacturer: ADATA
[ 35.941693] usb 2-2: SerialNumber: *****
[ 35.948270] usb-storage 2-2:1.0: USB Mass Storage device detected
[ 35.952584] scsi host0: usb-storage 2-2:1.0
[ 36.982727] scsi 0:0:0:0: Direct-Access ADATA USB Flash Drive 1.00 PQ: 0 ANSI: 6
[ 36.991944] sd 0:0:0:0: [sda] 30720000 512-byte logical blocks: (15.7 GB/14.6 GiB)
[ 36.982727] scsi 0:0:0:0: Direct-Acce[ 37.000049] sd 0:0:0:0: [sda] Write Protect is off
ss ADATA USB Flash Drive 1.00 PQ[ 37.008579] sd 0:0:0:0: [sda] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA
: 0 ANSI: 6
[ 36.991944] sd 0:0:0:0: [sda] 30720000 512-byte logical blocks: (15.7 GB/14.6 GiB)
[ 37.000049] sd 0:0:0:0: [sda] Write Protect is off
[ 37.007972] sd 0:0:0:0: [sda] Mode Sense: 23 00 00 00
[ 37.008579] sd 0:0:0:0: [sda] Wr[ 37.038818] sd 0:0:0:0: [sda] Attached SCSI removable disk
ite cache: disabled, read cache: disabled, doesn't support DPO or FUA
[ 37.034516] sda: sda1 sda2 sda4
[ 37.038818] sd 0:0:0:0: [sda] Attached SCSI removable disk
[ 38.408487] tegra-pcie 1003000.pcie: unexpected MSI
[ 48.498396] xhci_hcd 0000:01:00.0: xHCI host not responding to stop endpoint command.
[ 48.506285] xhci_hcd 0000:01:00.0: Assuming host is dying, halting host.
[ 48.498396] xhci_hcd 0000:01:00.0: xHCI host not resp[ 48.517527] xhci_hcd 0000:01:00.0: HC died; cleaning up
onding to stop endpoint command.
[ 48.506285] xhci_hcd 0000:01:00.0: Assuming host is dying, halting host.
[ 48.517527] xhci_hcd 0000:01:00.0: HC died; cleaning up
[ 48.527338] usb 2-2: USB disconnect, device number 2
[ 48.594234] blk_update_request: I/O error, dev sda, sector 768
[ 48.594215] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[ 48.594227] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 03 00 00 01 00 00
[ 48.594234] blk_update_request: I/O error, dev sda, sector 768
and another try
[ 17.604597] CPU0: SError detected, daif=1c0, spsr=0x200000c5, mpidr=80000000, esr=bf000002
[ 28.139736] CPU2: SError detected, daif=1c0, spsr=0x200000c5, mpidr=80000002, esr=bf000002
[ 28.139739] CPU1: SError detected, daif=1c0, spsr=0x600000c5, mpidr=80000001, esr=bf000002
[ 28.139758] tegra-xusb 70090000.xusb: controller firmware hang
[ 28.166652] tegra-xusb 70090000.xusb: WARN: xHC CMD_RUN timeout
[ 38.688936] CPU1: SError detected, daif=1c0, spsr=0x600000c5, mpidr=80000001, esr=bf000002
[ 38.688961] tegra-xusb 70090000.xusb: xhci_suspend() failed -110
[ 49.215816] CPU2: SError detected, daif=1c0, spsr=0x600000c5, mpidr=80000002, esr=bf000002
[ 49.215826] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 49.215827] INFO: rcu_preempt self-detected stall on CPU
[ 49.215835] 0-...: (1 GPs behind) idle=a47/140000000000001/0 softirq=7697/7702 fqs=0
[ 49.215840] 1-...: (1 GPs behind) idle=723/140000000000001/0 softirq=7437/7438 fqs=0
[ 49.215843] 1-...: (1 GPs behind) idle=723/140000000000001/0 softirq=7437/7438 fqs=0
[ 49.215846]
[ 49.215847] 2-...: (2 GPs behind) idle=073/1/0 softirq=8665/8666 fqs=0
[ 49.215852]
[ 49.215853] rcu_preempt kthread starved for 5269 jiffies! g342 c341 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
[ 49.216082] rcu_preempt kthread starved for 5269 jiffies! g342 c341 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x0
The tegra-pcie 1003000.pcie: unexpected MSI
error appeared both with the ethernet controller and the usb, and once it appeared, the card became basically unusable.