Unfortunately I don’t have a way to test it with the M.2 slot as I lack an appropriate adapter.
I was able to try another CFExpress card (https://www.bhphotovideo.com/c/product/1531996-REG/sony_cebg128_j_128gb_cfexpress_type_b.html) which actually shows up on the PCIe bus, but when I try transferring a large amount of data to it, it seems to drop off the bus:
[ 165.866151] nvme 0000:01:00.0: Failed status: 0xffffffff, reset controller.
[ 165.873393] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0008(Transmitter ID)
[ 165.883939] pcieport 0000:00:01.0: device [10de:10e5] error status/mask=00001100/00002000
[ 165.892301] pcieport 0000:00:01.0: [ 8] RELAY_NUM Rollover
[ 165.894212] nvme nvme0: Removing after probe failure status: -19
[ 165.894427] blk_update_request: I/O error, dev nvme0n1, sector 2732031
[ 165.894430] blk_update_request: I/O error, dev nvme0n1, sector 2738175
[ 165.894435] EXT4-fs warning (device nvme0n1p1): ext4_end_bio:313: I/O error -5 writing to inode 12 (offset 1182793728 size 8388608 starting block 341759)
[ 165.894439] EXT4-fs warning (device nvme0n1p1): ext4_end_bio:313: I/O error -5 writing to inode 12 (offset 1191182336 size 8388608 starting block 342527)
[ 165.894440] Buffer I/O error on device nvme0n1p1, logical block 333312
[ 165.894442] Buffer I/O error on device nvme0n1p1, logical block 334080
[ 165.894448] Buffer I/O error on device nvme0n1p1, logical block 333313
[ 165.894449] Buffer I/O error on device nvme0n1p1, logical block 334081
[ 165.894450] Buffer I/O error on device nvme0n1p1, logical block 333314
[ 165.894451] Buffer I/O error on device nvme0n1p1, logical block 334082
[ 165.894452] Buffer I/O error on device nvme0n1p1, logical block 333315
[ 165.894454] Buffer I/O error on device nvme0n1p1, logical block 334083
[ 165.894455] Buffer I/O error on device nvme0n1p1, logical block 333316
[ 165.894456] Buffer I/O error on device nvme0n1p1, logical block 334084
[ 165.894859] blk_update_request: I/O error, dev nvme0n1, sector 2740223
[ 165.894865] EXT4-fs warning (device nvme0n1p1): ext4_end_bio:313: I/O error -5 writing to inode 12 (offset 1191182336 size 8388608 starting block 342783)
[ 165.894869] blk_update_request: I/O error, dev nvme0n1, sector 2736127
[ 165.894875] EXT4-fs warning (device nvme0n1p1): ext4_end_bio:313: I/O error -5 writing to inode 12 (offset 1191182336 size 8388608 starting block 342271)
[ 165.894906] blk_update_request: I/O error, dev nvme0n1, sector 2711551
[ 165.894911] EXT4-fs warning (device nvme0n1p1): ext4_end_bio:313: I/O error -5 writing to inode 12 (offset 1174405120 size 8388608 starting block 339199)
[ 165.895237] blk_update_request: I/O error, dev nvme0n1, sector 2742271
[ 165.895242] EXT4-fs warning (device nvme0n1p1): ext4_end_bio:313: I/O error -5 writing to inode 12 (offset 1191182336 size 8388608 starting block 343039)
[ 165.895276] blk_update_request: I/O error, dev nvme0n1, sector 2637823
[ 165.895281] EXT4-fs warning (device nvme0n1p1): ext4_end_bio:313: I/O error -5 writing to inode 12 (offset 1140850688 size 8388608 starting block 329983)
[ 165.895597] blk_update_request: I/O error, dev nvme0n1, sector 2744319
[ 165.895602] EXT4-fs warning (device nvme0n1p1): ext4_end_bio:313: I/O error -5 writing to inode 12 (offset 1191182336 size 8388608 starting block 343295)
[ 165.895628] blk_update_request: I/O error, dev nvme0n1, sector 2639871
[ 165.895633] EXT4-fs warning (device nvme0n1p1): ext4_end_bio:313: I/O error -5 writing to inode 12 (offset 1140850688 size 8388608 starting block 330239)
[ 165.895953] blk_update_request: I/O error, dev nvme0n1, sector 2746367
[ 165.895958] EXT4-fs warning (device nvme0n1p1): ext4_end_bio:313: I/O error -5 writing to inode 12 (offset 1191182336 size 8388608 starting block 343551)
[ 166.188968] pcieport 0000:00:01.0: [12] Replay Timer Timeout
[ 166.195476] pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0008(Requester ID)
[ 166.207704] pcieport 0000:00:01.0: device [10de:10e5] error status/mask=00004000/00000000
[ 166.216508] pcieport 0000:00:01.0: [14] Completion Timeout (First)
Shortly afterwards, I get a kernel panic and the TX2 reboots:
[ 363.594862] INFO: task kworker/u12:2:49 blocked for more than 120 seconds.
[ 363.601937] Not tainted 4.9.140-tegra #1
[ 363.606491] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 363.614828] Kernel panic - not syncing: hung_task: blocked tasks
[ 363.620836] CPU: 5 PID: 672 Comm: khungtaskd Not tainted 4.9.140-tegra #1
[ 363.627617] Hardware name: quill (DT)
[ 363.631276] Call trace:
[ 363.633723] [<ffffff800808bdb8>] dump_backtrace+0x0/0x198
[ 363.639118] [<ffffff800808c37c>] show_stack+0x24/0x30
[ 363.644168] [<ffffff800845d820>] dump_stack+0x98/0xc0
[ 363.649218] [<ffffff80081c2198>] panic+0x11c/0x298
[ 363.654006] [<ffffff8008181f90>] watchdog+0x300/0x3b8
[ 363.659051] [<ffffff80080dbee4>] kthread+0xec/0xf0
[ 363.663836] [<ffffff8008083850>] ret_from_fork+0x10/0x40
[ 363.669146] SMP: stopping secondary CPUs
[ 363.673070] Kernel Offset: disabled
[ 363.676555] Memory Limit: none
[ 363.679608] trusty-log panic notifier - trusty version Built: 22:43:54 Dec 9 2019 [ 363.695673] Rebooting in 5 seconds..
As far as I can tell, it seems to be some sort of timing issue at the PCIe protocol layer. I tried forcing the link speed to PCIe Gen 1 with the following kernel changes, but it doesn’t seem to have helped.
--- a/drivers/pci/host/pci-tegra.c 2019-11-05 15:27:45.000000000 -0600
+++ b/drivers/pci/host/pci-tegra.c 2020-02-13 16:44:04.378100124 -0600
@@ -2712,6 +2712,7 @@
return err;
}
+#if 0
static void tegra_pcie_change_link_speed(struct tegra_pcie *pcie)
{
struct device *dev = pcie->dev;
@@ -2772,12 +2773,13 @@
port->index);
}
}
+#endif
static void tegra_pcie_link_speed(struct tegra_pcie *pcie)
{
PR_FUNC_LINE;
- tegra_pcie_change_link_speed(pcie);
+ //tegra_pcie_change_link_speed(pcie);
tegra_pcie_scale_voltage(pcie);
return;
Any tips for further diagnosing the issue would be appreciated.