CFExpress (NVMe) card not detected

Hi all,

I’m attempting to connect a SanDisk 128GB Extreme PRO CFExpress card to my TX2 developer kit via
a passive CFExpress <-> PCIe adapter card. It seems that the card is not detected on the PCIe bus at all (doesn’t show up in lspci, dmesg indicates no PCIE end points were detected). The same card+adapter combination is confirmed working in an x86 machine.

I have tried disabling the SMMU and changing the PCIe lane config from 4,0,2 to 2,1,1 to no avail. I have also tried an M.2 Samsung 970 EVO NVMe device via a similar adapter card in the TX2 and it works fine.

My dmesg output is here: https://upaste.anastas.io/c5awx3

Any help would be greatly appreciated.

Thanks,
Shawn

Hi,

Are you using the x4 pcie port on the TX2 devkit? Could you share what adapters you are using?

this card? https://www.amazon.com/SanDisk-128GB-Extreme-CFexpress-Card/dp/B07WVDV3FG
is it nvme card? it doesn’t seem NVMe to me
I can see a some specufic cardreader for the card: Amazon.com
Are you using an adapter like Amazon.com?
what is the specific model of the adapter you are using?

Yeah, this is the adapter I’m using in the x4 slot of the dev board: https://www.amazon.com/Ableconn-PEX-CF106-Express-Adapter-CFexpress/dp/B07NP2ZQD5?.

The CFExpress standard is just a different form factor for NVMe, which is why the adapter board contains so few components - it’s just adapting the physical connections. It’s also why the card is detected as an nvme device when used with the adapter on an x86 board.

Our team is also testing the CFexpress card with TX2 developer board, with the same adapter from Amazon and the Sandisk 128GB CFexpress card purchased on B & H. We also encountered the same issue that the TX2 cannot detect the CFexpress card. Have you figured it out?

Hi,

Could you also try the x1 pcie M.2 slot on devkit and see if it can work?
You need to change the ODMDATA in p2771-0000.conf.common to enable it.

Unfortunately I don’t have a way to test it with the M.2 slot as I lack an appropriate adapter.

I was able to try another CFExpress card (https://www.bhphotovideo.com/c/product/1531996-REG/sony_cebg128_j_128gb_cfexpress_type_b.html) which actually shows up on the PCIe bus, but when I try transferring a large amount of data to it, it seems to drop off the bus:

[  165.866151] nvme 0000:01:00.0: Failed status: 0xffffffff, reset controller.
[  165.873393] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0008(Transmitter ID)
[  165.883939] pcieport 0000:00:01.0:   device [10de:10e5] error status/mask=00001100/00002000
[  165.892301] pcieport 0000:00:01.0:    [ 8] RELAY_NUM Rollover
[  165.894212] nvme nvme0: Removing after probe failure status: -19
[  165.894427] blk_update_request: I/O error, dev nvme0n1, sector 2732031
[  165.894430] blk_update_request: I/O error, dev nvme0n1, sector 2738175
[  165.894435] EXT4-fs warning (device nvme0n1p1): ext4_end_bio:313: I/O error -5 writing to inode 12 (offset 1182793728 size 8388608 starting block 341759)
[  165.894439] EXT4-fs warning (device nvme0n1p1): ext4_end_bio:313: I/O error -5 writing to inode 12 (offset 1191182336 size 8388608 starting block 342527)
[  165.894440] Buffer I/O error on device nvme0n1p1, logical block 333312
[  165.894442] Buffer I/O error on device nvme0n1p1, logical block 334080
[  165.894448] Buffer I/O error on device nvme0n1p1, logical block 333313
[  165.894449] Buffer I/O error on device nvme0n1p1, logical block 334081
[  165.894450] Buffer I/O error on device nvme0n1p1, logical block 333314
[  165.894451] Buffer I/O error on device nvme0n1p1, logical block 334082
[  165.894452] Buffer I/O error on device nvme0n1p1, logical block 333315
[  165.894454] Buffer I/O error on device nvme0n1p1, logical block 334083
[  165.894455] Buffer I/O error on device nvme0n1p1, logical block 333316
[  165.894456] Buffer I/O error on device nvme0n1p1, logical block 334084
[  165.894859] blk_update_request: I/O error, dev nvme0n1, sector 2740223
[  165.894865] EXT4-fs warning (device nvme0n1p1): ext4_end_bio:313: I/O error -5 writing to inode 12 (offset 1191182336 size 8388608 starting block 342783)
[  165.894869] blk_update_request: I/O error, dev nvme0n1, sector 2736127
[  165.894875] EXT4-fs warning (device nvme0n1p1): ext4_end_bio:313: I/O error -5 writing to inode 12 (offset 1191182336 size 8388608 starting block 342271)
[  165.894906] blk_update_request: I/O error, dev nvme0n1, sector 2711551
[  165.894911] EXT4-fs warning (device nvme0n1p1): ext4_end_bio:313: I/O error -5 writing to inode 12 (offset 1174405120 size 8388608 starting block 339199)
[  165.895237] blk_update_request: I/O error, dev nvme0n1, sector 2742271
[  165.895242] EXT4-fs warning (device nvme0n1p1): ext4_end_bio:313: I/O error -5 writing to inode 12 (offset 1191182336 size 8388608 starting block 343039)
[  165.895276] blk_update_request: I/O error, dev nvme0n1, sector 2637823
[  165.895281] EXT4-fs warning (device nvme0n1p1): ext4_end_bio:313: I/O error -5 writing to inode 12 (offset 1140850688 size 8388608 starting block 329983)
[  165.895597] blk_update_request: I/O error, dev nvme0n1, sector 2744319
[  165.895602] EXT4-fs warning (device nvme0n1p1): ext4_end_bio:313: I/O error -5 writing to inode 12 (offset 1191182336 size 8388608 starting block 343295)
[  165.895628] blk_update_request: I/O error, dev nvme0n1, sector 2639871
[  165.895633] EXT4-fs warning (device nvme0n1p1): ext4_end_bio:313: I/O error -5 writing to inode 12 (offset 1140850688 size 8388608 starting block 330239)
[  165.895953] blk_update_request: I/O error, dev nvme0n1, sector 2746367
[  165.895958] EXT4-fs warning (device nvme0n1p1): ext4_end_bio:313: I/O error -5 writing to inode 12 (offset 1191182336 size 8388608 starting block 343551)
[  166.188968] pcieport 0000:00:01.0:    [12] Replay Timer Timeout
[  166.195476] pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0008(Requester ID)
[  166.207704] pcieport 0000:00:01.0:   device [10de:10e5] error status/mask=00004000/00000000
[  166.216508] pcieport 0000:00:01.0:    [14] Completion Timeout     (First)

Shortly afterwards, I get a kernel panic and the TX2 reboots:

[  363.594862] INFO: task kworker/u12:2:49 blocked for more than 120 seconds.
[  363.601937]       Not tainted 4.9.140-tegra #1
[  363.606491] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  363.614828] Kernel panic - not syncing: hung_task: blocked tasks
[  363.620836] CPU: 5 PID: 672 Comm: khungtaskd Not tainted 4.9.140-tegra #1
[  363.627617] Hardware name: quill (DT)
[  363.631276] Call trace:
[  363.633723] [<ffffff800808bdb8>] dump_backtrace+0x0/0x198
[  363.639118] [<ffffff800808c37c>] show_stack+0x24/0x30
[  363.644168] [<ffffff800845d820>] dump_stack+0x98/0xc0
[  363.649218] [<ffffff80081c2198>] panic+0x11c/0x298
[  363.654006] [<ffffff8008181f90>] watchdog+0x300/0x3b8
[  363.659051] [<ffffff80080dbee4>] kthread+0xec/0xf0
[  363.663836] [<ffffff8008083850>] ret_from_fork+0x10/0x40
[  363.669146] SMP: stopping secondary CPUs
[  363.673070] Kernel Offset: disabled
[  363.676555] Memory Limit: none
[  363.679608] trusty-log panic notifier - trusty version Built: 22:43:54 Dec  9 2019 [  363.695673] Rebooting in 5 seconds..

As far as I can tell, it seems to be some sort of timing issue at the PCIe protocol layer. I tried forcing the link speed to PCIe Gen 1 with the following kernel changes, but it doesn’t seem to have helped.

--- a/drivers/pci/host/pci-tegra.c	2019-11-05 15:27:45.000000000 -0600
+++ b/drivers/pci/host/pci-tegra.c	2020-02-13 16:44:04.378100124 -0600
@@ -2712,6 +2712,7 @@
 	return err;
 }
 
+#if 0
 static void tegra_pcie_change_link_speed(struct tegra_pcie *pcie)
 {
 	struct device *dev = pcie->dev;
@@ -2772,12 +2773,13 @@
 				port->index);
 	}
 }
+#endif
 
 static void tegra_pcie_link_speed(struct tegra_pcie *pcie)
 {
 	PR_FUNC_LINE;
 
-	tegra_pcie_change_link_speed(pcie);
+	//tegra_pcie_change_link_speed(pcie);
 	tegra_pcie_scale_voltage(pcie);
 
 	return;

Any tips for further diagnosing the issue would be appreciated.