Connecting NVMe (x4) PCIe SSD on Jetson Developer Kit Carrier Board B01

Hello NVIDIA,

I am using Jetson Nano Developer Kit with Carrier Board revision B01.
I have flashed the nv-jetson-nano-sd-card-image-r32-3-1 with Etcher.

I am aware that carrier board has PCIe M.2 Key E connector that supports single line PCIe devices.
However, we have connected NVMe PCIe (x4) SSD over the breadboard with the M2 connector and added additional lines as “Test Points”.

lspci lists following output:
00:02.0 PCI bridge: NVIDIA Corporation Device 0faf (rev a1)
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)

The output of:
dmesg | grep '(PCI|pcie)'

is as follow:

[    0.000000]     PCI I/O : 0xffffffbefee00000 - 0xffffffbeffe00000   (    16 MB)
[    0.699735] PCI: CLS 0 bytes, default 64
[    0.984421] tegra-xusb-padctl 7009f000.xusb_padctl: dev = phy-pcie.3, lane = pcie-0, function = pcie-x1
[    0.984510] tegra-xusb-padctl 7009f000.xusb_padctl: dev = phy-pcie.4, lane = pcie-1, function = pcie-x4
[    0.984602] tegra-xusb-padctl 7009f000.xusb_padctl: dev = phy-pcie.5, lane = pcie-2, function = pcie-x4
[    0.984692] tegra-xusb-padctl 7009f000.xusb_padctl: dev = phy-pcie.6, lane = pcie-3, function = pcie-x4
[    0.984786] tegra-xusb-padctl 7009f000.xusb_padctl: dev = phy-pcie.7, lane = pcie-4, function = pcie-x4
[    0.984870] tegra-xusb-padctl 7009f000.xusb_padctl: dev = phy-pcie.8, lane = pcie-5, function = xusb
[    0.984953] tegra-xusb-padctl 7009f000.xusb_padctl: dev = phy-pcie.9, lane = pcie-6, function = xusb
[    0.994900] tegra-pcie 1003000.pcie: 4x1, 1x1 configuration
[    0.996195] tegra-pcie 1003000.pcie: PCIE: Enable power rails
[    0.996549] tegra-pcie 1003000.pcie: probing port 0, using 4 lanes
[    1.000737] tegra-pcie 1003000.pcie: probing port 1, using 1 lanes
[    1.088975] Intel(R) 10GbE PCI Express Linux Network Driver - version 4.6.4
[    1.092220] ehci-pci: EHCI PCI platform driver
[    1.092277] ohci-pci: OHCI PCI platform driver
[    1.440810] tegra-pcie 1003000.pcie: link 0 down, retrying
[    1.849946] tegra-pcie 1003000.pcie: link 0 down, retrying
[    2.262753] tegra-pcie 1003000.pcie: link 0 down, retrying
[    2.264825] tegra-pcie 1003000.pcie: link 0 down, ignoring
[    2.368745] tegra-pcie 1003000.pcie: PCI host bridge to bus 0000:00
[    2.378904] pci 0000:00:02.0: PCI bridge to [bus 01]
[    2.379118] pcieport 0000:00:02.0: Signaling PME through PCIe PME interrupt
[    2.379121] pci 0000:01:00.0: Signaling PME through PCIe PME interrupt
[    2.379125] pcie_pme 0000:00:02.0:pcie001: service driver pcie_pme loaded
[    2.379200] aer 0000:00:02.0:pcie002: service driver aer loaded
[ 3326.039613] pcieport 0000:00:02.0: AER: Corrected error received: id=0018
[ 3326.039624] pcieport 0000:00:02.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0010(Receiver ID)
[ 3326.049880] pcieport 0000:00:02.0:   device [10de:0faf] error status/mask=00000001/00002000
[ 3326.058332] pcieport 0000:00:02.0:    [ 0] Receiver Error         (First)

Do you have any suggestions on how and where to approach this problem?

I’m not clear on this. Could you please provide more info?
Since NVMe SSDs come in M.2 Key-M formfactor and Nano has M.2 Key-E formfactor, how is the connection made exactly?
What do you mean by over the breadboard?
What all lanes from M.2 Key-E slot are sampled and taken to the breadboard?

Hello vidyas,

Thanks for quick response.

Here are the images of what I mean.

At one point we managed to read the SSD with lspci:

01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981

but it didn’t work and it was throwing errors:

pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0008(Receiver ID)
pcieport 0000:00:01.0:      device: [10de:0fae] error status/mask=00000001/00002000
pcieport 0000:00:01.0:        [0] Receiver Error
pcieport 0000:00:01.0:        [7] Bad DLLP

and

pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0008(Receiver ID)
pcieport 0000:00:01.0:      device: [10de:0fae] error status/mask=00004000/00000000
pcieport 0000:00:01.0:        [14] Completion Timeout     (First)

Later we couldn’t read it and tried re-flashing OS, but no answer from SSD.

This corresponds to following scheme:

P.S. I couldn’t upload it in the same post because of one embedded file restriction.

Since PCIes REFCLK is 100 MHz and Tx/Rx lanes are at 2.5GHz / 5 GHz based on Gen-1 or Gen-2 speed, I don’t think this kind of connection works reliably. The fact that you got the device enumerated once confirms that all the Tx/Rx lanes along with sideband signals are routed correctly, but the AER errors that appeared in the log confirm that the link is not reliable. Please use a COTS M.2 Key-E to M.2 Key-M adapters (some thing like https://www.amazon.com/dp/B089VQXS32/ref=cm_sw_em_r_mt_dp_ezTdGb84WQ4WV ) for a reliable connection.

Thanks for the suggestion, but currently I am not able to get these adapters quickly and we need some quick evaluation with the SSD possibilities.

I measured REFCLK and no clock was generated on pins 162 and 160. All lines are connected.
Shouldn’t there be something - 100MHz?

I tried different params like suggested in NVME SSD drive visible in lspci, but not visible in fdisk - #13 by JohnFishMarket and related topics, but the problem is that I can’t even see it with lspci.

I checked

gunzip -c /proc/config.gz | grep CONFIG_BLK_DEV_NVME
gunzip -c /proc/config.gz | grep CONFIG_PCI_TEGRA

and they both return ‘y’

SSD was formatted on a computer with FAT32.

Do you maybe have any other ides to hint?

REFCLK would be available during the link up time and it will be removed if the link doesn’t come up, If the PCIe link is not coming up in your setup, then, you should be observing REFCLK for a very brief amount of time.
If you want to see the REFCLK continuously i.e. even when the PCIe link doesn’t come up, please remove “nvidia,enable-power-down” entry from the respective controllers device-tree entry.
With this, you may be able to see the REFCLK but not sure how is that going to help. To get the PCIe link up, we need the setup to follow PCIe spec recommendations (like trace length, coupling capacitance Etc…)

1 Like

Hi @vidyas,
Thank you for the support!

For anyone with similar problem, after printing out the custom board and following PCIe spec recommendations we managed to read the SSD without any further changes to device-tree or boot arguments.
There also were no problems with formatting and mounting the device.

This means that problems were most likely in unreliable communication due to this “prototyping” wiring as @vidyas suggested.

Print from lspci :

01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981