970 PRO NVMe M.2 1TB SSD is not detected on Xavier dev kit

The patch refers to line 334 of the tegra194-soc-pcie.dtsi, which is the pcie_ep@14180000, if I understand correctly.

Changed the lines 506 (num-lanes) and 536 (nvidia,max-speed) according the patch.

xxd /proc/device-tree/pcie@14180000/nvidia,max-speed
00000000: 0000 0003 …
xxd /proc/device-tree/pcie@14180000/num-lanes
00000000: 0000 0004 …

Samsung NVMe is still not showing up. Maybe problem with the dev kit board?

Update.

Tested with Intel SSD 660p SERIES, 512GB NVMe disk. The dev kit does not detect it either. Tried both stock and patched kernel.

Can this be a hardware fault? Is there anything, I can do, to verify, that the hardware is ok?

Did you try with the latest Jetpack release? GA4.4?

I use a 970 evo plus in my NX without issue. I use a 960 EVO on my AGX. It works for me on the stock kernel, so I do think there might be a hardware fault in your case. Seems you are missing a PCI bridge or two. My lspci is:

 $ lspci
0000:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad0 (rev a1)
0000:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961
0001:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad2 (rev a1)
0001:01:00.0 SATA controller: Marvell Technology Group Ltd. Device 9171 (rev 13)
0003:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad2 (rev a1)
0003:01:00.0 System peripheral: Device 1ac1:089a

My JetPack version is 4.4.

Can you clarify, 960 EVO is working on AGX but 960 EVO Pro is not working ?

@omp
On AGX, I am using a 960 EVO. On NX I am using a 970 EVO Plus. (No pro). I believe the Plus has a different controller, but I think the only difference between the 960 and 960 pro is the amount of SLC nand. I am not sure about the 970 Evo plus vs 970 Pro. They may have different controllers. I have no Pro to test with currently since they’re nearly twice the price and I don’t need the sustained writes.

Again I am not clear, you said “I use a 960 EVO on my AGX. It works for me on the stock kernel” did you mean it is working on some other host, not one nvidia ?

The last lspci output you have shared is from AGX ? It is showing nvme card detected on C0 controller.

@omp
I mean the default kernel provided by Nvidia (not mainline). Sorry if I wasn’t clear. It works fine for me “out of the box”.

Yes, the paste is from AGX

@mdegans,
Thanks for confirmation.

I mean the default kernel provided by Nvidia (not mainline)

So you are trying to update just kernel. We would need more details what you are try to do. let’s close this thread and create new thread with details like what steps you are following to update kernel. What is the kernel version etc .

Indeed the 0000:00:00.0 bridge is missing on the AGX here:
lspci

0001:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad2 (rev a1)

0001:01:00.0 SATA controller: Marvell Technology Group Ltd. Device 9171 (rev 13)

lspci -v
0001:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad2 (rev a1) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 35
Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
I/O behind bridge: 00000000-00000fff
Memory behind bridge: 40000000-400fffff
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Root Port (Slot-), MSI 00
Capabilities: [b0] MSI-X: Enable- Count=1 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [148] #19
Capabilities: [158] #26
Capabilities: [17c] #27
Capabilities: [190] L1 PM Substates
Capabilities: [1a0] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
Capabilities: [2a0] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
Capabilities: [2d8] #25
Capabilities: [2e4] Precision Time Measurement
Capabilities: [2f0] Vendor Specific Information: ID=0004 Rev=1 Len=054 <?>
Kernel driver in use: pcieport

0001:01:00.0 SATA controller: Marvell Technology Group Ltd. Device 9171 (rev 13) (prog-if 01 [AHCI 1.0])
Subsystem: Marvell Technology Group Ltd. Device 9171
Flags: bus master, fast devsel, latency 0, IRQ 564
I/O ports at 100010 [size=8]
I/O ports at 100020 [size=4]
I/O ports at 100018 [size=8]
I/O ports at 100024 [size=4]
I/O ports at 100000 [size=16]
Memory at 1230010000 (32-bit, non-prefetchable) [size=512]
Expansion ROM at 1230000000 [disabled] [size=64K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [70] Express Legacy Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: ahci

Is the missing bridge on the carrier board or on the AGX itself? Tested also a Intel PCIe network card on the carrier board connector. No luck with that either, the card does not show up.

Summary of the situation:

Samsung 970 Pro 1TB SSD: Does not show up on lspci and lsblk listings, tried with stock kernel and patched kernel

Intel SSD 660p Series 512GB: Does not show up on lspci and lsblk listings, tried with stock kernel and patched kernel

Intel PCIe network card on carrier board PCIe slot: Does not show up on lspci, only internal network card is visible on dmesg output. Tested with stock kernel.

Anything else, I can try to solve this, apparently there is no POST messaging on the AGX?

no pcie controller is missing. if EP is not attached to respective controller it will not show in list.

to list all controller even EP is not attached you can remove “nvidia,enable-power-down” property from all pcie nodes from file
$TOP/hardware/nvidia/platform/t19x/galen/kernel-dts/common/tegra194-p2888-0000-a00.dtsi

1 Like

tried, now all controllers are listed, but still no NVMe

lspci

0000:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad0 (rev a1)
0001:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad2 (rev a1)
0001:01:00.0 SATA controller: Marvell Technology Group Ltd. Device 9171 (rev 13)
0003:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad2 (rev a1)
0005:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad0 (rev a1)

“lspci -v” reports “lspci: Unable to load libkmod resources: error -12” on one of the controllers, what can cause this?

0000:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad0 (rev a1) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 33
Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Capabilities: [70] Express Root Port (Slot-), MSI 00
Capabilities: [b0] MSI-X: Enable- Count=8 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [148] #19
Capabilities: [168] #26
Capabilities: [190] #27
Capabilities: [1c0] L1 PM Substates
Capabilities: [1d0] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
Capabilities: [2d0] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
Capabilities: [308] #25
Capabilities: [314] Precision Time Measurement
Capabilities: [320] Vendor Specific Information: ID=0004 Rev=1 Len=054 <?>
Kernel driver in use: pcieport
lspci: Unable to load libkmod resources: error -12

0001:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad2 (rev a1) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 35
Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
I/O behind bridge: 00000000-00000fff
Memory behind bridge: 40000000-400fffff
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Root Port (Slot-), MSI 00
Capabilities: [b0] MSI-X: Enable- Count=1 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [148] #19
Capabilities: [158] #26
Capabilities: [17c] #27
Capabilities: [190] L1 PM Substates
Capabilities: [1a0] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
Capabilities: [2a0] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
Capabilities: [2d8] #25
Capabilities: [2e4] Precision Time Measurement
Capabilities: [2f0] Vendor Specific Information: ID=0004 Rev=1 Len=054 <?>
Kernel driver in use: pcieport

0001:01:00.0 SATA controller: Marvell Technology Group Ltd. Device 9171 (rev 13) (prog-if 01 [AHCI 1.0])
Subsystem: Marvell Technology Group Ltd. Device 9171
Flags: bus master, fast devsel, latency 0, IRQ 820
I/O ports at 100010 [size=8]
I/O ports at 100020 [size=4]
I/O ports at 100018 [size=8]
I/O ports at 100024 [size=4]
I/O ports at 100000 [size=16]
Memory at 1230010000 (32-bit, non-prefetchable) [size=512]
Expansion ROM at 1230000000 [disabled] [size=64K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [70] Express Legacy Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: ahci

0003:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad2 (rev a1) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 37
Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Root Port (Slot-), MSI 00
Capabilities: [b0] MSI-X: Enable- Count=1 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [148] #19
Capabilities: [158] #26
Capabilities: [17c] #27
Capabilities: [190] L1 PM Substates
Capabilities: [1a0] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
Capabilities: [2a0] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
Capabilities: [2d8] #25
Capabilities: [2e4] Precision Time Measurement
Capabilities: [2f0] Vendor Specific Information: ID=0004 Rev=1 Len=054 <?>
Kernel driver in use: pcieport

0005:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad0 (rev a1) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 39
Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Capabilities: [70] Express Root Port (Slot-), MSI 00
Capabilities: [b0] MSI-X: Enable- Count=8 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [148] #19
Capabilities: [168] #26
Capabilities: [190] #27
Capabilities: [1c0] L1 PM Substates
Capabilities: [1d0] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
Capabilities: [2d0] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
Capabilities: [308] #25
Capabilities: [314] Precision Time Measurement
Capabilities: [320] Vendor Specific Information: ID=0004 Rev=1 Len=054 <?>
Kernel driver in use: pcieport

no pcie controller is missing. if EP is not attached to respective controller it will not show in list.

Thx for correction! I was not aware of that.

  • Tested with all available JetPack versions from 4.2 to 4.4.
  • Tested with recommended patch, that lowers the num-lanes to 4 and nvidia,max-speed to 3.
  • Tested with two different NVMe SSD:s (Samsung, Intel)

None of the above works. Was able to test with an external mSATA PCIe card and could verify, that it works, can see the disks on the card.

Is there still something I could try?