I am using Jetson Nano Developer Kit with Carrier Board revision B01.
I have flashed the nv-jetson-nano-sd-card-image-r32-3-1 with Etcher.
I am aware that carrier board has PCIe M.2 Key E connector that supports single line PCIe devices.
However, we have connected NVMe PCIe (x4) SSD over the breadboard with the M2 connector and added additional lines as “Test Points”.
I’m not clear on this. Could you please provide more info?
Since NVMe SSDs come in M.2 Key-M formfactor and Nano has M.2 Key-E formfactor, how is the connection made exactly?
What do you mean by over the breadboard?
What all lanes from M.2 Key-E slot are sampled and taken to the breadboard?
Since PCIes REFCLK is 100 MHz and Tx/Rx lanes are at 2.5GHz / 5 GHz based on Gen-1 or Gen-2 speed, I don’t think this kind of connection works reliably. The fact that you got the device enumerated once confirms that all the Tx/Rx lanes along with sideband signals are routed correctly, but the AER errors that appeared in the log confirm that the link is not reliable. Please use a COTS M.2 Key-E to M.2 Key-M adapters (some thing like Amazon.com ) for a reliable connection.
REFCLK would be available during the link up time and it will be removed if the link doesn’t come up, If the PCIe link is not coming up in your setup, then, you should be observing REFCLK for a very brief amount of time.
If you want to see the REFCLK continuously i.e. even when the PCIe link doesn’t come up, please remove “nvidia,enable-power-down” entry from the respective controllers device-tree entry.
With this, you may be able to see the REFCLK but not sure how is that going to help. To get the PCIe link up, we need the setup to follow PCIe spec recommendations (like trace length, coupling capacitance Etc…)
For anyone with similar problem, after printing out the custom board and following PCIe spec recommendations we managed to read the SSD without any further changes to device-tree or boot arguments.
There also were no problems with formatting and mounting the device.
This means that problems were most likely in unreliable communication due to this “prototyping” wiring as @vidyas suggested.