Problem with PCIe connector

Hello, we are developping a system based on an FPGA card communicating through a PCI-express link with a linux-based host. We successfully tested the system using the Jetson-TK1 board using the onboard mini-PCIe connector, and we wanted to move to the TX1.

Since the TX1 doesn’t have a miniPCIe connector, we bought a mini-PCIe to PCIe adapter to connect it on the PCIe connector on the TX1 (J2).
The adapter board we used is a StarTech PEX2MPEX PCI Express to Mini PCI Express Adapter Card (https://www.startech.com/Cards-Adapters/Slot-Extension/PCI-Express-to-Mini-PCI-Express-Card-Adapter~PEX2MPEX)
We have removed and reinserted the card a few times, every time the FPGA needed to be reprogrammed. We used to do the same thing on the TK1 (except no adpater needed there).
This solution worked for roughly a month, when we started noticing the first disconnection problem (signalled by AER errors on the PCI link).
Finally the link stopped working altogether.

During the boot phase, if we issue a pci enum instruction then the link seems to be down:

Tegra210 (P2371-2180) # pci enum
tegra-pcie: PCI regions:
tegra-pcie: I/O: 0x0000000012000000-0x0000000012010000
tegra-pcie: non-prefetchable memory: 0x0000000013000000-0x0000000020000000
tegra-pcie: prefetchable memory: 0x0000000020000000-0x0000000040000000
tegra-pcie: 4x1, 1x1 configuration
tegra-pcie: probing port 0, using 4 lanes
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, ignoring
tegra-pcie: probing port 1, using 1 lanes
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, ignoring

Likewise, issuing an lspci command in Linux does not return anything.

The same behavior occurs if, instead of our FPGA board, we connect a PCIe USB Adapter (StarTech 2-Port USB 3.0 SuperSpeed PCIe PCI Express Adapter Card (PI40200-2X2D).

Suspecting a problem on the connector, we decided to move to the M2 connector (J18), as we envisage to use this in the final design of our system.
This however did not solve the problem, as no PCI peripheral was detected.

We therefore removed the processing module (the one with the SoC, RAM, flash etc.) from the carrier board, and swapped it with another one from a collaborator’s TX1.
After doing this, both the PCI-USB adapter and our FPGA card were correctly detected.

To rule out a software problem we then decided to restore the system to a fresh image. We have followed the procedure described at http://developer.download.nvidia.com/embedded/L4T/r23_Release_v2.0/l4t_quick_start_guide.txt?autho=1461340513_4f1dab19809ca59091608b46ffbbc759&file=l4t_quick_start_guide.txt,
but now we are not able to boot the card anymore… if we connect a serial cable and examine the boot procedure we see the following messages:

U-Boot 2015.07-rc2-g2ac3917 (Nov 09 2015 - 13:12:08 -0800)

TEGRA210
Model: NVIDIA P2371-2180
DRAM: 4 GiB
MC: Tegra SD/MMC: 0, Tegra SD/MMC: 1
*** Warning - bad CRC, using default environment

tegra-pcie: PCI regions:
tegra-pcie: I/O: 0x0000000012000000-0x0000000012010000
tegra-pcie: non-prefetchable memory: 0x0000000013000000-0x0000000020000000
tegra-pcie: prefetchable memory: 0x0000000020000000-0x0000000040000000
tegra-pcie: 4x1, 1x1 configuration
tegra-pcie: probing port 0, using 4 lanes
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, ignoring
tegra-pcie: probing port 1, using 1 lanes
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, ignoring
In: serial
Out: serial
Err: serial
Net: No ethernet found.
Hit any key to stop autoboot: 0
Card did not respond to voltage select!
switch to partitions #0, OK
mmc0(part 0) is current device
Scanning mmc 0:1…
starting USB…
USB0: USB EHCI 1.10
scanning bus 0 for devices… 1 USB Device(s) found
scanning usb for storage devices… 0 Storage Device(s) found
scanning usb for ethernet devices… 0 Ethernet Device(s) found

USB device 0: unknown device
No ethernet found.
missing environment variable: pxeuuid
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/00000000
No ethernet found.
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/0000000
No ethernet found.
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/000000
No ethernet found.
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/00000
No ethernet found.
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/0000
No ethernet found.
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/000
No ethernet found.
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/00
No ethernet found.
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/0
No ethernet found.
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/default-arm-tegra210
No ethernet found.
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/default-arm
No ethernet found.
missing environment variable: bootfile
Retrieving file: pxelinux.cfg/default
No ethernet found.
Config file not found
No ethernet found.
Tegra210 (P2371-2180) #

What could have caused an issue on both the M2 connector and the PCIexpress one? Could this be linked to the fact that we are not able to re-initialize the system and not able to boot anymore?

Thanks in advance for all the help.

I do not know the cause, but the first place the log shows as different than a working R23.2 JTX1 with no add-on cards is this:

# actual:
Hit any key to stop autoboot: 0
<b>Card did not respond to voltage select!</b>
switch to partitions #0, OK
mmc0(part 0) is current device
# should be:
Hit any key to stop autoboot:  0 
MMC: no card present
switch to partitions #0, OK

Whatever the meaning is behind this is the key:
Card did not respond to voltage select!