TX1 <-> FPGA through PCIE

I have built a Spartan 6 LX45T FPGA development board with a PCIE X 1 interface.

I am educating myself about PCIE so if I sound ignorant on the subject please forgive me (and please help me!).

Verifying PCIE enumeration on a desktop computer
I want to interface the TX1 with the FPGA over PCIE. to get started I used the Xilinx tools to build a demo project for the FPGA board. I plugged the board into my desktop computer (Ubuntu 15.10 X64) and achieved enumeration. If I enter ‘lspci’ in the terminal and I can see that it correctly reads 10EE:0007 (Xilinx PCIE). This tells me that the board is correctly physically connected and the low level PCIE core called (PCIE_A1) is correctly responding to host queries.

Installing the board on the TX1 (with Jetpack 2.0 installed)
I then installed the board into the TX1 development board on the PCIE X 4 slot. I typed ‘lspci’ to query the PCIe bus and nothing, I then read through the post about how PCIE hotplugging is not supported by default:

https://devtalk.nvidia.com/default/topic/901924/?comment=4756269

This is not an issue for me so I reset the TX1 devboard and the TX1 would not boot. I then powered everything off plugged in a known good PCIE board and it booted just fine and I could detect this known good PCIE board using ‘lspci’

00:01.0 PCI Bridge: NVIDIA Corporation Device 0FAE (rev a1)
01:00.0 USB controller: VIA Technologies, Inc. VL80x xHCI USB 3.0 Controller (rev 03)

This means there is something funny going on on the FPGA side, or some configuration setting within the TX1 that is different from the desktop that works fine with normal PCIE devices but not with my configuration.

Debugging
I designed a set of tools to allow me to talk to cores within an FPGA. I configured it to read and write registers to/from the PCIE_A1. When the TX1 is booting up I can see the core is looping through various states of initialization including polling.config, polling.detect polling.compliance (I don’t think it gets passed polling to configuration). This means that it did not get though the out of bound initialization, or perhaps it did but the TX1 is not happy with the response. Unfortunately, I don’t have the internal logic analyzer to step through this debug but I’m looking into setting this up.

I then was able to get the TX1 to boot with the FPGA by disabling the PCIE_A1 core at the beginning part of boot up. It does not consistently
work but sometimes the TX1 boots up. Then when I type ‘lspci’ I see one device:

00:01.0 PCI Bridge: NVIDIA Corporation Device 0FAE (rev a1)

I don’t see the FPGA entry in lspci, I’m expecting VID:PID 10EE:0007.

I then output dmesg which is flooded with this:

… lots of the same …
[ 86.372079] pcieport 0000:00:01.0: can’t find device of ID0010
[ 86.372084] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[ 86.372092] pcieport 0000:00:01.0: can’t find device of ID0010
[ 86.387004] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[ 86.387017] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0008(Receiver ID)
[ 86.397259] pcieport 0000:00:01.0: device [10de:0fae] error status/mask=00000001/00002000
[ 86.405620] pcieport 0000:00:01.0: [ 0] Receiver Error

… lots of the same …

I’ve been researching this:

AER: is the Advanced Error Reporting tool that outputs more useful error data than a standard pcie device.

In order to figure out what’s going on I have been digging through kernel drivers, specifically:

kernel/drivers/pci/pcie/aer/aerdrv_core.c

and found that this line: “pcieport 0000:00:01.0: can’t find device of ID0010”

is within the function “pci_walk_bus” which, you might expect enumerates PCIE devices. It looks like the driver has a problem with device 0x10. I don’t know what ‘device’ means. Is my board device 0x10 or is it Nvidia’s PCIE Bridge, I’m guessing it’s the PCIE Bridge?

The other error:

pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0008(Receiver ID)
pcieport 0000:00:01.0: device [10de:0fae] error status/mask=00000001/00002000
pcieport 0000:00:01.0: [ 0] Receiver Error

10DE:0FAE is the vendor ID of NVIDIA while I suspect that the 0FAE is a product ID of their Bridge

I don’t know what the ‘Receiver ID’ is, it’s different than the above 0x10.

Unfortunately the error reporting basically let’s me know that there is an error on the physical layer but besides that I don’t know.

I suspect that the PCIE core within the FPGA may have reduced functionality compared to other PCIE devices. This is something I am looking into.

If anyone has any experience with this I would really appreciate any feedback as to what might be the issue or what I can do to help isolate this problem. Is there certain features of the PCIE bus that I can disable/enable by modifying the kernel b0ot command line options?

I appreciate any help.

Dave

I’m a long way from being able to explain the issue, but seeing physical layer tends to make me think about signal timings being out of spec. PCIe can be very picky about this (minimal error would result in a revision 2 capable card being throttled back to revision 1 speeds…this looks to be incapable of operation even at revision 1). The extra note about receiver error tends to support this.

FYI, each PCIe has on the host a TX and RX corresponding to the PCIe card’s RX and TX for full duplex operation. The advanced error detection is an out of band separate signal on a different wire running at much slower speed than the RX/TX pairs, and so it works even if other parts don’t. I don’t know if this is the case, but I could see a failed TX/RX causing software on one machine to skip the device and move on, or on another with not quite as robust software, getting stuck on the device. Can you describe in detail how the circuit board itself was designed (or a link to a manufacturer that created it)?

Also, when running on a working desktop machine, can you post the result of “lspci -v” for this device? I’d like to see rev. 1/rev. 2 capabilities and actual setting.

Thank you for getting back to me!

I designed the circuit boards. Here is the what they look like:

http://nysa.readthedocs.org/en/latest/boards.html#artemis

It may not look like it but there are two circuit boards. Without getting into too much details I wanted to design a board that users could change the way they talk to the FPGA so instead of one board with both a PCIE connection and an FPGA there are two boards. One with the Spartan 6 FPGA that is connected to the PCIe Board through a high speed connector. I worked with a guy from Samtec connectors to make sure it can handle the high speed signals. both the boards have impedance controlled routing of 50 ohms single ended and 100 ohms differential. I verified that my TX_P/TX_N are matched to within 0.005 inches of each other as well as RX_P/RX_N are matched to that length as well.

If you would like I can email you the schematics.

Output of lspci on the desktop computer

Here is the result from lspci -d 10ee:0007 -vvv

03:00.0 RAM memory: Xilinx Corporation Default PCIe endpoint ID
        Subsystem: Xilinx Corporation Default PCIe endpoint ID
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 11
        Region 0: Memory at d7fffc00 (32-bit, non-prefetchable) 
        Region 2: Memory at d8000000 (32-bit, non-prefetchable) 
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [58] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s unlimited, L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100 v1] Device Serial Number 00-00-00-00-00-00-c5-94

If need be I can change some settings on the FPGA, so that is an option to try out when the problem is better understood.

I didn’t know if this was important but here is the output of ‘uname -a’ of the desktop computer:

Linux quokka 4.2.0-16-generic #19-Ubuntu SMP Thu Oct 8 15:35:06 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

I am working on putting together the internal logic analyzer of the FPGA so that I can see what is happening within the low level PCIE core. When I get some waveforms I’ll post them.

Thanks again for any feedback.

Dave

These are just observations, not actual debugging, but since you designed the board this might be of use to you. First, I see the link advertises revision 1 speeds only, and so this is the speed which is easiest to meet…but at the same time, it has nowhere to fall back to if unable to handle this speed. Thus knowledge of why it falls back differently on one host versus another might help. This gets into details which might seem overly verbose, but this interface has some unusual traits compared to most interfaces.

[s]It helps to understand RF transmission lines for this, the analogy is directly applicable. When setting up a common transmitter (e.g., ham radio), the transmission line itself is chosen such that any reflection of signal at the far end of the line bounces back additively…forward and reflected need to be in phase. Because of this, any signal which is not absorbed at the end of the line causes voltages to go up. Without an antenna, the level goes up so much that typically the transmitter final output stage is destroyed from over-voltage. The antenna does the opposite, the antenna is designed such that any reflection from the far end is destructive and out of phase…energy can’t be created or destroyed, and so the antenna radiates and does what we would want.

Most electrical connections in the computer can detect the initial forward wave and specifications simply provide enough voltage in that wave (leading edge of a square wave) to detect the 1s or 0s. PCIe data lanes do not do this, the initial wave voltage is less than the differential voltage required by the receiver. To follow specifications, the voltage at the moment of the clock marking to read from the receiver must exceed this initial wave…the only way to do this is for the reflection to add to the forward wave like a transmission line. Once forward and reflected add, the voltage becomes sufficient for the receiver. Should the length of the traces not provide a correct reinforcing reflected wave, the signal will never reach sufficient voltage at the moment of the clock, and the PCIe data lane will have to either go to a slower clock rate (longer wave implies reduced physical construction requirements), or else have the lane lengths adjusted to “fix” this out-of-phase reflection.[/s]

In your situation the total trace length between host hardware and your add-on board differ depending on whether the x86 host is used or the TX1 (or for that matter, any motherboard). The total length of traces likely is somewhat out of phase when total length of host plus add-on board are combined on JTX1. Alternatively, the end point may be absorbing too much energy and the reinforcing reflected wave could be insufficient…but if this were the case, then the board would also fail on the desktop, and not just on the JTX1.

So in general you do want to be sure that each side of the differential pair for TX and RX are the same length and same shape to avoid the transmission line itself introducing errors. After that, the total length becomes easier to deal with, but is still something of an art. Sometimes slew rates or capacitance can be adjusted to emphasize leading or trailing edges, but I suspect there is a need to carefully tweak minor TX/RX lane traces.

@linuxdev, are you sure PCIe uses reflected-wave switching? I assumed it uses incident-wave switching.

I spent yesterday trying to incorporate a logic analyzer core into the build so I can capture the sequence of states that the PCIE_A1 core goes through during linkup. Unfortunately, I am having some issues. I was hoping to have an ‘apples to apples’ comparison between the desktop version of the linkup and the TX1 version so if I were to make a change in the TX1 I’ll know what to expect.

I don’t want to sound defensive about my board but I don’t feel that the signal integrity is the issue because on the board not only do I have a PCIE interface but I also have a SATA 2.0 interface that works at 3GHz and I can read and write data to/from the hard drive with no problem. Nonetheless I’ll research the signals. Perhaps it is what linuxdev was saying and due to some small variations in the TX1 and desktop motherboard there is enough of an impedance discontinuity to have these problems.

I’m going to spend some more energy on the logic analyzer and read up on PCIE.

Thanks again for the help and ideas.

Dave

@cioma: The information I have was from the book “PCI Express System Architecture”, authors Ravi Budruk, Don Anderson, and Tom Shanley. I’m looking closer at this and what I see says you are right. The comment in the book, although about PCIe, appears to be a comment about the older PCI bus. This was apparently just a comment on what has changed going from PCI to PCIe.

The part which would always apply is cross talk and impedance in general. Signal lengths and trace shapes differ between motherboards, and if the device works on one but not on another, when both motherboard slots are known to function with some device, then it basically must be a signal issue.

@cospan, is the working SATA 2 interface a function on the same card as is the FPGA? Assuming this is a different add-on card, then you know the interface at the JTX1 is working. If the SATA 2 is a function on the same FPGA card, then you know there is an interaction beyond the RX/TX data channel. Does the SATA 2 interface use the same PCIe design, but with a different layout?

I found my old Xilinx Spartan 6 LX45T Development board SP605 and loaded the PCIE Demo image and have observed the same results as I did with my board namely that I achieved linkup on the Desktop and I didn’t achieve linkup on the TX1 devboard.

Here is the result of ‘lspci -d 10ee:0007 -vvv’ on the desktop with the SP605

03:00.0 RAM memory: Xilinx Corporation Default PCIe endpoint ID
        Subsystem: Xilinx Corporation Default PCIe endpoint ID
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 11
        Region 0: Memory at fea00000 (32-bit, non-prefetchable) 
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [58] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s unlimited, L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100 v1] Device Serial Number 00-00-00-01-01-00-0a-35

I mentioned before that when I plug in my board (Artemis) into the TX1 devboard the TX1 does not boot but when I can keep the FPGA in reset until after it is part way through the boot I can see the PCIE Switch using ‘lspci’

When using the SP605 the TX1 boots fine, it doesn’t seem to recognize the SP605 at all.

I found a USB - UART cable and plugged in the serial console so I can observe the PCIE output within the bootloader:

Here is the known working USB3.0 PCIE card output

Some stuff before
...
TEGRA210
Model: NVIDIA P2371-2180
DRAM:  4 GiB
MC:   Tegra SD/MMC: 0, Tegra SD/MMC: 1
*** Warning - bad CRC, using default environment

tegra-pcie: PCI regions:
tegra-pcie:   I/O: 0x0000000012000000-0x0000000012010000
tegra-pcie:   non-prefetchable memory: 0x0000000013000000-0x0000000020000000
tegra-pcie:   prefetchable memory: 0x0000000020000000-0x0000000040000000
tegra-pcie: 4x1, 1x1 configuration
tegra-pcie: probing port 0, using 4 lanes
tegra-pcie: probing port 1, using 1 lanes
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, ignoring
In:    serial
Out:   serial
Err:   serial
Net:   No ethernet found.
Hit any key to stop autoboot:  0
MMC: no card present
switch to partitions #0, OK
mmc0(part 0) is current device
Scanning mmc 0:1...
Found /boot/extlinux/extlinux.conf
Retrieving file: /boot/extlinux/extlinux.conf
948 bytes read in 72 ms (12.7 KiB/s)
p2371-2180 eMMC boot options
1:      primary kernel
Enter choice: 1:        primary kernel
Retrieving file: /boot/Image
19003224 bytes read in 487 ms (37.2 MiB/s)
append: fbcon=map:0 console=tty0 console=ttyS0,115200n8 androidboot.modem=none androidboot.serialno=P2180A00P00940c003fd androidboot.security=non-secure tegraid=21.1.2.0.0 ddr_die=2048M@2048M ddr_die=2048M@4096M section=256M memtype=0 vpr_resize usb_port_owner_info=0 lane_owner_info=0 emc_max_dvfs=0 touch_id=0@63 video=tegrafb no_console_suspend=1 debug_uartport=lsport,0 earlyprintk=uart8250-32bit,0x70006000 maxcpus=4 usbcore.old_scheme_first=1 lp0_vec=0x1000@0xff2bf000 nvdumper_reserved=0xff23f000 core_edp_mv=1125 core_edp_ma=4000 gpt android.kerneltype=normal androidboot.touch_vendor_id=0 androidboot.touch_panel_id=63 androidboot.touch_feature=0 androidboot.bootreason=pmc:software_reset,pmic:0x0 root=/dev/mmcblk0p1 rw rootwait
Retrieving file: /boot/tegra210-jetson-tx1-p2597-2180-a01-devkit.dtb
248081 bytes read in 227 ms (1 MiB/s)
## Flattened Device Tree blob at 82000000
   Booting using the fdt blob at 0x82000000
   reserving fdt memory region: addr=80000000 size=20000
   Using Device Tree in place at 0000000082000000, end 000000008203f910

Starting kernel ...
...
[    2.943834] tegra-pcie 1003000.pcie-controller: PCIE: Enable power rails
[    2.945453] tegra-pcie 1003000.pcie-controller: probing port 0, using 4 lanes and lane map as 0x14
[    2.947519] tegra-pcie 1003000.pcie-controller: probing port 1, using 1 lanes and lane map as 0x14
[    3.384350] tegra-pcie 1003000.pcie-controller: link 1 down, retrying
[    3.790406] tegra-pcie 1003000.pcie-controller: link 1 down, retrying
[    4.196457] tegra-pcie 1003000.pcie-controller: link 1 down, retrying
[    4.202994] tegra-pcie 1003000.pcie-controller: link 1 down, ignoring
[    4.207634] tegra-pcie 1003000.pcie-controller: PCI host bridge to bus 0000:00
[    4.214670] pci_bus 0000:00: root bus resource [mem 0x13000000-0x1fffffff]
[    4.221504] pci_bus 0000:00: root bus resource [mem 0x20000000-0x3fffffff pref]
[    4.228773] pci_bus 0000:00: root bus resource [bus 00-ff]
[    4.234242] pci_bus 0000:00: root bus resource [io  0x1000-0xffff]
[    4.240901] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    4.260392] pci 0000:00:01.0: BAR 8: assigned [mem 0x13000000-0x130fffff]
[    4.265198] pci 0000:01:00.0: BAR 0: assigned [mem 0x13000000-0x13000fff]
[    4.271979] pci 0000:00:01.0: PCI bridge to [bus 01]
[    4.276897] pci 0000:00:01.0:   bridge window [mem 0x13000000-0x130fffff]
[    4.283676] PCI: enabling device 0000:00:01.0 (0140 -> 0143)
[    4.289513] pcieport 0000:00:01.0: Signaling PME through PCIe PME interrupt
[    4.296241] pci 0000:01:00.0: Signaling PME through PCIe PME interrupt
[    4.302988] PCI: enabling device 0000:01:00.0 (0140 -> 0142)
...

When I reached the console I used a similar lspci command: “lspci -s 01:00.0 -vvv” and here is the output

01:00.0 USB controller: VIA Technologies, Inc. VL80x xHCI USB 3.0 Controller (rev 03) (prog-if 30 [XHCI])
        Subsystem: VIA Technologies, Inc. VL80x xHCI USB 3.0 Controller
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 547
        Region 0: Memory at 13000000 (32-bit, non-prefetchable) 
        Capabilities: [80] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 000000017ef69000  Data: 0000
        Capabilities: [c4] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <16us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM not supported, Exit Latency L0s <2us, L1 <16us
                        ClockPM+ Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range B, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Kernel driver in use: xhci_hcd

Here is what the SP605 output looks like:

Some stuff before
...
TEGRA210
Model: NVIDIA P2371-2180
DRAM:  4 GiB
MC:   Tegra SD/MMC: 0, Tegra SD/MMC: 1
*** Warning - bad CRC, using default environment

tegra-pcie: PCI regions:
tegra-pcie:   I/O: 0x0000000012000000-0x0000000012010000
tegra-pcie:   non-prefetchable memory: 0x0000000013000000-0x0000000020000000
tegra-pcie:   prefetchable memory: 0x0000000020000000-0x0000000040000000
tegra-pcie: 4x1, 1x1 configuration
tegra-pcie: probing port 0, using 4 lanes
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, ignoring
tegra-pcie: probing port 1, using 1 lanes
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, ignoring
In:    serial
Out:   serial
Err:   serial
Net:   No ethernet found.
Hit any key to stop autoboot:  0
MMC: no card present
switch to partitions #0, OK
mmc0(part 0) is current device
Scanning mmc 0:1...
Found /boot/extlinux/extlinux.conf
Retrieving file: /boot/extlinux/extlinux.conf
948 bytes read in 72 ms (12.7 KiB/s)
p2371-2180 eMMC boot options
1:      primary kernel
Enter choice: 1:        primary kernel
Retrieving file: /boot/Image
19003224 bytes read in 487 ms (37.2 MiB/s)
append: fbcon=map:0 console=tty0 console=ttyS0,115200n8 androidboot.modem=none androidboot.serialno=P2180A00P00940c003fd androidboot.security=non-secure tegraid=21.1.2.0.0 ddr_die=2048M@2048M ddr_die=2048M@4096M section=256M memtype=0 vpr_resize usb_port_owner_info=0 lane_owner_info=0 emc_max_dvfs=0 touch_id=0@63 video=tegrafb no_console_suspend=1 debug_uartport=lsport,0 earlyprintk=uart8250-32bit,0x70006000 maxcpus=4 usbcore.old_scheme_first=1 lp0_vec=0x1000@0xff2bf000 nvdumper_reserved=0xff23f000 core_edp_mv=1125 core_edp_ma=4000 gpt android.kerneltype=normal androidboot.touch_vendor_id=0 androidboot.touch_panel_id=63 androidboot.touch_feature=0 androidboot.bootreason=pmc:software_reset,pmic:0x0 root=/dev/mmcblk0p1 rw rootwait
Retrieving file: /boot/tegra210-jetson-tx1-p2597-2180-a01-devkit.dtb
248081 bytes read in 227 ms (1 MiB/s)
## Flattened Device Tree blob at 82000000
   Booting using the fdt blob at 0x82000000
   reserving fdt memory region: addr=80000000 size=20000
   Using Device Tree in place at 0000000082000000, end 000000008203f910


Starting kernel ...
...
some stuff
...
[    2.944068] tegra-pcie 1003000.pcie-controller: PCIE: Enable power rails
[    2.945681] tegra-pcie 1003000.pcie-controller: probing port 0, using 4 lanes and lane map as 0x14
[    2.947746] tegra-pcie 1003000.pcie-controller: probing port 1, using 1 lanes and lane map as 0x14
[    3.350287] tegra-pcie 1003000.pcie-controller: link 0 down, retrying
[    3.756334] tegra-pcie 1003000.pcie-controller: link 0 down, retrying
[    4.162211] tegra-pcie 1003000.pcie-controller: link 0 down, retrying
[    4.168692] tegra-pcie 1003000.pcie-controller: link 0 down, ignoring
[    4.574318] tegra-pcie 1003000.pcie-controller: link 1 down, retrying
[    4.980195] tegra-pcie 1003000.pcie-controller: link 1 down, retrying
[    5.386246] tegra-pcie 1003000.pcie-controller: link 1 down, retrying
[    5.392739] tegra-pcie 1003000.pcie-controller: link 1 down, ignoring
[    5.397197] tegra-pcie 1003000.pcie-controller: PCIE: no ports detected
[    5.404168] tegra-pcie 1003000.pcie-controller: PCIE: Disable power rails
...
Continue to terminal...

Here is Artemis:

Some stuff before
...
TEGRA210
Model: NVIDIA P2371-2180
DRAM:  4 GiB
MC:   Tegra SD/MMC: 0, Tegra SD/MMC: 1
*** Warning - bad CRC, using default environment

tegra-pcie: PCI regions:
tegra-pcie:   I/O: 0x0000000012000000-0x0000000012010000
tegra-pcie:   non-prefetchable memory: 0x0000000013000000-0x0000000020000000
tegra-pcie:   prefetchable memory: 0x0000000020000000-0x0000000040000000
tegra-pcie: 4x1, 1x1 configuration
tegra-pcie: probing port 0, using 4 lanes
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, ignoring
tegra-pcie: probing port 1, using 1 lanes
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, ignoring
In:    serial
Out:   serial
Err:   serial
Net:   No ethernet found.
Hit any key to stop autoboot:  0
MMC: no card present
switch to partitions #0, OK
mmc0(part 0) is current device
Scanning mmc 0:1...
Found /boot/extlinux/extlinux.conf
Retrieving file: /boot/extlinux/extlinux.conf
948 bytes read in 72 ms (12.7 KiB/s)
p2371-2180 eMMC boot options
1:      primary kernel
Enter choice: 1:        primary kernel
Retrieving file: /boot/Image
19003224 bytes read in 487 ms (37.2 MiB/s)
append: fbcon=map:0 console=tty0 console=ttyS0,115200n8 androidboot.modem=none androidboot.serialno=P2180A00P00940c003fd androidboot.security=non-secure tegraid=21.1.2.0.0 ddr_die=2048M@2048M ddr_die=2048M@4096M section=256M memtype=0 vpr_resize usb_port_owner_info=0 lane_owner_info=0 emc_max_dvfs=0 touch_id=0@63 video=tegrafb no_console_suspend=1 debug_uartport=lsport,0 earlyprintk=uart8250-32bit,0x70006000 maxcpus=4 usbcore.old_scheme_first=1 lp0_vec=0x1000@0xff2bf000 nvdumper_reserved=0xff23f000 core_edp_mv=1125 core_edp_ma=4000 gpt android.kerneltype=normal androidboot.touch_vendor_id=0 androidboot.touch_panel_id=63 androidboot.touch_feature=0 androidboot.bootreason=pmc:software_reset,pmic:0x0 root=/dev/mmcblk0p1 rw rootwait
Retrieving file: /boot/tegra210-jetson-tx1-p2597-2180-a01-devkit.dtb
248081 bytes read in 227 ms (1 MiB/s)
## Flattened Device Tree blob at 82000000
   Booting using the fdt blob at 0x82000000
   reserving fdt memory region: addr=80000000 size=20000
   Using Device Tree in place at 0000000082000000, end 000000008203f910

Starting kernel ...
...
some stuff
...
[    2.944187] tegra-pcie 1003000.pcie-controller: PCIE: Enable power rails
[    2.945811] tegra-pcie 1003000.pcie-controller: probing port 0, using 4 lanes and lane map as 0x14
[    2.947883] tegra-pcie 1003000.pcie-controller: probing port 1, using 1 lanes and lane map as 0x14
[    3.650134] tegra-pcie 1003000.pcie-controller: link 1 down, retrying
[    4.056180] tegra-pcie 1003000.pcie-controller: link 1 down, retrying
[    4.464405] tegra-pcie 1003000.pcie-controller: link 1 down, retrying
[    4.470890] tegra-pcie 1003000.pcie-controller: link 1 down, ignoring
[    4.475530] tegra-pcie 1003000.pcie-controller: PCI host bridge to bus 0000:00
[    4.482566] pci_bus 0000:00: root bus resource [mem 0x13000000-0x1fffffff]
[    4.489389] pci_bus 0000:00: root bus resource [mem 0x20000000-0x3fffffff pref]
[    4.496688] pci_bus 0000:00: root bus resource [bus 00-ff]
[    4.502139] pci_bus 0000:00: root bus resource [io  0x1000-0xffff]
[    4.508805] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    4.522410] PCI: bus1: Fast back to back transfers enabled
[    4.525960] pci 0000:00:01.0: PCI bridge to [bus 01]
[    4.530885] PCI: enabling device 0000:00:01.0 (0140 -> 0143)
[    4.536704] pcieport 0000:00:01.0: Signaling PME through PCIe PME interrupt
[    4.547802] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.553681] tegra-pcie 1003000.pcie-controller: PCIE: No Link speed change happened
[    4.563074] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0008(Receiver ID)
[    4.564152] tsec tsec: initialized
[    4.565395] tsec tsecb: initialized
[    4.567705] nvdec nvdec: initialized
[    4.570997] falcon vic03: initialized
[    4.572648] falcon msenc: initialized
[    4.574033] falcon nvjpg: initialized
[    4.575595] tegradc tegradc.1: Display dc.54240000 registered with id=0
[    4.575750] display board info: id 0x0, fab 0x0
[    4.576253] panel_select fail by _node_status
[    4.576326] display board info: id 0x0, fab 0x0
[    4.576645] panel_select fail by _node_status
[    4.576657] parse_tmds_config: No tmds-config node
[    4.576798] of_dc_parse_platform_data: could not find vrr-settings node
[    4.576803] of_dc_parse_platform_data: could not find SD settings node
[    4.576809] of_dc_parse_platform_data: could not find cmu node
[    4.576813] of_dc_parse_platform_data: could not find cmu node for adobeRGB
[    4.576831] tegradc tegradc.1: DT parsed successfully
[    4.653228] display board info: id 0x0, fab 0x0
[    4.654278] pcieport 0000:00:01.0:   device [10de:0fae] error status/mask=00000001/00002000
[    4.655798] pcieport 0000:00:01.0:    [ 0] Receiver Error
[    4.658016] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.661501] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0008(Receiver ID)
[    4.662293] pcieport 0000:00:01.0:   device [10de:0fae] error status/mask=00000001/00002000
[    4.663401] pcieport 0000:00:01.0:    [ 0] Receiver Error
[    4.664778] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.664797] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0008(Receiver ID)
[    4.664805] pcieport 0000:00:01.0:   device [10de:0fae] error status/mask=00000001/00002000
[    4.664812] pcieport 0000:00:01.0:    [ 0] Receiver Error
[    4.664829] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.664851] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.664873] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.664894] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.664915] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.664937] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.664959] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.664980] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.665002] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.665023] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.665044] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.665065] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.665087] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.665108] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.665130] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.665151] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.665172] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.665193] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.665215] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.665236] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.665257] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.665278] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.665744] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.665766] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.665787] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.711579] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.711642] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.711685] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.757339] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.757488] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.757510] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
...
Spiral to AER oblivion :(

I didn’t want to put the entire boot logs into the code windows so I put in the parts I thought were important. If you would like to view the entire boot logs they are available here:

Artemis Not Working
SP605 Not Working
USB 3.0 Working

Any ideas?

Thanks,

Dave

Is the working SATA 2 interface a function on the same card as is the FPGA? Assuming this is a different add-on card, then you know the interface at the JTX1 is working. If the SATA 2 is a function on the same FPGA card, then you know there is an interaction beyond the RX/TX data channel. Does the SATA 2 interface use the same PCIe design, but with a different layout?

The SATA 2 and PCIE connections are on the same physical board. Here is an image of the host board:

Artemis Host Adapter Board PCIE/SATA

I have a PCIe x1 dual port gigabit ethernet card to test on. My first observation is that with no mounting point for the card bracket, physical connection may or may not be an issue. I have to wonder about what happens when the card is just slightly stressed.

Under an ssh login, things look fairly normal. However, a small amount of error messages are showing up on the serial console each time I run lspci, which also ends up in dmesg logs:

[  249.355293] tegra-pcie 1003000.pcie-controller: PCIE: Response decoding error, signature: 10010001
[  249.364472] tegra-pcie 1003000.pcie-controller: PCIE: Response decoding error, signature: 10010005

This became fairly interesting with “lspci -v”:

root@x1:~# lspci -v
00:01.0 PCI bridge: NVIDIA Corporation Device 0fae (rev a1) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=01, subordinate=02, sec-latency=0
        Memory behind bridge: 13000000-130fffff
        Capabilities: [40] Subsystem: NVIDIA Corporation Device 0000
        Capabilities: [48] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/2 Maskable- 64bit+
        Capabilities: [60] HyperTransport: MSI Mapping Enable- Fixed-
        Capabilities: [80] Express Root Port (Slot+), MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] L1 PM Substates
        Kernel driver in use: pcieport

01:00.0 PCI bridge: PLX Technology, Inc. Device 8603 (rev ff) (prog-if ff)
        !!! Unknown header type 7f
        Kernel driver in use: pcieport

What seems particularly interesting is that your SATA function works on the same physical board. Because this is x1, only a single duplex TX/RX pair is involved, and the electrical signal pretty much cannot work on one device but not on another. I see in my own test of the dual port gigabit:

!!! Unknown header type 7f

The pcieport driver is fairly well tested, so now instead of signal quality being the issue (other than maybe the connector being sensitive to mounting when there is nothing to screw the mount bracket to), I’m starting to wonder if there is something at a lower level that is the issue.

Do you have any other PCIe cards you can test with? Anyone with a PCIe x1 through x4 card that can comment on whether the card functions, and whether dmesg shows any error after an lspci?

I have a PCIex4 SATA controller card, no issue to boot up, “sudo lspci -v” showed:

ubuntu@tegra-ubuntu:~$ sudo lspci -v
00:01.0 PCI bridge: NVIDIA Corporation Device 0fae (rev a1) (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	I/O behind bridge: 00001000-00001fff
	Memory behind bridge: 13000000-130fffff
	Prefetchable memory behind bridge: 0000000020000000-00000000200fffff
	Capabilities: [40] Subsystem: NVIDIA Corporation Device 0000
	Capabilities: [48] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/2 Maskable- 64bit+
	Capabilities: [60] HyperTransport: MSI Mapping Enable- Fixed-
	Capabilities: [80] Express Root Port (Slot+), MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] L1 PM Substates
	Kernel driver in use: pcieport

01:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SX7042 PCI-e 4-port SATA-II (rev 02)
	Subsystem: Marvell Technology Group Ltd. Device 11ab
	Flags: fast devsel, IRQ 130
	Memory at 13000000 (64-bit, non-prefetchable) [disabled] 
	I/O ports at 1000 [disabled] 
	[virtual] Expansion ROM at 20000000 [disabled] 
	Capabilities: [40] Power Management version 2
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [60] Express Legacy Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting

How is the working card physically attached? Is the card just freely sitting in the slot? I’m thinking a lot could be explained by a free hanging card making the card and connector mating an issue. In my case I could not “feel” a firm connection with the dual gigabit card I tested on. I also noticed that results (including ability to reboot) differed depending on slight changes to how the card was sitting.

The SATA II card is a quarter size/light weight card and well balanced in TX1 PCIe slot. For bigger/heavier cards, a modified PC chassis could be used to mount the TX1 and secure PCI card.

I noticed my Xilinx Virtex 5 PCIe development kit has external power input and wandering if power supply on PCIe slot is sufficient for the FPGA. I have used some PCIe carrier cards for FPGAs, most of them have external power connector to get power directly from PC power supplies.

I have been going through the boot logs and there is something that caught my attention:

in the bootloader the working PCIE device (USB 3.0) seems to be recognized while the non-working (Artemis, SP605) is not:

Here is a working one:

U-Boot 2015.07-rc2-g2ac3917 (Nov 09 2015 - 13:12:08 -0800)


TEGRA210
Model: NVIDIA P2371-2180
DRAM:  4 GiB
MC:   Tegra SD/MMC: 0, Tegra SD/MMC: 1
*** Warning - bad CRC, using default environment

tegra-pcie: PCI regions:
tegra-pcie:   I/O: 0x0000000012000000-0x0000000012010000
tegra-pcie:   non-prefetchable memory: 0x0000000013000000-0x0000000020000000
tegra-pcie:   prefetchable memory: 0x0000000020000000-0x0000000040000000
tegra-pcie: 4x1, 1x1 configuration
tegra-pcie: probing port 0, using 4 lanes
tegra-pcie: probing port 1, using 1 lanes
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, ignoring
...

Not working (notice line 15-18):

U-Boot 2015.07-rc2-g2ac3917 (Nov 09 2015 - 13:12:08 -0800)

TEGRA210
Model: NVIDIA P2371-2180
DRAM:  4 GiB
MC:   Tegra SD/MMC: 0, Tegra SD/MMC: 1
*** Warning - bad CRC, using default environment

tegra-pcie: PCI regions:
tegra-pcie:   I/O: 0x0000000012000000-0x0000000012010000
tegra-pcie:   non-prefetchable memory: 0x0000000013000000-0x0000000020000000
tegra-pcie:   prefetchable memory: 0x0000000020000000-0x0000000040000000
tegra-pcie: 4x1, 1x1 configuration
tegra-pcie: probing port 0, using 4 lanes
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, ignoring
tegra-pcie: probing port 1, using 1 lanes
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, ignoring

Later on the kernel attempts to link up with PCIE devices. This time both the USB 3.0 and Artemis links up.

[    2.944187] tegra-pcie 1003000.pcie-controller: PCIE: Enable power rails
[    2.945811] tegra-pcie 1003000.pcie-controller: probing port 0, using 4 lanes and lane map as 0x14
[    2.947883] tegra-pcie 1003000.pcie-controller: probing port 1, using 1 lanes and lane map as 0x14
[    3.650134] tegra-pcie 1003000.pcie-controller: link 1 down, retrying
[    4.056180] tegra-pcie 1003000.pcie-controller: link 1 down, retrying
[    4.464405] tegra-pcie 1003000.pcie-controller: link 1 down, retrying
[    4.470890] tegra-pcie 1003000.pcie-controller: link 1 down, ignoring
[    4.475530] tegra-pcie 1003000.pcie-controller: PCI host bridge to bus 0000:00
[    4.482566] pci_bus 0000:00: root bus resource [mem 0x13000000-0x1fffffff]
[    4.489389] pci_bus 0000:00: root bus resource [mem 0x20000000-0x3fffffff pref]
[    4.496688] pci_bus 0000:00: root bus resource [bus 00-ff]
[    4.502139] pci_bus 0000:00: root bus resource [io  0x1000-0xffff]
[    4.508805] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring

They are the same until the next line:

USB 3.0 Version:

[    4.240901] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    4.260392] pci 0000:00:01.0: BAR 8: assigned [mem 0x13000000-0x130fffff]
[    4.265198] pci 0000:01:00.0: BAR 0: assigned [mem 0x13000000-0x13000fff]
[    4.271979] pci 0000:00:01.0: PCI bridge to [bus 01]
[    4.276897] pci 0000:00:01.0:   bridge window [mem 0x13000000-0x130fffff]

Artemis Version:

[    4.508805] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    4.522410] PCI: bus1: Fast back to back transfers enabled
[    4.525960] pci 0000:00:01.0: PCI bridge to [bus 01]
[    4.530885] PCI: enabling device 0000:00:01.0 (0140 -> 0143)
[    4.536704] pcieport 0000:00:01.0: Signaling PME through PCIe PME interrupt
[    4.547802] pcieport 0000:00:01.0: AER: Multiple Corrected error received: id=0010
[    4.553681] tegra-pcie 1003000.pcie-controller: PCIE: No Link speed change happened
[    4.563074] pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0008(Receiver ID)

This line is very interesting:

[    4.522410] PCI: bus1: Fast back to back transfers enabled

I just looked it up and Fast back to back and it is something that is only applicable to PCI not PCIE, all PCIE devices are required to return ‘0’ for that flag.

I searched through the kernel_sources/arch/arm64/kernel/bios32.c and found the function ‘pcibios_fixup_bus’, I believe, this is where this message came from. At the end of this function if a PCIE device has not been found that message is sent out. This is because the ‘struct pci_dev’ was not set up correctly. It’s strange because if the kernel actually detected the pcie device it would have populated the ‘struct pcie_dev’ and then the pcibios_fixup_bus would correctly see that Artemis’s ‘fast b2b’ flag is set low.

It looks like uboot doesn’t detect Artemis over PCIE but the kernel does. If the kernel system is not set up for ‘hotplugging’ does the kernel inheret the pcie configuration from uboot? I’m going to look into this. If it is the case perhaps this can be solved if either one of two things happend:

  1. Uboot can detect the device
  2. Build a kernel with 'hotplugging' enabled

Dave

It’s possible that FPGA PCIe core takes time to configure but u-boot start too soon. BOIS can be configured for delayed boot.

Can you try power up the PCIe card first and start u-boot later (e.g., hit any key from serial console port to hold u-boot)?

I appreciate this concern but the FPGA board is not drawing power from the TX1, it has it’s own power supply.

On PCIE connectors there is also a ‘PWRGD’ signal which goes low when the chip first boots up. The PCIE_A1 core uses this for an internal reset. I was concerned that the FPGA’s PLLs didn’t have time to lock between the time the host asserted ‘PWRGD’ and before the PCIE host started to query it. In order to determine if this was an issue I modified the FPGA controller core so that I would send it a manual reset independent of the PCIE host. This did not fix the problem either.

My knowledge on PCI hotplug is limited, but from what I can tell it won’t apply (in terms of that feature) to what u-boot hands off to the Linux kernel. It seems this form of hotplug was invented for PCI devices to load or unload on a running system for old style PCI, e.g., PCMCIA slot on a laptop without shutting down the laptop before adding. It looks to me like hotplug was integrated in newer style PCIe, and only needed additional support under older PCI (there may be other software involved even in PCIe, but I’m only considering kernel drivers).

So u-boot not detecting the card could be useful information, but I don’t think this would effect anything once the kernel boots. If firmware were involved with that PCIe card, then this could definitely be an issue if u-boot did not load the firmware prior to handing off to the kernel. However, I doubt the TX/RX links would be affected by this, it would be the final (non-PCIe) device driver for the end device which would care.

Having a device with two functions (your FPGA plus SATA) where only a single data lane is used, combined with SATA functioning, is a bit confusing (especially since the FPGA works on a desktop system). I have to wonder about the software which chooses between the FPGA and SATA functions.

if you have source code with you, can you please try adding “nvidia,disable-clock-request;” under sub-node “pci@1,0” of “pcie-controller” node?

The Spartan 6 PCIe block only supports the base 1.1 specification. I don’t know but suspect that this could be a problem. Unfortunately, the carrier board connector is too short to accommodate more recent offerings from Xilinx like the Kintex based KC705 8 lane PCIe 2 board. I haven’t looked to see what might be available in an Altera board.

Even if you configure your SP605 FPGA from flash you probably need to have the board powered and configured before booting the TX1 development board. I’ve tried this and get the same boot log messages suggesting some deficiency in the dialog between the FPGA and the Tegra root complex during the negotiation phase…

@toothless: I’ll try this out

@usr2222: I understand that the base 1.1 specification may be an issue. The Altera board may be a possible solution.

regarding your other concern about the flash. I have the board powered separately from the TX1 so I don’t believe that this is the issue.

Thank you both for the feedback.