TX1 <-> FPGA through PCIE

dusty_nv · February 2, 2016, 3:08pm

In theory you could use an off-the-shelf PCIe riser, extender, or adapter to get it hooked up at x4. See these for example (there are lots of variants on Amazon/eBay/ect.):

http://www.orbitmicro.com/global/pexp8-sx-8-4-p-754.html
http://www.orbitmicro.com/global/arc1-08x16-x4-p-8227.html

Although these probably aren’t great for PCIe gen2 signal integrity, they at least would allow for a mechanical fit at x4.

usr2222 · February 2, 2016, 6:10pm

regarding an Altera solution I suggest looking at Terasic. They have the TR4 Stratix board with PCIe windows drivers ( I know that this doesn’t help with the TX1 ), a PCI 4 lane equalization IO board and cables to connect a PC to the TR4. You’ll have to have deeper pockets than me though as you’ll need a full Quartus license to do anything with the TR4. Neither Altera nor Xilinx make it easy to use their transceivers on a paupers budget. I’ve had mostly good experience with them over the years. Though they did give me this bit of painfully acquired wisdom; If you ever come across a board with a micro USB connector that doesn’t have soldered through-hole tabs the VERY FIRST thing that you want to do before using it is epoxy that little trouble maker to the PCB…

I’d be very interested in hearing if the suggestion from toothless works out for you.

cospan · February 3, 2016, 9:29pm

@toothless

I apologize for the wait. It took me a little longer than I anticipated to build the kernel.

Kernel Modification
I modified the file:
<kernel_source>/arch/arm64/boot/dts/tegra210-jetson-cv-base-p2597-2180-a00.dts

line 214:

pci@1,0 {
      status = "okay";
      nvidia,disable-clock-request;
    };

Result
No Change

I rebuilt/reflashed the TX1 and it still outputted the same AER error stream as before.

U-Boot Modification
I thought that perhaps instead of just modifying the DTS within the kernel I should modify the U-Boot DTS (or both the DTS within kernel and u-boot)

I modified the following file:
<u-boot_source>/arch/arm/dts/tegra210-p2371-2180.dts

Line 28:

pci@1,0 {
      status = "okay";
      nvidia,disable-clock-request;
    };

Result
No Change

Unfortunately this yielded the same results, a lot of AER error steam.

U-boot

I noticed inside of u-boot that I can manually initiate a PCI Enumeration with the command:

‘pci enum’

Here is a result when I enter the command when my PCIE device is not attached:

Tegra210 (P2371-2180) # pci enum
tegra-pcie: PCI regions:
tegra-pcie:   I/O: 0x0000000012000000-0x0000000012010000
tegra-pcie:   non-prefetchable memory: 0x0000000013000000-0x0000000020000000
tegra-pcie:   prefetchable memory: 0x0000000020000000-0x0000000040000000
tegra-pcie: 4x1, 1x1 configuration
tegra-pcie: probing port 0, using 4 lanes
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, ignoring
tegra-pcie: probing port 1, using 1 lanes
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, ignoring
Tegra210 (P2371-2180) #

Sometimes when I do enable my FPGA PCIE port I see a different result:

Tegra210 (P2371-2180) # pci enum
tegra-pcie: PCI regions:
tegra-pcie:   I/O: 0x0000000012000000-0x0000000012010000
tegra-pcie:   non-prefetchable memory: 0x0000000013000000-0x0000000020000000
tegra-pcie:   prefetchable memory: 0x0000000020000000-0x0000000040000000
tegra-pcie: 4x1, 1x1 configuration
tegra-pcie: probing port 0, using 4 lanes
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, ignoring
tegra-pcie: probing port 1, using 1 lanes
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, ignoring
Tegra210 (P2371-2180) # pci enum
tegra-pcie: PCI regions:
tegra-pcie:   I/O: 0x0000000012000000-0x0000000012010000
tegra-pcie:   non-prefetchable memory: 0x0000000013000000-0x0000000020000000
tegra-pcie:   prefetchable memory: 0x0000000020000000-0x0000000040000000
tegra-pcie: 4x1, 1x1 configuration
tegra-pcie: probing port 0, using 4 lanes
tegra-pcie: link 0 down, retrying
tegra-pcie: link 0 down, retrying
tegra-pcie: probing port 1, using 1 lanes
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, ignoring
Tegra210 (P2371-2180) # pci enum
tegra-pcie: PCI regions:
tegra-pcie:   I/O: 0x0000000012000000-0x0000000012010000
tegra-pcie:   non-prefetchable memory: 0x0000000013000000-0x0000000020000000
tegra-pcie:   prefetchable memory: 0x0000000020000000-0x0000000040000000
tegra-pcie: 4x1, 1x1 configuration
tegra-pcie: probing port 0, using 4 lanes
tegra-pcie: probing port 1, using 1 lanes
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, retrying
tegra-pcie: link 1 down, ignoring
Tegra210 (P2371-2180) #

Notice how the linkup is not consistent. From the TX1 side It does seem to link up however when I enter the ‘pci’ command (which just shows which devices are available) all I get is the bridge and not my device.

Tegra210 (P2371-2180) # help pci
Usage:
pci [bus] [long]
    - short or long list of PCI devices on bus 'bus'
pci enum
    - re-enumerate PCI buses
pci header b.d.f
    - show header of PCI device 'bus.device.function'
pci display[.b, .w, .l] b.d.f [address] [# of objects]
    - display PCI configuration space (CFG)
pci next[.b, .w, .l] b.d.f address
    - modify, read and keep CFG address
pci modify[.b, .w, .l] b.d.f address
    -  modify, auto increment CFG address
pci write[.b, .w, .l] b.d.f address value
    - write to CFG address
Tegra210 (P2371-2180) # pci
Scanning PCI devices on bus 0
BusDevFun  VendorId   DeviceId   Device Class       Sub-Class
_____________________________________________________________
00.01.00   0x10de     0x0fae     Bridge device           0x04
Tegra210 (P2371-2180) # 
Tegra210 (P2371-2180) # pci long
Scanning PCI devices on bus 0

Found PCI device 00.01.00:
  vendor ID =                   0x10de
  device ID =                   0x0fae
  command register =            0x0007
  status register =             0x0010
  revision ID =                 0xa1
  class code =                  0x06 (Bridge device)
  sub class code =              0x04
  programming interface =       0x00
  cache line =                  0x08
  latency time =                0x00
  header type =                 0x01
  BIST =                        0x00
  base address 0 =              0x00000000
  base address 1 =              0x00000000
  primary bus number =          0x00
  secondary bus number =        0x01
  subordinate bus number =      0x01
  secondary latency timer =     0x00
  IO base =                     0x01
  IO limit =                    0xf1
  secondary status =            0x0000
  memory base =                 0x1300
  memory limit =                0x12f0
  prefetch memory base =        0x2001
  prefetch memory limit =       0x1ff1
  prefetch memory base upper =  0x00000000
  prefetch memory limit upper = 0x00000000
  IO base upper 16 bits =       0x1200
  IO limit upper 16 bits =      0x11ff
  expansion ROM base address =  0x00000000
  interrupt line =              0x00
  interrupt pin =               0x01
  bridge control =              0x0000
Tegra210 (P2371-2180) #

When I use my tool to query the status of the FPGA I can see that the FPGA’s PCIE_A1 core LTSSM state machine is attempting to link up. It is primarily in the ‘Polling.Active’ state.

Since it is possible for me to modify the u-boot source code and the u-boot also queries the PCI express bus perhaps we can focus on making modifications within u-boot and if we are successful we can port the results to the kernel. This is a lot easier for me to do.

I found the source code for the PCIE Express tegra controller.
<u-boot_source>/drivers/pci/pci_tegra.c

After comparing it to the kernel’s version
<kernel_source>/drivers/pci/host/pci-tegra.c

It seems that they are basically the same except the kernel uses the kernel API and u-boot uses it’s own API

I’ll spend some time looking through the PCIE Registers in the TRM, perhaps we can try something else.

Thanks again for the help.

Dave

linuxdev · February 3, 2016, 10:00pm

I’m not sure of how all of the firmware config options in a kernel build work, but it is possible that sometimes firmware will not take into account changes if certain options are set or not set. The kernel source itself contains tools for extracting DTS from a DTB, you might want to do this and check to be sure your firmware modifications made it in:

scripts/dtc/dtc -I dtb -O dts -o /tmp/extracted.dts /boot/the_firmware_in_extlinux.dtb

More information on device tree compiler here:
http://xillybus.com/tutorials/device-tree-zynq-1

cospan · February 3, 2016, 10:21pm

Thanks for the heads up!

I just did the check and the changes are in there

toothless · February 5, 2016, 1:58pm

it may just be the case that by the time RootPort expect the device to be ready for link up sequence, EndPoint (in this case FPGA) is still getting ready, resulting in NO linkup (as root port can’t wait indefinitely for EndPoint to get ready). Is it possible to point me to the code base you are using? web link is fine.

cospan · February 5, 2016, 4:47pm

Yeah, absolutely:

RTL Location
Low Level RTL

Here is the low level RTL that interfaces with AXI stream interface, PCIE hard macro and gigabit transceiver: Low Level RTL

Wishbone Interface:

Here is the wrapper RTL that exposes the control signals of the PCIE hard macro and gigabit transceiver to the user. I include this so that you can trace what signals I have control of. Wishbone Wrapper
Anything exposed here can be configured at runtime using a python script. as an example: I can read and write the transmit differential swing amplitude using the TX_DIFF_CTRL (0x06) register so if you would like me to try something on the FPGA side I can do that easily. If need be I can even modify all of these modules to expose/control more signals

Here is the entire project

Full RTL Project
I don't believe this will help you as much as the 'Low Level RTL' but I put this in here if you would like to get a top level view of the project.

RTL Description
I appreciate that looking through someone else’s RTL can be a frustrating task. The RTL Files that are most important are:

/pcie_axi_bridge.v

This is the core that glues the PCIE_A1 core and gigabit transceivers together. All of my interaction with PCIE is through this core.

PCIE_A1

This is a hard macro within the FPGA. As can be seen inside pcie_axi_bridge.v an instance is declared but this doesn't show how it works so here is the PCIE_A1 Datasheet

/gtpa1_dual_wrapper_tile.v (Gigabit Transceiver)

There is a wrapper above this file called /gtpa1_dual_wrapper.v but for all intensive purpose the wrapper is not important. The GTPs is VERY FLEXIBLE and it is not advisable to instantiate it without using the Xilinx configuration tool (coregen) here is the gigabit transceiver user guide

Let me know if there is anything that isn’t clear.

Dave

usr2222 · February 8, 2016, 12:35pm

Eli Billaur has some nice PICe tutorials on his Xillybus website. He has a PCIe demo for the SP605 that works fine on a Intel based Z77 motherboard. Kernel drivers are part of the official Linux source for versions 3.12 and above. You can compile his driver for earlier versions above 2.6.36. This seems to suggest that the issue of not linking isn’t with the Spartan 6 PCIe core…

There are a number of good reasons why it would be nice if the next version of Tegra Linux was 3.19 or higher.

usr2222 · February 11, 2016, 10:25pm

[ 2.944187] tegra-pcie 1003000.pcie-controller: PCIE: Enable power rails
[ 2.945811] tegra-pcie 1003000.pcie-controller: probing port 0, using 4 lanes and lane map as 0x14

As I understand it a PC detects PCI devices in the BIOS during boot. Obviously, this process is quite different on a TX1. I wonder if this message in dmesg is providing a hint to the problem of establishing a link.

usr2222 · February 15, 2016, 2:16pm

For what it’s worth to anyone; I was able to confirm that the Lattice EPC5 Versa Development kit, with the FPGA configured with the basic PCIe demo, is able to connect with the TX1 on the PCIe interface. This card is powered by the PCIe connector and configured from a flash device on power-on. I suspect that Microsemi’s flash based FPGA IGLOO2 development kit will do the same.

cospan · February 17, 2016, 3:16pm

@usr2222: This is really good to know. It seems as though the Spartan 6 really is the issue. I appreciate that you checked your board.

Thank you,

Dave

usr2222 · February 17, 2016, 4:51pm

I don’t have the answer to your problem but I wouldn’t conclude that “the Spartan 6 really is the issue”. I’m guessing that it’s more to do with how the TX1 boots and the kernel checks for PCIe devices. In a PC I’ve connected to both the SP605 and KC705 which were powered separately and configured. The TX1 doesn’t seem to have an issue linking with PCIe 1.0 boards that are powered by the PCIe connector… and the PCI bus power rails are off until the kernel gets around to probing for devices… at least that what dmesg is claiming.

inshine1986 · May 28, 2016, 2:13am

@cospan i face the same problem. I buy a fpga develop kit called gvi-k7 connectivity kit. it works well when pluged in pc，but cannot be detected by tx1.

i read the jetson developer kit carrier board spec.pdf. In page 15 table9，it shows that the max delay for pcie board is limited to less than 380 ps. maybe the trace length from the fpga to tx1 excceeds the range. but i cannot prove it.

my project cannot continue if the pcie doesnot work. i wish we can communicate with email（inshine1986@126.com）.
thank you.

yahoo2016 · May 28, 2016, 11:01am

I’m using L4T 24.1, I have 2 Xilinx FPGA cards from the same vendor, one was detected by “lspci” another not. Both FPGA cards worked without issues on different Intel CPUs under Linux or Windows.

It appears TX1 PCIe is not fully compliant with PCIe standard due to HW or SW.

jayds · May 29, 2016, 2:40am

Hi all,

At this point, after lots of trying, I have my custom Xilinx Zynq ZC706, PCIe with CDMA, finally up and running on TX1 under 23.2 with my very hacked driver and various other 23.1 changes. It appears that 24.1 has issues including not allocating CMA memory for DMA the same way as 23.2 did, I can see code sections relating to memory allocation have changed in 24.1.
24.1 does not appear to connect the internal DDR3 bus onto the PCIe bus for what I have.

It is very important that the Xilinx FPGA has time to correctly program/config the FPGA bits before the any PCIe host begins to access it. There are Xilinx app notes that go over doing partial config to get PCIe up within specs.
(Note I am using a external re-buffered x4 PCIe bus link between TX1 as host mounted in a box going to Xilinx ZC706 mounted in another box. I turn the ZC706 box on and wait for config LEDS to turn on before powering on the TX1 box. This works well.)
Also I bring the PCIe reset from host into the Xilinx and use it with PCIe clock being locked to bring the Xilinx PCIe and CDMA section out of reset. This could be a issue if using one of the base Xilinx PCIe app note designs.

I am also setting the Xilinx up to run the PCIe bus at 2.5G PCIe 1.0 speeds at this point. (In a earlier configuration using a Avnet Zynq mini-itx board it had issues linking at 5.0G PCIe 2.0 speeds and seeing the board. I have not tried a Xilinx build at 5.0G PCIe 2.0 with TX1 yet.)

PCIe uses 8/10 serialized LVDS at 2.5G or 5G rates. There is quite a lot that takes place to train the 8/10 serialized LVDS on both the host and endpoint.

Jayd

cospan · May 29, 2016, 1:34pm

@jayds:

Do you use the clock generated by the the host computer?

The board I had was powered independently from the host and used an on board oscillator, not the clock from the host and it didn’t work. If you have the capability to try this out would you see if there is a difference?

I want to try it with the clock generated by the host and see if there is a difference. Unfortunately my TX1 is a little dead right now and I need to get a new one.

Dave

jayds · May 30, 2016, 12:45am

Hi Dave and others,
There is quite a lot going on inside the FPGA in regards to a endpoint PCIe interface. First you have multiple clock domains in the FPGA such as series 7 Xilinx device. For a PCIe endpoint there is a 100MHz clock sourced from the root complex, master, along with a PCIe reset signal going into the FPGA.
The 100MHz PCIe LVDS clock goes right into a very special set of pins on the FPGA that goes to a multi GHz PPL complex that directly feeds the 8/10 serialized clock of the Giga bit LVDS data link lanes of PCIe.
There is a whole hardware process hiding away in which the each data link get trained and aligned to each other. It all in fact requires a single master 100MHz clock to drive both root complex/switch going and endpoint correctly have the clock bits transmitted within the source data be decoded at the receiver. Skew is to large without same very good low noise clock at both ends.
What happen is inside the Xilinx FPGA there are Arm/Axi based data buses that run at differing clock domains. One of these will be the PCIe master/slave Axi buses which will operate at 125MHz/250MHz. This clock comes out of the PCIe subsection of the FPGA along with a clock is unlocked signal. This all comes from the 100MHz to GHz PPL from our 100MHz master PCIe clock.
The master and slave Axi buses from PCIe subsection are then interconnected to other sections such as RAM or DDR which are at differing clocks and data widths by way of a big Axi bus switch. There are separate clock and reset inputs for each master or slave Axi port on this big Axi switch.
I in fact have several things running in differing clock domains and Axi bus sizes.

What I was trying to point out was that in my design the CDMA engine along with PCIe subsystem Axi data buses all run off the provide 125MHz clock out from PCIe subsection from 100MHz PCIe backplane in per base requirements. In this case the Axi bus switch for these master and slave ports and CDMA must have their reset inputs come from a standard Xilinx reset generator that combines the PCIe reset in, PCIe 125MHz subsystem clock, and the 125MHz is unlocked signals.
Here you have powered the endpoint FPGA board on before the root complex of TX1 suppling the PCIe backplane clock.

Next for the base Xilinx PCIe subsystem its speed must setup as a 2.5G 1.0 or 5.0G 2.0 interface manually in its build. It does not change speeds from 5.0G to 2.5G if it can not set up the data links right. A slower FPGA speed grade device may not meet the 5.0G 2.0 requirements. In Xilinx in fact the PCIe subsystem Axi slave and master bus must be able to unload the PCIe data at rate, hence for a 125MHz 64bit wide Axi bus the PCIe link must be 2x link at 5.0 2.0 spec or a 4x link at 1.0 spec…

Have to make sure all this is right given the first thing that is expected by TX1 as seen in the boot.log is the PCIe links are up and trained. Which can be quite a issue to get right. That is the FPGA has configured the PCIe section by then and un tri-stated the pins, lock the PLL, and PCIe data links have been trained by the host root complex hardware.

Last there the whole issue with how quickly after power on all this needs to truly get done. Now in the PCI spec which PCIe follows it says that if I remember right there 20ms after power good, reset released, until the time the host may start to read the endpoints configuration table. Where as it might take several 100ms for the FPGA to be configured from its ROM depending on what is used.
Anyway this issue has been long known and Xilinx has a special method of programing the PCIe section first. But you need to have set this up in the FPGA design tools to do so.
In most cases the root complex host will take some long period before releasing the PCIe backplane reset from when the PCIe clock generator it has is stable. After some more time it will start the PCIe link training.
As long as the FPGA gets PCIe section configured before the root complex really does release the PCIe backplane from reset all will be well. Which is a big if.
I am avoiding this issue by power the endpoint on first but it requires care to bring the Axi bus interface for PCIe and CDMA out of reset correctly per above. (Which may not be in the default FPGA code for its PCIe and DMA subsystem. Because I had to figure this out the hard way, of why it did not work a few months back with a Xilinx based root complex host instead of TX1. That is Axi bus switch ports do not like to come up out of reset without its clock.)

Anyway you should have a custom driver in Linux for your FPGA with lots of info print statements in the probe code section for the board. This will all get dumped in the boot log output to the debug UART/USB port, as you have.
If the TX1 sees trained PCIe endpoint is connected at boot startup later in time in log it will loop around several times at the point the root complex setup as it tries to read the endpoints hard PCI configuration table. If it can read the endpoint config table it will call your drivers probe code section.
If you do not see your endpoint being probed for then your issue is back in the FPGA being setup right to start with, per what I have outlined above.

I have been doing PCI and PCIe hardware over the years, one really needs to dig into the various app notes and papers that folks like Xilinx have done for their FPGA in regards to PCIe to understand at least the basics to some point.
Avnet does the distribution for Xilinx. If you are in a place to connect with the sales group, going to use lots of parts… they have folks that know Xilinx design and PCIe for it that could be of help. They do offer design services also.

Now in regards to TX1 23.2 and 24.1 Linux revs there may be other issues related to memory management code or other changes causing PCIe does not work issues. I know in 23.2 that the .unlocked_ioctl in file_operations does not work but that changing it to .compat_ioctl it does where as both appear in the file_operations structure in 23.2. In 24.1 .unlocked_ioctl does work.
But in 24.1 the memory management code has changed such that dma_allocation I do does not connect DDR into PCIe, theres is a Axi bus to bus translation switch in TX1 that might not be connecting. At boot it may not be seeing my CMA memory request so set aside DDR space in the right way, note my boot log output says I have been given both virtual and physical DMA memory allocation just like in 23.2, no errors in probe.
It is quite possible given there a several ways, methods, to request DMA memory and do DMA that some method hiding away in 24.1 works correctly.

Need to separate any hardware issues of FPGA design from that of TX1 Linux io-mmu code memory allocation etc.

sorry if so long to try and explain,

Jayd

jayds · May 30, 2016, 10:40pm

For Xilinx Kintex-7 demo board users,
This demo design is based on Xapp1052 which is for 8 lane gen 3.0 10G PCIe endpoint. The TX1 only supports 4 lane gen 2 5G max for an endpoint card.
So it comes up a runs find on Intel motherboard with 8/16 lane Gen 3.0 slot. But in TX1 the board is asking for a root complex pipeline that is wider and faster than supported. It will not work without a 8 lane gen 3.0 host.
Unfortunately, this demo design, in the zip file, is top to bottom Verilog with the IP for the PCIe endpoint embedded deep within it not easy to alter to be a Gen 2.0 4x lane subsystem.

I am using a very modified version of app 1171 which uses the block design environment of Vivado Tool such that the core IP can be edited there. This uses the older PCIe Gen 2.0 and CDMA IP cores. In the block design mode you have a block schematic of interconnected IP cores where for Xilinx IP you can click on them and directly adjust settings. I use the TCL text language in general to build up a block design environment.

Just to follow up
Jayd

yahoo2016 · May 31, 2016, 12:01pm

For Xilinx Kintex-7 demo board users,
This demo design is based on Xapp1052 which is for 8 lane gen 3.0 10G PCIe endpoint. The TX1 only supports 4 lane gen 2 5G max for an endpoint card.
So it comes up a runs find on Intel motherboard with 8/16 lane Gen 3.0 slot. But in TX1 the board is asking for a root complex pipeline that is wider and faster than supported. It will not work without a 8 lane gen 3.0 host.
Unfortunately, this demo design, in the zip file, is top to bottom Verilog with the IP for the PCIe endpoint embedded deep within it not easy to alter to be a Gen 2.0 4x lane subsystem.

I am using a very modified version of app 1171 which uses the block design environment of Vivado Tool such that the core IP can be edited there. This uses the older PCIe Gen 2.0 and CDMA IP cores. In the block design mode you have a block schematic of interconnected IP cores where for Xilinx IP you can click on them and directly adjust settings. I use the TCL text language in general to build up a block design environment.

Just to follow up
Jayd

Jayd,
With your background on both TX1 and Xilinx, how difficult for you to design a TX1 carrier card to interface to custom or off-the-shelf FPGA modules such as Avnet Mini-Module-Plus? I’m sure there is market for it.

cospan · May 31, 2016, 3:03pm

Jayd,

Thank you very much for you input.

I appreciate that there is a lot going on in the generated PCIE hard macro built within the FPGAs as well as the HDL that generates the AXI Master/Slave streaming interface. Compared with the previous low level TLP interface that people had to use before Axi is a dream come true!

I also understand that there are quiet a few clock domains including the 100MHz or 125MHz clock that I need to supply to the FPGA that the GTPs (or GTXs) use to generate the 2.5GHz high speed serial clock for PCIE Gen 1 as well as the multi-gigabit PLL used to generate USERCLK1=250MHz used for 1/10th the clock rate to convert between the 2.5GHz serial bit clock to/from the byte clock and the USERCLK2=62.5MHz used to divide the byte clock by 4 to 32-bit dwords so the user logic is not trying to meet a horribly high timing constraint of 250MHz.

As I’ve stated on multiple responses I power my FPGA board independently from the host computer so my board is programmed before the CPU boots.

What you said is very interesting to me:

What I was trying to point out was that in my design the CDMA engine along with PCIe subsystem Axi data buses all run off the provide 125MHz clock out from PCIe subsection from 100MHz PCIe backplane in per base requirements. In this case the Axi bus switch for these master and slave ports and CDMA must have their reset inputs come from a standard Xilinx reset generator that combines the PCIe reset in, PCIe 125MHz subsystem clock, and the 125MHz is unlocked signals.

So this means that you supplied your PCIE core with a clock generated by the TX1 Root Complex?

One issue I’ve been worried about is that the PCIE Root Complex on the TX1 may not work well with two separate PCIE clocks (One on the TX1 and one on my board). The PCIE Specification says it’s okay to work with different clocks but a colleague of mine who has worked on Network Access Storage solution using PCIE based hard drives noted that some Root Complexes do not work well with different reference clocks and that I might need to run a test to determine if the TX1 behaves differently when I use the clock supplied by the TX1 and when I use a clock supplied by my own board. I hesitate to make such a bold statement about the TX1 without proof and this is why I was hoping that you could run a test with a host provided reference clock and a reference clock from your own board. From what you have stated you are using the clock generated by the TX1. This is different from my board.

Unfortunately I can’t test this out because one of my clients damaged his TX1 and has been using mine so I don’t have a test setup for this at the moment.

I don’t think it’s an issue related to DMA because I haven’t even loaded a driver into the TX1 and the FPGA PCIE isn’t recognized. At this point I only want the PCIE Root Complex to perform the standard interrogation to a PCIE device. which involves only the linkup procedure and Config Read and Config Write.

I designed a scripting tool that allows me to look at the status registers of the FPGAs PCIE registers using a USB interface and I can see a problem. It looks like the linkup procedure never completes.

The “Link Training Status State Machine” seemingly never gets passed ‘Polling Config’ or perhaps it does but my ability to read it is too slow. This might indicate that the link status never completes or that it does but the PCIE Root Complex detects an error during post linkup interrogation and the Root Complex resets the endpoint.

I use command line tools to build FPGA images and I don’t have an easy way to insert ChipScope. I did write my own logic analyzer but I broke it a few months back and haven’t had a chance to fix it.

I appreciate your time is valuable but if you could generate two PCIE images testing out these two different variants:

Uses the reference clock supplied by the TX1
Uses the reference clock generated by your board

It would be very helpful to me and I would be very appreciative.

Dave

With your background on both TX1 and Xilinx, how difficult for you to design a TX1 carrier card to interface to custom or off-the-shelf FPGA modules such as Avnet Mini-Module-Plus? I’m sure there is market for it.

Thanks! The board I designed for my client is a carrier board for the TX1 that interfaces with a Xilinx FPGA using the 4 x PCIE Gen 2 lanes. In terms of complexity I have to say it was relatively easy to do. The supplied NVIDIA schematics and carrier board layout was extremely helpful and the very easy power requirements for the TX1 (1 X Wide Range Power Rail) was awesome! When I get done with this project I plan to write a post mortem on what documentation could be done better.

Topic		Replies	Views
PCIe not being recognized - TX2 Jetson TX2	20	3169	March 16, 2020
Pcie clk Jetson Xavier NX pcie	12	1894	December 8, 2021
Basic TX1 PCIe question Jetson TX1	12	2464	July 29, 2016
PCIe Device - No Link Jetson AGX Xavier	29	4553	September 15, 2021
PCIe not being recognized Jetson TX2	5	2141	October 18, 2021
Jetson AGX Xavier Pcie(Root) detection of Device needs jetson reboot Jetson AGX Xavier pcie , fpga	25	2205	July 12, 2023
PCIe declares an error during bus enumeration Jetson TX2	3	1639	October 18, 2021
PCIe clock rates and power management Jetson TX1	38	6218	October 18, 2021
PCIe Bus Error message Jetson TK1	40	8735	October 18, 2021
R24.2 (TX1) Kernel hangs during bootup on PCIe (PLX Switch) - R23.2 was ok. Jetson TX1	10	2110	September 23, 2016

TX1 <-> FPGA through PCIE

Related topics