Hi. I am running PCIe using NVIDIA Jetson AGX Xavier running on Jetpack 5.1 as my root port, and a custom board running as end point using Xilinx FPGA.
I am able to move data between the two boards using PCIe. However, I am currently facing performance issues:
When the link capability is set to Gen1, x1 link, I get roughly 103 MB/s (megabytes per second) when I send test data from root port to end point. When I go to Gen2, x1 link, I get roughly 207MB/s, which seems ok, as the speed should double when we go from Gen1 to Gen2. Now for a Gen2 x4 link, we should get 207*4 = 828 MB/s, or something around 828MB/s. However, when going to a Gen2, x4 link, the speed is around 449MB/s.
Why am I not getting 4x speed when moving from a x1 link to a x4 link? I also tried a Gen3 x1 link. For that, I get 373MB/s. However, when going to Gen3 x4 link, I only get 470MB/s. Again, changing from a x1 to x4 link did not equal 4x speed.
I am running Jetpack 5.1, with the change that I added CONFIG_STRICT_DEVMEM=n to
tegra_defconfig file. Rest all settings are according to their default values. I am using PIO writes to send data from NVIDIA root port to my custom end point.
Additionally, during PCIe RX (receiving data on root port from end point), on Gen1 x1 link, I get only 4.5MB/s, compared to 103MB/s when transmitting data from root port.
Why am I getting such poor link speed? And why is my link speed not increasing according to the number of lanes?
Also, when debugging further, I find that when receiving data on NVIDIA root port, the NVIDIA generates multiple16byte read requests, which is a major contributor to the poor link speed when reading data. Is it possible to change/increase the size of these read requests?
Sana Ur Rehman