PCIe Link Speed Issue

Hi. I am running PCIe using NVIDIA Jetson AGX Xavier running on Jetpack 5.1 as my root port, and a custom board running as end point using Xilinx FPGA.

I am able to move data between the two boards using PCIe. However, I am currently facing performance issues:
When the link capability is set to Gen1, x1 link, I get roughly 103 MB/s (megabytes per second) when I send test data from root port to end point. When I go to Gen2, x1 link, I get roughly 207MB/s, which seems ok, as the speed should double when we go from Gen1 to Gen2. Now for a Gen2 x4 link, we should get 207*4 = 828 MB/s, or something around 828MB/s. However, when going to a Gen2, x4 link, the speed is around 449MB/s.

Why am I not getting 4x speed when moving from a x1 link to a x4 link? I also tried a Gen3 x1 link. For that, I get 373MB/s. However, when going to Gen3 x4 link, I only get 470MB/s. Again, changing from a x1 to x4 link did not equal 4x speed.

I am running Jetpack 5.1, with the change that I added CONFIG_STRICT_DEVMEM=n to tegra_defconfig file. Rest all settings are according to their default values. I am using PIO writes to send data from NVIDIA root port to my custom end point.

Additionally, during PCIe RX (receiving data on root port from end point), on Gen1 x1 link, I get only 4.5MB/s, compared to 103MB/s when transmitting data from root port.

Why am I getting such poor link speed? And why is my link speed not increasing according to the number of lanes?

Also, when debugging further, I find that when receiving data on NVIDIA root port, the NVIDIA generates multiple16byte read requests, which is a major contributor to the poor link speed when reading data. Is it possible to change/increase the size of these read requests?

Regards,
Sana Ur Rehman

Sorry for the late response, our team will do the investigation and provide suggestions soon. Thanks

1 Like

Hi,

did you check if the PCIe connection is stable regarding the LTSSM? You can check the LTSSM state of the PCIe endpoint in the FPGA (e.g. for the PCIe Integrated Block AMD Adaptive Computing Documentation Portal the signal cfg_ltssm_state).
It should be in the L0 state and not going through recovery states.

Regards,
Gerrit

If no DMA Is in use, then the cpu freq may affect. Please run jetson_clocks to put the clocks in full freq and see if that enhances.

Hi @WayneWWW . Thanks for the response. I tried running jetson_clocks, but this did not have any effect on link speed. Still getting the same numbers.

Regards,
Sana Ur Rehman

please also try maxN mode + jetson_clocks.

@WayneWWW , tried that as well. Got all the CPUs online, chose the maxN power mode, and executed jetson_clocks. Then tested the speed. Still getting same (poor) numbers.

Regards,
Sana Ur Rehman

To get max throughput supported by PCIe then you need to use DMA.

@WayneWWW , does there exist any guide or documentation on how to use DMA? (I have read the GPC-DMA chapter in the Xavier Technical Reference Manual, but still am confused.) Maybe some other guide exists? I cant seem to find any.

Regards,
Sana Ur Rehman

Thanks for the reply @ggrutzeck . I’ll have a look at the LTSSM states.

Regards,
Sana Ur Rehman

Update: So far, I haven’t been able to check either LTSSM states, or test DMA. I am currently working on DMA. Will update the status here once DMA transfer is achieved.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.