Unexpected low performance of PCIe DMA to TX1

readonly · March 13, 2017, 6:21pm

Hello,

We have a working design of a Xilinx FPGA DMAing camera frames to TX1 over PCIe 4x1 set to 2.5 GT/s. Frames are received in DMA coherent memory by a custom kernel driver and forwarded on to user space.

However, we are seeing the data rate top out at around 300 MB/s. If we request a higher frame rate that results in a data stream beyond this rate, the FPGA experiences back pressure from the PCIe link, which backs up the camera stream and causes the stream to fail.

I have all the performance settings / clocks set to max, including the EMC clock. (When the EMC clock is set lower than max, the top achievable data rate is even less, so it is having an effect.)

According to all specs, shouldn’t the TX1 should easily handle this rate? Even though the destination is coherent memory, TX1 DDR bandwidth is rated at 25.6 GB/s.

This previous post mentions a similar problem:

https://devtalk.nvidia.com/default/topic/949448/jetson-tx1/pcie-dma-doesn-t-work-for-l4t-24-1/3

But our performance is observed only for the DMA write, not the following read to user space.

vidyas · March 14, 2017, 5:30am

Issue in the other thread is different in the sense that memcpy is not very efficient as dma_alloc_coherent() marks area as uncached and when CPU tries to work with it for memcpy perf is less. But, in this scenario, you are saying, FPGA based end point’s DMA is not able to dump (because of back pressure etc…) at higher data rates.
Can you give more information on the release you are using? (like 23.1 / 24.1 / 24.2 etc…) Also,
Can you try not to consume the data at user space? i.e let DMA dump the data to memory and discard that data. If we don’t see any issue with this, then, it is most likely to do with the whole kernel space-user space data transfers that might be putting the back pressure.
There are no known issues with TX1’s PCIe not being able to handle incoming data from end point’s DMA.
Also, it would be great if you can give the sequence of events that are taking place here.

readonly · March 15, 2017, 2:31am

I think we have narrowed it down to the fact that the FPGA is detecting the PCIe link as Gen1, and the data rate exceeds Gen1 bandwidth. I thought I saw in TX1 documentation that Gen2 is supported? When we configure the FPGA PCIe core to Gen2, the TX1 console prints a stream of PCIe bus errors as soon as the Tegra PCI controller driver is loaded at boot.

For background, L4T is 24.2. The FPGA delivers two camera image streams over PCIe via DMA at high frame rates. A custom kernel driver on the TX1 receives an MSI interrupt when each frame DMA is complete. User space application reads out the frames from their DMA location, but I have that part disabled while investigating this. The delivery of each frame reaches ~550 MB/sec, so two frames at once does push the limits of Gen1, and that’s where we see throttling all the way back to the IP cores generating the pixel streams.

vidyas · March 15, 2017, 5:46am

Can you paste those errors here? Are those AER errors? If yes, can you please check if you have ASPM enabled? If yes, please try disabling it (by appending ‘pcie_aspm=off’ to kernel command line) and check once?

readonly · March 15, 2017, 5:57pm

I do have pcie_aspm=off. This is the error that streams in:

pcieport 0000:00:01.0: AER: Corrected error received: id=0010
pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0008(Receiver ID)
pcieport 0000:00:01.0:   device [10de:0fae] error status/mask=00000001/00002000
pcieport 0000:00:01.0:    [ 0] Receiver Error         (First)

Also, with PCIe debugging enabled, dmesg reports a steady stream of tegra_pcie_isr(1182) and handle_sb_intr(1141) alongside the errors, so there is activity there.

I should add that this happens before any camera streams are started. This is during link initialization.

vidyas · March 16, 2017, 4:57am

As I see, errors are of type ‘Physical Layer’. This is due to bad electricals. If you have connected any interposer / converter cards in between, can you please remove them and check once?

readonly · March 17, 2017, 11:15pm

Ok, it may be because there are two boards / connectors in between the TX1 and FPGA. All are designed with proper traces for PCIe but may be too many connector transitions for Gen2. No way to remove them currently…

Is there a way to put the TX1 PCIe in internal loopback mode (PMA/PCS loopback) in order to test part of the path?

vidyas · March 29, 2017, 11:04am

>> Is there a way to put the TX1 PCIe in internal loopback mode (PMA/PCS loopback) in order to test part of the path?
This terminology seems specific to Ethernet. Do you mean FarEnd loopback in PCIe?

kayccc · May 8, 2017, 6:01am

Hi readonly,

Have you found the cause and resolved this problem?
Any status update for this issue?

Thanks

Topic		Replies	Views
Problem with PCIe throughput on TX1 Jetson TX1	5	985	October 18, 2021
PCIe DMA transfer performance issue with custom FPGA board on Jetson TX2 Jetson TX2 pcie , kernel , fpga	2	939	July 12, 2022
PCIE x4, only 658MB/s Jetson TX2	19	8986	October 18, 2021
PCIe data transfer speed Jetson TX1	7	2681	June 10, 2016
PCIe 2.0 on TX2 Not Meeting Specifications! Jetson TX2	8	4485	June 25, 2019
PCIe IOMMU Error Jetson TX2	3	1322	April 14, 2019
PCIe Status with FPGA card Jetson Xavier NX pcie , hw , kernel	4	1202	April 13, 2022
Altera FPGA DMA to TX2 via PCIe problem Jetson TX2	18	3489	October 18, 2021
DMA from PCIe Device to Jetson Tx2 local DDR Jetson TX2	8	2369	September 11, 2018
PCIe EP/RP speedtest for virtual network and DMA Jetson AGX Orin pcie , rdma-and-roce	7	763	December 13, 2023

Unexpected low performance of PCIe DMA to TX1

Related topics