We have a working design of a Xilinx FPGA DMAing camera frames to TX1 over PCIe 4x1 set to 2.5 GT/s. Frames are received in DMA coherent memory by a custom kernel driver and forwarded on to user space.
However, we are seeing the data rate top out at around 300 MB/s. If we request a higher frame rate that results in a data stream beyond this rate, the FPGA experiences back pressure from the PCIe link, which backs up the camera stream and causes the stream to fail.
I have all the performance settings / clocks set to max, including the EMC clock. (When the EMC clock is set lower than max, the top achievable data rate is even less, so it is having an effect.)
According to all specs, shouldn’t the TX1 should easily handle this rate? Even though the destination is coherent memory, TX1 DDR bandwidth is rated at 25.6 GB/s.
This previous post mentions a similar problem:
But our performance is observed only for the DMA write, not the following read to user space.