Poor DMA performance over PCIe from FPGA

Hi,

There are clarification and questions

  1. Can anyone confirm that with the iommu turned off (and dma-coherent turned off) that PCIe transfers rates will suffer ?
    → This might cause poor performance because IO coherency might be disabled here.
    When iommu is enabled, it drives the coherency bit based on “dma-coherent” flag.
    When iommu is disabled, PCIe controller drives the IO coherency bit, this is taken from the TLP packets sent by FPGA. So, if FPGA sent TLPs with IO coherency bit enabled then IO coherency will be enabled.
    If we don’t want to rely on FPGA(EP), then there is a override bit in Tegra PCIe controller.
    Note: “dma-coherent” flag should be kept intact even though iommus are removed.

  2. Has anyone attempted to run a DPDK application with a device in the PCI slot using the vfio-pci driver ?
    → No, we didn’t attempt this.

  3. Can anyone provide guidance on how to get around this problem ?
    → With current information, we don’t see any data comparison with and without iommu. Probably because you can’t run you application with iommu enabled? Thus, we cannot completely attribute this issue to iommu/dma coherency.
    Are you using DMA engine part of FPGA for both Tx & Rx?
    If yes, then Tegra side hardly plays any role here. Since Tx is working fine, please check on FPGA side why Rx is low.