I have question, because I have problem with high speed transfers.
I have tried to transfer data from FPGA to TX1, FPGA writing data in big chunks (4MB, 512B per TLP). After each chunk MSI interrupt is generated and next transfer request is send immediately. Data are generated on fly, so write requests are generated as fast as possible.
And as fast as I can go is almost 300M/s (293M/s on avg, driver utilizes 2% CPU load), transferring 4GB of data takes no less than 14s. I hoped I can easy achieve 700M/s. I have looked inside FPGA using logic analyser and found out that after first bulk tlp memory write (512bytes) I need to wait for TX1 unreasonable amount of cycles before sending next, and all following writes are delayed from this point. On the other hand, everything works fine on x86_64 motherboard.
I am using PCIe Gen1 x4
# R24 (release), REVISION: 1.0
I am mapping memory using
dma_page = dma_zalloc_coherent(dev, 4194304, &dma_addr, GFP_DMA32);
I am not even checking data integrity (I have checked with another code, not speedtest code), and still transfer is not as high as I would expect of PCIe.
Is there some special way of handling such communication on this platform allowing to achieve close to maximum PCIe speed ? I have read that there is a problem with
copy_to_user kernel function but I am not using any of such here.