what kind of throughput performance should I expect on the TX1 when doing (SG)DMA through the PCI Express Gen2 x4 interface?
I am achieving these speeds currently:
FPGA to CPU 981 Megabytes/s
CPU to FPGA 1115 Megabytes/s
which are worst-case end-to-end numbers (i.e. these include interrupt overhead every 8 Megabytes and DMA idle times).
Also, what tunable variables influence the throughput? I have seen scripts to increase TX1 clock speeds etc.
You can expect following performance numbers
DMA to SysMem 1600 MBytes/sec
SysMem to DMA 1300 MBytes/sec
Are you looking for tunable variables from Tegra side or general PCIe point of view?
sorry for my very late reply, the project was paused but now fully started again.
I am looking from tunables on the Tegra/Linux side, and from the PCIe side but targetting Tegra TX1 specifically (for example MRRS, max outstanding read requests, max payload).
Or specifically, how was DMA->SysMem 1600 MBytes/second achieved, and how was SysMem to DMA 1300 MBytes/seconds achieved?