TX2 throughput issue


We have a Tegra TX2 board which has FPGA connected via PCIe x4.

We have a working driver and DMA solution on LFT R28.2.0

We are attempting to transfer data from DDR in a FPGA card to/from DDR in the Tegra TX2 setup over PCIe.

We have optimized the FPGA side settings by configuring TLP size, interconnect setting, clock crossing adapters etc. for best performance.

We are trying to read from the Tegra TX2 DDR and write it to the FPGA DDR. We could observe no performance limitation at the FPGA DDR write side but only during the read response from host (TX2).

We are taking same FPGA card/driver/dma was tested on Linux X86 64 bit PC as reference with a sample transfer size of 40KB.

While the x86 takes only about 4500 clocks for 40KB transfer, the TX2 takes about 5966 clocks for the same.Effectively the x86 throughput is 9.1Gbps where as TX2 it stands at 6.8Gbps. But when we are transferring at a larger chunks, the effective bandwidth of TX2 is only 1.6Gbps (approx 100 HD Frames per second).

We think that there is some inherent band width limitations in TX2 but in some of the DevTalk community discussions, much superior bandwidth has been claimed.

Do we have to do some settings to improve performance on the PCIe driver/clocks side?

Looking forward for your suggestions/thoughts to improve the same.

Hi karthick10,

Please maximize CPU/GPU clock to check if any improvement:
Jetson clocks enabled (sudo ~/jetson_clocks.sh)


Hi kayccc,

Thank you for your quick response. There is no change in throughput after maximizing CPU/GPU clock via jetson_clocks.sh