AGX Xavier PCIe - real read performance.

Does your FPGA have a DMA engine built into it whose configuration registers are exposed to host through one of its BARs and the client driver (of your FPGA endpoint) running in the host can program FPGA’s DMA and start transfer? If yes, you can configure the system with Xavier being host and FPGA being the endpoint and have your DMA engine of endpoint start writing/dumping data to Xavier’s system memory. In this case, the perf should be around 13.5 Gbits/sec.
If your FPGA doesn’t have any DMA engine and is exposing the memory directly through BAR so that host can do a READ to BAR to get data FROM endpoint to host, then, perf would be very bad as we would be employing host CPU in this case and this is not a perf case.
Xavier supports max 256 bytes of payload, so yes, host can be configured for 256 MPS.