Xilinx FPGA PCIe driver working on TX1

Hello, and thank you in advance for any help.

I am a colleague of @chirstnp_work who has been working on this problem (see https://devtalk.nvidia.com/default/topic/1025813/jetson-tx2-xilinix-pcie/?offset=7#reply) with him for several months now. @chirstnp_work has since moved on to other tasks. I have managed to transfer the amount of data our application requires without data errors using a TX1 and Xilinx’s loopback FPGA example. My steps to achieve this are as follows (I’ll try to be as clear as possible; I apologize for any redundancy):

  1. Install Jetpack 2.3.1 (L4T 24.2.1)
  2. Unpack L4T 24.2.1 kernel to TX1 and compile/install using jetsonhacks build script
  3. (https://github.com/jetsonhacks/buildJetsonTX1Kernel/tree/v1.0-L4T24.2.1)
  4. Reboot
  5. Download Xilinx XDMA driver sources (https://www.xilinx.com/Attachment/Xilinx_Answer_65444_Linux_Files.zip)
  6. Unpack Xilinx XDMA sources to /home/nvidia
  7. Modify RX_BUF_PAGES in Xilinx driver include/xdma-core.h from 256 to 2048
  8. Download version of Xilinx xdma_core.c file with cyclic buffer disabled File: https://forums.xilinx.com/xlnx/attachments/xlnx/PCIe/9115/1/xdma-core_cyclic_buffer_disabled.c Forum: https://forums.xilinx.com/t5/PCI-Express/PCIE-DMA-subsystem-AXI4-Streaming-c2h-transfers/td-p/791701
  9. Apply the attached patch
  10. Build Xilinx XDMA sources and run load_driver.sh with FPGA plugged into PCIe and programmed with loopback design
  11. At this point, multiple transfers of size 8M will complete without data errors, but dmesg will still show mc-errs and smmu faults.

Before the patch is applied, the modified xdma-core.c file (cyclic buffer disabled), will complete small numbers of 8M transfers successfully, but at larger numbers (>~64) it will cause a kernel crash because of a BUG_ON macro that verifies the “transfer” pointers are not null. The exact transfer number when the crash occurs is unpredictable, but it always happens after the WARN_ON macro (ln 1380) executes. My modifications were designed to prevent the driver from calling the functions that triggered the BUG_ON macros if their arguments were null.

Let me qualify my changes to xdma-core.c by saying that I don’t believe they are a good solution, simply a very crude workaround to show proof-of-concept. In fact, I’m surprised I haven’t noticed more serious problems yet.

My question is: the changes I made to the modified xdma-core.c file prevented the driver from triggering the BUG_ON macros unpredictably, but I’m not sure why the modified driver works in the first place while the original does not. My guess is that it is some kind of race condition where the size of the transfer list has been updated, but the data structure to which the pointer is meant to refer has not been allocated yet.

I am hoping there is some relatively obvious reason why one driver version works while the other does not based on the TX1 architecture and the L4T implementation.
xdma_core_modifications.patch.txt (2.9 KB)
xdma-core_cyclic_buffer_disabled.c (154 KB)
xdma-core-MODDED.c (155 KB)

If there is no dependency on 24.2.1, Can you please try with the latest 28.2 release and see if you see any issues with that?