I’m using a custom FPGA board with RIFFA PCIe driver on Jetson AGX Orin platform. When performing DMA data transfers, I encounter SMMU context faults and Memory Controller errors:
[ 388.659082] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x402, iova=0xffe04780, fsynr=0x450013, cbfrsynra=0x1014, cb=10
[ 388.659148] tegra-mc 2c00000.memory-controller: pcie5w: secure write @0x00000003ffffff00: VPR violation ((null))
[ 388.659156] tegra-mc 2c00000.memory-controller: pcie5w: secure write @0x00000003ffffff00: Route Sanity error ((null))
Symptoms:
- About 80% of DMA data transfers fail, 20% succeed
- Invalid DMA address 0x00000003ffffff00 appears in error logs
- The same FPGA board works correctly on RK3586 platform
Environment
- Platform: NVIDIA Jetson AGX Orin (T234 SoC)
- FPGA: Xilinx FPGA with RIFFA IP core (Device ID: 0x7021)
- PCIe Controller: PCIe C5 (pcie@141a0000)
- Driver: RIFFA PCIe driver
- Kernel: JetPack 5.x / L4T 35.x
Any help or guidance would be appreciated. Thank you!
*** Please note that this reply is generated by LLM automatically ***
Based on the provided information, it appears that you are experiencing issues with DMA data transfers between a custom FPGA board and a Jetson AGX Orin platform. The errors you are encountering include SMMU context faults and Memory Controller errors.
To help you troubleshoot this issue, I’ll provide some guidance and potential solutions.
- SMMU Context Faults: The SMMU (System Memory Management Unit) is responsible for managing memory access and translation between the CPU and peripherals. The error message “Unhandled context fault” suggests that the SMMU is encountering an invalid or unexpected context.
- Check the FPGA’s RIFFA IP core configuration to ensure that it is properly set up for the Jetson AGX Orin platform.
- Verify that the DMA addresses used by the FPGA are correctly aligned and within the valid address range for the Jetson AGX Orin platform.
- Memory Controller Errors: The Memory Controller errors are related to the PCIe controller and memory access.
- The error message “pcie5w secure write 0x00000003ffffff00 VPR violation null” suggests that there is a violation of the Virtual Page Table (VPT) or the Memory Controller’s access control.
- Check the FPGA’s RIFFA IP core configuration to ensure that it is properly set up for the Jetson AGX Orin platform’s memory controller.
- Verify that the DMA addresses used by the FPGA are correctly aligned and within the valid address range for the Jetson AGX Orin platform.
- Invalid DMA Address: The error message “Invalid DMA address 0x00000003ffffff00” suggests that the FPGA is using an invalid or out-of-range DMA address.
- Check the FPGA’s RIFFA IP core configuration to ensure that it is properly set up for the Jetson AGX Orin platform.
- Verify that the DMA addresses used by the FPGA are correctly aligned and within the valid address range for the Jetson AGX Orin platform.
- Comparison with RK3586 Platform: Since the same FPGA board works correctly on the RK3586 platform, it’s possible that the issue is specific to the Jetson AGX Orin platform or its configuration.
- Compare the FPGA’s RIFFA IP core configuration and DMA address usage between the two platforms to identify any differences.
- Verify that the Jetson AGX Orin platform’s memory controller and SMMU configurations are properly set up for the FPGA.
To further troubleshoot the issue, you may want to:
- Check the FPGA’s RIFFA IP core documentation to ensure that it is properly configured for the Jetson AGX Orin platform.
- Verify the DMA address usage by the FPGA to ensure that it is within the valid address range for the Jetson AGX Orin platform.
- Use debugging tools, such as the Jetson AGX Orin’s built-in debugging tools or third-party tools, to monitor the DMA transfers and identify any issues.
- Consult with NVIDIA’s technical support or the FPGA vendor’s technical support for further guidance and assistance.
By following these steps and comparing the configurations between the two platforms, you should be able to identify and resolve the issue causing the SMMU context faults and Memory Controller errors.
*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***
Is this still an issue to support? Any result can be shared?