I have TX2 board connected to a Xilinx device. I have created a DMA driver to send 32 MB buffer from user space, using scatter gather mapping. When I run the application to send the buffer I get the following error:
I checked some previous posts and it seems that iova(0x17fe0fe00) is out of the allocated range? What would be the possible reason? How do I resolve this?
Thank you for your reply. I have followed the documentation you mentioned, while developing the driver. I have experience in PCIe DMA driver development.
And for your information, driver works in Ubuntu x86 PC. The problem is that it doesn’t work on TX2 board, probably because of ARM64 architecture, SMMU etc. We have solved some other ARM related problems like cache coherency. Since this one seems to be related TX2 SMMU architecture, I request your guidance in solving the same. I am also trying to debug it, with the help of TRM and instrumentation.
There are a few points I want to bring to your notice:
The problem seems to be solved, if I disable SMMU. However, I obviously want to bring DMA up with SMMU
The problem occurs randomly. Sometimes,context fault doesn't show up, if I reboot the board. When the fault happens, the send and received DMA buffers in my DMA loop-back program mismatch.
The problem doesn't occur in x86, which suggests the issue is not with the Xilinx PCIe IP/ device I am using.
I see the issue in many other posts. If possible, could you please try to replicate it at your side?
Could you please send a reference DMA code which you might have used for testing SMMU with L4T 32+? That would be really helpful.
So far, we have not seen this issue with any upstreamed drivers (xhci_hcd, nvme, r8169, igb, e1000e, ixgbe Etc…). Would it be possible to share the driver with us privately?
I was wondering if you have learned anything about the “Unhandled context fault” problem that you can share? Our problem is similar in that our driver works with our hardware on x86, but faults on both the TX2 and Xavier (currently testing with JP 4.2.2 rev1). We also have disabled the MMU as a workaround. Our situation differs in that we use a single, large, long-lived, circular, memory-mapped buffer to receive a constant flow of data via the DMA.
Any tips on how to adapt to the Jetson environment would be greatly appreciated.
Thanks. Yes, we discovered it was necessary to disable ASPM. The PCIe card we’re using does not run reliably with ASPM enabled. The symptom we had was random “pcieport” errors in the dmesg log. I can’t find a copy of the specific message right now. Once we disabled ASPM, we would still get the unhandled context faults. We are currently running with the SMMU disabled to avoid the faults until we can fix our driver.
I don’t think it is range issue. Basically, the context fault implies it is unable to find out the TLB context for that address. SMMU translation logic tries to find out valid context before dive into translation stages.
FSYNR is hint about the error and it will give you detailed about the SMMU failure.
Yes, we still have the unhandled context fault if the SMMU is not disabled. I’ve been on another project and haven’t followed up recently. I’d really like to eventually fix this. It doesn’t make a good impression with our users when the PC installation is so simple compared to Xavier.