I’m working on transfer data between 2 Orin through PCIE switch. After using dma_map_xxx to map source address(by kmalloc) and destination address(BAR physical address), I submit the DMA transfer and there are errors like:
But it seems that the data has not been carried to the other Orin. I am quite sure there is no problem with the switch because if I use memcpy_toio(instead of DMA) with exactly the same source address and destination address, it works and there is data received by the other Orin.
I prefer there are something wrong with Orin’s IOMMU which is possible that the IOMMU is blocking access by the switch-dma to host memory.
Are you sure only these 2 properties(iommus and dma-coherent) are required to remove? I think maybe other 2 properties “iommus-map” and “iommus-map-mask” should also be removed.
During the enumeration process, the kernel learns about I/O devices and their MMIO space and the host bridges that connect them to the system. For example, if a PCI device has a BAR, the kernel reads the bus address (A) from the BAR and converts it to a CPU physical address (B). The address B is stored in a struct resource and usually exposed via /proc/iomem. When a driver claims a device, it typically uses ioremap() to map physical address B at a virtual address (C). It can then use, e.g., ioread32(C), to access the device registers at bus address A.
If the device supports DMA, the driver sets up a buffer using kmalloc() or a similar interface, which returns a virtual address (X). The virtual memory system maps X to a physical address (Y) in system RAM. The driver can use virtual address X to access the buffer, but the device itself cannot because DMA doesn’t go through the CPU virtual memory system.
In some simple systems, the device can do DMA directly to physical address Y. But in many others, there is IOMMU hardware that translates DMA addresses to physical addresses, e.g., it translates Z to Y. This is part of the reason for the DMA API: the driver can give a virtual address X to an interface like dma_map_single(), which sets up any required IOMMU mapping and returns the DMA address Z. The driver then tells the device to do DMA to Z, and the IOMMU maps it to the buffer at address Y in system RAM.
Now I got a BAR physical address(B in the graph above) and a user defined kmalloc address(X in the graph above).
What should I do if I want to use DMA transfer between B and X? I know X is easily mapped with Z(dma_addr) by dma_map_single. But what about B? How to map B with a dma available address?
I have already disabled IOMMU following your step.
However, I found not only transfer by dma, but also transfer by memcpy to the device memory where I disable IOMMU is really slow. Before disable IOMMU, memcpy speed was about 2.5GB/s, now it’s only 980MB/s.