IOMMU may have issue for 32 bit DMA addressing

Hi,

I am facing DMA addressing issue for 32 bit devices. This is particularly found for PCIe wifi devices. NVMe SSD deivices and PCIe-Ethernet devices (I210) uses 64 bit DMA addressing mode, so IOMMU always works for theses devices. However, PCIe WiFi devices mostly uses 32 bit DMA. I have found two PCIe wifi devices facing this issue.

When IOMMU is enabled:
For 88W8997/88W8897, it got Unhandled context fault IOMMU error.

[    8.662459] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0xfffaa980, fsynr=0x13, cb=2, sid=89(0x59 - PCIE3), pgd=857678003, pud=857678003, pmd=857679003, pte=0

Something interesting for 88W8997/88W8897 is that it still works even after the IOMMU error. I debug the 88W8997 driver, and I found that there are many iova to phy address IOMMU mappings performed, but only 1 or 2 of them got “Unhandled context fault”. I am not sure why.

For QCA9377, it got “corrected PCIe bus error”:

[    7.136591] pcieport 0003:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0000(Receiver ID)
[    7.136871] pcieport 0003:00:00.0:   device [10de:1ad2] error status/mask=00000001/0000e000
[    7.137044] pcieport 0003:00:00.0:    [ 0] Receiver Error         (First)
[    7.269652] ath10k_pci 0003:01:00.0: Direct firmware load for ath10k/pre-cal-pci-0003:01:00.0.bin failed with error -2
[    7.269886] ath10k_pci 0003:01:00.0: Falling back to user helper
[    7.270861] ath10k_pci 0003:01:00.0: Direct firmware load for ath10k/cal-pci-0003:01:00.0.bin failed with error -2
[    7.271049] ath10k_pci 0003:01:00.0: Falling back to user helper
[    7.275733] ath10k_pci 0003:01:00.0: qca9377 hw1.1 target 0x05020001 chip_id 0x003821ff sub 0000:0000
[    7.275740] ath10k_pci 0003:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 0 testmode 0
[    7.276884] ath10k_pci 0003:01:00.0: firmware ver WLAN.TF.1.0-00002-QCATFSWPZ-5 api 5 features ignore-otp crc32 c3e0d04f
[    7.341669] ath10k_pci 0003:01:00.0: failed to fetch board data for bus=pci,vendor=168c,device=0042,subsystem-vendor=0000,subsystem-device=0000 from ath10k/QCA9377/hw1.0/board-2.bin
[    7.342629] ath10k_pci 0003:01:00.0: board_file api 1 bmi_id N/A crc32 544289f7
[   10.668806] ath10k_pci 0003:01:00.0: failed to receive control response completion, polling..
[   11.692915] ath10k_pci 0003:01:00.0: ctl_resp never came in (-110)
[   11.693166] ath10k_pci 0003:01:00.0: failed to connect to HTC: -110
[   11.709420] ath10k_pci 0003:01:00.0: could not init core (-110)

Please note that for QCA9377 case above (when IOMMU is enabled), I am not 100% sure that this is an IOMMU error, or caused by IOMMU error. QCA9377 issue is more severe, because it failed at init time, so it doesn’t work.

I found from other forums, they were suggested to disable IOMMU and dma-coherence in device-tree. So, I tried it. However, I found that DMA mapping in this case (when IOMMU is disabled) just returns the phy address of memory, and it falls beyond the 32-bit range. Note both wifi driver calls pci_set_dma_mask(pdev, DMA_BIT_MASK(32));. However, dma map functions returns address beyond 32 bit.

For 88W8997/88W8897, it works when IOMMU is disabled. This is because its driver is using u64 for dma address, and it seems this wifi chip does support 64 bit DMA addressing.

However, for QCA9377, disabling IOMMU doesn’t work. Its driver is using u32 for dma address, and since mapped dma address is beyond 32 bit, the address get truncated, and got an mc-err.

I also tried this patch as per suggested by one forum:

But it didn’t work. pci_alloc_consistent() call will fail, so failed at probe time of wifi driver.

Looking deeper, I found that SWIOTLB is disabled. CONFIG_SWIOTLB is deselected, Even in Kconfig, it is exclusive with TEGRA.

config SWIOTLB
	def_bool y
	depends on !ARCH_TEGRA

I believe SWIOTLB provides bounce buffers if dma mapping is beyond device’s limit, which can potentially resolve the issue above. But why SWIOTLB is conflict with TEGRA? Is it already verified not working on your side?

To sum up my questions:

  1. IOMMU may have issue for 32 bit DMA addressing. I have found two PCIe wifi chips facing this issue. But I am not 100% sure on this (it could be wifi driver issue). Have you verified 32 bit IOMMU on your side? (If so, which wifi chip have you test?)
  2. When IOMMU is disabled, dma map returns address beyond 32 bit even though driver already called pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
  3. It seems using SWIOTLB dma ops may resolve my issue #2, but SWIOTLB is deselected when compile kernel, and Kconfig sets SWIOTLB conflicts with ARCH_TEGRA. Does it mean that I cannot use SWIOTLB, and you have already tested that SWIOTLB doesn’t work on TEGRA?

Best regards.

Hi NVIDIA,

Any update for my questions?

Thanks.

Sorry for the late response, will forward this issue to internal team to see if can have suggestions. Thanks

Hi,

32-bit IOMMU is verified with realtek WiFi, it works fine.
If you disable IOMMU, dma_map_single() return physical address which is already allocated, so pci_set_dma_mask() doesn’t guarantee you 32-bit phy address.

iova=0xfffaa980 looks fine to me, most probably because of some cache coherency or barrier issue WiFi is using this iova after it is unmapped by the driver. You have add prints in WiFi driver and track the iova which is causing the issue. See if there is any cache coherency/barrier are required to fix this issue.

Thanks,
Manikanta

1 Like

Hi @Manikanta

Thank you so much for your reply, and confirming on it.

I believe the dma_map_xxx/dma_alloc_xxx API of kernel handles cache coherency. So, most likely it is the barrier, or reordering caused by compiler optimization.

One last question, I think SWIOTLB dma ops provides bounce buffers which can be a potential work around for my issue. However, SWIOTLB is deselected when compile kernel, and Kconfig sets SWIOTLB conflicts with ARCH_TEGRA. Can I removed the conflict with ARCH_TEGRA in Kconfig, and try enable it? Have you tried it on your side?

Best regards.

No, I haven’t tried SWIOTLB.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.