Dma_map_sg_attrs returns 0 for PCIe device after several iterations

Hi,

I have a problem when working with PCIe device on TX2 if IOMMU is enabled. For DMA transfers we need to lock a memory buffer, which involves calling dma_map_sg_attrs in a kernel driver. When IOMMU is active for PCIe controller it involves creating IOVA mappings, which seems to start failing after some amount of map/unmap iterations. Issue is not reproducible when IOMMU is disabled (dma_map_sg_attrs is still called of course).

I’ve prepared a test-case (ximea_cam_pcie.tar.gz (5.4 KB)) with simplified application and kernel driver. It requires some PCIe device attached to the system (anything will do), because otherwise there is no PCIe devices in the system to create DMA mappings for. To run:
tar -xf ximea_cam_pcie.tar.gz
sudo ximea_cam_pcie/run.sh
Then wait for it to fail (it may fail fast, or it may take hours, seems very random). In dmesg output you will see a error message “dma_map_sg_attrs returned 0” printed by kernel driver (line 157 of ximea_cam_pcie/ximea_cam_pcie.c).

It looks like a bug in IOMMU driver. I haven’t managed to track it down further, and hoping to find some help here. This issue has a workaround as I mentioned - disabling of IOMMU for PCIe, but since it’s enabled by default I would prefer to have this fixed and actually working with default L4T install in the future releases. This issue is long-standing, it probably began as soon as IOMMU was enabled for PCIe. Latest L4T versions I checked are 32.3.1 and then 32.4.4 using OTA update from 32.3.1.

Thanks for coming up with a test file to repro the issue. We’ll take a look at this and update with the fix soon.