PCIe DMA driver compatibility with Xavier SMMU/IOMMU

We previously worked around the issue of our PCIe DMA driver not playing well with the TX2’s SMMU by disabling the SMMU for PCIe (JetPack 3.3).


Now we need to do the same for Xavier on the latest release (4.2.1). It appears we could take the same approach as before using the instructions in comment #4 of https://devtalk.nvidia.com/default/topic/1043746/jetson-agx-xavier/pcie-smmu-issue/

We supply our software to our customers. Writing the instructions for the TX2 on how to modify the device tree is not something we’d like to do again (we can’t assume our users have experience in this area). It would be a much cleaner solution to fix our driver.

So, I’d like to follow up on what vidyas said (in comment #4 mentioned above) “Having said that, I genuinely feel that your code should be checked once as to why it can’t work with SMMU enabled for PCIe. It looks to me that you might not be using dma_alloc_* / dma_map_* APIs …”

We use dma_zalloc_coherent, dma_free_coherent and remap_pfn_range.

This driver works on the TX2 (with the SMMU disabled with JetPack 3.3), Ubuntu and CentOS.
Two buffers of 4MB are created (one for input, one for output)

A clip from dmesg:

[ 3880.742139] ii : Alloc size 4194304 at index 0 phaddr 0xc1d01000 
[ 3880.742201] ii_dma_mmap (773): 
[ 3880.742209] MMAP DMA #0 length 4194304
[ 3880.745345] ii : Alloc size 4194304 at index 1 phaddr 0xc2102000 
[ 3880.745408] ii_dma_mmap (773): 
[ 3880.745415] MMAP DMA #1 length 4194304
[ 3880.746550] ii_bar_mmap (794): 
[ 3880.746558] MMAP BAR #2 length 262144
[ 3892.563964] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0xc2103000, fsynr=0x2, cb=3, sid=91(0x5b - PCIE5), pgd=458274003, pud=458274003, pmd=0, pte=0
[ 3892.564338] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu1, iova=0xc2102000, fsynr=0x80002, cb=3, sid=91(0x5b - PCIE5), pgd=458274003, pud=458274003, pmd=0, pte=0
[ 3892.564828] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0xc2131c00, fsynr=0x2, cb=3, sid=91(0x5b - PCIE5), pgd=458274003, pud=458274003, pmd=0, pte=0

Some minimal code clips:

// allocating
context->dmas[idx].kaddr = dma_zalloc_coherent(&context->pci_dev->dev, context->dmas[idx].size, &context->dmas[idx].handle, GFP_KERNEL);										  
context->dmas[idx].paddr = virt_to_phys(context->dmas[idx].kaddr);

// mapping to user space
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
remap_pfn_range(vma, vma->vm_start, context->dmas[idx].paddr >> PAGE_SHIFT, vma->vm_end - vma->vm_start, vma->vm_page_prot))

// freeing								
dma_free_coherent(&context->pci_dev->dev, context->dmas[idx].size, context->dmas[idx].kaddr, context->dmas[idx].handle);

Q1: What do I need to know to make this code compatible with Xavier? (Actually, I’m hoping the same change will apply to the TX2 also.) I recently noticed dma_mmap_coherent(), but it appears to do the same as our mapping code above.

Q2: Should GFP_DMA be used instead of GPF_KERNEL? It didn’t seem to make a difference.

Q3: Where do I look from here? I’m searching devtalk and reading a lot of Linux docs. Is there an Nvidia doc on L4T, or the hardware, that I should be reading?

Q4: It appears that dma_zalloc_coherent is being phased out in favor of dma_alloc_coherent. Can anyone confirm this?

I’d appreciate any comments. Thanks.

So, from reading the documents in the kernel source tree and checking a could of existing drivers, it appears using dma_alloc_coherent, dma_free_coherent and remap_pfn_range are the functions to use.