How to disable IOMMU on Xavier NX?

Since you said you are using remap_pfn_range() API, I assume what you are trying to do is, allocating the memory in the kernel space and then mmap’ing it to the user space through remap_pfn_range() right? If so, how are you getting the ‘pfn’ that is to be supplied as one of the arguments to the remap_pfn_range() API?
If you are using either __pa() or virt_to_phys() to get the physical address from the virtual address, it may not work. Please use vmalloc_to_pfn() API to get the pfn from virtual address and then, use this pfn for remap_pfn_range() API. I hope this should solve your issue.

Hi @vidyas, thanks for your reply, yes, I did tried it as your advice, but still not work, the result is hang when I get data from user space.

Let me show the detail of my code:

  1. First allocated 7 x camera memory in kernel space

for(i=0; im_nMaxChl; i++) {
for( k=0; k<MAX_VIDEO_QUEUE;k++) {
{
pdx->m_pVideoData[i][k] = kmalloc((pdx->m_MaxHWVideoBufferSize), GFP_KERNEL);
if(!pdx->m_pVideoData[i][k]) {
pdx->m_bBufferAllocate = TRUE;
DmaMemFreePool(pdx);
pdx->m_bBufferAllocate = FALSE;
status = -1;
return status;
}
else
{
pdx->m_pVideoData_area[i][k] = (char *)(((unsigned long)pdx->m_pVideoData[i][k] + PAGE_SIZE -1) & PAGE_MASK);
for (phyvirt_addr=(unsigned long)pdx->m_pVideoData_area[i][k]; phyvirt_addr < ((unsigned long)pdx->m_pVideoData_area[i][k] + (pdx- >m_MaxHWVideoBufferSize));
phyvirt_addr+=PAGE_SIZE)
{
// reserve all pages to make them remapable
SetPageReserved(virt_to_page(phyvirt_addr));
}
memset(pdx->m_pVideoData[i][k],0x0,pdx->m_MaxHWVideoBufferSize );
}
}
}

  1. Configure
    static int uio_mmap_video0(struct file *filp, struct vm_area_struct *vma, int index)
    {
    int ret = 0;
    int index=0;
    int offset=0;
    int ch;
    ch = index;
    offset = vma->vm_pgoff<<PAGE_SHIFT;
    index = offset/MAX_MM_VIDEO_SIZE;
    ret = remap_pfn_range(vma, vma->vm_start, vmalloc_to_pfn(sys_dvrs_hw_pdx->m_pVideoData[ch][index]) >>PAGE_SHIFT, vma->vm_end - vma->vm_start, vma->vm_page_prot);
    if(ret)
    return -EAGAIN;
    return ret;
    }
    static int uio_mmap_video1(struct file *filp, struct vm_area_struct *vma, int index)
    {

    }
    static int uio_mmap_video2(struct file *filp, struct vm_area_struct *vma, int index)
    {

    }

    static int uio_mmap_video6(struct file *filp, struct vm_area_struct *vma, int index)
    {

    }

  2. Allocated memory for every camera
    for(i=0; im_nMaxChl; i= i+2)
    {
    pdx->m_pbyVideoBuffer[i] = kmalloc((pdx->m_MaxHWVideoBufferSize*MAX_VIDEO_PKSIZE), GFP_KERNEL);

      if(pdx->m_pbyVideoBuffer[i]== NULL)
      {
        printk("m_pbyVideoBuffer[%d] mem Allocate Fail ************\n", i);
        pdx->m_bBufferAllocate = TRUE;
        DmaMemFreePool(pdx);
        pdx->m_bBufferAllocate = FALSE;
        status = -1;
        return status;
      }
      pdx->m_pbyVideoBuffer_area[i] = (char *)(((unsigned long)pdx->m_pbyVideoBuffer[i] + PAGE_SIZE -1) & PAGE_MASK);
      for (phyvirt_addr=(unsigned long)pdx->m_pbyVideoBuffer_area[i]; phyvirt_addr < ((unsigned long)pdx->m_pbyVideoBuffer_area[i] + (pdx-       >m_MaxHWVideoBufferSize*MAX_VIDEO_PKSIZE));
      phyvirt_addr+=PAGE_SIZE)
      {
        // reserve all pages to make them remapable
        SetPageReserved(virt_to_page(phyvirt_addr));
      }
      memset(pdx->m_pbyVideoBuffer[i] , 0x0,(pdx->m_MaxHWVideoBufferSize*MAX_VIDEO_PKSIZE) );
    
      phy_addr= (u64*)virt_to_phys(pdx->m_pbyVideoBuffer[i]);
    
      pdx->m_pbyVideo_phys[i] = phy_addr;
      pdx->m_dwVideoBuffer[i] =     ((u64)phy_addr)&0xFFFFFFFF;
      pdx->m_dwVideoHighBuffer[i] = ((u64)phy_addr>>32)&0xFFFFFFFF;;
    
      pdx->m_pbyVideoBuffer[i+1] =  pdx->m_pbyVideoBuffer[i]+ pdx->m_MaxHWVideoBufferSize;
    
      phy_addr= (u64*)virt_to_phys(pdx->m_pbyVideoBuffer[i+1]);
      pdx->m_pbyVideo_phys[i+1] = phy_addr;
      pdx->m_dwVideoBuffer[i+1] =   ((u64)phy_addr)&0xFFFFFFFF;
      pdx->m_dwVideoHighBuffer[i+1] = ((u64)phy_addr>>32)&0xFFFFFFFF;
    
  3. When ioctl to get camera data from user space

pci_dma_sync_single_for_cpu(pdx->pdev,pdx->m_pbyVideo_phys[nDecoder],pdx->m_MaxHWVideoBufferSize,2);

  1. Copy data

bBuf = pdx->m_pVideoData[nDecoder][nIndex];
pSrcBuf = pdx->m_pbyVideoBuffer[nDecoder];
memcpy(bBuf,pSrcBuf,copysize);

Please note that if you are dealing with sizes of more than one page, then, you may have to call vmall_to_pfn() and remap_pfn_range() in a loop for each page. Since MMU and IOMMU are enabled, both VA and IOVA are contiguous in their own way but the physical address may not be contiguous and hence the mapping needs to be done for each page.

Hi @vidyas, thanks for your advice, of course the buffer for camera data(1280x720) is more then one page, could you give me some sample code to process this case? thanks.

int ret = 0;
int index=0;
int offset=0;
int ch;
ch = index;
offset = vma->vm_pgoff<<PAGE_SHIFT;
index = offset/MAX_MM_VIDEO_SIZE;
ret = remap_pfn_range(vma, vma->vm_start, vmalloc_to_pfn(sys_dvrs_hw_pdx->m_pVideoData[ch][index]) >>PAGE_SHIFT, vma->vm_end - vma->vm_start, vma->vm_page_prot);

Hi, I have update for this issue, I found the message from kernel(dmesg) when did same operate, please check it, thanks.

I’m sorry but we don’t have any sample code. But, as I said, you just need to repeat the procedure for each page and then keep track of all of them and then create mapping.

Hi @vidyas, okay, I will try later it.
but I have a question want to check it, as I said at early post, the same driver and application code works well in jeston TX2 and Nano

Well, it is possible that on Jetson TX2 and Nano, SMMU for PCIe may have been disabled. If that is something you would be interested in trying, I can provide the steps. But, before that, I would like you to confirm that the SMMU for PCIe is indeed disabled for PCIe.

As I posted on Jun 17, I hava refer to old patch for TX2, remove all iommu config in file ‘hardware/nvidia/soc/t19x/kernel-dts/tegra194-soc/tegra194-soc-pcie.dtsi’ for NX, but I don’t know I did right way or not, can you please check it? thanks.

Please use below two changes to disable SMMU for PCIe.
(The first change shows how it is to be done for the C5 controller. If the controller in question is not C5, then, a similar change needs to be made for the respective controller.
The second change is a common change.)

--- a/kernel-dts/tegra194-soc/tegra194-soc-pcie.dtsi
+++ b/kernel-dts/tegra194-soc/tegra194-soc-pcie.dtsi
@@ -867,13 +867,6 @@
                pinctrl-0 = <&pex_rst_c5_out_state>;
                pinctrl-1 = <&clkreq_c5_bi_dir_state>;
 
-               iommus = <&smmu TEGRA_SID_PCIE5>;
-               dma-coherent;
-#if LINUX_VERSION >= 414
-               iommu-map = <0x0 &smmu TEGRA_SID_PCIE5 0x1000>;
-               iommu-map-mask = <0x0>;
-#endif
-
                #interrupt-cells = <1>;
                interrupt-map-mask = <0 0 0 0>;
                interrupt-map = <0 0 0 0 &intc 0 53 0x04>;

and

--- a/drivers/iommu/arm-smmu-t19x.c
+++ b/drivers/iommu/arm-smmu-t19x.c
@@ -2535,7 +2535,10 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
        reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
 
        /* Enable fault reporting */
-       reg |= (sCR0_GFRE | sCR0_GFIE | sCR0_GCFGFRE | sCR0_GCFGFIE | sCR0_USFCFG);
+       reg |= (sCR0_GFRE | sCR0_GFIE | sCR0_GCFGFRE | sCR0_GCFGFIE);
+
+       /* Disable Unidentified stream fault reporting */
+       reg &= ~(sCR0_USFCFG);
 
        /* Disable TLB broadcasting. */
        reg |= (sCR0_VMIDPNE | sCR0_PTM);

Hi @vidyas, Thanks for your patch, It works on NX now, but I found below crash issue, and I change virt_to_phys() to vmalloc_to_pfn(),still crash.

[ 609.091216] irq 554 handler irqhandler+0x0/0x380 [dvrs_hw] enabled interrupts
[ 609.091238] ------------[ cut here ]------------
[ 609.091369] WARNING: CPU: 0 PID: 9635 at /home/xhz/nvidia/source/Linux_for_Tegra_nx_tx2/source/public/kernel_src/kernel/kernel-4.9/kernel/irq/handle.c:149 __handle_irq_event_percpu+0x238/0x288
[ 609.091643] Modules linked in: dvrs_hw(O) bnep fuse zram overlay spidev nvgpu bluedroid_pm ip_tables x_tables

[ 609.091712] CPU: 0 PID: 9635 Comm: lt_test Tainted: G O 4.9.140-tegra #3
[ 609.091719] Hardware name: NVIDIA Jetson Xavier NX Developer Kit (DT)
[ 609.091725] task: ffffffc19e128000 task.stack: ffffffc1a1fbc000
[ 609.091732] PC is at __handle_irq_event_percpu+0x238/0x288
[ 609.091737] LR is at __handle_irq_event_percpu+0x238/0x288
[ 609.091741] pc : [] lr : [] pstate: 40400045
[ 609.091745] sp : ffffffc1ffd07d40
[ 609.091749] x29: ffffffc1ffd07d40 x28: ffffffc19e128000
[ 609.091758] x27: ffffff800a0aa000 x26: ffffffc1f0c86400
[ 609.091767] x25: ffffffc1ffd07dcc x24: ffffff8009835018
[ 609.091776] x23: ffffff8009e76e48 x22: 000000000000022a
[ 609.091788] x21: 0000000000000000 x20: 0000000000000001
[ 609.091796] x19: ffffffc1ddf30480 x18: 0000000000000060
[ 609.091805] x17: 0000007fb7b0eb38 x16: 0000000000000000
[ 609.091814] x15: ffffffffffffffff x14: 747075727265746e
[ 609.091823] x13: 0000000000000000 x12: 0000000000000006
[ 609.091831] x11: 0000000000000002 x10: 00000000000003d0
[ 609.091844] x9 : 0000000000000001 x8 : ffffffc1ffcd82e3
[ 609.091858] x7 : 0000000000000000 x6 : 000000001313168b
[ 609.091867] x5 : 0000000000000000 x4 : ffffffc1ffd08be8
[ 609.091875] x3 : ffffffc1ffd08be8 x2 : 0000000000000007
[ 609.091885] x1 : ffffffc19e128000 x0 : 0000000000000041

[ 609.091897] —[ end trace 4deb775374cfcae6 ]—
[ 609.091981] Call trace:
[ 609.091988] [] __handle_irq_event_percpu+0x238/0x288
[ 609.091994] [] handle_irq_event_percpu+0x28/0x60
[ 609.092002] [] handle_irq_event+0x50/0x80
[ 609.092011] [] handle_simple_irq+0x8c/0xc0
[ 609.092017] [] generic_handle_irq+0x34/0x50
[ 609.092025] [] dw_handle_msi_irq+0xb4/0x108
[ 609.092032] [] tegra_pcie_msi_irq_handler+0x20/0x30
[ 609.092038] [] __handle_irq_event_percpu+0x68/0x288
[ 609.092044] [] handle_irq_event_percpu+0x28/0x60
[ 609.092049] [] handle_irq_event+0x50/0x80
[ 609.092055] [] handle_fasteoi_irq+0xc8/0x1b8
[ 609.092061] [] generic_handle_irq+0x34/0x50
[ 609.092066] [] __handle_domain_irq+0x68/0xc0
[ 609.092073] [] gic_handle_irq+0x5c/0xb0
[ 609.092082] [] el1_irq+0xe8/0x194
[ 609.092092] [] _raw_spin_unlock_irq+0x28/0x58
[ 609.092100] [] finish_task_switch+0x7c/0x1a8
[ 609.092106] [] schedule_tail+0x20/0x170
[ 609.092112] [] ret_from_fork+0x4/0x30
[ 615.356438] ########- [5][0] [0]

you may have to repeat the vmalloc_to_pfn() for all the pages. Is that being done here?

Thanks for your keep watching, I will try later.

Hi,vidyas, after I modify the kernel and tegra194-soc-pcie.dtsi, I can replace /boot/Image , but , * How should I replace the device tree?

Hi, vidyas, I replace the dts, /boot/dts/tegra194-p3668-all-p3509-0000.dtb, by kernel/kernel-4.9/arch/arm64/boot/dts/tegra194-p3668-all-p3509-0000.dtb. and replace Image. /boot/Image by kernel/kernel-4.9/arch/arm64/boot/Image , Did I miss anything?

The dtb is read from partition but not from /boot. So placing the dtb under /boot/dts will not work.

Please find the ubuntu device and use flash.sh to flash it into device.

Is there a way to just flash dtb?

You can use

sudo ./flash.sh -r -k kernel-dtb board-name mmcblk0p1

Thank you very much.