pcie smmu issue

Hi NVpeople

i can run the xilinx driver in tx2.
but it can’t wok fine in xavier.

the error message is as below

[  210.958192] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0x66000000, fsynr=0x250011, cb=3, sid=91(0x5b - PCIE5), pgd=0, pud=0, pmd=0, pte=0
[  210.959095] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0x66000200, fsynr=0x250011, cb=3, sid=91(0x5b - PCIE5), pgd=0, pud=0, pmd=0, pte=0
[  210.959606] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0x660003c0, fsynr=0x250011, cb=3, sid=91(0x5b - PCIE5), pgd=0, pud=0, pmd=0, pte=0
[  210.960019] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0x660003c0, fsynr=0x250011, cb=3, sid=91(0x5b - PCIE5), pgd=0, pud=0, pmd=0, pte=0
[  210.960589] mc-err: vpr base=0:0, size=0, ctrl=1, override:(a01a8340, fcee10c1, 1, 0)
[  210.960623] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0x66000600, fsynr=0x250011, cb=3, sid=91(0x5b - PCIE5), pgd=0, pud=0, pmd=0, pte=0
[  210.961012] mc-err: (255) csw_pcie5w: MC request violates VPR requirements
[  210.961160] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0x66000700, fsynr=0x250011, cb=3, sid=91(0x5b - PCIE5), pgd=0, pud=0, pmd=0, pte=0
[  210.961611] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0x66000800, fsynr=0x250011, cb=3, sid=91(0x5b - PCIE5), pgd=0, pud=0, pmd=0, pte=0
[  210.961881] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0x66000900, fsynr=0x250011, cb=3, sid=91(0x5b - PCIE5), pgd=0, pud=0, pmd=0, pte=0
[  210.962283] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0x66000a00, fsynr=0x250011, cb=3, sid=91(0x5b - PCIE5), pgd=0, pud=0, pmd=0, pte=0
[  210.962744] t19x-arm-smmu 12000000.iommu: Unhandled context fault: smmu0, iova=0x66000b00, fsynr=0x250011, cb=3, sid=91(0x5b - PCIE5), pgd=0, pud=0, pmd=0, pte=0
[  211.027111] mc-err:   status = 0x0ff740e3; addr = 0xffffffff00; hi_adr_reg=008
[  211.034457] mc-err:   secure: yes, access-type: write
[  211.039527] mc-err: unknown mcerr fault, int_status=0x00000000, ch_int_status=0x00000200, hubc_int_status=0x00000000 sbs_int_status=0x00000000, hub_int_status=0x00000000

i read the topic and it can work perfectly by disable SMMU for peic.

could you tell me how to disable the SMMU for PCIE5?

2 Likes

Hi NVpeople

could you give disable step to disable smmu?

Hi NVpeople

how did i know the range about iova ?

You can disable SMMU for PCIe controller-5 by removing the following two lines from the device-tree file

iommus = <&smmu TEGRA_SID_PCIE5>;
dma-coherent;

Having said that, I genuinely feel that your code should be checked once as to why it can’t work with SMMU enabled for PCIe. It looks to me that you might not be using dma_alloc_* / dma_map_* APIs which isn’t a good thing to do from the system security point of view as the PCIe controller (based on accesses from endpoint) can access any random memory location in the system with SMMU disabled.

Hi vidyas

thanks for your reply.
i find the node and disable it.
it can fixed smmu issue.
could i ask the dma*map function is necessary ??

and what is mc-err?

how to find the root cause and fixed it or work around ?

[  148.471226] mc-err: (255) csw_pcie5w: EMEM address decode error
[  148.471406] mc-err:   status = 0x200100e3; addr = 0x66000000; hi_adr_reg=008
[  148.471557] mc-err:   secure: no, access-type: write
[  148.471676] mc-err: unknown mcerr fault, int_status=0x00000000, ch_int_status=0x00000000, hubc_int_status=0x00000000 sbs_int_status=0x00000000, hub_int_status=0x00000000
[  148.471963] mc-err: (255) csw_pcie5w: EMEM address decode error
[  148.472083] mc-err:   status = 0x200100e3; addr = 0x66000200; hi_adr_reg=008
[  148.472228] mc-err:   secure: no, access-type: write
[  148.472345] mc-err: unknown mcerr fault, int_status=0x00000000, ch_int_status=0x00000000, hubc_int_status=0x00000000 sbs_int_status=0x00000000, hub_int_status=0x00000000
[  148.472635] mc-err: Too many MC errors; throttling prints

could i ask the dma*map function is necessary ??
It is better if you could (re-)write your driver using those APIs

and what is mc-err?
It is the error flagged by memory controller (mc) when an IP (in this case PCIe controller) is trying to access memory which is not allocated for it (or at least not allocated informing SMMU). If DMA APIs are used, then, SMMU comes to know of all the allocations done by different PCIe device drivers

how to find the root cause and fixed it or work around ?
One way is to find out which part of your code is reserving memory at locations 0x66000000, 0x66000200 (these addresses are there in the error log) and see what APIs are used to allocate memory and replace them with DMA APIs

For xavier and NX, you also have to set corresponding McSidStreamidOverrideConfigPcie* to 0x0000007f when disabling SMMU in DT.

-----
diff --git a/bootloader/t186ref/BCT/tegra194-memcfg-sw-override.cfg b/bootloader/t186ref/BCT/tegra194-memcfg-sw-override.cfg
index fbb107e..1c0de1b 100644
--- a/bootloader/t186ref/BCT/tegra194-memcfg-sw-override.cfg
+++ b/bootloader/t186ref/BCT/tegra194-memcfg-sw-override.cfg
@@ -224,13 +224,13 @@
McSidStreamidOverrideConfigNvenc1srd = 0x00000000;
McSidStreamidSecurityConfigNvenc1srd = 0x00010000;
McSidStreamidOverrideConfigNvenc1swr = 0x00000000;
McSidStreamidSecurityConfigNvenc1swr = 0x00010000;
-McSidStreamidOverrideConfigPcie0r = 0x00000056;
+McSidStreamidOverrideConfigPcie0r = 0x0000007f;
McSidStreamidSecurityConfigPcie0r = 0x00010101;
-McSidStreamidOverrideConfigPcie0w = 0x00000056;
+McSidStreamidOverrideConfigPcie0w = 0x0000007f;
McSidStreamidSecurityConfigPcie0w = 0x00010101;
-McSidStreamidOverrideConfigPcie1r = 0x00000057;
+McSidStreamidOverrideConfigPcie1r = 0x0000007f;
McSidStreamidSecurityConfigPcie1r = 0x00010101;
-McSidStreamidOverrideConfigPcie1w = 0x00000057;
+McSidStreamidOverrideConfigPcie1w = 0x0000007f;
McSidStreamidSecurityConfigPcie1w = 0x00010101;
McSidStreamidOverrideConfigPcie2ar = 0x00000058;