Iommu error after flashed with jetpack-4.4

Hi,

I am using kernel and device-tree from jetpack-4.4DP. (l4t-32.4.2)
I have removed the following in pcie@14140000 section in device tree which disables iommu:

iommus = <&smmu TEGRA_SID_PCIE3>;
dma-coherent;

It worked before jetpack-4.4 (it was jetpack-4.2.2).
But now after I flashed jetpack-4.4 (using flash.sh), I got iommu error on PCIE3 (pcie@14140000).

[    6.991206] mwifiex_pcie 0003:01:00.0: enabling device (0000 -> 0002)
[    6.991265] mwifiex_pcie: try set_consistent_dma_mask(32)
[    6.991503] mwifiex_pcie: PCI memory map Virt0: ffffff801a500000 PCI memory map Virt2: ffffff801a700000
[    6.991785] mwifiex: rx work enabled, cpus 8
[    7.108380] mc-err: (255) csr_pcie3r: EMEM address decode error
[    7.108528] mc-err:   status = 0x200640de; addr = 0xffffffff00; hi_adr_reg=ff08
[    7.108661] mc-err:   secure: yes, access-type: read
[    7.108764] mc-err: mcerr: unknown intr source intstatus = 0x00000000, intstatus_1 = 0x00000000
[    7.108900] t19x-arm-smmu 12000000.iommu: SMMU1: Unexpected {global,context} fault, this could be serious
[    7.108912] t19x-arm-smmu 12000000.iommu: 	GFSR 0x80000002, GFSYNR0 0x00000000, GFSYNR1 0x00000459, GFSYNR2 0x00000000, fault_addr=0x4623bf580, sid=89(0x59 - PCIE3)
[    7.109386] mwifiex_pcie 0003:01:00.0: FW CRC error indicated by the	helper: len = 0x0011, txlen = 17
[    7.109440] mc-err: (255) csr_pcie3r: EMEM address decode error
[    7.109560] mc-err:   status = 0x200640de; addr = 0xffffffff00; hi_adr_reg=ff08
[    7.109695] mc-err:   secure: yes, access-type: read
[    7.109805] mc-err: mcerr: unknown intr source intstatus = 0x00000000, intstatus_1 = 0x00000000
[    7.109958] t19x-arm-smmu 12000000.iommu: SMMU1: Unexpected {global,context} fault, this could be serious
[    7.109967] t19x-arm-smmu 12000000.iommu: 	GFSR 0x80000002, GFSYNR0 0x00000000, GFSYNR1 0x00000459, GFSYNR2 0x00000000, fault_addr=0x4623bf580, sid=89(0x59 - PCIE3)
[    7.142227] mwifiex_pcie 0003:01:00.0: FW CRC error indicated by the	helper: len = 0x0011, txlen = 17
[    7.142285] mc-err: Too many MC errors; throttling prints
[    7.147793] t19x-arm-smmu 12000000.iommu: SMMU1: Unexpected {global,context} fault, this could be serious
[    7.147840] mwifiex_pcie 0003:01:00.0: FW download failure @ 16, over max	retry count
[    7.147848] mwifiex_pcie 0003:01:00.0: prog_fw failed ret=0xffffffff
[    7.147852] mwifiex_pcie 0003:01:00.0: info: mwifiex_fw_dpc: unregister device
[    7.157401] t19x-arm-smmu 12000000.iommu: 	GFSR 0x80000002, GFSYNR0 0x00000000, GFSYNR1 0x00000459, GFSYNR2 0x00000000, fault_addr=0x4623bf580, sid=89(0x59 - PCIE3)

Again, the kernel source and device tree are all same (jetpack-4.4DP or l4t-32.4.2). jetpack-4.2.2 works, but got this iommu error after flashed with jetpack-4.4 using flash.sh.

Thanks,
Shuo

Fault address 0xffffffff00 suggest that SMMU is still enabled. Check “/sys/kernel/iommu_groups/” if pcie@14140000 is present, it signifies that SMMU is enabled. Also check device tree node /proc/device-tree/pcie@14140000/iommus.

Hi,

I checked both.
It doesn’t have /proc/device-tree/pcie@14140000/iommus.
Also, /sys/kernel/iommu_groups/ only shows (no 14140000.pcie):

./1/devices/14100000.pcie
./2/devices/141a0000.pcie
./0/devices/14180000.pcie

Thanks,
Shuo

Try attached patch along with DT change.

iommu.txt (611 Bytes)

Hi,

After applying your patch, it doesn’t generate these iommu errors. Also, my wifi-chip on PCIe3 works now. Previously, it didn’t detect it.
dmesg:

[    6.753811] mwifiex_pcie 0003:01:00.0: enabling device (0000 -> 0002)
[    6.753873] mwifiex_pcie: try set_consistent_dma_mask(32)
[    6.754142] mwifiex_pcie: PCI memory map Virt0: ffffff801a600000 PCI memory map Virt2: ffffff801a800000
[    7.871151] mwifiex_pcie 0003:01:00.0: info: FW download over, size 843828 bytes
[    8.630659] mwifiex_pcie 0003:01:00.0: WLAN FW is active
[    8.696940] mwifiex_pcie 0003:01:00.0: Unknown api_id: 4
[    8.729255] mwifiex_pcie 0003:01:00.0: info: MWIFIEX VERSION: mwifiex 1.0 (15.68.7.p154) 
[    8.729267] mwifiex_pcie 0003:01:00.0: driver_version = mwifiex 1.0 (15.68.7.p154) 

Although the issue seems probably resolved, I am so confused right now.

  1. Is iommu still enabled at PCIe3?
  2. I was using the exact same kernel and device-tree (my customized 4.4DP kernel and dtb). But why it worked before in jetpack-4.2.2, but got this iommu error in jetpack-4.4.

Thanks,
Shuo

Is iommu still enabled at PCIe3?

No

I was using the exact same kernel and device-tree (my customized 4.4DP kernel and dtb). But why it worked before in jetpack-4.2.2, but got this iommu error in jetpack-4.4.

arm-smmu driver settings makes HW treat unknow SID as error. Since we disabled smmu in DT, we are seeing this error. Jetpack-4.4 has this kernel patch integrated, so it is observed from this build.

Hi,
Thanks for your reply. I am still a bit confused.
I am using my customized 4.4DP kernel in both 4.2.2 and 4.4. But 4.2.2 works fine. 4.4 got smmu error.
To be more clear:

  1. my customized 4.4DP kernel+dtb in jetpack 4.2.2: no error.
  2. the exact same customized 4.4DP kernel+dtb in jetpack 4.4: smmu error.

So I think it is probably caused by other jetpack-4.4 component rather than kernel. For example, cboot, bpmp software or dtb.

Thanks.

Yes, memory settings are done by bootloader.

Hi,

Can you please give me a hint where and how memory setting is done by bootloader?
Do you have a document for it?

I forgot to mention one thing that I am also using the same customized cboot in both 4.2.2 and 4.4 as well.
So for sure not caused by cboot.

So caused by memory setting by mb1, mb2?
If you can point me the NVIDIA document link for it, it can be very helpful.

Thanks

Hi,

You can use below method to disable mc override.
Please go to Linux_for_Tegra/bootloader/t186ref/BCT/tegra194-memcfg-sw-override.cfg

and modify the corresponding OverrideConfig PCIe to 0x7f.

For example,

-McSidStreamidOverrideConfigPcie0r = 0x00000056;
+McSidStreamidOverrideConfigPcie0r = 0x0000007f;

Hi,

Thanks for your reply.
Changing to McSidStreamidOverrideConfigPcie0r = 0x0000007f; doesn’t work.
Also, since smmu error is from PCIE3, should I change McSidStreamidOverrideConfigPcie3r = 0x0000007f;.

However, I tried changing McSidStreamidOverrideConfigPcie3r = 0x0000007f;, but still got smmu error.

I tried all 3:

  1. change pcie0 only:
McSidStreamidOverrideConfigPcie0r = 0x0000007f;
McSidStreamidOverrideConfigPcie0w = 0x0000007f;
  1. change pcie3 only:
McSidStreamidOverrideConfigPcie3r = 0x0000007f;
McSidStreamidOverrideConfigPcie3w = 0x0000007f;
  1. change both pcie0 and pcie3 in my 1 and 2.

None of them works.

dmesg:

[    6.829894] mwifiex_pcie 0003:01:00.0: enabling device (0000 -> 0002)
[    6.829956] mwifiex_pcie: try set_consistent_dma_mask(32)
[    6.830269] mwifiex_pcie: PCI memory map Virt0: ffffff8013100000 PCI memory map Virt2: ffffff8013300000
[    6.830558] mwifiex: rx work enabled, cpus 8
[    8.752773] mwifiex_pcie 0003:01:00.0: WLAN FW is active
[    8.758240] t19x-arm-smmu 12000000.iommu: SMMU0: Unexpected {global,context} fault, this could be serious
[    8.758449] t19x-arm-smmu 12000000.iommu: 	GFSR 0x80000002, GFSYNR0 0x00000002, GFSYNR1 0x00001059, GFSYNR2 0x00000000, fault_addr=0x462162980, sid=89(0x59 - PCIE3)
[    8.758951] mc-err: vpr base=0:c6000000, size=20, ctrl=3, override:(a01a8340, fcee10c1, 1, 0)
[    8.759115] mc-err: (255) csw_pcie3w: MC request violates VPR requirements
[    8.759239] mc-err:   status = 0x0ff740df; addr = 0xffffffff00; hi_adr_reg=008
[    8.759368] mc-err:   secure: yes, access-type: write
[    8.759462] mc-err: mcerr: unknown intr source intstatus = 0x00000000, intstatus_1 = 0x00000000
[    8.759524] mwifiex_pcie 0003:01:00.0: CMD_RESP: invalid cmd resp
[    8.759554] mwifiex_pcie 0003:01:00.0: info: mwifiex_fw_dpc: unregister device

Thanks,
Shuo

Hi NVIDIA,

Any update?
I changed tegra194-memcfg-sw-override.cfg to have:

McSidStreamidOverrideConfigPcie3r = 0x0000007f;
McSidStreamidOverrideConfigPcie3w = 0x0000007f;

This didn’t work.

I prefer to fix smmu error in tegra194-memcfg-sw-override.cfg instead of kernel patches.

Thanks.

please dump the result of

/sys/kernel/debug/tegra_mc_sid/ord# cat PCIE3*

Hi,

# cat PCIE3*
0000007f
00000059
# cat PCIE3R
0000007f
# cat PCIE3W
00000059

I am 100% sure that I have

McSidStreamidOverrideConfigPcie3r = 0x0000007f;
McSidStreamidOverrideConfigPcie3w = 0x0000007f;

So, it seems it needs some extra settings to set Pcie3w, or something else is reset it back?

Thanks.

Could you also share the values in below file with us?

Linux_for_Tegra/bootloader/tegra194-memcfg-sw-override.cfg

Hi,

It is having the correct values:

cat bootloader/tegra194-memcfg-sw-override.cfg | grep -i McSidStreamidOverrideConfigPcie3
McSidStreamidOverrideConfigPcie3r = 0x0000007f;
McSidStreamidOverrideConfigPcie3w = 0x0000007f;

I have attached the full file:
tegra194-memcfg-sw-override.cfg.txt (14.8 KB)

Thanks.

Hi,

I tested by flashing the tegra194-memcfg-sw-override.cfg from Jetpack-4.2.2, but the same smmu error still occurs.
This is how I did it:.

  1. copy the tegra194-memcfg-sw-override.cfg from Jetpack-4.2.2 to bootloader/t186ref/BCT/ folder of Jetpack-4.4. I renamed it as tegra194-memcfg-sw-override-422.cfg.
  2. In the jetson-xavier.conf file, add a line of EMMC_BCT1="tegra194-memcfg-sw-override-422.cfg";
  3. Put xavier to flash recovery mode.
  4. ./flash.sh -k MB1_BCT jetson-xavier mmcblk0p1

After flashing, the tegra194-memcfg-sw-override-422.cfg does copied from bootloader/t186ref/BCT/ to bootloader/. Also, from the ./flash.sh output on terminal, I can see that tegra194-memcfg-sw-override-422.cfg does get used.

This is the result of original jetpack-4.4 tegra194-memcfg-sw-override.cfg:

# cat PCIE0*
00000056
00000056
00000056
root@eagle-proto:/sys/kernel/debug/tegra_mc_sid/ord# cat PCIE1*
00000057
00000057
root@eagle-proto:/sys/kernel/debug/tegra_mc_sid/ord# cat PCIE2*
00000058
00000058
root@eagle-proto:/sys/kernel/debug/tegra_mc_sid/ord# cat PCIE3*
0000007f
00000059
root@eagle-proto:/sys/kernel/debug/tegra_mc_sid/ord# cat PCIE4*
0000005a
0000005a
root@eagle-proto:/sys/kernel/debug/tegra_mc_sid/ord# cat PCIE5*
0000005b
0000005b
0000005b

This is the result of modified jetpack-4.4 tegra194-memcfg-sw-override.cfg (change Pcie3 to 0x7f):

# cat PCIE0*
00000056
00000056
00000056
root@eagle-proto:/sys/kernel/debug/tegra_mc_sid/ord# cat PCIE1*
00000057
00000057
root@eagle-proto:/sys/kernel/debug/tegra_mc_sid/ord# cat PCIE2*
00000058
00000058
root@eagle-proto:/sys/kernel/debug/tegra_mc_sid/ord# cat PCIE3*
0000007f
00000059
root@eagle-proto:/sys/kernel/debug/tegra_mc_sid/ord# cat PCIE4*
0000005a
0000005a
root@eagle-proto:/sys/kernel/debug/tegra_mc_sid/ord# cat PCIE5*
0000005b
0000005b
0000005b

This is the result of using the 4.2.2 tegra194-memcfg-sw-override-422.cfg:

# cat PCIE0*
00000056
00000056
00000056
root@eagle-proto:/sys/kernel/debug/tegra_mc_sid/ord# cat PCIE1*
00000057
00000057
root@eagle-proto:/sys/kernel/debug/tegra_mc_sid/ord# cat PCIE2*
00000058
00000058
root@eagle-proto:/sys/kernel/debug/tegra_mc_sid/ord# cat PCIE3*
0000007f
00000059
root@eagle-proto:/sys/kernel/debug/tegra_mc_sid/ord# cat PCIE4*
0000005a
0000005a
root@eagle-proto:/sys/kernel/debug/tegra_mc_sid/ord# cat PCIE5*
0000005b
0000005b
0000005b

ALL THREE RESULTS ARE THE SAME!
I flashed by ./flash.sh -k MB1_BCT jetson-xavier mmcblk0p1. Do I need a full flash? From the output of ./flash.sh -k MB1_BCT jetson-xavier mmcblk0p1, I can tell that the tegra194-memcfg-sw-override.cfg does flashed to xavier. So I don’t think I need a full flash. So it looks like the mem cfg are overwrite by some thing else in later boot stage?

Thanks.

Hi,

The issue get resolved. It seems that I do need a full flash on xavier.
After full flash, SMMU error are gone. And

# cat PCIE0*
00000056
00000056
00000056
root@eagle-recovery:/sys/kernel/debug/tegra_mc_sid/ord# cat PCIE1*
00000057
00000057
root@eagle-recovery:/sys/kernel/debug/tegra_mc_sid/ord# cat PCIE2*
00000058
00000058
root@eagle-recovery:/sys/kernel/debug/tegra_mc_sid/ord# cat PCIE3*
0000007f
0000007f
root@eagle-recovery:/sys/kernel/debug/tegra_mc_sid/ord# cat PCIE4*
0000005a
0000005a
root@eagle-recovery:/sys/kernel/debug/tegra_mc_sid/ord# cat PCIE5*
0000005b
0000005b
0000005b

I just have one final question. Where can I find the document for this? It should be part of memory controller setting? I couldn’t find it in TRM nor the l4t online document.

Thanks.

1 Like