PCIe iommu fault with bcm5719

Hello

We encountered a problem.

We connect bcm5719 chip to PCIe bus

But we can’t enter user space sometimes.

The error log is below

[    8.233937] pcieport 0000:00:01.0: enabling device (0000 -> 0002)
[    8.234040] pcieport 0000:00:01.0: Signaling PME through PCIe PME interrupt
[    8.234042] pci 0000:01:00.0: Signaling PME through PCIe PME interrupt
[    8.234044] pci 0000:01:00.1: Signaling PME through PCIe PME interrupt
[    8.234045] pci 0000:01:00.2: Signaling PME through PCIe PME interrupt
[    8.234047] pci 0000:01:00.3: Signaling PME through PCIe PME interrupt
[    8.234053] pcie_pme 0000:00:01.0:pcie01: service driver pcie_pme loaded
[    8.234131] aer 0000:00:01.0:pcie02: service driver aer loaded
[    8.234230] tg3.c:v3.137 (May 11, 2014)
[    8.234237] tg3 0000:01:00.0: enabling device (0000 -> 0002)
[    8.234297] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[    8.234335] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[    8.234366] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[    8.234397] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[    8.234427] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[    8.234457] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[    8.234487] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[    8.234517] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[    8.234548] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[    8.234578] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[    8.237625] (255) csw_afiw: MC request violates VPR requirements
[    8.237653]   status = 0x00337031; addr = 0x3ffffffc0
[    8.237657]   secure: yes, access-type: write
[    8.246049] (255) csw_afiw: MC request violates VPR requirements
[    8.246052]   status = 0x00337031; addr = 0x3ffffffc0
[    8.246055]   secure: yes, access-type: write
[    8.253501] (255) csw_afiw: MC request violates VPR requirements
[    8.253505]   status = 0x00337031; addr = 0x3ffffffc0
[    8.253507]   secure: yes, access-type: write
[    8.255882] tg3 0000:01:00.0 eth1: Tigon3 [partno(647592-001) rev 5719001] (PCI Express) MAC address 00:10:18:57:19:00
[    8.255888] tg3 0000:01:00.0 eth1: attached PHY is 5719C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[    8.255891] tg3 0000:01:00.0 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[    8.255894] tg3 0000:01:00.0 eth1: dma_rwctrl[00000001] dma_mask[64-bit]
[    8.256287] tg3 0000:01:00.1: enabling device (0000 -> 0002)
[    8.261890] (255) csw_afiw: MC request violates VPR requirements
[    8.261893]   status = 0x00337031; addr = 0x3ffffffc0
[    8.261896]   secure: yes, access-type: write
[    8.267980] mc-err: Too many MC errors; throttling prints
[    8.279478] tg3 0000:01:00.1 eth2: Tigon3 [partno(647592-001) rev 5719001] (PCI Express) MAC address 00:10:18:57:19:02
[    8.279482] tg3 0000:01:00.1 eth2: attached PHY is 5718S (1000Base-SX Ethernet) (WireSpeed[0], EEE[0])
[    8.279485] tg3 0000:01:00.1 eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[    8.279488] tg3 0000:01:00.1 eth2: dma_rwctrl[00000001] dma_mask[64-bit]
[    8.279863] tg3 0000:01:00.2: enabling device (0000 -> 0002)
[    8.299465] tg3 0000:01:00.2 eth3: Tigon3 [partno(647592-001) rev 5719001] (PCI Express) MAC address 00:10:18:57:19:04
[    8.299469] tg3 0000:01:00.2 eth3: attached PHY is 5718S (1000Base-SX Ethernet) (WireSpeed[0], EEE[0])
[    8.299472] tg3 0000:01:00.2 eth3: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[    8.299475] tg3 0000:01:00.2 eth3: dma_rwctrl[00000001] dma_mask[64-bit]
[    8.299858] tg3 0000:01:00.3: enabling device (0000 -> 0002)
[    8.317147] tg3 0000:01:00.3 eth4: Tigon3 [partno(647592-001) rev 5719001] (PCI Express) MAC address 00:10:18:57:19:06
[    8.317151] tg3 0000:01:00.3 eth4: attached PHY is 5718S (1000Base-SX Ethernet) (WireSpeed[0], EEE[0])
[    8.317154] tg3 0000:01:00.3 eth4: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[    8.317157] tg3 0000:01:00.3 eth4: dma_rwctrl[00000001] dma_mask[64-bit]
[    8.317480] tegra-pcie 10003000.pcie-controller: speed change : Gen-1 -> Gen-2
[   13.236703] __arm_smmu_context_fault: 162132 callbacks suppressed
[   13.243992] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   13.259332] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   13.274679] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   13.289987] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   13.305389] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   13.320758] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   13.336123] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   13.351482] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   13.366891] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   13.382314] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   18.260730] __arm_smmu_context_fault: 153335 callbacks suppressed
[   18.268052] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   18.283490] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   18.298967] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   18.314445] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   18.330024] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   18.345585] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   18.361180] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   18.376773] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   18.392329] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   18.408026] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   22.700701] l4tbr0: port 2(usb1) entered forwarding state
[   23.284711] __arm_smmu_context_fault: 152618 callbacks suppressed
[   23.292083] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   23.307602] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   23.323124] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   23.338639] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   23.354168] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   23.369702] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   23.385235] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   23.400801] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   23.416370] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   23.431902] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x00000000, fsynr=0x11, cb=22, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
[   28.308717] __arm_smmu_context_fault: 152737 callbacks suppressed

If we don’t use tg3 driver, this problem isn’t found.

We tried to use original tg3 driver.

But this problem also happened.

How does we resolve this problem?

The version of driver package that we use is 28.2.1 from Jetson downloard center

The version of kernel source is also 28.2.1 form Jetson downloard center

tx2_errorlog.txt (128 KB)

Hello

I tested below item.

  1. Disable iommu of pcie -> this problem didn’t happen

  2. Check error function -> pci_set_master(if this function is executed, this problem may happen)

  3. try to modify code according to below answer -> this problem happened.
    https://devtalk.nvidia.com/default/topic/1002486/iommu-unhandled-context-fault-on-pci-device-dma/

But we must use iommu of pcie to control 5719 so I can’t disable iommu of pcie

Hi, heard that no such problem with TX1 module, right? Is the carrier board same between the tests with TX1 and TX2 module?

Hello Trumany

Yes, no such problem with TX1 module.

We use the same carrier board to do test.

Hello Trumany;

We found the iova is 0x0. Is this correct?

root@tegra-ubuntu:/sys/kernel/debug/12000000.iommu/cb022# cat iova_to_phys
iova=0x0000000000000000 pa=0x0000000000000000

Hi Chen.NeilZX,

We tried with another Ethernet card and it works on both TX1 and TX2.
Is this issue reproducible on Jetson TX2 DevKit (not your carrier board)?
Can removing tg3 driver be a solution in your situation?

Moreover, which PCIe card with bcm5719 chip do you use?

Hi Vickyy

We have solved this problem.

We modify reset sequence of bcm5719.

Thanks for your reply