Network Issues Related to Page Faults Intel 82574L PCIe NIC

We have multiple carriers with Intel 82574L PCIe NICs that are no longer working as of L4T 24.2. Whenever the network interface is opened there is page faults thrown and then the adapter resets. This was not an issue in 24.1.

I have disabled Network-Manager and determined that the issue occurs when the port is opened. I am using the same e1000e driver as 24.1. the page fault errors I get are:

[ 2792.206979] smmu_dump_pagetable(): fault_address=0x00000000e7104000 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
[ 2792.218636] mc-err: (0) csr_afir: EMEM decode error on PDE or PTE entry
[ 2792.225434] mc-err: status = 0x6000000e; addr = 0xe7104000
[ 2792.231117] mc-err: secure: no, access-type: read, SMMU fault: nr-nw-s
[ 2796.098536] smmu_dump_pagetable(): fault_address=0x00000000e7104200 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
[ 2796.110358] mc-err: (0) csr_afir: EMEM decode error on PDE or PTE entry
[ 2796.117013] mc-err: status = 0x6000000e; addr = 0xe7104200
[ 2796.122690] mc-err: secure: no, access-type: read, SMMU fault: nr-nw-s
[ 2796.366948] smmu_dump_pagetable(): fault_address=0x0000000000000000 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
[ 2796.378674] mc-err: (0) csw_afiw: EMEM decode error on PDE or PTE entry
[ 2796.385647] mc-err: status = 0x60010031; addr = 0x00000000
[ 2796.391367] mc-err: secure: no, access-type: write, SMMU fault: nr-nw-s
[ 2797.006443] mc-err: Too many MC errors; throttling prints

This will happen repeatedly as long as the interface is up.

Has anyone seen these errors before? Has there been memory allocation changes between the 2 versions?

Any help would be great. Thanks,
Parker

It appears to only affect devices that are connected via a PCIe switch.

I can’t help much on this, but you might include the output from “sudo lspci -vvv” for the particular switch…though difficult, it would be useful to see the same output on the Jetson when it fails, plus the same command on any machine where the switch works (it could be the earlier R24.1 Jetson, or it could be a desktop computer).

Yeah i am having similar issues

They have really messed up the PCIe drivers it seems.
I am also having issues with a Marvell SATA controller that throws almost identical errors when I have it hooked up.
Others have these errors with their PCIe devices as well, network controllers, HBAs, FPGAs you name it.

I am currently looking into it. Any detailed information you can provide here would be helpful. I will post when I find a solution.

Can you please try with the following patch?

--- a/arch/arm64/mm/dma-mapping.c
    +++ b/arch/arm64/mm/dma-mapping.c
    @@ -2112,7 +2112,8 @@ static dma_addr_t arm_coherent_iommu_map_page(struct device *dev, struct page *p
         * compound page then there's probably a bug somewhere.
         */
        if (page_offset > 0)
    -       BUG_ON(page_count(page) == 1);
    +       BUG_ON(page_offset > (1 << compound_order(compound_head(page)))
    +           - ((page - compound_head(page)) << PAGE_SHIFT));

        dma_addr = __alloc_iova(mapping, len, attrs);
        if (dma_addr == DMA_ERROR_CODE)

Just wanted to add that I was hitting the same issue at the OP on L4T 24.2 (Ubuntu 16.04), and downgrading to L4T 24.1 (Ubuntu 14.04) did resolve it.

Has anyone tried the patch that Vidyas recommended for L4T 24.2?

Hi Saket,
I tried that patch and it made no difference. What did make a difference was removing the “iommus = <0x46 0x0>;” from the pcie-controller of the device tree. It was a new addition in 24.2.0 and it was breaking the memory mapping in some way. I never got the chance to determine the exact reason why however.

Hi pnewmanCTI – thank you for the info! We will give your suggested fix a try and report back later.

Hi,
patch mentioned in #7 alone should fix the issue.
@ pnewmanCTI, Can you please attach the log when you tried #7 but it failed?
Suggestion in #9 also works as it basically disables IOMMU/SMMU for PCIe and removes the need to have patch mentioned in #7