Network Issues Related to Page Faults Intel 82574L PCIe NIC

pnewmanCTI · October 27, 2016, 4:26pm

We have multiple carriers with Intel 82574L PCIe NICs that are no longer working as of L4T 24.2. Whenever the network interface is opened there is page faults thrown and then the adapter resets. This was not an issue in 24.1.

I have disabled Network-Manager and determined that the issue occurs when the port is opened. I am using the same e1000e driver as 24.1. the page fault errors I get are:

[ 2792.206979] smmu_dump_pagetable(): fault_address=0x00000000e7104000 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
[ 2792.218636] mc-err: (0) csr_afir: EMEM decode error on PDE or PTE entry
[ 2792.225434] mc-err: status = 0x6000000e; addr = 0xe7104000
[ 2792.231117] mc-err: secure: no, access-type: read, SMMU fault: nr-nw-s
[ 2796.098536] smmu_dump_pagetable(): fault_address=0x00000000e7104200 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
[ 2796.110358] mc-err: (0) csr_afir: EMEM decode error on PDE or PTE entry
[ 2796.117013] mc-err: status = 0x6000000e; addr = 0xe7104200
[ 2796.122690] mc-err: secure: no, access-type: read, SMMU fault: nr-nw-s
[ 2796.366948] smmu_dump_pagetable(): fault_address=0x0000000000000000 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
[ 2796.378674] mc-err: (0) csw_afiw: EMEM decode error on PDE or PTE entry
[ 2796.385647] mc-err: status = 0x60010031; addr = 0x00000000
[ 2796.391367] mc-err: secure: no, access-type: write, SMMU fault: nr-nw-s
[ 2797.006443] mc-err: Too many MC errors; throttling prints

This will happen repeatedly as long as the interface is up.

Has anyone seen these errors before? Has there been memory allocation changes between the 2 versions?

Any help would be great. Thanks,
Parker

pnewmanCTI · October 27, 2016, 5:33pm

It appears to only affect devices that are connected via a PCIe switch.

linuxdev · October 27, 2016, 7:39pm

I can’t help much on this, but you might include the output from “sudo lspci -vvv” for the particular switch…though difficult, it would be useful to see the same output on the Jetson when it fails, plus the same command on any machine where the switch works (it could be the earlier R24.1 Jetson, or it could be a desktop computer).

raak · October 29, 2016, 12:19pm

Yeah i am having similar issues

pushkar90 · October 31, 2016, 11:19pm

They have really messed up the PCIe drivers it seems.
I am also having issues with a Marvell SATA controller that throws almost identical errors when I have it hooked up.
Others have these errors with their PCIe devices as well, network controllers, HBAs, FPGAs you name it.

pnewmanCTI · November 1, 2016, 12:39pm

I am currently looking into it. Any detailed information you can provide here would be helpful. I will post when I find a solution.

vidyas · November 4, 2016, 5:11am

Can you please try with the following patch?

--- a/arch/arm64/mm/dma-mapping.c
    +++ b/arch/arm64/mm/dma-mapping.c
    @@ -2112,7 +2112,8 @@ static dma_addr_t arm_coherent_iommu_map_page(struct device *dev, struct page *p
         * compound page then there's probably a bug somewhere.
         */
        if (page_offset > 0)
    -       BUG_ON(page_count(page) == 1);
    +       BUG_ON(page_offset > (1 << compound_order(compound_head(page)))
    +           - ((page - compound_head(page)) << PAGE_SHIFT));

        dma_addr = __alloc_iova(mapping, len, attrs);
        if (dma_addr == DMA_ERROR_CODE)

Saket · March 1, 2017, 4:33am

Just wanted to add that I was hitting the same issue at the OP on L4T 24.2 (Ubuntu 16.04), and downgrading to L4T 24.1 (Ubuntu 14.04) did resolve it.

Has anyone tried the patch that Vidyas recommended for L4T 24.2?

pnewmanCTI · March 1, 2017, 1:38pm

Hi Saket,
I tried that patch and it made no difference. What did make a difference was removing the “iommus = <0x46 0x0>;” from the pcie-controller of the device tree. It was a new addition in 24.2.0 and it was breaking the memory mapping in some way. I never got the chance to determine the exact reason why however.

Saket · March 1, 2017, 5:02pm

Hi pnewmanCTI – thank you for the info! We will give your suggested fix a try and report back later.

vidyas · March 2, 2017, 12:25pm

Hi,
patch mentioned in #7 alone should fix the issue.
@ pnewmanCTI, Can you please attach the log when you tried #7 but it failed?
Suggestion in #9 also works as it basically disables IOMMU/SMMU for PCIe and removes the need to have patch mentioned in #7

Topic		Replies	Views
PCIe Drivers ? JetPack-L4T-2.2 - JetPack-L4T-2.3 DIFF Jetson TX1	9	1169	October 18, 2021
pcie driver fails after moving to latest l4t Jetson TX2	8	1907	October 18, 2021
PCIE driver issue with L4T R24.2.1 Jetson TX1	2	1365	October 18, 2021
iommu unhandled context fault on PCI device DMA Jetson TX2	21	9261	October 18, 2021
Find mc-err message on TX1 Jetson TX1	9	1752	May 3, 2018
What is the current status of PCIe DMA? Jetson TX1	21	8038	October 18, 2021
jetson-tx1 pcie2sata connect hdd disk error with ahci msi as interrupt Jetson TX1	27	4089	October 22, 2024
Altera FPGA DMA to TX2 via PCIe problem Jetson TX2	18	3538	October 18, 2021
SMMU fault on address 0x00000000ffd6c000 Jetson TX2 pcie	2	31	June 18, 2025
pci_map_sg sometimes leads to kernel-oops Jetson TX1	9	1981	October 18, 2021

Network Issues Related to Page Faults Intel 82574L PCIe NIC

Related topics