2x Ethernets with Jetson Nano and Custom Board // IOMMU Problem

Hello together,

We have developed a custom board that is compatible with the Jetson Nano, Xavier NX and TX2 NX. There is a PCIE switch on the board and a 1GBit PHY (RTL8111E) behind it.

This is identical to the internal PHY of the Jetson Nano, so they also load the same driver (r8168). We have adjusted the file r8168_n.c so that we don’t create the same kernel object twice, which led to a kernel panic. We also do not have an EEPROM for our PCIE-1Gbit-Phy, so we use the random_mac() function. (r8168_n_custom.c.txt attached). We didn’t change anything in the device tree regarding the ethernet phy.

We currently have the following problem:

When we boot with the Ethernet PHY 1 (Jetson internal) and the Ethernet cable plugged in, we do not get kernel panics, the Ethernet PHY 1 & 2 have a valid MAC address and when we plug the Ethernet cable into the Ethernet PHY 2 (External PHY), the interface gets an IP address and can communicate on the network. → Works as designed (dmesg_works.txt)

When we boot with the Ethernet PHY 2 (Jetson External) and the Ethernet cable plugged in, we get a kernel panic after switching the link UP. Most of the time the Linux boots and gets slower and slower (because of the kernel panic flushing) until it crashes - (syslog_failed.txt output attached). If Ubuntu boots and then you unplug the LAN cable, it also goes straight into a kernel panic and the system freezes.

I would be grateful for any tips!

Thank you and have a nice evening,
Saber

r8168_n_custom.c.txt (1.0 MB)
dmesg_works.txt (72.4 KB)
syslog_failed.txt (744.4 KB)

Hi, can you share the system diagram of your design first? And are you testing with nano module?

Sure, attached is the block diagram and also the schematics.


Yes, we’re testing on Nano. On Xavier NX or TX2 it’s working.

Hi,
This is possible since Jetson Nano module is not 100% pin-to-pin compatible with Xavier NX and TX2 NX.

Please check Table 6-2, 6-3 in product design guide:
https://developer.nvidia.com/embedded/dlc/jetson-nano-product-design-guide

And share which PCIe lanes you are using. If it is different from Jetson Nano developer kit, would need to modify device tree.

Hi, we’ve checked the PCIE Pin Mapping, we’re using PCIE0 with lane 0. But at this point everything should work, since we also we’re able to get IP Pakets through the external Realtek Chip.

But this happens after replugging a ethernet cable on the external chip:

Dec 14 20:54:00 edgekit-desktop kernel: [ 8.644184] WARNING: CPU: 0 PID: 3961 at /data/Linux_for_Tegra/source/public/kernel/kernel-4.9/drivers/iommu/tegra-smmu.c:901 __smmu_iommu_map_pfn_default+0x21c/0x228 as described in the syslog_failed.txt

please share your log by checking from the UART node instead of sharing syslog.

Hi @WayneWWW, attached is the UART Output, while booting with connected Ethernet 2 (external Ethernet Phy).
uart_ethernet_2…log (148.0 KB)

Does the ethernet driver have this code?

/* disable ASPM completely as that cause random device stop working
	 * problems as well as full system hangs for some PCIe devices users */
	pci_disable_link_state(pdev, PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1 |
				     PCIE_LINK_STATE_CLKPM);

It’s the original code from Jetpack 4.6.1, we’ve just added a mac-address and kernel-object handling. So the code has the lines 3519 - 3527:

                        /*
                         * FIFO overrun errors are observed when ASPM is enabled
                         * and flow control is disabled. This is causing perf
                         * drop. So disable ASPM if flow control is disabled.
                         */
                        if (aspm && ((RTL_R8(PHYstatus) & (TxFlowCtrl | RxFlowCtrl)) !=
                            (TxFlowCtrl | RxFlowCtrl)))
                                pci_disable_link_state(tp->pci_dev, PCIE_LINK_STATE_L0S |
                                                       PCIE_LINK_STATE_L1 | PCIE_LINK_STATE_CLKPM);

I could rebuild the ethernet driver with debug prints, to check if this function is called. Or do i need to add something, that this is called?

Could you remove this part and try again?

Attached to log, with outcommented all pci_disable_link_state() functions:

SN12_Eth_2_boot_aspm_removed.log (31.5 KB)

Unfortunately we have the same problem.

Is it possible to validate this issue without that pcie switch?

Hi @WayneWWW, we’ve build up a new board, without the pcie switch. Same error, UART log attached.
SN_Eth2_boot.txt (37.8 KB)

I think the problem is, that the ethernet phy’s load the same driver and the driver maybe has a fixed IOMMU definition and cannot seperate between two different hardware-phy’s?

You can try to disable one pcie device to check if what you think is correct or not.

I’ve disabled the internal ethernet-phy:

 54                 pci@2,0 {
 55 #if TEGRA_XUSB_PADCONTROL_VERSION >= DT_VERSION_2
 56                         phys = <&{/xusb_padctl@7009f000/pads/pcie/lanes/pcie-0}>;
 57                         phy-names = "pcie-0";
 58 #endif
 59                         nvidia,num-lanes = <1>;
 60                         nvidia,plat-gpios = <&gpio TEGRA_GPIO(X, 3) GPIO_ACTIVE_HIGH>;
 61                         status = "disabled";
 62 
 63                         ethernet@0,0 {
 64                                 reg = <0x000000 0 0 0 0>;
 65                         };
 66                 };

But i still have the same errors… Normally it shouldn’t be a problem, that the same type of Ethernet-Phy is connected, right?

In the Syslog i see that the kernel have an error here:

[    8.568370] error:map req on already mapped pte, asid=10 iova=0x00000000feddf000 pa=0x00000001706ab000 prot=3 *pte=e008a583

Really strange new informations. When im trying the same board (and driver) with the Xavier NX the realtek 8111F ethernet-phy works after 2-3 minutes, but it hasn’t any kernel panics. After 2-3 minutes the networking works and plugging out the ethernet cable also works without any problems.

So the r8168 driver shouldn’t be the problem in our issue. Any new idea’s what it could be?

I have some updates in this topic, after days with the same problem and analyzing & trying everything we saw, that the R8168 Driver with the 8111F Realtek chip running on Ubuntu 18.04 is highly unstable. So on the Xavier NX we’re not building the R8168 driver anymore and let the R8169 driver do his thing. IP address via DHCP and networking over longer times also works. The only issue we have right now is: It just works in reboot scenarios and not in shutdown scenarios. (Reboot = IP address & everything works; Shutdown = Just MAC address and no IP address possible).

Attached the log files:
reboot_dmesg.txt (71.5 KB)
shutdown_dmesg.txt (71.2 KB)
syslog_reboot.txt (307.6 KB)
syslog_shutdown.txt (308.1 KB)

Final update for everyone with similar problems: Everything is working, when using the R8169 driver with custom /etc/network/interfaces configuration.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.