2x Ethernets with Jetson Nano and Custom Board // IOMMU Problem

saber.kaygusuz · December 14, 2022, 8:57pm

Hello together,

We have developed a custom board that is compatible with the Jetson Nano, Xavier NX and TX2 NX. There is a PCIE switch on the board and a 1GBit PHY (RTL8111E) behind it.

This is identical to the internal PHY of the Jetson Nano, so they also load the same driver (r8168). We have adjusted the file r8168_n.c so that we don’t create the same kernel object twice, which led to a kernel panic. We also do not have an EEPROM for our PCIE-1Gbit-Phy, so we use the random_mac() function. (r8168_n_custom.c.txt attached). We didn’t change anything in the device tree regarding the ethernet phy.

We currently have the following problem:

When we boot with the Ethernet PHY 1 (Jetson internal) and the Ethernet cable plugged in, we do not get kernel panics, the Ethernet PHY 1 & 2 have a valid MAC address and when we plug the Ethernet cable into the Ethernet PHY 2 (External PHY), the interface gets an IP address and can communicate on the network. → Works as designed (dmesg_works.txt)

When we boot with the Ethernet PHY 2 (Jetson External) and the Ethernet cable plugged in, we get a kernel panic after switching the link UP. Most of the time the Linux boots and gets slower and slower (because of the kernel panic flushing) until it crashes - (syslog_failed.txt output attached). If Ubuntu boots and then you unplug the LAN cable, it also goes straight into a kernel panic and the system freezes.

I would be grateful for any tips!

Thank you and have a nice evening,
Saber

r8168_n_custom.c.txt (1.0 MB)
dmesg_works.txt (72.4 KB)
syslog_failed.txt (744.4 KB)

Trumany · December 15, 2022, 2:18am

Hi, can you share the system diagram of your design first? And are you testing with nano module?

saber.kaygusuz · December 15, 2022, 1:39pm

Sure, attached is the block diagram and also the schematics.

saber.kaygusuz · December 15, 2022, 7:13pm

Yes, we’re testing on Nano. On Xavier NX or TX2 it’s working.

DaneLLL · December 16, 2022, 5:15am

Hi,
This is possible since Jetson Nano module is not 100% pin-to-pin compatible with Xavier NX and TX2 NX.

Please check Table 6-2, 6-3 in product design guide:
https://developer.nvidia.com/embedded/dlc/jetson-nano-product-design-guide

And share which PCIe lanes you are using. If it is different from Jetson Nano developer kit, would need to modify device tree.

saber.kaygusuz · December 16, 2022, 9:04am

Hi, we’ve checked the PCIE Pin Mapping, we’re using PCIE0 with lane 0. But at this point everything should work, since we also we’re able to get IP Pakets through the external Realtek Chip.

But this happens after replugging a ethernet cable on the external chip:

Dec 14 20:54:00 edgekit-desktop kernel: [ 8.644184] WARNING: CPU: 0 PID: 3961 at /data/Linux_for_Tegra/source/public/kernel/kernel-4.9/drivers/iommu/tegra-smmu.c:901 __smmu_iommu_map_pfn_default+0x21c/0x228 as described in the syslog_failed.txt

WayneWWW · December 19, 2022, 3:18am

please share your log by checking from the UART node instead of sharing syslog.

saber.kaygusuz · December 19, 2022, 8:20am

Hi @WayneWWW, attached is the UART Output, while booting with connected Ethernet 2 (external Ethernet Phy).
uart_ethernet_2…log (148.0 KB)

WayneWWW · December 19, 2022, 8:44am

Does the ethernet driver have this code?

/* disable ASPM completely as that cause random device stop working
	 * problems as well as full system hangs for some PCIe devices users */
	pci_disable_link_state(pdev, PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1 |
				     PCIE_LINK_STATE_CLKPM);

saber.kaygusuz · December 19, 2022, 8:52am

It’s the original code from Jetpack 4.6.1, we’ve just added a mac-address and kernel-object handling. So the code has the lines 3519 - 3527:

                        /*
                         * FIFO overrun errors are observed when ASPM is enabled
                         * and flow control is disabled. This is causing perf
                         * drop. So disable ASPM if flow control is disabled.
                         */
                        if (aspm && ((RTL_R8(PHYstatus) & (TxFlowCtrl | RxFlowCtrl)) !=
                            (TxFlowCtrl | RxFlowCtrl)))
                                pci_disable_link_state(tp->pci_dev, PCIE_LINK_STATE_L0S |
                                                       PCIE_LINK_STATE_L1 | PCIE_LINK_STATE_CLKPM);

I could rebuild the ethernet driver with debug prints, to check if this function is called. Or do i need to add something, that this is called?

WayneWWW · December 19, 2022, 9:01am

Could you remove this part and try again?

saber.kaygusuz · December 19, 2022, 10:15am

Attached to log, with outcommented all pci_disable_link_state() functions:

SN12_Eth_2_boot_aspm_removed.log (31.5 KB)

Unfortunately we have the same problem.

WayneWWW · December 19, 2022, 11:03am

Is it possible to validate this issue without that pcie switch?

saber.kaygusuz · December 19, 2022, 3:42pm

Hi @WayneWWW, we’ve build up a new board, without the pcie switch. Same error, UART log attached.
SN_Eth2_boot.txt (37.8 KB)

I think the problem is, that the ethernet phy’s load the same driver and the driver maybe has a fixed IOMMU definition and cannot seperate between two different hardware-phy’s?

WayneWWW · December 19, 2022, 3:45pm

You can try to disable one pcie device to check if what you think is correct or not.

saber.kaygusuz · December 19, 2022, 4:17pm

I’ve disabled the internal ethernet-phy:

 54                 pci@2,0 {
 55 #if TEGRA_XUSB_PADCONTROL_VERSION >= DT_VERSION_2
 56                         phys = <&{/xusb_padctl@7009f000/pads/pcie/lanes/pcie-0}>;
 57                         phy-names = "pcie-0";
 58 #endif
 59                         nvidia,num-lanes = <1>;
 60                         nvidia,plat-gpios = <&gpio TEGRA_GPIO(X, 3) GPIO_ACTIVE_HIGH>;
 61                         status = "disabled";
 62 
 63                         ethernet@0,0 {
 64                                 reg = <0x000000 0 0 0 0>;
 65                         };
 66                 };

But i still have the same errors… Normally it shouldn’t be a problem, that the same type of Ethernet-Phy is connected, right?

In the Syslog i see that the kernel have an error here:

[    8.568370] error:map req on already mapped pte, asid=10 iova=0x00000000feddf000 pa=0x00000001706ab000 prot=3 *pte=e008a583

saber.kaygusuz · December 20, 2022, 8:57pm

Really strange new informations. When im trying the same board (and driver) with the Xavier NX the realtek 8111F ethernet-phy works after 2-3 minutes, but it hasn’t any kernel panics. After 2-3 minutes the networking works and plugging out the ethernet cable also works without any problems.

So the r8168 driver shouldn’t be the problem in our issue. Any new idea’s what it could be?

saber.kaygusuz · December 21, 2022, 6:31pm

I have some updates in this topic, after days with the same problem and analyzing & trying everything we saw, that the R8168 Driver with the 8111F Realtek chip running on Ubuntu 18.04 is highly unstable. So on the Xavier NX we’re not building the R8168 driver anymore and let the R8169 driver do his thing. IP address via DHCP and networking over longer times also works. The only issue we have right now is: It just works in reboot scenarios and not in shutdown scenarios. (Reboot = IP address & everything works; Shutdown = Just MAC address and no IP address possible).

Attached the log files:
reboot_dmesg.txt (71.5 KB)
shutdown_dmesg.txt (71.2 KB)
syslog_reboot.txt (307.6 KB)
syslog_shutdown.txt (308.1 KB)

saber.kaygusuz · January 5, 2023, 12:11pm

Final update for everyone with similar problems: Everything is working, when using the R8169 driver with custom /etc/network/interfaces configuration.

system · January 19, 2023, 12:11pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Eth1 and eth0 have same mac address in our custom board Jetson Nano pcie	8	1812	October 15, 2021
Issue with network driver: r8168: eth0: link down Jetson Nano ethernet	9	1923	October 18, 2021
Jetson Nano custom motherboard Wifi doesn't work Jetson Nano	6	2759	October 14, 2021
TX2 can't detect rtl8821ce with pci interface. Jetson TX2	9	1443	September 29, 2019
Ethernet Issue in jetson xavier nx Jetson Xavier NX ethernet	10	262	May 21, 2024
HDMI on custom board for orin enablement Jetson Orin Nano board-design , hdmi	32	1713	July 31, 2023
Mac 00:00:00:00:00:01 Jetson Nano ethernet	30	6230	August 25, 2021
Jetson Nano Development Kit USB Direct Mode Jetson Nano usb	17	973	July 24, 2023
Devices under PCIE packet switch sometimes are not detected after system boots or reboots Jetson Xavier NX pcie , board-design	42	4130	April 2, 2022
Can't get Jetson Nano to boot with custom pinmux configuration per Nvidia instructions Jetson Nano	19	4135	October 14, 2021

2x Ethernets with Jetson Nano and Custom Board // IOMMU Problem

Related topics