Jetson AGX Orin: Low speeds and high packet loss on MGBE 5G Mode

Hi

I’m currently integrating Orin with a 10G Broadcom PHY, but I’m facing issues specifically with the 5G speed mode. Iperf3 is giving me low speed measurements when receiving data, coupled with a high retransmit count.

iperf3 -c 10.167.61.19 -p 10001 -t 30 
Connecting to host 10.167.61.19, port 10001
[  5] local 10.167.61.45 port 36286 connected to 10.167.61.19 port 10001
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   143 MBytes  1.20 Gbits/sec   71    218 KBytes       
[  5]   1.00-2.00   sec   139 MBytes  1.16 Gbits/sec   67   95.8 KBytes       
[  5]   2.00-3.00   sec   140 MBytes  1.18 Gbits/sec   66    122 KBytes       
[  5]   3.00-4.00   sec   134 MBytes  1.13 Gbits/sec   55    209 KBytes       
[  5]   4.00-5.00   sec  84.0 MBytes   705 Mbits/sec   83   17.4 KBytes       
[  5]   5.00-6.00   sec   975 KBytes  7.99 Mbits/sec   28   17.4 KBytes       
[  5]   6.00-7.00   sec   975 KBytes  7.99 Mbits/sec   28   17.4 KBytes       
[  5]   7.00-8.00   sec   975 KBytes  7.99 Mbits/sec   28   17.4 KBytes       
[  5]   8.00-9.00   sec  88.5 MBytes   743 Mbits/sec   51    331 KBytes       
[  5]   9.00-10.00  sec   144 MBytes  1.21 Gbits/sec   75    296 KBytes       
[  5]  10.00-11.00  sec   131 MBytes  1.10 Gbits/sec   62    122 KBytes       
[  5]  11.00-12.00  sec   111 MBytes   929 Mbits/sec   74   17.4 KBytes       
[  5]  12.00-13.00  sec  1.19 MBytes  9.98 Mbits/sec   72   17.4 KBytes       
[  5]  13.00-14.00  sec  1.19 MBytes  9.99 Mbits/sec   28   17.4 KBytes       
[  5]  14.00-15.00  sec   121 MBytes  1.01 Gbits/sec   56    148 KBytes       
[  5]  15.00-16.00  sec   149 MBytes  1.25 Gbits/sec   71    148 KBytes       
[  5]  16.00-17.00  sec  38.3 MBytes   321 Mbits/sec  103   17.4 KBytes       
[  5]  17.00-18.00  sec  1.19 MBytes  9.99 Mbits/sec   28   17.4 KBytes       
[  5]  18.00-19.00  sec  59.3 MBytes   497 Mbits/sec   41    131 KBytes       
[  5]  19.00-20.00  sec   149 MBytes  1.25 Gbits/sec   70    131 KBytes       
[  5]  20.00-21.00  sec   141 MBytes  1.19 Gbits/sec   64    183 KBytes       
[  5]  21.00-22.00  sec   139 MBytes  1.16 Gbits/sec   66    174 KBytes       
[  5]  22.00-23.00  sec   135 MBytes  1.14 Gbits/sec   70    174 KBytes       
[  5]  23.00-24.00  sec   144 MBytes  1.20 Gbits/sec   61    165 KBytes       
[  5]  24.00-25.00  sec   136 MBytes  1.14 Gbits/sec   76   26.1 KBytes       
[  5]  25.00-26.00  sec  0.00 Bytes  0.00 bits/sec   63   17.4 KBytes       
[  5]  26.00-27.00  sec  1.19 MBytes  9.98 Mbits/sec   28   17.4 KBytes       
[  5]  27.00-28.00  sec  22.1 MBytes   186 Mbits/sec   31    200 KBytes       
[  5]  28.00-29.00  sec   145 MBytes  1.22 Gbits/sec   64    165 KBytes       
[  5]  29.00-30.00  sec   155 MBytes  1.30 Gbits/sec   68    165 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-30.00  sec  2.59 GBytes   743 Mbits/sec  1748             sender
[  5]   0.00-30.00  sec  2.59 GBytes   742 Mbits/sec                  receiver

I also see the following kernel errors coming up sometimes while running iperf:

[  251.550361] nvethernet 6810000.ethernet: [xpcs_lane_bring_up][477][type:0x4][loga-0x0] PCS block lock SUCCESS
[  251.550853] nvethernet 6810000.ethernet eth1: Link is Up - 5Gbps/Full - flow control rx/tx
[  299.228680] nvethernet 6810000.ethernet: Failed to report error: reporter ID: 0x0, Error code: 0x1001, return: -19
[  299.228687] nvethernet 6810000.ethernet: Failed to report error: reporter ID: 0x0, Error code: 0x1003, return: -19

I’m only seeing the issue for the 5G mode. 10G and 2.5G are working without any problems. Both PHY and MAC are configured in XFI mode. There are no issues when sending data, only when receiving.

I’m running the tests by connecting our custom carrier board (with Broadcom PHY) to an Orin dev kit.

This is our current device tree configuration

	ethernet@6810000 {
		status = "okay";
		nvidia,mac-addr-idx = <1>;

		/* 0:XFI 10G, 1:XFI 5G, 2:USXGMII 10G, 3:USXGMII 5G */
		nvidia,phy-iface-mode = <0>;
		/* 0:5G, 1:10G */
		nvidia,uphy-gbe-mode = <1>;
		nvidia,phy-reset-gpio = <&tegra_main_gpio TEGRA234_MAIN_GPIO(Y, 1) 0>;
		phy-handle = <&bcm_phy>;
		phy-mode = "10gbase-r";

		mdio {
			/delete-node/ ethernet_phy@0;

			bcm_phy: ethernet_phy@1 {
				compatible = "ethernet-phy-ieee802.3-c45";
				reg = <0x1>;
				nvidia,phy-rst-pdelay-msec = <1000>;
				nvidia,phy-rst-duration-usec = <1000>;
				interrupt-parent = <&tegra_main_gpio>;
				interrupts = <TEGRA234_MAIN_GPIO(Y, 3) IRQ_TYPE_LEVEL_LOW>;
			};
		};
	};

Please share full dmesg.

Hi @WayneWWW

Here are the dmesg logs:
dmesg.log (91.0 KB)

A couple remarks

  • You will notice on the logs that I switched the MTU from 1500 to 8966, but the problem also happens with the lower mtu
  • I’m using ethtool to force the 5G speed: ethtool -s eth1 autoneg on speed 5000 duplex full
  • While trying to extract the logs, on one occasion I didn’t get the issue at all, iperf3 was measuring 4.95 Gbits/sec on both TX/RX with no retransmissions
  • I’m configuring the PHY to use the 5G/2.5G over XFI mode. I’m also enabling the PHY pause frame mode. Before enabling the pause frame mode, I was seeing issues on the TX speed as well, but after enabling it TX is always around 4.95 Gbits/sec, and it is only RX that is low

Here also the ethtool output:

root@skye-rev3-sb-cot:~# ethtool eth1
Settings for eth1:
        Supported ports: [  ]
        Supported link modes:   100baseT/Half 100baseT/Full
                                1000baseT/Full
                                10000baseT/Full
                                2500baseT/Full
                                5000baseT/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  5000baseT/Full
        Advertised pause frame use: Symmetric Receive-only
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Link partner advertised link modes:  10baseT/Half 10baseT/Full
                                             100baseT/Half 100baseT/Full
                                             1000baseT/Full
                                             10000baseT/Full
                                             2500baseT/Full
                                             5000baseT/Full
        Link partner advertised pause frame use: Symmetric Receive-only
        Link partner advertised auto-negotiation: Yes
        Link partner advertised FEC modes: Not reported
        Speed: 5000Mb/s
        Duplex: Full
        Auto-negotiation: on
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: external
        MDI-X: Unknown (auto)
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000000 (0)
                              
        Link detected: yes

Hi @WayneWWW

Any updates on this?

Sorry for late reply.

Just a clarification, is your problem coming from setting link mode from 10G to 5G by running ethtool?

Or it is set through the device tree which changed the nvidia,uphy-gbe-mode or nvidia,phy-iface-mode?

Hi @WayneWWW

It happens when I set the link mode from 10G to 5G using ethtool.

The device tree is configured for 10G XFI.

So if using nv devkit, will I see the same behavior if I change the speed from 10G to 5G?

Nope, the devkit doesn’t seem to show this issue. It seems to be specifically to our PHY, or how we configure it with the driver. I was wondering if you guys may have any insight on the errors reported by the nvethernet driver that could help us debug this issue further