RGMII issue with Jetpack 5.1.1 (KSZ9131)

Hi,

we have a custom carrier board with an AGX Orin on it. The hardware includes an ethernet interface, which uses a KSZ9131 as PHY (connected via RGMII to the Orin).

The baseboard was developed for AGX Xavier and was already used for some years without any problems. We recently upgraded to the AGX Orin and successfully tested our hardware with Jetpack 5.1.

However, we now want to update to the previously released Jetpack 5.1.1 and face some issues with the ethernet connection mentioned above (Ethernet->KSZ9131->RGMII->Orin).

While the interface is initialized correctly and is able to detect the state of the link, no traffic is possible (in both directions). This leads to the assumption, that the MDIO interface is OK, but the RGMII part causes some errors.

I’ve already tried to diff the changes applied to kernel and devicetree (between Jetpack 5.1 and Jetpack 5.1.1). Devicetree does not differ too much regarding that interface, but a lot of things have been changed in the kernel as I could see.

Some facts:

  • Interface was working with AGX Xavier and Jetpack 4.6
  • Interface was working with AGX Orin and Jetpack 5.1
  • Interface is not working with AGX Orin and Jetpack 5.1.1
  • Pinmux and devicetree are identical for Jetpack 5.1 and Jetpack 5.1.1:
  ethernet@2310000 {
    status = "okay";
    nvidia,mac-addr-idx = <0>;
    nvidia,phy-reset-gpio = <&tegra_main_gpio TEGRA234_MAIN_GPIO(G, 5) 0>;
    phy-mode = "rgmii-id";
    phy-handle = <&phy>;
    /delete-node/ fixed-link;

    mdio {
      compatible = "nvidia,eqos-mdio";
      #address-cells = <1>;
      #size-cells = <0>;

      phy: phy@2 {
        reg = <2>;
        nvidia,phy-rst-pdelay-msec = <224>;
        nvidia,phy-rst-duration-usec = <10000>;
        interrupt-parent = <&tegra_main_gpio>;
        interrupts = <TEGRA234_MAIN_GPIO(G, 4) IRQ_TYPE_LEVEL_LOW>;
      };
    };
  };

pinmux.dtsi (65.2 KB)

Does anyone have an idea what could cause this problem?
Are there any known issues regarding this interface and the new Jetpack release?

Thanks in advance!
Greetings

Hi,

Are you talking about jetpack5.1.1 has some regression issue on your RGMII which was working on jp5.1?

Hi,

I managed to solve it myself. There was a minor change in my setup when moving from Jetpack-5.1 to Jetpack-5.1.1. This caused apply_binaries.sh to run without the flag --target-overlay, which somehow led to the problems I was facing.

However, I identified another issue regarding ethernet in Jetpack 5.1.1:
I changed the MTU of the eth-interface to 9000 and saw the following message in the log:
Macsec: Reduced MTU: 8966 Max: 9000
However, I’ve not enabled Macsec in the devicetree, so this should theoretically not happen.

Having a closer look at the kernel, I found the following lines in function ether_change_mtu() in file kernel/nvidia/drivers/net/ethernet/nvidia/nvethernet/ether_linux.c

static int ether_change_mtu(struct net_device *ndev, int new_mtu) {
// ...
	/* Macsec is not supported or not enabled in DT */
	if (!pdata->macsec_pdata) {
		netdev_info(pdata->ndev, "Macsec not supported or not enabled in DT\n");
	} else if ((osi_core->mac == OSI_MAC_HW_EQOS && osi_core->mac_ver == OSI_EQOS_MAC_5_30) ||
	    (osi_core->mac == OSI_MAC_HW_MGBE && osi_core->mac_ver == OSI_MGBE_MAC_3_10)) {
		/* Macsec is supported, reduce MTU */
		ndev->mtu -= MACSEC_TAG_ICV_LEN;
		netdev_info(pdata->ndev, "Macsec: Reduced MTU: %d Max: %d\n",
			    ndev->mtu, ndev->max_mtu);
	}
// ...
}

The MTU should not be reduced if Macsec is not enabled in the DT. In order to check if it is disabled, you check if pdata->macsec_pdata is set to NULL.

However, having a look at the function macsec_probe() in file kernel/nvidia/drivers/net/ethernet/nvidia/nvethernet/macsec.c I identified an issue with this check.

int macsec_probe(struct ether_priv_data *pdata) {
// ...
	/* Alloc macsec priv data structure */
	macsec_pdata = devm_kzalloc(dev, sizeof(struct macsec_priv_data),
				    GFP_KERNEL);
	if (macsec_pdata == NULL) {
		dev_err(dev, "failed to alloc macsec_priv_data\n");
		ret = -ENOMEM;
		goto exit;
	}
	macsec_pdata->ether_pdata = pdata;
	pdata->macsec_pdata = macsec_pdata;

	/* Read if macsec is enabled in DT */
	ret = of_property_read_u32(np, "nvidia,macsec-enable",
				   &macsec_pdata->is_macsec_enabled_in_dt);
	if ((ret != 0) || (macsec_pdata->is_macsec_enabled_in_dt == 0U)) {
		dev_info(dev,
			 "macsec param in DT is missing or disabled\n");
		ret = 1;
		goto init_err;
	}
// ...
init_err:
	devm_kfree(dev, pdata->macsec_pdata);
// ...
}

What basically happens is that you allocate space for pdata->macsec_pdata, check if macsec is enabled in the devicetree, see that it’s not enabled, free up the allocated space but don’t set pdata->macsec_pdata back to NULL.

This leads to the error that the MTU is reduced even though macsec is not enabled in the devicetree.

Thanks and BR.

Thanks for pointing this out. We will take a look.