RGMII Ethernet not working

Please be aware that we have this criteria in the first line of the document

Ensure that PHY comes out of reset mode and is in a ready state when the PHY reset GPIO is toggled.

Does that really happen on your side?

It comes out of reset at the end of the initialization sequence, right before the reset is permanently being pulled low at the end. It exchanges one MDIO message and then the Reset is pulled low by the jetson, putting it in reset and leaving it there, never to wake up again

That quick MDIO communication seems to be a failure though. The jetson seems to try to read something from the phy twice, which fails and then the Jetson pulls the reset low.

Is the driver of dp83867 also ready when the nvethernet driver is going to do the init sequence?

We had issues with getting DP83867 and based on errors i see it was probably something similar. I didnt read the entire discussion but over weekend i shall go through it and see if i can provide any suggestion that could help or atleast share back what worked for us.
Thanks
-maheshG

1 Like

That would be amazing, thank you!

The module is loaded on boot based on the entry in /etc/modules. That should be enough, right?

We also observed that G5 is not going low after the sequence if it is physically disconnected from the phy.

My best guess: In the absence of a “misconfigured” reset signal, the PHY starts, initialization with the phy works (70% of the time) and the jetson is happy keeping the signal on high.

When the reset is connected to the phy something about the signal makes the phy fail to respond correctly which makes the Jetson decide that it wants to keep it in reset forever.
This is based on the observation that in this case the Jetson tries (and fails) to exchange information with the PHY right before it’s pulled low.

This does not happen when the reset is physically disconnected though…

Hi,

Just to clarify the diagram you shared previously.

From this one I saw reset is from low to high and then low. It means reset should be set to high active.

However, one problem here is the timestamp seems too long. Looks like it took about 90 sec ?

Could you disable 6800000 MGBE node in your device tree if it is not in use?

Also, one more question here as I have asked this one for a while.

What is the exact software error from /3rdparty/canonical/linux-jammy/kernel-source/drivers/net/mdio/of_mdio.c and make that “MDIO device at address %d is missing” error happened?

Have you added some prints in driver and see what is going on?

Also, for your previous questions about the delay.

reset-delay-us:
    description:
      RESET pulse width in microseconds. It applies to all MDIO devices
      and must therefore be appropriately determined based on all devices
      requirements (maximum value of all per-device RESET pulse widths).
reset-post-delay-us:
    description:
      Delay after reset deassert in microseconds. It applies to all MDIO
      devices and it's determined by how fast all devices are ready for
      communication. This delay happens just before e.g. Ethernet PHY
      type ID auto detection.

We thought we did that, but it turned out we had an overlay in ODMDATA, that overwrote the status with okay again. This is fixed now, but apart from some dmesg error logging related to 6800000 disappearing it did not change anything.

The interesting part of the reset sequence is just in the very end. If it’s high or low for the 90s before that is not relevant I think.


At the end of the reset sequence we can see some MDIO communication that’s trying to read the id from the PHY, which fails and just returns a bunch of F’s:

We could also find the offending line in the kernel Code (kernel-jammy-source/drivers/net/phy/phy_device.c) and confirmed it with a print statement. See the following code on the very bottom. It has WARNING@231000 prepended, so we can easily grep for these messages:

/**
 * get_phy_c22_id - reads the specified addr for its clause 22 ID.
 * @bus: the target MII bus
 * @addr: PHY address on the MII bus
 * @phy_id: where to store the ID retrieved.
 *
 * Read the 802.3 clause 22 PHY ID from the PHY at @addr on the @bus,
 * placing it in @phy_id. Return zero on successful read and the ID is
 * valid, %-EIO on bus access error, or %-ENODEV if no device responds
 * or invalid ID.
 */
static int get_phy_c22_id(struct mii_bus *bus, int addr, u32 *phy_id)
{
	int phy_reg;

	/* Grab the bits from PHYIR1, and put them in the upper half */
	phy_reg = mdiobus_read(bus, addr, MII_PHYSID1);
	if (phy_reg < 0) {
		/* returning -ENODEV doesn't stop bus scanning */
		return (phy_reg == -EIO || phy_reg == -ENODEV) ? -ENODEV : -EIO;
	}

	*phy_id = phy_reg << 16;

	/* Grab the bits from PHYIR2, and put them in the lower half */
	phy_reg = mdiobus_read(bus, addr, MII_PHYSID2);
	if (phy_reg < 0) {
		/* returning -ENODEV doesn't stop bus scanning */
		return (phy_reg == -EIO || phy_reg == -ENODEV) ? -ENODEV : -EIO;
	}

	*phy_id |= phy_reg;

	/* If the phy_id is mostly Fs, there is no device there */
	if ((*phy_id & 0x1fffffff) == 0x1fffffff) {
		dev_err(&bus->dev, "WARNING@231000: get_phy_c22_id: %p\n", phy_id);	
		return -ENODEV;
	}

	return 0;
}

It handles the exact case we also see in the logic analyzer: Reading the phy ID returns mostly F’s.


The documentation you provided for the variables is helpful, but these seem like they are different variables than the ones from the Bring-Up guide. Are they the same?

reset-post-delay-us is probably closely related to nvidia,phy-rst-pdelay-msec, which we are using.

Do you think reset-delay-us is the same as nvidia,phy-rst-duration-usec then? It does not mention duration, but maybe that’s the same as the pulse width.

Increasing nvidia,phy-rst-pdelay-msec increases the time the reset line stays high in the two high-pulses in the image above.

Hi,

You could directly check the driver code and see where did those delay happen.

For example,

kernel/nvidia-oot/drivers/net/ethernet/nvidia/nvethernet/ether_linux.c

2592  	/* Reset the PHY */
2593  	if (gpio_is_valid(pdata->phy_reset)) {
2594  		gpio_set_value(pdata->phy_reset, 0);
2595  		usleep_range(pdata->phy_reset_duration,
2596  			     pdata->phy_reset_duration + 1);
2597  		gpio_set_value(pdata->phy_reset, 1);
2598  		msleep(pdata->phy_reset_post_delay);
2599  	}

phy_reset_duration is nvidia,phy-rst-duration-usec which is how long this reset GPIO is set to 0 and then 1.
and phy_reset_post_delay is how long to wait for this reset pin toggle and then move to next steps.

I wonder we need PHY vendor to check what requirement is missing here too as our MAC driver is not able to read out phy id.

The configuration of the reset works and I have verified with print statements.

The closest I could get to the source of the error by looking through the code and adding prints is here:

/**
 * fwnode_mdio_find_device - Given a fwnode, find the mdio_device
 * @fwnode: pointer to the mdio_device's fwnode
 *
 * If successful, returns a pointer to the mdio_device with the embedded
 * struct device refcount incremented by one, or NULL on failure.
 * The caller should call put_device() on the mdio_device after its use.
 */
struct mdio_device *fwnode_mdio_find_device(struct fwnode_handle *fwnode)
{
	struct device *d;

	if (!fwnode)
		return NULL;

	d = bus_find_device_by_fwnode(&mdio_bus_type, fwnode);
	if (!d){
		printk("DEBUG@231000: fwnode_mdio_find_device: bus_find_device_by_fwnode failed\n");
		return NULL;
	}

	return to_mdio_device(d);
}
EXPORT_SYMBOL(fwnode_mdio_find_device);

It’s hard following the call-graph from there, but I think bus_find_device_by_fwnode fails due to

	/* If the phy_id is mostly Fs, there is no device there */
	if ((*phy_id & 0x1fffffff) == 0x1fffffff)
		return -ENODEV;

in method get_phy_c22_id in file drivers/net/phy/phy_device.c

I still don’t understand, why the Ethernet works (80% of the time) if we disconnect the reset line between the phy and the Jetson. In that case it is apparently able to read the ID. Setting the phy_reset_post_delay to a whopping 30s does not help either.

I could also not found the place that sets the reset to 0 in nvethernet/ether_linux.c. None of the gpio_set_value(pdata->phy_reset, 0) seem to be executed based on some printk’s (same for gpio_set_value(pdata->phy_reset, OSI_DISABLE))

Something that is also weird is that the reset you mentioned should just go to low and then back to high. The sequence we are seeing seems to do this twice. The reset sequence you posted is however executed only once according to some prints I added.

@maheshramu.gaikwad Do you happen to have the device-tree that worked for you in the end? This still seems like the most probable place that we have to make adjustments in my mind.
The one that we currently put the most hope in (can probably be shortened a bit):

// SPDX-License-Identifier: GPL-2.0-only
/*
 * Device Tree Overlay for RGMII Gigabit Ethernet PHY
 * This overlay configures the EQOS Ethernet controller to use an external PHY
 * connected via RGMII interface.
 * Based on https://docs.nvidia.com/jetson/archives/r36.4/DeveloperGuide/HR/JetsonModuleAdaptationAndBringUp/JetsonAgxOrinSeries.html#for-rgmii
 */

/dts-v1/;
/plugin/;

#include <dt-bindings/gpio/tegra234-gpio.h>
#include <dt-bindings/interrupt-controller/irq.h>
#include <dt-bindings/net/ti-dp83867.h>

/ {
	overlay-name = "RGMII Gigabit Ethernet PHY";

	compatible = "nvidia,tegra234";

	fragment@0 {
		target-path = "/bus@0/ethernet@2310000";
		__overlay__ {
			status = "okay";
			nvidia,mac-addr-idx = <0>;
			nvidia,max-platform-mtu = <8000>;
			nvidia,pause_frames = <0>;
			local-mac-adress = [1a 2b 3c 4d 5e 6f];
			phy-mode = "rgmii-id";
			phy-handle = <&phy>;
			nvidia,phy-reset-gpio = <&gpio TEGRA234_MAIN_GPIO(G, 5) 0>;

			mdio {
				compatible = "nvidia,eqos-mdio";
				#address-cells = <1>;
				#size-cells = <0>;

				phy: phy@0 {
					reg = <0>;
					nvidia,phy-rst-pdelay-msec = <224>; /* msec */
					nvidia,phy-rst-duration-usec = <10000>; /* usec */
					interrupt-parent = <&gpio>;
					interrupts = <TEGRA234_MAIN_GPIO(G, 4) IRQ_TYPE_LEVEL_LOW>;
					// Extra settings based on https://e2e.ti.com/support/processors-group/processors/f/processors-forum/563409/linux-66ak2l06-how-to-configure-dp83867-ethernet-phy-in-the-device-tree/2073476#2073476
					compatible = "ethernet-phy-ieee802.3-c22";
					tx-fifo-depth = <DP83867_PHYCR_FIFO_DEPTH_4_B_NIB>;
					rx-fifo-depth = <DP83867_PHYCR_FIFO_DEPTH_4_B_NIB>;
					ti,max-output-impedance;
					// ti,clk-output-sel = <DP83867_CLK_O_SEL_CHN_A_RCLK>;
					ti,rx-internal-delay = <DP83867_RGMIIDCTL_2_25_NS>;
					ti,tx-internal-delay = <DP83867_RGMIIDCTL_2_75_NS>;
				};
			};
		};
	};

	fragment@1 {
		target-path = "/bus@0/ethernet@6800000";
		__overlay__ {
			status = "disabled";
			mdio {
				/delete-node/ phy@0;
			};
		};
	};
};