After changing PHY 88E1512PB2 to RTL8211FI-CG, eqos_open failed

Hello Team,

  1. What did we do:
    On Jetson AGX Xavier board, we replaced the original Ethernet PHY 88E1512PB2 with Realtek’s RTL8211FI-CG, which is the same as Jetson Xavier NX reference design.
    HW connections are kept same as AGX Xavier, while SW side adds following device tree configuration (from Xavier NX):
    ether_qos@2490000 {
    nvidia,phy-reset-post-delay = <224>;
    nvidia,phy-reset-duration = <10000>;
    mdio {
    compatible = “nvidia,eqos-mdio”;
    #address-cells = <1>;
    #size-cells = <0>;
    phy0: ethernet-phy@0 {
    reg = <1>;
    };
    };
    };

  2. What is the issue
    The board failed to run the function eqos_open.
    Add comments (in bold) in original code to show what happened:
    static int eqos_open(struct net_device *dev)
    {

    pr_info(“–>eqos_open\n”); // See this message in log

     ......
     /* Reset the PHY */
     if (gpio_is_valid(pdata->phy_reset_gpio)) {
             gpio_set_value(pdata->phy_reset_gpio, 0);
             usleep_range(pdata->phy_reset_duration,
                          pdata->phy_reset_duration + 1);
             gpio_set_value(pdata->phy_reset_gpio, 1);
             msleep(pdata->phy_reset_post_delay);
     } **// Find GPIO went low and high, and 10ms duration from Oscilloscope**
    
     ret = eqos_clock_enable(pdata);
     if (ret) {
             dev_err(&dev->dev, "failed to enable clocks\n"); // Not see this error message
             return ret;
     }
    
     /* issue CAR reset to device */
     ret = hw_if->car_reset(pdata);
     if (ret < 0) {
             ret = -ENODEV;
             dev_err(&dev->dev, "Failed to reset MAC\n"); **// See this error message**
             goto err_mac_rst;
     }
    
     ......
    

}

On the other hand, eqos_car_reset in the function eqos_probe works fine, returns 0, and can even read back RTL8211 device ID 0x1cc916 through MDIO.

  1. What are the questions
    From the code of eqos_car_reset, function reset_control_reset will send bpmp reset message id 17 (<bpmp_resets 17U>) first, wait for 10 usec, and then check address “eqos_base_addr + 0x1000” (0x2491000) bit 0.
    So, questions are:
    1. What does bpmp processor do after receiving reset message id 17?
    2. What does register at “eqos_base_addr + 0x1000” (0x2491000) bit 0 mean?

Thanks.

Here is the kernel log:

[ 4.579353] -->eqos_init_module
[ 4.579763] -->eqos_probe()
[ 4.580050] res->start = 0x2490000
[ 4.580053] res->end = 0x249ffff
[ 4.580055] irq = 41
[ 4.580057] power_irq = 42
[ 4.580060] rx_irq[0]=43, tx_irq[0]=44
[ 4.580063] rx_irq[1]=45, tx_irq[1]=46
[ 4.580066] rx_irq[2]=47, tx_irq[2]=48
[ 4.580069] rx_irq[3]=49, tx_irq[3]=50
[ 4.580072] ============================================================
[ 4.580074] Sizeof tx context desc 16
[ 4.580076] Sizeof rx normal desc 16
[ 4.580078] Sizeof tx normal desc 16

[ 4.580081] ============================================================
[ 4.580142] -->eqos_init_function_ptrs_dev
[ 4.580145] <–eqos_init_function_ptrs_dev
[ 4.580171] eqos_regulator_init
[ 4.824366] <–eqos_car_reset()
[ 4.824482] phyirq = -1
[ 4.824508] eqos 2490000.ether_qos: failed to read eqos_auto_cal_config_0_reg
[ 4.824534] -->eqos_pad_calibrate()
[ 4.824637] <–eqos_pad_calibrate()
[ 4.824651] mac - user ID: 0x10, Synopsys ID: 0x50
[ 4.824653] -->eqos_get_all_hw_features
[ 4.824661] <–eqos_get_all_hw_features
[ 4.824663] -->eqos_print_all_hw_features

[ 4.824668] =====================================================/

[ 4.824672] 10/100 Mbps Support : YES
[ 4.824675] 1000 Mbps Support : YES
[ 4.824678] Half-duplex Support : YES
[ 4.824682] PCS Registers(TBI/SGMII/RTBI PHY interface) : NO
[ 4.824707] VLAN Hash Filter Selected : NO
[ 4.824709] SMA (MDIO) Interface : YES
[ 4.824712] PMT Remote Wake-up Packet Enable : YES
[ 4.824714] PMT Magic Packet Enable : YES
[ 4.824716] RMON/MMC Module Enable : YES
[ 4.824719] ARP Offload Enabled : YES
[ 4.824722] IEEE 1588-2008 Timestamp Enabled : YES
[ 4.824724] Energy Efficient Ethernet Enabled : YES
[ 4.824726] Transmit Checksum Offload Enabled : YES
[ 4.824729] Receive Checksum Offload Enabled : YES
[ 4.824731] MAC Addresses 16–31 Selected : YES
[ 4.824733] MAC Addresses 32–63 Selected : YES
[ 4.824736] MAC Addresses 64–127 Selected : YES
[ 4.824739] Timestamp System Time Source : INTERNAL
[ 4.824741] Source Address or VLAN Insertion Enable : YES
[ 4.824744] Active PHY Selected : RGMII
[ 4.824747] MTL Receive FIFO Size : 36 KBytes
[ 4.824750] MTL Transmit FIFO Size : 36 KBytes
[ 4.824752] IEEE 1588 High Word Register Enable : YES
[ 4.824754] DCB Feature Enable : NO
[ 4.824757] Split Header Feature Enable : YES
[ 4.824759] TCP Segmentation Offload Enable : YES
[ 4.824762] DMA Debug Registers Enabled : YES
[ 4.824764] AV Feature Enabled : YES
[ 4.824767] Low Power Mode Enabled : NO
[ 4.824770] Hash Table Size : No hash table selected
[ 4.824780] Total number of L3 or L4 Filters : 8 L3/L4 Filter
[ 4.824783] Number of MTL Receive Queues : 4
[ 4.824785] Number of MTL Transmit Queues : 4
[ 4.824788] Number of DMA Receive Channels : 4
[ 4.824790] Number of DMA Transmit Channels : 4
[ 4.824813] Number of PPS Outputs : No PPS output
[ 4.824818] Number of Auxiliary Snapshot Inputs : 1 auxillary input

[ 4.824821] =====================================================/
[ 4.824823] <–eqos_print_all_hw_features
[ 4.824886] setting MAC_1US_TIC to 204 MHz
[ 4.825294] eqos 2490000.ether_qos: Setting local MAC: 48 b0 2d 63 c2 69
[ 4.825334] Using phyrst_lpmode = 1 from DT
[ 4.825337] -->eqos_mdio_register
[ 4.825424] libphy: dwc_phy: probed
[ 4.825444] → eqos_mdio_read: phyaddr = 1, phyreg = 2
[ 4.825515] ← eqos_mdio_read: phydata = 0x1c
[ 4.825525] → eqos_mdio_read: phyaddr = 1, phyreg = 3
[ 4.825569] ← eqos_mdio_read: phydata = 0xc916
[ 4.826057] <–eqos_mdio_register
[ 4.826065] -->eqos_init_rx_coalesce
[ 4.826067] <–eqos_init_rx_coalesce
[ 4.826402] ← eqos_probe
[ 4.826497] tegra_eqos_max_state state=0
[ 4.826520] tegra_eqos_max_state state=0
[ 4.826536] tegra_eqos_max_state state=0
[ 4.826552] tegra_eqos_max_state state=0
[ 4.826567] tegra_eqos_max_state state=0
[ 4.826590] EQOS cooling dev registered
[ 4.827233] eqos_probe
[ 4.827413] eqos:driver registration sucessful
[ 4.827415] <–eqos_init_module

[ 9.987953] -->eqos_ioctl
[ 9.987961] <–eqos_ioctl - error
[ 10.568146] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 10.568301] -->eqos_open
[ 10.831218] <–eqos_car_reset()
[ 16.634854] net eth0: Failed to reset MAC
[ 16.637056] <–eqos_open()

Which JetPack version you’re using now?

AGX Xavier, with 32.7.1

Can we firstly check whether the vendor driver is loaded correctly first?

Yes, i can see device and driver registered:

hehe@ubuntu:~$ ls -l /sys/bus/mdio_bus/devices/dwc_phy-1:01/*
lrwxrwxrwx 1 root root 0 Jul 27 13:35 /sys/bus/mdio_bus/devices/dwc_phy-1:01/driver → ‘…/…/…/…/…/bus/mdio_bus/drivers/RTL8211F Gigabit Ethernet’
lrwxrwxrwx 1 root root 0 Jul 27 13:36 /sys/bus/mdio_bus/devices/dwc_phy-1:01/of_node → …/…/…/…/…/firmware/devicetree/base/ether_qos@2490000/mdio/ethernet-phy@0
-r–r–r-- 1 root root 4096 Jul 27 13:36 /sys/bus/mdio_bus/devices/dwc_phy-1:01/phy_has_fixups
-r–r–r-- 1 root root 4096 Jul 27 13:36 /sys/bus/mdio_bus/devices/dwc_phy-1:01/phy_id
-r–r–r-- 1 root root 4096 Jul 27 13:36 /sys/bus/mdio_bus/devices/dwc_phy-1:01/phy_interface
lrwxrwxrwx 1 root root 0 Jul 27 13:36 /sys/bus/mdio_bus/devices/dwc_phy-1:01/subsystem → …/…/…/…/…/bus/mdio_bus
-rw-r–r-- 1 root root 4096 Jul 27 13:36 /sys/bus/mdio_bus/devices/dwc_phy-1:01/uevent

The interesting thing is, I can see eqos_proble works well, while eqos_open fails always.

And eqos_open stops at function eqos_car_reset, which sends mail to bpmp and then waits for 1 bit, as explained in the first message. However, seems bpmp code is not released, and eqos manual is not either.

Hi,

Please also share the full dts in use now.

tegra194-p2888-0001-p2822-0000.dtb.dts.tmp (452.2 KB)

Could you directly convert your dtb back to dts and share?

The file shared (tegra194-p2888-0001-p2822-0000.dtb.dts.tmp) is exactly dts file.

In a word, it is exactly the same as original dts (tegra194-p2888-0001-p2822-0000), just add following, like mentioned at the first day.
ether_qos@2490000 {
nvidia,phy-reset-post-delay = <224>;
nvidia,phy-reset-duration = <10000>;
mdio {
compatible = “nvidia,eqos-mdio”;
#address-cells = <1>;
#size-cells = <0>;
phy0: ethernet-phy@0 {
reg = <1>;
};
};
};

BTW, anything else you may need? It is not so efficient to ask such questions 1 by 1.

How about have a quick check on what bpmp does after receiving <bpmp_resets 17U> in function eqos_car_reset?
We are blocked since eqos_car_reset returns -1, but we did not change any code in it, nor did we change the device tree related with it.

Hello,

Any update? Thanks.

Hello,

This needs some time to check. Will reply later.

And sorry that, it seems you didn’t get my point. Could you use dtc tool to convert back your dtb back to dts and just attach that dts here?

I shouldn’t see any duplicated files to include dtsi A/B/C/D… If you still don’t understand what I am talking about, please tell.

Sorry that it seems you didn’t get my point.
My dts file is exactly same as the original 32.7.2, guess you may have access to Nvidia’s own dts file? Thanks.

If you still don’t understand what I am talking about, please tell.

Hi,

I don’t want to argue about that. Can you just use dtc tool to convert your dtb back to dts and attach that dts ?

This is to make sure the dts content from you. I don’t know if you make any minor change to it. This could prevent any diff between what I see on my side and what you are doing on your side.

Let me explain why we do not feel so comfortable: if you really checked our questions, you might find they did not rely on device tree at all.

2 questions are listed at the first mail.
The 1st question is on user manual, which has nothing to do with device tree: What does register at “eqos_base_addr + 0x1000” (0x2491000) bit 0 mean?

The 2nd question is on bpmp’s behaviour: what does bpmp processor do after receiving reset message id 17? We have confirmed the meesage ID 17 already.

How about just spend some time, and answer straightforward.

Hi,

This is just a debug procedure here and we are just following that procedure first to provide necessary info.

I am not able to reply your question directly right now. But I will deliver this info to internal team and they will decide whether what you are asking is the right direction for debugging this issue or not.

For now, we just need you to provide device tree and full dmesg. Thank you in advance if you are willing to share them.

Hello,

Any good news? Thanks.