XavierNX EQOS LAN port sometimes doesn't link up

Hi, any update? We purchased XAVIER NX DEVELOPER KIT for $550 to investigate this issue.

Still checking this internally.

Hi shinichiro.adachi,

We follow your steps to reproduce issue, but after reboot, the system can’t auto continue the test.
For steps 8, inset a USB disk, do I need copy any file in USB disk? or only plug-in USB disk on Xavier-NX board?

Thanks!

Hi carolyuu,

For steps 8, inset a USB disk, do I need copy any file in USB disk? or only plug-in USB disk on Xavier-NX board?

No need to copy file. Just plug-in USB disk.

However, the transfer speed of USB disk may have something to do with it.

Currently, the USB2.0 check is enabled in pe_test.sh as shown below.

test_and_check_init "usb storage" "lsusb -t" "Mass Storage, Driver=usb-storage, 480M"

If you want to use USB 3.0 memory, please enable the following code,
and comment out USB2.0 check.

test_and_check_init "usb storage" "lsusb -t ; sleep 1" "Mass Storage, Driver=usb-storage, 5000M"

Hi,

Just plug-in the usb stick and reboot the device. And if previous setup is correct, we shall see the auto test start?

Will it show anything in the terminal or it will just keep rebooting?

Hi,

And if previous setup is correct, we shall see the auto test start?

Yes, if eth0 is linked up correctly and the USB stick is inserted,
it will automatically reboot and repeat the test.
If eth0 does not link up, it can be determined by the fact that it will not reboot.

Will it show anything in the terminal or it will just keep rebooting?

Just keep rebooting.

If you want to check test progress, stop test by unplugging the USB stick and check size of logfile in /root/pe_log/.

Note: XAVIER NX DEVELOPER KIT environment is no RTC with battery, so logfilename datetime is incorrect.

Hi shinichiro.adachi,

Could you get the PHY dump using mii-tool for each boot?
We can debug diff the values for phy when non working case.
Thanks!

Hi carolyuu,

I had already done mii register comparison a month ago on CustomBoard, not XavierEVK.

It seems that mii-tool does not have a function to dump MII registers with RAW values.

# mii-tool -v eth0
eth0: negotiated 1000baseT-FD flow-control, link ok
  product info: vendor 00:07:32, model 17 rev 6
  basic mode:   autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
  advertising:  1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control
  link partner: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control

I tried also ethtool -d, but EQOS driver (made by NVIDIA) does not implement get_regs() in eqos_ethtool_ops, so I cannot dump.

# ethtool -d eth0
Cannot get register dump: Operation not supported

Instead, I set CONFIG_DEBUGFS_OBJ=y and dumped from sysfs.

  • Link Up normally
# cat /sys/kernel/debug/2490000.eqos/BCM_REGS

BroadCom Phy Regs
Control(0x0)                        : 0x1040
Status(0x1)                         : 0x79ad
Id1 (0x2)                           : 0x1c
Id2 (0x3)                           : 0xc916
Auto Neg Advertisement(0x4)         : 0xde1
Auto Neg Link Partner Ability(0x5)  : 0xc5e1
Auto Neg Expansion(0x6)             : 0x6f
Auto Neg Next Page(0x7)             : 0x2001
1000 Base-T Control(0x9)            : 0x200
1000 Base-T Status(0xa)             : 0x800
IEEE Extended Status(0xf)           : 0x2000
Extended Control(0x10)              : 0x0
Extended Status(0x11)               : 0x0
Rx Error Count(0x12)                : 0x0
False Carrier Sense Count(0x13)     : 0x0
Rx Not Ok Count(0x14)               : 0x0
10BASE-T(0x18, Shadow 001)          : 0xffff
Power/Mii Ctrl(0x18, Shadow 010)    : 0x0
Misc Test (0x18, Shadow 100)        : 0xffff
Misc Ctrl(0x18, Shadow 111)         : 0xffff
Int Status(0x1a)                    : 0xffff
Int Mask(0x1b)                      : 0xffff
Pkt Count(expansion reg 0xf00)      : 0xffff
EEE advertisement(C45, Dev7, 0x3C)  : 0xffff
EEE resolution (C45, Dev7, 0x803E)  : 0xffff
LPI Mode Counter(C45, Dev7, 0x803F) : 0xffff
ev7, 0x803F) : 0xffff
  • PHY_AN not raised and cannot Link Up
# cat /sys/kernel/debug/2490000.eqos/BCM_REGS

BroadCom Phy Regs
Control(0x0)                        : 0x1040
Status(0x1)                         : 0x79ad
Id1 (0x2)                           : 0x1c
Id2 (0x3)                           : 0xc916
Auto Neg Advertisement(0x4)         : 0xde1
Auto Neg Link Partner Ability(0x5)  : 0xc5e1
Auto Neg Expansion(0x6)             : 0x6d
Auto Neg Next Page(0x7)             : 0x2001
1000 Base-T Control(0x9)            : 0x200
1000 Base-T Status(0xa)             : 0x800
IEEE Extended Status(0xf)           : 0x2000
Extended Control(0x10)              : 0x0
Extended Status(0x11)               : 0x0
Rx Error Count(0x12)                : 0x0
False Carrier Sense Count(0x13)     : 0x0
Rx Not Ok Count(0x14)               : 0x0
10BASE-T(0x18, Shadow 001)          : 0xffff
Power/Mii Ctrl(0x18, Shadow 010)    : 0x0
Misc Test (0x18, Shadow 100)        : 0xffff
Misc Ctrl(0x18, Shadow 111)         : 0xffff
Int Status(0x1a)                    : 0xffff
Int Mask(0x1b)                      : 0xffff
Pkt Count(expansion reg 0xf00)      : 0xffff
EEE advertisement(C45, Dev7, 0x3C)  : 0xffff
EEE resolution (C45, Dev7, 0x803E)  : 0xffff
LPI Mode Counter(C45, Dev7, 0x803F) : 0xffff
ev7, 0x803F) : 0xffff
nomal: Auto Neg Expansion(0x6)             : 0x6f
err:   Auto Neg Expansion(0x6)             : 0x6d

0110 1111
0110 1101

According to RTL8211F Datasheet...
8.3.7. ANER (Auto-Negotiation Expansion Register, Address 0x06)

6.1 Page Received (RC/LH)

1: A New Page (new LCW) has been received
0: A New Page has not been received

If the CustomBoard results are unreliable, we rebuild the Image to get it for EVK.

By the way, How is the reproduction in your environment?
Even if you don’t use our script, if you keep rebooting,
you should be able to reproduce the issue of no LinkUp and no IP assignment.

How is the result of sudo mii-tool -vvv eth0?

How is the result of sudo mii-tool -vvv eth0?

Thank you for your suggestion.
The following are the values when Link Up is performed normally.

# ifconfig 
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.0.77  netmask 255.0.0.0  broadcast 10.255.255.255
...

# mii-tool -vvv eth0
Using SIOCGMIIPHY=0x8947
eth0: negotiated 1000baseT-FD flow-control, link ok
  registers for MII PHY 1: 
    1040 79ad 001c c916 0de1 cde1 006f 2801
    6001 0200 7800 0000 0000 4007 0000 2000
    0000 0000 0000 0000 0000 0000 0000 0000
    211e 0862 38ec 0002 0000 0000 0000 0000
  product info: vendor 00:07:32, model 17 rev 6
  basic mode:   autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
  advertising:  1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control
  link partner: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control

Now, I am trying to reproduce it again.
If it reproduces, it will post the result of mii-tool dump as well.

I reproduced it overnight from yesterday.

The following are the values when No Link Up.

# ifconfig 
eth0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 48:b0:2d:3d:78:86  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)

# mii-tool -vvv eth0
Using SIOCGMIIPHY=0x8947
eth0: negotiated 1000baseT-FD flow-control, link ok
  registers for MII PHY 1: 
    1040 79ad 001c c916 0de1 cde1 006f 2801
    6001 0200 7800 0000 0000 0000 0000 2000
    0000 0000 0000 0000 0000 0000 0000 0000
    211e 0862 38ee 0002 0000 0000 0000 0000
  product info: vendor 00:07:32, model 17 rev 6
  basic mode:   autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
  advertising:  1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control
  link partner: 1000baseT-FD 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control

dmesg.txt (68.6 KB)

mii-tool say [link ok], but ethtool say [Link detected: no].

# ethtool eth0
Settings for eth0:
        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Advertised pause frame use: Symmetric Receive-only
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Speed: Unknown!
        Duplex: Unknown! (255)
        Port: MII
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: g
        Wake-on: d
        Link detected: no

Hi, any update?

Could you get the PHY dump using mii-tool for each boot?
We can debug diff the values for phy when non working case.

As requested, we have obtained the PHY dump by mii-tool.

Hi,

When you hit this issue, can you use below command and see if the interrupt is 2?

root@localhost:/home/nvidia# cat /proc/interrupts | grep phy
243: 2 0 0 0 tegra-gpio 52 Level phy_interrupt

ifconfig up and down shall give out the interrupt +2 each time. Can you check if you see the same behavior when error happens?

When you hit this issue, can you use below command and see if the interrupt is 2?

I reproduced problem again and got following output.

root@contec-desktop:~# cat /proc/interrupts | grep phy
 243:          1          0          0          0  tegra-gpio   52 Level     phy_interrupt

Interrupt is 1. I already know why this happens.
That’s because NETDEV_CHANGE related interrupt couldn’t happen when Link failed.

Link OK:

[    8.337451] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   10.922650] eqos 2490000.ether_qos eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[   10.923276] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

Link Failed:

[    7.974343] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready

The root of “link becomes ready” is phy_state_machine() state=PHY_AN
as described in the first POST 2 month ago…

All I want to know is why PHY_AN is sometimes not happen only XavierNX EVK/Custom Board.

We suspect that there is a lack of configuration between the XavierNX EQOS Controller and the RTL8211FD LAN chip.

According to the following information, the RTL8211 series may fail to link.
https://linux-sunxi.org/Ethernet#Realtek_RTL8211E

How about RTL8211FD case??
Are the settings such as “trace length compensation” appropriate?

We are still checking this issue on our side. Currently, there is still nothing we can share.
Sorry that we do have lots of issues from this forum, and need to debug them one by one. Thanks for your patience.

Hi shinichiro.adachi,

Test with your script and enable USB3.0 with pe_test.sh script, but still can’t auto run.
We test with our internal script, running over 3000 loops, can’t reproduce issue.

Hi @carolyuu,

We test with our internal script, running over 3000 loops, can’t reproduce issue.

In that case, could you please provide us with your script?
We want to check whether reproduce or not by your script.

Sorry, it cannot be provided. Can you guide us why your script cannot run?

Can you guide us why your script cannot run?

O.K.
In Jun 17,

We follow your steps to reproduce issue, but after reboot, the system can’t auto continue the test.

Does this mean that the script did not run to the end and reboot was not executed?

Is the script started automatically from systemd?

Where is the script stopping?

Could you provide the log file created in /root/pe_log ?

Could you provide the results of lsusb with the USB disk inserted?

Hi shinichiro.adachi,

Yes, after reboot device, the script doesn’t auto executed.

List my test steps:

Set IP on host-ubuntu and NX
Turn off WiFi and BT
Insert USB drive 
# cp pe_test.sh to /root/
# cp pe_test.service lib/systemd/system/
# chmod a+x /root/pe_test.sh
# systemctl enable pe_test
reboot device

Below is log from: /root/pe_log/:

# cat petest_log_20180128_155819.txt 
### INIT CHECK ### 
[1] VERSION [NG] date=20180128_155829
arg num=3
ITEM=VERSION
COMMAND=uname -a
EXPECTED=4.9.140
RESULT=Linux localhost.localdomain 4.9.201-tegra #1 SMP PREEMPT Fri Feb 19 08:42:04 PST 2021 aarch64 aarch64 aarch64 GNU/Linux
### INIT CHECK ### 
[1] VERSION [NG] date=20180128_155829
arg num=3
ITEM=VERSION
COMMAND=uname -a
EXPECTED=4.9.140
RESULT=Linux localhost.localdomain 4.9.201-tegra #1 SMP PREEMPT Fri Feb 19 08:42:04 PST 2021 aarch64 aarch64 aarch64 GNU/Linux
  • I will show you the lsusb result when I can enter the office.