TX2 Ethernet interface going up and down at 1Gbps

We are seeing an Ethernet connectivity issue in our carrier board design. We connected the phy output through a 1:1 transformer Pulse HX5084NL to a Microchip KS9897R Ethernet switch on the same carrier board. Most of the time things are fine, but occasionally the link goes down for 3 seconds and comes back up intermittently. I need to know if there is some hardware issue I need to be looking for, or if this is a known issue with a workaround of some kind. Our interface operates at 1000Mb/sec and we need that to always operate at that speed for performance reasons.

I noticed others were having similar issues on the forum, but the results were not all posted at 1000Mb.

Here are the messages we get in the dmesg log:

[ 0.814972] eqos 2490000.ether_qos: Setting local MAC: 0 4 4b c4 8c a0
[ 10.909683] eqos 2490000.ether_qos eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 182.329696] eqos 2490000.ether_qos eth0: Link is Down
[ 185.512438] eqos 2490000.ether_qos eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 1167.301654] eqos 2490000.ether_qos eth0: Link is Down
[ 1170.521455] eqos 2490000.ether_qos eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 1238.145605] eqos 2490000.ether_qos eth0: Link is Down
[ 1241.201848] eqos 2490000.ether_qos eth0: Link is Up - 1Gbps/Full - flow control rx/tx

We checked the interface error counters with "ethtool -S eth0. Except for the disconnect/connect counts, we aren’t seeing ant error counts other than 0:

NIC statistics:
mmc_tx_octetcount_gb: 259504
mmc_tx_framecount_gb: 1300
mmc_tx_broadcastframe_g: 18
mmc_tx_multicastframe_g: 48
mmc_tx_64_octets_gb: 35
mmc_tx_65_to_127_octets_gb: 351
mmc_tx_128_to_255_octets_gb: 589
mmc_tx_256_to_511_octets_gb: 318
mmc_tx_512_to_1023_octets_gb: 5
mmc_tx_1024_to_max_octets_gb: 2
mmc_tx_unicast_gb: 1234
mmc_tx_multicast_gb: 48
mmc_tx_broadcast_gb: 18
mmc_tx_underflow_error: 0
mmc_tx_singlecol_g: 0
mmc_tx_multicol_g: 0
mmc_tx_deferred: 0
mmc_tx_latecol: 0
mmc_tx_exesscol: 0
mmc_tx_carrier_error: 0
mmc_tx_octetcount_g: 259504
mmc_tx_framecount_g: 1300
mmc_tx_excessdef: 0
mmc_tx_pause_frame: 0
mmc_tx_vlan_frame_g: 0
mmc_rx_framecount_gb: 1115
mmc_rx_octetcount_gb: 94042
mmc_rx_octetcount_g: 94042
mmc_rx_broadcastframe_g: 17
mmc_rx_multicastframe_g: 16
mmc_rx_crc_errror: 0
mmc_rx_align_error: 0
mmc_rx_run_error: 0
mmc_rx_jabber_error: 0
mmc_rx_undersize_g: 0
mmc_rx_oversize_g: 0
mmc_rx_64_octets_gb: 23
mmc_rx_65_to_127_octets_gb: 1078
mmc_rx_128_to_255_octets_gb: 6
mmc_rx_256_to_511_octets_gb: 4
mmc_rx_512_to_1023_octets_gb: 2
mmc_rx_1024_to_max_octets_gb: 2
mmc_rx_unicast_g: 1082
mmc_rx_length_error: 0
mmc_rx_outofrangetype: 0
mmc_rx_pause_frames: 0
mmc_rx_fifo_overflow: 0
mmc_rx_vlan_frames_gb: 0
mmc_rx_watchdog_error: 0
mmc_rx_ipc_intr_mask: 5368463355
mmc_rx_ipc_intr: 0
mmc_rx_ipv4_gd: 1092
mmc_rx_ipv4_hderr: 0
mmc_rx_ipv4_nopay: 0
mmc_rx_ipv4_frag: 0
mmc_rx_ipv4_udsbl: 0
mmc_rx_ipv6_gd_octets: 1024
mmc_rx_ipv6_hderr_octets: 0
mmc_rx_ipv6_nopay_octets: 0
mmc_rx_udp_gd: 32
mmc_rx_udp_err: 0
mmc_rx_tcp_gd: 1076
mmc_rx_tcp_err: 0
mmc_rx_icmp_gd: 0
mmc_rx_icmp_err: 0
mmc_rx_ipv4_gd_octets: 72594
mmc_rx_ipv4_hderr_octets: 0
mmc_rx_ipv4_nopay_octets: 0
mmc_rx_ipv4_frag_octets: 0
mmc_rx_ipv4_udsbl_octets: 0
mmc_rx_ipv6_gd: 16
mmc_rx_ipv6_hderr: 0
mmc_rx_ipv6_nopay: 0
mmc_rx_udp_gd_octets: 768
mmc_rx_udp_err_octets: 0
mmc_rx_tcp_gd_octets: 50370
mmc_rx_tcp_err_octets: 0
mmc_rx_icmp_gd_octets: 0
mmc_rx_icmp_err_octets: 0
q_re_alloc_rx_buf_failed[0]: 0
q_re_alloc_rx_buf_failed[1]: 0
q_re_alloc_rx_buf_failed[2]: 0
q_re_alloc_rx_buf_failed[3]: 0
q_re_alloc_rx_buf_failed[4]: 0
q_re_alloc_rx_buf_failed[5]: 0
q_re_alloc_rx_buf_failed[6]: 0
q_re_alloc_rx_buf_failed[7]: 0
tx_process_stopped_irq_n[0]: 0
tx_process_stopped_irq_n[1]: 0
tx_process_stopped_irq_n[2]: 0
tx_process_stopped_irq_n[3]: 0
tx_process_stopped_irq_n[4]: 0
tx_process_stopped_irq_n[5]: 0
tx_process_stopped_irq_n[6]: 0
tx_process_stopped_irq_n[7]: 0
rx_process_stopped_irq_n[0]: 0
rx_process_stopped_irq_n[1]: 0
rx_process_stopped_irq_n[2]: 0
rx_process_stopped_irq_n[3]: 0
rx_process_stopped_irq_n[4]: 0
rx_process_stopped_irq_n[5]: 0
rx_process_stopped_irq_n[6]: 0
rx_process_stopped_irq_n[7]: 0
tx_buf_unavailable_irq_n[0]: 0
tx_buf_unavailable_irq_n[1]: 0
tx_buf_unavailable_irq_n[2]: 0
tx_buf_unavailable_irq_n[3]: 0
tx_buf_unavailable_irq_n[4]: 0
tx_buf_unavailable_irq_n[5]: 0
tx_buf_unavailable_irq_n[6]: 0
tx_buf_unavailable_irq_n[7]: 0
rx_buf_unavailable_irq_n[0]: 0
rx_buf_unavailable_irq_n[1]: 0
rx_buf_unavailable_irq_n[2]: 0
rx_buf_unavailable_irq_n[3]: 0
rx_buf_unavailable_irq_n[4]: 0
rx_buf_unavailable_irq_n[5]: 0
rx_buf_unavailable_irq_n[6]: 0
rx_buf_unavailable_irq_n[7]: 0
rx_watchdog_irq_n: 0
fatal_bus_error_irq_n: 0
pmt_irq_n: 0
tx_normal_irq_n[0]: 1253
tx_normal_irq_n[1]: 0
tx_normal_irq_n[2]: 0
tx_normal_irq_n[3]: 0
tx_normal_irq_n[4]: 0
tx_normal_irq_n[5]: 0
tx_normal_irq_n[6]: 0
tx_normal_irq_n[7]: 0
rx_normal_irq_n[0]: 1024
rx_normal_irq_n[1]: 0
rx_normal_irq_n[2]: 0
rx_normal_irq_n[3]: 0
rx_normal_irq_n[4]: 0
rx_normal_irq_n[5]: 0
rx_normal_irq_n[6]: 0
rx_normal_irq_n[7]: 0
napi_poll_n: 2121
tx_clean_n[0]: 2121
tx_clean_n[1]: 0
tx_clean_n[2]: 0
tx_clean_n[3]: 0
tx_clean_n[4]: 0
tx_clean_n[5]: 0
tx_clean_n[6]: 0
tx_clean_n[7]: 0
tx_path_in_lpi_mode_irq_n: 69
tx_path_exit_lpi_mode_irq_n: 69
rx_path_in_lpi_mode_irq_n: 767
rx_path_exit_lpi_mode_irq_n: 767
tx_pkt_n: 1300
rx_pkt_n: 1115
tx_vlan_pkt_n: 0
rx_vlan_pkt_n: 0
tx_timestamp_captured_n: 0
rx_timestamp_captured_n: 0
tx_tso_pkt_n: 0
q_tx_pkt_n[0]: 1300
q_tx_pkt_n[1]: 0
q_tx_pkt_n[2]: 0
q_tx_pkt_n[3]: 0
q_tx_pkt_n[4]: 0
q_tx_pkt_n[5]: 0
q_tx_pkt_n[6]: 0
q_tx_pkt_n[7]: 0
q_rx_pkt_n[0]: 1115
q_rx_pkt_n[1]: 0
q_rx_pkt_n[2]: 0
q_rx_pkt_n[3]: 0
q_rx_pkt_n[4]: 0
q_rx_pkt_n[5]: 0
q_rx_pkt_n[6]: 0
q_rx_pkt_n[7]: 0
link_disconnect_count: 3
link_connect_count: 4

Wonder if disabling auto negotiation would work or not.

sudo ethtool -s eth0 autoneg off

Disabling auto negotiation seems to take the interface down. I can’t ping or SSH into the TX2/carrier after typing ‘sudo ethtool -s eth0 autoneg off’

I suppose the next thing to try is using a serial port terminal to see what happens after the above command is typed in, but I’ll need to get our software folks involved to enable the port.

Any other things to try?

Perhaps once auto negotiation is off you can manually set gigabit and it’ll come back on. Prior to doing anything take a look at your “ifconfig eth0” to see what settings it has (or any errors). After setting auto negotiation off, try “sudo ifup eth0”. If that doesn’t help, take a look at “man ethtool”, but basically it should go something like this (I haven’t verified this which is why I mention the man page…you might find something easy to fix there):

sudo ethtool speed 1000 duplex full eth0

Per the 1GB Ethernet spec, Auto-negotiation isn’t an option on a 1GB Ethernet link. We’ve tried this. It just effectively shuts off the interface.

Thanks,

Steve

Sorry for late reply. May I ask what is the purpose of transformer Pulse HX5084NL?

So far as I know the interface could require an ability to use auto negotiation, but the driver is allowed not advertise modes (it is possible to mask the modes advertised). Switching modes manually to various modes is not the same as not having auto negotiation. The goal of the command in #4 is not designed to remove auto negotiation, but instead is designed to manually pick a mode.

The Transformer is required to interface the Broadcom Phy in the TX2 the PHY in the Microchip KSZ9897R on-board switch. There is no cable connection in our carrier board design between the TX2 and the Ethernet switch; instead, it is done with the transformer and PCB traces.

The Broadcom PHY and the Microchip PHY use different driver schemes. The Broadcom PHY in the TX2 is a current mode driver, and the Microchip PHY is a voltage mode driver.

For voltage mode drivers, the transformer center taps are isolated from one another before being connected to capacitors to ground. For current mode drivers, all the center taps are tied together and grounded through a single cap. This difference requires the use of a 1:1 isolation transformer to connect the two PHYs back-to-back.

Additionally, for gigabit Ethernet connections, PHY-to-PHY connections using coupling caps won’t work like it does with 10/100 PHY’s. So the 1:1 transformer is required in any case.

Steve

@Linuxdev: I will try the command ‘ethtool -s eth0 advertise 0x20’ and see if it stops the link disconnects from happening.

It still doesn’t answer the question why this is happening in the first place.

Steve

Hi Rodgers,
I came across the thread while searching for solutions to the same problem. 100Mb works well for me with the KSz9897 but when I switch to Gigabit I get dropouts on our custom board between the TX2 and the switch.

Any luck with finding the root cause?

Regards
Bade

I can’t say much, due to NDA’s.

I had Microchip review the KSZ9897 part of the schematic using their LANCHECK service through the support section on the website. I made the recommended changes to my design and having a new board built. This issue doesn’t affect all assemblies so population testing is a good idea.

@rodgers,

Why do you think the PHY on the Jetson side is Current mode driven?
The BCM54610 is a voltage mode and they claim the BCM89610 currently being used is a “drop in replacement”.

I am hoping you have more info then I.
Unfortunately folks here are interested in having a capacitive coupled connection to another PHY on our carrier board. I just want to be sure it won’t work before we head to magnetics.

Cheers
Eric