MCX512A-ACAT problems with link status and ethtool on Centos 7

Hello,

I have been trying to configure my Mellanox NIC (MCX512A-ACAT) in order to have a 10G-BASE SR connection using SFP and fiber optic cables, to another board (a custom made board that supports the same type of connections). I have the following problems:

  1. I cannot set speed, duplex or autoneg via ethtool, using sudo/root.

The following outputs are produced:

ethtool -s enp6s0f1 autoneg off

Cannot set new settings: Operation not supported

not setting autoneg

ethtool -s enp6s0f1 duplex full

Cannot advertise duplex full

ethtool -s enp6s0f1 speed 10000

Cannot advertise speed 10000

  1. I have a link down, status that I cannot change in any way. Attempting the ip link set enp6s0f1 up returns nothing and does nothing.

Here are some outputs and some things I have found while trying to resolve this, that might be useful:

A) ifconfig

enp6s0f1: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500

inet 192.168.2.50 netmask 255.255.255.248 broadcast 192.168.2.55

inet6 fe80::a75d:a26b:de27:4c78 prefixlen 64 scopeid 0x20

ether 98:03:9b:cc:82:a9 txqueuelen 1000 (Ethernet)

RX packets 0 bytes 0 (0.0 B)

RX errors 0 dropped 0 overruns 0 frame 0

TX packets 0 bytes 0 (0.0 B)

TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

B) ethtool

Settings for enp6s0f1:

Supported ports: [ FIBRE ]

Supported link modes: 1000baseKX/Full

10000baseKR/Full

25000baseCR/Full

25000baseKR/Full

25000baseSR/Full

Supported pause frame use: Symmetric

Supports auto-negotiation: Yes

Supported FEC modes: Not reported

Advertised link modes: 1000baseKX/Full

10000baseKR/Full

25000baseCR/Full

25000baseKR/Full

25000baseSR/Full

Advertised pause frame use: Symmetric

Advertised auto-negotiation: Yes

Advertised FEC modes: Not reported

Speed: Unknown!

Duplex: Unknown! (255)

Port: FIBRE

PHYAD: 0

Transceiver: internal

Auto-negotiation: on

Supports Wake-on: d

Wake-on: d

Current message level: 0x00000004 (4)

link

Link detected: no

C) ip addr show enp6s0f1

5: enp6s0f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000

link/ether 98:03:9b:cc:82:a9 brd ff:ff:ff:ff:ff:ff

inet 192.168.2.50/29 brd 192.168.2.55 scope global noprefixroute enp6s0f1

valid_lft forever preferred_lft forever

inet6 fe80::a75d:a26b:de27:4c78/64 scope link tentative noprefixroute

valid_lft forever preferred_lft forever

D) This occurs on both slots of the NIC, using different SFPs and cables (which however produce light, have been tested for being functional and reliable). Currently, I am attempting an external loopback on the fibers, which fails, I presume due to the link status.

E) I am able to use nmtui to configure and activate my connections.

F) dmesg | grep -e mlx5_core -e enp6s0f1

[ 1.365269] mlx5_core 0000:06:00.0: firmware version: 16.24.1000

[ 1.365306] mlx5_core 0000:06:00.0: 63.008 Gb/s available PCIe bandwidth (8 GT/s x8 link)

[ 1.580318] mlx5_core 0000:06:00.0: irq 74 for MSI/MSI-X

[ 1.580331] mlx5_core 0000:06:00.0: irq 75 for MSI/MSI-X

[ 1.580336] mlx5_core 0000:06:00.0: irq 76 for MSI/MSI-X

[ 1.580342] mlx5_core 0000:06:00.0: irq 77 for MSI/MSI-X

[ 1.580347] mlx5_core 0000:06:00.0: irq 78 for MSI/MSI-X

[ 1.580352] mlx5_core 0000:06:00.0: irq 79 for MSI/MSI-X

[ 1.580356] mlx5_core 0000:06:00.0: irq 80 for MSI/MSI-X

[ 1.580360] mlx5_core 0000:06:00.0: irq 81 for MSI/MSI-X

[ 1.580365] mlx5_core 0000:06:00.0: irq 82 for MSI/MSI-X

[ 1.580369] mlx5_core 0000:06:00.0: irq 83 for MSI/MSI-X

[ 1.580374] mlx5_core 0000:06:00.0: irq 84 for MSI/MSI-X

[ 1.580378] mlx5_core 0000:06:00.0: irq 85 for MSI/MSI-X

[ 1.580385] mlx5_core 0000:06:00.0: irq 86 for MSI/MSI-X

[ 1.580389] mlx5_core 0000:06:00.0: irq 87 for MSI/MSI-X

[ 1.580394] mlx5_core 0000:06:00.0: irq 88 for MSI/MSI-X

[ 1.580398] mlx5_core 0000:06:00.0: irq 89 for MSI/MSI-X

[ 1.581401] mlx5_core 0000:06:00.0: Port module event: module 0, Cable unplugged

[ 1.588381] mlx5_core 0000:06:00.1: firmware version: 16.24.1000

[ 1.588437] mlx5_core 0000:06:00.1: 63.008 Gb/s available PCIe bandwidth (8 GT/s x8 link)

[ 1.808030] mlx5_core 0000:06:00.1: irq 91 for MSI/MSI-X

[ 1.808036] mlx5_core 0000:06:00.1: irq 92 for MSI/MSI-X

[ 1.808041] mlx5_core 0000:06:00.1: irq 93 for MSI/MSI-X

[ 1.808046] mlx5_core 0000:06:00.1: irq 94 for MSI/MSI-X

[ 1.808051] mlx5_core 0000:06:00.1: irq 95 for MSI/MSI-X

[ 1.808056] mlx5_core 0000:06:00.1: irq 96 for MSI/MSI-X

[ 1.808061] mlx5_core 0000:06:00.1: irq 97 for MSI/MSI-X

[ 1.808065] mlx5_core 0000:06:00.1: irq 98 for MSI/MSI-X

[ 1.808069] mlx5_core 0000:06:00.1: irq 99 for MSI/MSI-X

[ 1.808074] mlx5_core 0000:06:00.1: irq 100 for MSI/MSI-X

[ 1.808078] mlx5_core 0000:06:00.1: irq 101 for MSI/MSI-X

[ 1.808082] mlx5_core 0000:06:00.1: irq 102 for MSI/MSI-X

[ 1.808087] mlx5_core 0000:06:00.1: irq 103 for MSI/MSI-X

[ 1.808091] mlx5_core 0000:06:00.1: irq 104 for MSI/MSI-X

[ 1.808096] mlx5_core 0000:06:00.1: irq 105 for MSI/MSI-X

[ 1.808100] mlx5_core 0000:06:00.1: irq 106 for MSI/MSI-X

[ 1.809457] mlx5_core 0000:06:00.1: Port module event: module 1, Cable plugged

[ 1.816221] mlx5_core 0000:06:00.0: slow_pci_heuristic:4521:(pid 321): Max link speed = 25000, PCI BW = 63008

[ 1.816289] mlx5_core 0000:06:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)

[ 1.945731] mlx5_core 0000:06:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)

[ 6.913876] mlx5_core 0000:06:00.0 enp6s0f0: Link down

[ 6.919262] IPv6: ADDRCONF(NETDEV_UP): enp6s0f1: link is not ready

[ 6.985507] mlx5_core 0000:06:00.1 enp6s0f1: Link down

[ 6.988709] IPv6: ADDRCONF(NETDEV_UP): enp6s0f1: link is not ready

[ 17.441260] IPv6: ADDRCONF(NETDEV_UP): enp6s0f1: link is not ready

[ 17.443918] IPv6: ADDRCONF(NETDEV_UP): enp6s0f1: link is not ready

[69629.698456] mlx5_core 0000:06:00.1: Port module event: module 1, Cable unplugged

[69639.097465] mlx5_core 0000:06:00.1: Port module event: module 1, Cable plugged

[69720.194832] mlx5_core 0000:06:00.0: Port module event: module 0, Cable plugged

[146454.650697] IPv6: ADDRCONF(NETDEV_UP): enp6s0f1: link is not ready

[146529.819726] IPv6: ADDRCONF(NETDEV_UP): enp6s0f1: link is not ready

G) cat /etc/centos-release: CentOS Linux release 7.9.2009 (Core)

I apologize for the long post, and can provide any additional information. I have checked to the best of my ability all user guides and online resolutions, but if you have any suggestions they could be very helpful!

Mary

Hello,

You had mentioned a custom board being utilized on the peer side, and it may be a significant part of the link problem.

Please be advised that if the optical modules, cabling, and peer device on the other end of the link are not within the Firmware Compatible Product listings in the firmware release notes, it is considered an untested configuration. To confirm if the device and the interconnect devices between them are validated for use with the adapter, please check the Firmware Compatible Products matrix in the latest Firmware Release Notes:

https://docs.mellanox.com/display/ConnectX5Firmwarev16311014/Firmware+Compatible+Products

I do see the mlx5_core module loaded, but to be sure, we recommend ensuring you have the latest MLNX_OFED version supported for the adapter installed:

MLNX_OFED Download:

https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed

MLNX_OFED v5.4-3.0.3.0 User Manual:

https://docs.mellanox.com/display/MLNXOFEDv543030/NVIDIA+MLNX_OFED+Documentation+Rev+5.4-3.0.3.0

Based on the output you provided, we also noticed that you are utilizing firmware version 16.24.1000 which was released in December of 2018 and is not compatible with our latest MLNX_OFED driver; if at all possible, please consider updating to the latest firmware version for the MCX512A-ACAT, v16.31.1014 which can be found at the following link:

https://www.mellanox.com/support/firmware/connectx5en

To review the MLNX_OFED:Firmware compatibility matrix, please visit the following link:

https://www.mellanox.com/support/mlnx-ofed-matrix

Thank you,

Nvidia Network Support