Dell Z9100-ON Switch + Mellanox/Nvidia MCX455-ECAT 100GbE QSFP28 Question

I have a new Dell Z9100-ON 32-Port 100G QSFP28 switch, and a rack of servers with Mellanox/Nvidia MCX 455-ECAT 100G QSFP28 cards in them, with the Servers themselves running under VMware vCenter 7.0.3. When connecting port-to-port on the switch using a genuine Dell 100G direct-attach cable, I get a link light, full 100G, no issue. When connecting server-to-server (NIC to NIC) using that same cable, I get link, full 100G, no issue. But when connecting from Switch to NIC, I can’t get link. I’ve gone into the switch via the console, set all the ports to switchport mode, no shutdown, fec enable, autonegotiation, LLDP, etc., but with no change.

Does anyone have experience connecting Mellanox / Nvidia cards to this particular switch? If so, can you give me some advice as to what the issue is, and how I can get these cards to work with this switch?

For example, here is the switch port status for a port that’s got a Dell 100G direct-attach cable in it, connected to one of the NICs:

DellEMC(conf-if-range-hu-1/1-1/32)#do show interfaces hundredGigE 1/2
hundredGigE 1/2 is up, line protocol is down
Hardware is DellEMCEth, address is 54:bf:64:b9:2f:42
    Current address is 54:bf:64:b9:2f:42
Pluggable media present, QSFP28 type is 100GBASE-CR4-3M
    AutoNegotiation is ON
    Forward Error Correction(FEC) configured is cl91 (RS-FEC)
    FEC status is OFF
    Wavelength is 64nm
Interface index is 2097678
Internet address is not set
Mode of IPv4 Address Assignment : NONE
DHCP Client-ID :54bf64b92f42
MTU 1554 bytes, IP MTU 1500 bytes
LineSpeed 100000 Mbit
Flowcontrol rx off tx off
ARP type: ARPA, ARP Timeout 04:00:00
Last clearing of "show interface" counters 00:16:44
Queueing strategy: fifo
Input Statistics:
     0 packets, 0 bytes
     0 64-byte pkts, 0 over 64-byte pkts, 0 over 127-byte pkts
     0 over 255-byte pkts, 0 over 511-byte pkts, 0 over 1023-byte pkts
     0 Multicasts, 0 Broadcasts, 0 Unicasts
     0 runts, 0 giants, 0 throttles
     0 CRC, 0 overrun, 0 discarded
     0 FEC bit errors, 0 FEC uncorrected code words
Output Statistics:
     0 packets, 0 bytes, 0 underruns
     0 64-byte pkts, 0 over 64-byte pkts, 0 over 127-byte pkts
     0 over 255-byte pkts, 0 over 511-byte pkts, 0 over 1023-byte pkts
     0 Multicasts, 0 Broadcasts, 0 Unicasts
     0 throttles, 0 discarded, 0 collisions, 0 wreddrops
Rate info (interval 299 seconds):
     Input 00.00 Mbits/sec,          0 packets/sec, 0.00% of line-rate
     Output 00.00 Mbits/sec,          0 packets/sec, 0.00% of line-rate
Time since last interface status change: 00:16:09

I mean, clearly it is energizing the cable, the port is up, the line speed is 100G… what could the issue be?

Hello

I would suggest you try to manually set speed to 100G and enable FEC on both the switch and the NIC side

Make sure you have the latest VIB or driver pack in the ESXi side for ConnectX

You can also install MFT tools for ESXi to perform more advanced troubleshooting on the NIC side
https://docs.nvidia.com/networking/display/mftv4280/compilation+and+installation

To check mst device information in ESXi you can check this doc
https://docs.nvidia.com/networking/display/mftv4280/mst+synopsis+-+vmware

Then you can use mlxlink command utility to check what is the status of the link and negotiation for a specific NIC and any error codes (if detected) when trying to connect to switch.
https://docs.nvidia.com/networking/display/mftv4280/mlxlink+utility

/opt/mellanox/bin/mlxlink -d

DEV can be obtained from mst status output, for example

[root@hx3320-s7:~] /opt/mellanox/bin/mst status
MST devices:

------------

mt4125_pciconf6

mt4123_pciconf1

mt4127_pciconf2

mt4123_pciconf3

mt4125_pciconf4

mt4121_pciconf5

[root@hx3320-s7:~] /opt/mellanox/bin/mlxlink -d mt4123_pciconf1

If this question will require more in-depth investigation than is possible through a forum. A support request would be the best way to resolve this issue.

If I had an active support contract, I wouldn’t be in the forums, ha ha. 😄

So perhaps, you’d think, the problem is at the card level, and not at the switch?

The cards link to one another at 100G with no issue with these cables…

What settings on them should I explore changing so they would link to the switch?

The switch is set to auto-MDIX, so even if the Server cards were MDI, it should work.

Although I’m not sure QSFP28 100G DAC even support the MDI / MDIX model…

The Mellanox cards obtain link and auto-negotiate with one another to 100G with no issue when plugged card-to-card and server-to-server, but they don’t get link when plugged into the switch itself.

Hi!

Not sure if you have tried to hard code the settings for Speed and FEC. Mellanox NVIDIA nic cards implement auto negotiation and we test NIC to NIC connections, which explain how it works between them. Dell might impement their own interpretation of auto negotiation.

Also ensure you are using MFT in your ESXi server to ensure you have one of the latest drivers in case this is known interop issue.

I do currently have FEC (CL91) enabled on all of the Switch’s 100G ports, even though with Direct-Attach Cables within the same rack, IEEE recommends it not be used, as it will introduce pointless latency trying to track and correct forwarding errors that are not present.

For auto-negotiation, I would think it only natural that all Switch and NIC manufacturers (Cisco, Dell, Mellanox, Nvidia) would be following the same protocol, but it’s not like I have any way of changing to a different protocol if they do, and the Mellanox / Nvidia cards are supposed to be compatible with this switch.

I’ll poke deeper, and see if maybe the answer lays in some of the NIC settings.

I logged into one of the servers, which has two Mellanox 100G cards in it, and pulled the configuration of one of them (a ConnectX-4 VPI adapter card with EDR IB (100Gb/s) and 100GbE), the results of which are shown below.

The question is, what settings should I change?

[root@localhost:/opt/mellanox/bin] ./mlxconfig -d mt4115_pciconf0 query

Device #1:
----------

Device type:    ConnectX4
Name:           MCX455A-ECA_Ax
Description:    ConnectX-4 VPI adapter card; EDR IB (100Gb/s) and 100GbE; single-port QSFP28; PCIe3.0 x16; ROHS R6
Device:         mt4115_pciconf0

Configurations:                                      Next Boot
        MEMIC_BAR_SIZE                              0
        MEMIC_SIZE_LIMIT                            _256KB(1)
        FLEX_PARSER_PROFILE_ENABLE                  0
        FLEX_IPV4_OVER_VXLAN_PORT                   0
        ROCE_NEXT_PROTOCOL                          254
        NON_PREFETCHABLE_PF_BAR                     False(0)
        VF_VPD_ENABLE                               False(0)
        STRICT_VF_MSIX_NUM                          False(0)
        VF_NODNIC_ENABLE                            False(0)
        NUM_PF_MSIX_VALID                           True(1)
        NUM_OF_VFS                                  8
        NUM_OF_PF                                   1
        FPP_EN                                      True(1)
        SRIOV_EN                                    True(1)
        PF_LOG_BAR_SIZE                             5
        VF_LOG_BAR_SIZE                             1
        NUM_PF_MSIX                                 63
        NUM_VF_MSIX                                 11
        INT_LOG_MAX_PAYLOAD_SIZE                    AUTOMATIC(0)
        PCIE_CREDIT_TOKEN_TIMEOUT                   0
        PARTIAL_RESET_EN                            False(0)
        SW_RECOVERY_ON_ERRORS                       False(0)
        RESET_WITH_HOST_ON_ERRORS                   False(0)
        PCI_DOWNSTREAM_PORT_OWNER                   Array[0..15]
        CQE_COMPRESSION                             BALANCED(0)
        IP_OVER_VXLAN_EN                            False(0)
        MKEY_BY_NAME                                False(0)
        UCTX_EN                                     True(1)
        PCI_ATOMIC_MODE                             PCI_ATOMIC_DISABLED_EXT_ATOMIC_ENABLED(0)
        TUNNEL_ECN_COPY_DISABLE                     False(0)
        LRO_LOG_TIMEOUT0                            6
        LRO_LOG_TIMEOUT1                            7
        LRO_LOG_TIMEOUT2                            8
        LRO_LOG_TIMEOUT3                            13
        TX_SCHEDULER_BURST                          0
        LOG_DCR_HASH_TABLE_SIZE                     14
        MAX_PACKET_LIFETIME                         0
        DCR_LIFO_SIZE                               16384
        LINK_TYPE_P1                                ETH(2)
        ROCE_CC_PRIO_MASK_P1                        255
        CLAMP_TGT_RATE_AFTER_TIME_INC_P1            True(1)
        CLAMP_TGT_RATE_P1                           False(0)
        RPG_TIME_RESET_P1                           300
        RPG_BYTE_RESET_P1                           32767
        RPG_THRESHOLD_P1                            1
        RPG_MAX_RATE_P1                             0
        RPG_AI_RATE_P1                              5
        RPG_HAI_RATE_P1                             50
        RPG_GD_P1                                   11
        RPG_MIN_DEC_FAC_P1                          50
        RPG_MIN_RATE_P1                             1
        RATE_TO_SET_ON_FIRST_CNP_P1                 0
        DCE_TCP_G_P1                                1019
        DCE_TCP_RTT_P1                              1
        RATE_REDUCE_MONITOR_PERIOD_P1               4
        INITIAL_ALPHA_VALUE_P1                      1023
        MIN_TIME_BETWEEN_CNPS_P1                    0
        CNP_802P_PRIO_P1                            6
        CNP_DSCP_P1                                 48
        LLDP_NB_DCBX_P1                             False(0)
        LLDP_NB_RX_MODE_P1                          OFF(0)
        LLDP_NB_TX_MODE_P1                          OFF(0)
        ROCE_RTT_RESP_DSCP_P1                       0
        ROCE_RTT_RESP_DSCP_MODE_P1                  DEVICE_DEFAULT(0)
        DCBX_IEEE_P1                                True(1)
        DCBX_CEE_P1                                 True(1)
        DCBX_WILLING_P1                             True(1)
        KEEP_ETH_LINK_UP_P1                         True(1)
        KEEP_IB_LINK_UP_P1                          False(0)
        KEEP_LINK_UP_ON_BOOT_P1                     False(0)
        KEEP_LINK_UP_ON_STANDBY_P1                  False(0)
        DO_NOT_CLEAR_PORT_STATS_P1                  False(0)
        AUTO_POWER_SAVE_LINK_DOWN_P1                False(0)
        NUM_OF_VL_P1                                _4_VLs(3)
        NUM_OF_TC_P1                                _8_TCs(0)
        NUM_OF_PFC_P1                               8
        VL15_BUFFER_SIZE_P1                         0
        DUP_MAC_ACTION_P1                           LAST_CFG(0)
        SRIOV_IB_ROUTING_MODE_P1                    LID(1)
        IB_ROUTING_MODE_P1                          LID(1)
        PCI_WR_ORDERING                             per_mkey(0)
        MULTI_PORT_VHCA_EN                          False(0)
        PORT_OWNER                                  True(1)
        ALLOW_RD_COUNTERS                           True(1)
        RENEG_ON_CHANGE                             True(1)
        TRACER_ENABLE                               True(1)
        IP_VER                                      IPv4(0)
        BOOT_UNDI_NETWORK_WAIT                      0
        UEFI_HII_EN                                 False(0)
        BOOT_DBG_LOG                                False(0)
        UEFI_LOGS                                   DISABLED(0)
        BOOT_VLAN                                   1
        LEGACY_BOOT_PROTOCOL                        PXE(1)
        BOOT_INTERRUPT_DIS                          False(0)
        BOOT_LACP_DIS                               True(1)
        BOOT_VLAN_EN                                False(0)
        BOOT_PKEY                                   0
        P2P_ORDERING_MODE                           DEVICE_DEFAULT(0)
        DYNAMIC_VF_MSIX_TABLE                       False(0)
        EXP_ROM_UEFI_ARM_ENABLE                     False(0)
        EXP_ROM_UEFI_x86_ENABLE                     False(0)
        EXP_ROM_PXE_ENABLE                          True(1)
        ADVANCED_PCI_SETTINGS                       False(0)
        SAFE_MODE_THRESHOLD                         10
        SAFE_MODE_ENABLE                            True(1)

The card can see the inserted cable, and it shows that it supports 100G. The card has been set to forced-100G speed as well, so it’s not trying to auto-negotiate. It’s polling, but there is no link.