How to connect MT25408A0 to an MQM8700 switch?

Hello,

we have a bunch of old hardware, which still needs running (common scenario ;)
Last week the HPC cluster was rebuild - aka all machines out of the rack, some old ones to garbage, some old ones reused in the rack.

Just the main facts in a few words, I’m eager to hear your tips and hints:

Before:

computenode with MT25408A0 was connected to an IB switch SX6012. worked fine.

After

We now have a 40port HDR switch MQM8700 and the MT25408A0 device remains in state ‘Polling’

Technical Details:

[root@w6 ~]# lsb_release -d
Description: CentOS Linux release 7.9.2009 (Core)

[root@w6 ~]# rpm -qf /usr/sbin/ibstatus
infiniband-diags-2.1.0-1.el7.x86_64

[root@w6 ~]# lspci -v

06:00.0 InfiniBand: Mellanox Technologies MT25408A0-FCC-QI ConnectX, Dual Port 40Gb/s InfiniBand / 10GigE Adapter IC with PCIe 2.0 x8 5.0GT/s In… (rev b0)
Subsystem: Mellanox Technologies MT25408A0-FCC-QI ConnectX, Dual Port 40Gb/s InfiniBand / 10GigE Adapter IC with PCIe 2.0 x8 5.0GT/s Interface
Flags: bus master, fast devsel, latency 0, IRQ 16, NUMA node 0
Memory at cf300000 (64-bit, non-prefetchable) [size=1M]
Memory at c2800000 (64-bit, prefetchable) [size=8M]
Capabilities: [40] Power Management version 3
Capabilities: [48] Vital Product Data
Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
Capabilities: [60] Express Endpoint, MSI 00
Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
Capabilities: [148] Device Serial Number 00-02-c9-03-00-28-86-f6
Kernel driver in use: mlx4_core
Kernel modules: mlx4_core

[root@w6 ~]# ibstatus| egrep ‘device|state|rate’; ibstat|egrep ‘CA|Firmware|Hardware|State|Rate’
Infiniband device ‘mlx4_0’ port 1 status:
state: 1: DOWN
phys state: 2: Polling
rate: 10 Gb/sec (4X)
CA ‘mlx4_0’
CA type: MT26428
Firmware version: 2.9.1000
Hardware version: b0
State: Down
Rate: 10

If we connect the node to the old SX6012 switch, which is connected also with the new switch, it works. But we want to get rid of the old switch.

So how could we proceed?

Best regards
Joe

The devices are not tested together as the CX3 is very old.

What’s is the cable used for this?
Can you please run mlxlink on the switch port with -m -e -c flags?

It can be run inband with the switch device addressed by its lid.

E.g.

Mlxlink -d lid-23 -p 14 -m -e -c

Did this from another machine connected:

[root@w10 ~]# mst status
MST modules:

MST PCI module is not loaded
MST PCI configuration module is not loaded

PCI Devices:

45:00.0

Inband devices:

/dev/mst/CA_MT26428_taede147_mlx4_0_lid-0x0001
/dev/mst/CA_MT26428_w1_mlx4_0_lid-0x0012
/dev/mst/CA_MT26428_w3_mlx4_0_lid-0x0013
/dev/mst/CA_MT4099_c25-ib0_mlx4_0_lid-0x0043
/dev/mst/CA_MT4099_c26-ib0_mlx4_0_lid-0x0009
/dev/mst/CA_MT4099_c27-ib0_mlx4_0_lid-0x0008
/dev/mst/CA_MT4099_c28-ib0_mlx4_0_lid-0x0002
/dev/mst/CA_MT4099_c29-ib0_mlx4_0_lid-0x0006
/dev/mst/CA_MT4099_c30-ib0_mlx4_0_lid-0x0004
/dev/mst/CA_MT4099_c31-ib0_mlx4_0_lid-0x0007
/dev/mst/CA_MT4099_c32-ib0_mlx4_0_lid-0x0005
/dev/mst/CA_MT4099_w10_mlx4_0_lid-0x0045
/dev/mst/CA_MT4099_w8_mlx4_0_lid-0x0014
/dev/mst/CA_MT4099_w9_mlx4_0_lid-0x0016
/dev/mst/CA_MT4123_c34_mlx5_0_lid-0x0017
/dev/mst/CA_MT4123_MT4123_ConnectX6___Mellanox_Technologies_lid-0x0011
/dev/mst/CA_MT53001_Mellanox_Technologies_Aggregation_Node_lid-0x0010
/dev/mst/SW_MT51000_switch-9c1bc6"_lid-0x0003
/dev/mst/SW_MT54000_switch-742018"_lid-0x0015

Cables:

CA_MT26428_taede147_mlx4_0_lid-0x0001,mlx4_0,1_cable
CA_MT26428_w1_mlx4_0_lid-0x0012,mlx4_0,1_cable
CA_MT4099_c26-ib0_mlx4_0_lid-0x0009,mlx4_0,1_cable
CA_MT4099_c27-ib0_mlx4_0_lid-0x0008,mlx4_0,1_cable
CA_MT4099_c28-ib0_mlx4_0_lid-0x0002,mlx4_0,1_cable
CA_MT4099_c29-ib0_mlx4_0_lid-0x0006,mlx4_0,1_cable
CA_MT4099_c30-ib0_mlx4_0_lid-0x0004,mlx4_0,1_cable
CA_MT4099_c31-ib0_mlx4_0_lid-0x0007,mlx4_0,1_cable
CA_MT4099_c32-ib0_mlx4_0_lid-0x0005,mlx4_0,1_cable
CA_MT4123_MT4123_ConnectX6___Mellanox_Technologies_lid-0x0011,mlx4_0,1_cable
SW_MT51000_switch-9c1bc6"_lid-0x0003,mlx4_0,1_cable_1
SW_MT51000_switch-9c1bc6"_lid-0x0003,mlx4_0,1_cable_11
SW_MT51000_switch-9c1bc6"_lid-0x0003,mlx4_0,1_cable_2
SW_MT51000_switch-9c1bc6"_lid-0x0003,mlx4_0,1_cable_4
SW_MT51000_switch-9c1bc6"_lid-0x0003,mlx4_0,1_cable_7

That’s the MQM8700:

/dev/mst/SW_MT54000_switch-742018"_lid-0x0015

The machine with MT25408A0 which isn’t working is on port 28:

[root@w10 ~]# mlxlink -d lid-0x0015 -p 28 -m -e -c
ibwarn: [179208] _do_madrpc: recv failed: Connection timed out
ibwarn: [179208] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 21)
ibwarn: [179208] _do_madrpc: recv failed: Connection timed out
ibwarn: [179208] mad_rpc: _do_madrpc failed; dport (Lid 21)
ibwarn: [179208] _do_madrpc: recv failed: Connection timed out
ibwarn: [179208] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 21)
-E- ibvsmad : cr access read to Lid 21 failed
FATAL - crspace read (0xf0014) failed: Invalid argument

-E- Failed to read device ID

Lid requires decimal value… in your case it will be 21(0x15)

Try lid-21.

And if that isn’t working please try without any flags (no -m/e/c)

ALSO

If this is a managed switch you can enter the CLI and run this from the fae menu (en – conf term – fae mlxlink…)

Actually I’ve got the wrong lid.

[root@w10 ~]# ibswitches
Switch : 0x98039b0300ff9a80 ports 12 “MF0;switch-9c1bc6:SX6012/U1” enhanced port 0 lid 3 lmc 0
Switch : 0x1070fd03008a0f9e ports 41 “MF0;switch-742018:MQM8700/U1” enhanced port 0 lid 15 lmc 0

[root@w10 ~]# mlxlink -d lid-15 -p 28 -m -e -c

Operational Info

State : Polling
Physical state : ETH_AN_FSM_ENABLE
Speed : N/A
Width :N/A
FEC : N/A
Loopback Mode : No Loopback
Auto Negotiation : ON

Supported Info

Enabled Link Speed : 0x00000005 (QDR,SDR)
Supported Cable Speed : 0x00000007 (QDR,DDR,SDR)

Troubleshooting Info

Status Opcode : 2
Group Opcode : PHY FW
Recommendation : Auto-negotiation no partner detected.

Tool Information

Firmware Version : 27.2010.1202
amBER Version : 1.64
MFT Version : mft 4.18.0-106

Module Info

Identifier : QSFP+
Compliance : N/A
Cable Technology : Copper cable unequalized
Cable Type : Passive copper cable
OUI : Mellanox
Vendor Name : Mellanox
Vendor Part Number : MC2206130-003
Vendor Serial Number : MT1529VS05052
Rev : A3
Wavelength [nm] : N/A
Transfer Distance [m] : 3
Attenuation (5g,7g,12g) [dB] : 11,0,0
FW Version : N/A
Digital Diagnostic Monitoring : No
Power Class : N/A
CDR RX : N/A
CDR TX : N/A
LOS Alarm : N/A
Temperature [C] : N/A
Voltage [mV] : N/A
Bias Current [mA] : N/A
Rx Power Current [dBm] : N/A
Tx Power Current [dBm] : N/A
IB Cable Width : 1x,2x,4x
Memory Map Revision : 0
Linear Direct Drive : 0
Cable Breakout : Channels implemented [1,2,3,4]/Far end is unspecified
SMF Length : N/A
MAX Power : 0
Cable Rx AMP : N/A
Cable Rx Emphasis : N/A
Cable Rx Post Emphasis : N/A
Cable Tx Equalization : N/A
Wavelength Tolerance : N/A
Module State : N/A
DataPath state [per lane] : N/A,N/A,N/A,N/A
Rx Output Valid : 0,0,0,0
Rx Input Valid : 0,0,0,0
Nominal bit rate : 0.000Gb/s
Rx Power Type : OMA
Manufacturing Date : 16_07_15
Active Set Host Compliance Code : N/A
Active Set Media Compliance Code: N/A
Error Code Response : N/A
Module FW Fault : N/A
DataPath FW Fault : N/A
Tx Fault [per lane] : N/A
Tx LOS [per lane] : N/A
Tx CDR LOL [per lane] : N/A
Rx LOS [per lane] : N/A
Rx CDR LOL [per lane] : N/A
Tx Adaptive EQ Fault [per lane] : N/A

EYE Opening Info

Physical Grade : 0, 0, 0, 0
Height Eye Opening [mV] : N/A, N/A, N/A, N/A
Phase Eye Opening [psec] : N/A, N/A, N/A, N/A

Physical Counters and BER Info

Time Since Last Clear [Min] : N/A
Symbol Errors : N/A
Symbol BER : N/A
Effective Physical Errors : N/A
Effective Physical BER : N/A
Raw Physical Errors Per Lane : N/A
Raw Physical BER : N/A
Link Down Counter : N/A
Link Error Recovery Counter : N/A

Are you using the same cable to connect the SX6012 to the QM switch?
Can you please share the same output mlxlink output when you connect it to the SX6012?

I couldn’t guarantee for the cable and at the moment I have no colleague at the cluster, but I have another of the old ones with the same IB CA on SX6012:

[root@w10 ~]# ibnetdiscover

vendid=0x2c9
devid=0xc738
sysimgguid=0x98039b0300ff9a80
switchguid=0x98039b0300ff9a80(98039b0300ff9a80)
Switch 12 “S-98039b0300ff9a80” # “MF0;switch-9c1bc6:SX6012/U1” enhanced port 0 lid 3 lmc 0
[2] "H-0002c903000e2c0e"1 # “taede147 mlx4_0” lid 1 4xQDR
[4] "H-0002c90300289e38"1 # “w3 mlx4_0” lid 13 4xQDR
[7] "H-0002c903000cf350"1 # “w1 mlx4_0” lid 12 4xQDR
[11] “S-1070fd03008a0f9e”[40] # “MF0;switch-742018:MQM8700/U1” lid 15 4xSDR

[root@w10 ~]# ibswitches
Switch : 0x98039b0300ff9a80 ports 12 “MF0;switch-9c1bc6:SX6012/U1” enhanced port 0 lid 3 lmc 0
Switch : 0x1070fd03008a0f9e ports 41 “MF0;switch-742018:MQM8700/U1” enhanced port 0 lid 15 lmc 0

Unfortunately I get this:
[root@w10 ~]# mlxlink -d lid-3 -p 4 -m -e -c

-E- Device is not supported

[root@w10 ~]# rpm -qf /usr/bin/mlxlink
mft-4.18.0-106.x86_64

[root@w10 ~]# rpm -qi mft
Name : mft
Version : 4.18.0
Release : 106
Architecture: x86_64
Install Date: Fr 18 Feb 2022 04:41:36 CET
Group : System Environment/Base
Size : 187144723
License : Proprietary
Signature : DSA/SHA1, So 28 Nov 2021 16:54:56 CET, Key ID c5ed83e26224c050
Source RPM : mft-4.18.0-106.src.rpm
Build Date : So 28 Nov 2021 09:24:39 CET
Build Host : appsbuild-03-03.mtl.labs.mlnx
Relocations : /usr /etc
Packager : Omer Dagan omerd@mellanox.com
Vendor : Mellanox Technologies Ltd.
Summary : Mellanox firmware tools
Description :
Mellanox firmware tools

Next Tuesday (we have holiday on monday) I have the customer in place to check the cable.

To summarize –

The Quantum switch doesn’t support QDR rates (was never interop tested vs. the EOL ConnectX-2 devices)

  • It supports SDR/FDR/EDR/HDR.
  • If you are using a QDR cable on the link between SX6012 port#11 / Quantum port#40, it explains the reason the link is up in SDR.
  • You can run mlxlink on port#40 of the Quantum switch to see the cable details.
  • As you can see from the error print below, mlxlink is supported on SwitchIB generation onwards. Isn’t supported for SwitchX.

To continue using those legacy devices – I would keep connecting them with the SX6012 and in turn, connect the SX6012 to the Quantum switch using FDR cables.

Kinds Regards,

Dan

Hi Dan,
first of all thank you very much.
Some new questions in this context appeared.

Exemplary three machines: c27, c28, w1

############ c27 ###########
3b:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
Infiniband device ‘mlx4_0’ port 1 status:
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 56 Gb/sec (4X FDR)
CA ‘mlx4_0’
CA type: MT4099
Firmware version: 2.42.5000
Hardware version: 1
State: Active
Rate: 56
############ c28 ###########
3b:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
Infiniband device ‘mlx4_0’ port 1 status:
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 56 Gb/sec (4X FDR)
CA ‘mlx4_0’
CA type: MT4099
Firmware version: 2.42.5000
Hardware version: 1
State: Active
Rate: 56
############ w1 ###########
01:00.0 InfiniBand: Mellanox Technologies MT25408A0-FCC-QI ConnectX, Dual Port 40Gb/s InfiniBand / 10GigE Adapter IC with PCIe 2.0 x8 5.0GT/s In… (rev b0)
Infiniband device ‘mlx4_0’ port 1 status:
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 40 Gb/sec (4X QDR)
CA ‘mlx4_0’
CA type: MT26428
Firmware version: 2.9.1000
Hardware version: b0
State: Active
Rate: 40

ibswitches

Switch : 0x1070fd03008a0f9e ports 41 “MF0;switch-742018:MQM8700/U1” enhanced port 0 lid 15 lmc 0
Switch : 0x98039b0300ff9a80 ports 12 “MF0;switch-9c1bc6:SX6012/U1” enhanced port 0 lid 3 lmc 0

ibnetdiscover

Switch 41 S-1070fd03008a0f9e # MF0;switch-742018:MQM8700/U1 enhanced port 0 lid 15 lmc 0
[7] H-f4521403003e36d01 # w9 mlx4_0 lid 16 4xQDR
[10] H-98039b0300e25ba01 # c29-ib0 mlx4_0 lid 6 4xFDR
[11] H-98039b0300e263c01 # c31-ib0 mlx4_0 lid 7 4xFDR
[12] H-98039b0300e265901 # c32-ib0 mlx4_0 lid 5 4xFDR
[13] H-98039b0300e25f901 # c30-ib0 mlx4_0 lid 4 4xFDR
[14] H-98039b0300d599b01 # w8 mlx4_0 lid 12 4xFDR
[15] H-98039b0300d5a3501 # w10 mlx4_0 lid 45 4xFDR
[16] H-98039b0300e260201 # c28-ib0 mlx4_0 lid 2 4xFDR
[17] H-98039b0300e265801 # c25-ib0 mlx4_0 lid 14 4xFDR
[18] H-98039b0300e269f01 # c27-ib0 mlx4_0 lid 8 4xFDR
[19] H-98039b0300e269b01 # c26-ib0 mlx4_0 lid 9 4xFDR
[25] H-b8cef60300a7fb341 # c34 mlx5_0 lid 17 4x???
[27] H-b8cef60300a7ebcc1 # MT4123 ConnectX6 Mellanox Technologies lid 11 4x???
[40] S-98039b0300ff9a80[11] # MF0;switch-9c1bc6:SX6012/U1 lid 3 4xQDR
[41] H-1070fd03008a0fa61 # Mellanox Technologies Aggregation Node lid 10 4x???

vendid=0x2c9
devid=0xc738
sysimgguid=0x98039b0300ff9a80
switchguid=0x98039b0300ff9a80(98039b0300ff9a80)
Switch 12 “S-98039b0300ff9a80” # “MF0;switch-9c1bc6:SX6012/U1” enhanced port 0 lid 3 lmc 0
[2] "H-0002c903000e2c0e"1 # “taede147 mlx4_0” lid 1 4xQDR
[4] "H-0002c90300289e38"1 # “w3 mlx4_0” lid 13 4xQDR
[7] "H-0002c903000cf350"1 # “w1 mlx4_0” lid 18 4xQDR
[11] “S-1070fd03008a0f9e”[40] # “MF0;switch-742018:MQM8700/U1” lid 15 4xQDR

When we start a StarCCM job from w1 choosing c27 as computenode it’s “fast”.
When we start a StarCCM job from w1 choosing c28 as computenode it’s “fast”.
When we start a StarCCM job from w1 choosing c27 and c28 as computenodes it’s "it’s slow, almost stalled.

In the past this worked fast.
Without more details depending StarCCM setup, the question is:
Do we need some special configuration on the Quantum switch in reference to the differrent adapters and cables connected?

Do we need to configure a new / an extra SM on the Quantum switch?
We still have a SM running on one of the old nodes:

[root@c27-ib0 ~]# rpm -qf /usr/sbin/opensm
opensm-3.3.21-3.el7_8.x86_64

[root@c27-ib0 ~]# ps auxww | grep opensm
root 4066 0.0 0.0 115404 576 ? S Mai01 0:00 /bin/bash /usr/libexec/opensm-launch
root 4067 0.0 0.0 2101160 1952 ? Sl Mai01 0:08 /usr/sbin/opensm

Best regards
Joe

We could fix the described StarCCM Problem by adding this options:

… -mpidriver openmpi -mpiflags “–mca routed direct” -fabric ucx …

SOLVED

With some “plug and pray” we made it with the right cables and putting the old Connect X-2 to the old switch to a running system:

endid=0x2c9
devid=0xd2f0
sysimgguid=0x1070fd03008a0f9e
switchguid=0x1070fd03008a0f9e(1070fd03008a0f9e)
Switch 41 “S-1070fd03008a0f9e” # “MF0;switch-742018:MQM8700/U1” enhanced port 0 lid 15 lmc 0
[7] "H-f4521403003e36d0"1 # “w9 mlx4_0” lid 16 4xQDR
[10] "H-98039b0300e25ba0"1 # “c29-ib0 mlx4_0” lid 6 4xFDR
[11] "H-98039b0300e263c0"1 # “c31-ib0 mlx4_0” lid 7 4xFDR
[12] "H-98039b0300e26590"1 # “c32-ib0 mlx4_0” lid 5 4xFDR
[13] "H-98039b0300e25f90"1 # “c30-ib0 mlx4_0” lid 4 4xFDR
[14] "H-98039b0300d599b0"1 # “w8 mlx4_0” lid 12 4xFDR
[15] "H-98039b0300d5a350"1 # “w10 mlx4_0” lid 45 4xFDR
[16] "H-98039b0300e26020"1 # “c28-ib0 mlx4_0” lid 2 4xFDR
[17] "H-98039b0300e26580"1 # “c25-ib0 mlx4_0” lid 14 4xFDR
[18] "H-98039b0300e269f0"1 # “c27-ib0 mlx4_0” lid 8 4xFDR
[19] "H-98039b0300e269b0"1 # “c26-ib0 mlx4_0” lid 9 4xFDR
[25] "H-b8cef60300a7fb34"1 # “c34 mlx5_0” lid 17 4xHDR
[27] "H-b8cef60300a7ebcc"1 # “MT4123 ConnectX6 Mellanox Technologies” lid 11 4xHDR
[40] “S-98039b0300ff9a80”[11] # “MF0;switch-9c1bc6:SX6012/U1” lid 3 4xQDR
[41] "H-1070fd03008a0fa6"1 # “Mellanox Technologies Aggregation Node” lid 10 4xHDR

vendid=0x2c9
devid=0xc738
sysimgguid=0x98039b0300ff9a80
switchguid=0x98039b0300ff9a80(98039b0300ff9a80)
Switch 12 “S-98039b0300ff9a80” # “MF0;switch-9c1bc6:SX6012/U1” enhanced port 0 lid 3 lmc 0
[1] "H-0002c903000e298a"1 # “c22 mlx4_0” lid 20 4xQDR
[2] "H-0002c903000e2c0e"1 # “taede147 mlx4_0” lid 1 4xQDR
[3] "H-0002c903000e297e"1 # “c23 mlx4_0” lid 22 4xQDR
[4] "H-0002c90300289e38"1 # “w3 mlx4_0” lid 13 4xQDR
[5] "H-0002c903000e27fe"1 # “c24 mlx4_0” lid 23 4xQDR
[6] "H-0002c903000e2992"1 # “c21 mlx4_0” lid 21 4xQDR
[7] "H-0002c903000cf350"1 # “w1 mlx4_0” lid 18 4xQDR
[8] "H-0002c903000e2bee"1 # “taede146 mlx4_0” lid 19 4xQDR
[9] "H-0002c903002886f6"1 # “w6 mlx4_0” lid 24 4xQDR
[11] “S-1070fd03008a0f9e”[40] # “MF0;switch-742018:MQM8700/U1” lid 15 4xQDR

Thanks again Dan!

CU
Joe

Happy to hear all is fixed.

Best regards,

Dan

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.