We have just racked up 16 new Dell compute nodes with Infiniband cards that identify as MT28908:
e2:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
These are connected to an unmanaged FDR-capable switch identifying as a ‘SwitchX - Mellanox Technologies’.
All the new nodes only achieve SDR speeds:
[root@node098 ~]# ibstatus
Infiniband device ‘mlx5_0’ port 1 status:
default gid: fe80:0000:0000:0000:1c34:da03:0050:6600
base lid: 0x2c4
sm lid: 0xa8
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 10 Gb/sec (4X SDR)
link_layer: InfiniBand
Old nodes attached to the same switch get FDR.
I’ve upgraded the card firmware without affecting anything. mlxlink says
[root@node098 ~]# mlxlink -d e2:00.0
Operational Info
State : Active
Physical state : LinkUp
Speed : IB-SDR
Width : 4x
FEC : Firecode FEC
Loopback Mode : No Loopback
Auto Negotiation : ON
Supported Info
Enabled Link Speed : 0x00000011 (FDR,SDR)
Supported Cable Speed : 0x0000001f (FDR,FDR10,QDR,DDR,SDR)
Troubleshooting Info
Status Opcode : 35
Group Opcode : PHY FW
Recommendation : The active speed was degraded from maximal possible speed due to peer signal integrity issue.
Any suggestions for how to troubleshoot this?