I have two servers each with 1 infiniband nic connected to an infiniband switch running subnet manager, MQM8700-HS2F.
I see that the physical port state is linkUp, but the state is down.
When I try ib_send_bw, I see
user:~$ ib_send_bw
WARNING: BW peak won’t be measured in this run.
Port number 1 state is Down
Couldn’t set the link layer
Couldn’t get context for the device
Any help would be appreciated.
I have the following output from ibstat
CA ‘mlx5_1’
CA type: MT41692
Number of ports: 1
Firmware version: 32.42.1000
Hardware version: 1
Node GUID: 0x5c25730300e77133
System image GUID: 0x5c25730300e77132
Port 1:
State: Down
Physical state: LinkUp
Rate: 200
Base lid: 65535
LMC: 0
SM lid: 1
Capability mask: 0xa751ec48
Port GUID: 0x5c25730300e77133
Link layer: InfiniBand
And I ran ibdiagnet on the switch with the following output:
Running version: “IBDIAGNET 2.10.0.MLNX20220720.cd746c3”,“IBDIAG 2.1.1.cd746c3”,“IBDM 2.1.1.cd746c3”,“IBIS 7.0.0.c25850e”
Running command: /usr/bin/ibdiagnet
Running timestamp: 2024-11-07 14:05:38 UTC +0000
Switch label port numbering explanation:
Quantum2 switch split mode: ASIC/Cage/Port/Split, e.g 1/1/1/1
Quantum2 switch no split mode: ASIC/Cage/Port
Quantum switch split mode: Port/Split
Quantum switch no split mode: Port
Load Plugins from:
/usr/share/ibdiagnet2.1.1/plugins/
(You can specify more paths to be looked in with “IBDIAGNET_PLUGINS_PATH” env variable)
Plugin Name Result Comment
libibdiagnet_cable_diag_plugin-2.1.1 Succeeded Plugin loaded
libibdiagnet_phy_diag_plugin-2.1.1 Succeeded Plugin loaded
Discovery
-I- Start Fabric Discover
-I- Fill NodeDesc data
-I- NodeDesc finished successfully
-I- Fabric Discover finished successfully
-I- Fill PortInfo data
-I- PortInfo finished successfully
-I- No scope files. Total switches/ports [1/41], CAs/ports [2/2]
-I- Build VS Capability GMP
-I- VS Capability GMP finished successfully
-I- Build VS Capability SMP
-I- Build VS Capability FW Info SMP
-I- Build VS Capability Mask SMP
-I- VS Capability SMP finished successfully
-I- Build VS Extended Port Info
-I- VS ExtendedPortInfo finished successfully
-I- Build VS Port Info Extended
-I- Port Info Extended finished successfully
-I- Build Switch Info
-I- Switch Info retrieving finished successfully
-I- Build Hierarchy Info
-I- Hierarchy Info retrieving finished successfully
-I- Build AR Info
-I- AR Info retrieving finished successfully
-I- Duplicated GUIDs detection finished successfully
-W- Note: If you have unmanaged systems then duplication can occur
-W- Duplicated Node Description detection finished with warnings
-W- S5c25730300e77132/U2 - Node with GUID=0x5c25730300e77143 is configured with duplicated node description - localhost HCA-2
-W- S5c25730300e7fd52/U2 - Node with GUID=0x5c25730300e7fd63 is configured with duplicated node description - localhost HCA-2
-I- Port Hierarchy Info finished successfully
Lids Check
-I- Lids Check finished successfully
Links Check
-I- Links Check finished successfully
Subnet Manager
-I- SM Info retrieving finished successfully
-I- Subnet Manager Check finished successfully
Port Counters
-I- Build PMClassPortInfo
-I- Build PMPortSampleControl
-I- Build Port Counters
-I- Ports counters retrieving finished successfully
-I- RN counters retrieving finished successfully
-I- HBF counters retrieving finished successfully
-I- Going to sleep for 1 seconds until next counters sample
-I- Build Port Counters
-I- Ports counters retrieving (second time) finished successfully
-I- Ports counters value Check finished successfully
-I- Ports counters overflow value Check finished successfully
-I- pFRN Received Error check finished successfully
-I- Ports counters Difference Check (during run) finished successfully
-I- Ports counters delta check finished successfully
Nodes Information
-I- Devid: 41692(0xa2dc), PSID: MT_0000000884, Latest FW Version:32.42.1000
-I- Devid: 54000(0xd2f0), PSID: MT_0000000062, Latest FW Version:27.2010.5042
-I- FW Check finished successfully
Speed / Width checks
-I- Link Speed Check (Compare to supported link speed)
-I- Links Speed Check finished successfully
-I- Link Width Check (Compare to supported link width)
-I- Links Width Check finished successfully
Virtualization
-I- Build Virtualization Info DB
-I- Build VPort Info DB
-I- Build VPort Info DB
-I- Build VPort GUID Info DB
-I- Build VNode Info DB
-I- Build VPort PKey Table DB
-I- Build Node Description DB
-I- Virtualization finished successfully
-I- Virtual ports retrieving finished successfully
-I- Virtual ports retrieving finished successfully
Partition Keys
-I- Partition Keys retrieving finished successfully
-I- Partition Keys finished successfully
Temperature Sensing
-I- Temperature Sensing finished successfully
Routers
-I- Build Routers Info DB finished successfully
-I- Build Routers Tables finished successfully
Post Reports Generation
-I- Writing of IBNetdDscover file finished successfully
Fabric Summary
Total Nodes : 3
IB Switches : 1
IB Channel Adapters : 2
IB Aggregation Nodes : 0
IB Routers : 0
Adaptive Routing is enabled on 0 switches.
Hashed Based Forwarding is enabled on 0 switches.
Total number of links : 2
Links at 4x50 : 2
Master SM: Port=0 LID=1 GUID=0xa088c2030078685c devid=54000 Priority:0 Node_Type=SW Node_Description=MF0;snake0:MQM8700/U1
Standby SM : No Standby SM
Summary
-I- Stage Warnings Errors Comment
-I- Discovery 2 0
-I- Lids Check 0 0
-I- Links Check 0 0
-I- Subnet Manager 0 0
-I- Port Counters 0 0
-I- Nodes Information 0 0
-I- Speed / Width checks 0 0
-I- Virtualization 0 0
-I- Partition Keys 0 0
-I- Temperature Sensing 0 0
-I- Routers 0 0
-I- Post Reports Generation 0 0
-I- You can find detailed errors/warnings in: /var/tmp/ibdiagnet2/ibdiagnet2.log
-I- Database : /var/tmp/ibdiagnet2/ibdiagnet2.db_csv
-I- LST : /var/tmp/ibdiagnet2/ibdiagnet2.lst
-I- Network dump : /var/tmp/ibdiagnet2/ibdiagnet2.net_dump
-I- Subnet Manager : /var/tmp/ibdiagnet2/ibdiagnet2.sm
-I- Ports Counters : /var/tmp/ibdiagnet2/ibdiagnet2.pm
-I- RN counters 2 : /var/tmp/ibdiagnet2/ibdiagnet2.rnc2
-I- Nodes Information : /var/tmp/ibdiagnet2/ibdiagnet2.nodes_info
-I- VPorts : /var/tmp/ibdiagnet2/ibdiagnet2.vports
-I- VPorts Pkey : /var/tmp/ibdiagnet2/ibdiagnet2.vports_pkey
-I- Partition keys : /var/tmp/ibdiagnet2/ibdiagnet2.pkey
-I- IBNetDiscover : /var/tmp/ibdiagnet2/ibdiagnet2.ibnetdiscover
I see on the switch interface I plugged in has the following:
IB1/1 state:
Logical port state : Active
Physical port state : LinkUp
Current line rate : 200.0 Gbps
Supported speeds : sdr, qdr, fdr, edr, hdr
Speed : hdr
Supported widths : 1X, 2X, 4X
Width : 4X
Max supported MTUs : 4096
MTU : 4096
VL admin capabilities : VL0 - VL7
Operational VLs : VL0 - VL3
Description :
IB Subnet : infiniband-default
Phy-profile : high-speed-ber
Width reduction mode : Not supported
Telemetry sampling : Disabled
Telemetry threshold : Disabled
Telemetry record : Disabled
Telemetry threshold level: N/A bytes
RX:
Bytes : 5472
Packets : 19
Errors : 0
Symbol errors : 0
VL15 dropped packets: 0
TX:
Bytes : 5472
Packets : 19
Wait : 0
Discarded packets: 0