New to InfiniBand, so we are unable to configure to communicate between server and switch

Dear Member

I am new to InfiniBand I have a problem I have 7 server and one Mellanox switch we need to configure ethernet over InfiniBand

As I am new to InfiniBand technologies, so please help us to sort out the issues as below.

Installation steps

  • We had 7 no of servers with one of ConnectX-3 Single Port Card in each server.
  • I have one number msx6005f-1bfs switch
  • In each server operating system is centos 7.4
  • I am mark in checkbox to install InfiniBand supporting software during OS installation.

Issue we are facing

  • All server connected to InfiniBand switch but only number one port orange led is glowing in switch end
  • We are unable to configure IB over Ethernet

Please provide the steps to configure the InfiniBand Switch & communicate between Server & Switch for High Performance Computing(HPC).

HI Team,

i am configure infiniband OFED Driver after install

ibstat

CA ‘mlx4_0’

CA type: MT4099

Number of ports: 1

Firmware version: 2.36.5000

Hardware version: 1

Node GUID: 0xe41d2d03004e8520

System image GUID: 0xe41d2d03004e8523

Port 1:

State: Down

Physical state: Polling

Rate: 10

Base lid: 0

LMC: 0

SM lid: 0

Capability mask: 0x0251486a

Port GUID: 0xe41d2d03004e8521

Link layer: InfiniBand

and also find open sm log

cat /var/log/opensm.log

Oct 26 12:55:55 152889 [31773740] 0x03 → OpenSM 4.9.0.MLNX20170607.280b8f7

OpenSM 4.9.0.MLNX20170607.280b8f7

Oct 26 12:55:55 152971 [31773740] 0x80 → OpenSM 4.9.0.MLNX20170607.280b8f7

No local ports detected!

Oct 26 12:55:55 199919 [31773740] 0x02 → osm_vendor_init: 1000 pending umads specified

Oct 26 12:55:55 199989 [31773740] 0x02 → osm_vendor_init: 1000 pending umads specified

Oct 26 13:02:00 186218 [C18E4740] 0x03 → OpenSM 4.9.0.MLNX20170607.280b8f7

OpenSM 4.9.0.MLNX20170607.280b8f7

Oct 26 13:02:00 195147 [C18E4740] 0x80 → OpenSM 4.9.0.MLNX20170607.280b8f7

Using default GUID 0xe41d2d03004e8521

Entering DISCOVERING state

Oct 26 13:02:00 205192 [C18E4740] 0x02 → osm_vendor_init: 1000 pending umads specified

Oct 26 13:02:00 205269 [C18E4740] 0x02 → osm_vendor_init: 1000 pending umads specified

Oct 26 13:02:00 251984 [C18E4740] 0x80 → Entering DISCOVERING state

SM port is down

Oct 26 13:02:00 252197 [C18E4740] 0x02 → osm_vendor_bind: Mgmt class 0x81 binding to port GUID 0xe41d2d03004e8521

Oct 26 13:02:00 295259 [C18E4740] 0x02 → osm_vendor_bind: Mgmt class 0x03 binding to port GUID 0xe41d2d03004e8521

Oct 26 13:02:00 338453 [C18E4740] 0x02 → osm_vendor_bind: Mgmt class 0x04 binding to port GUID 0xe41d2d03004e8521

Oct 26 13:02:00 338491 [C18E4740] 0x02 → osm_vendor_bind: Mgmt class 0x21 binding to port GUID 0xe41d2d03004e8521

Oct 26 13:02:00 338525 [C18E4740] 0x02 → osm_opensm_bind: Setting IS_SM on port 0xe41d2d03004e8521

Oct 26 13:02:00 340619 [A605A700] 0x80 → SM port is down

OpenSM: Got signal 15 - exiting…

Exiting SM

pleas suggest