message "SUBNET UP" is not found in log files

I installed successfully MLNX_OFED_LINUX-2.2-1.0.1-rhel6.4-x86_64 on rhel6.4 machine with kernel 2.6.32-358.46.2.el6.x86_64.

Then I started two daemons

1.)/etc/init.d/openibd

2.)/etc/init.d/opensmd

If opensm was able to setup the subnet correctly then message “SUBNET UP” should seen in log files /var/log/opensm.log and /var/log/messages which is not found.

The log file /var/log/opensm.log contains :

Aug 11 02:26:50 235789 [4C8AF700] 0x03 → OpenSM 4.1.5.MLNX20140424.25abcb5

OpenSM 4.1.5.MLNX20140424.25abcb5

Aug 11 02:26:50 235869 [4C8AF700] 0x80 → OpenSM 4.1.5.MLNX20140424.25abcb5

Using default GUID 0xf4521403002abf01

Entering DISCOVERING state

Aug 11 02:26:50 242489 [4C8AF700] 0x02 → osm_vendor_init: 1000 pending umads specified

Aug 11 02:26:50 264138 [4C8AF700] 0x80 → Entering DISCOVERING state

Entering STANDBY state

Aug 11 02:26:50 276494 [4C8AF700] 0x02 → osm_vendor_bind: Mgmt class 0x81 binding to port GUID 0xf4521403002abf01

Aug 11 02:26:50 340494 [4C8AF700] 0x02 → osm_vendor_bind: Mgmt class 0x03 binding to port GUID 0xf4521403002abf01

Aug 11 02:26:50 340559 [4C8AF700] 0x02 → osm_vendor_bind: Mgmt class 0x04 binding to port GUID 0xf4521403002abf01

Aug 11 02:26:50 340628 [4C8AF700] 0x02 → osm_vendor_bind: Mgmt class 0x21 binding to port GUID 0xf4521403002abf01

Aug 11 02:26:50 340700 [4C8AF700] 0x02 → osm_opensm_bind: Setting IS_SM on port 0xf4521403002abf01

Aug 11 02:26:50 368521 [45AA2700] 0x80 → Entering STANDBY state

Aug 11 02:31:50 370767 [43C9F700] 0x01 → log_rcv_cb_error: ERR 3111: Received MAD with error status = 0xC

SubnGetResp(SMInfo), attr_mod 0x0, TID 0x12f9

Initial path: 0,1,23 Return path: 0,4,1

Aug 11 02:32:00 370839 [43C9F700] 0x01 → log_rcv_cb_error: ERR 3111: Received MAD with error status = 0xC

SubnGetResp(SMInfo), attr_mod 0x0, TID 0x12fa

Initial path: 0,1,23 Return path: 0,4,1

Aug 11 02:32:10 370884 [43C9F700] 0x01 → log_rcv_cb_error: ERR 3111: Received MAD with error status = 0xC

SubnGetResp(SMInfo), attr_mod 0x0, TID 0x12fb

Initial path: 0,1,23 Return path: 0,4,1

Entering DISCOVERING state

It shows only two status STANDBY or DISCOVERY.What should I need to do make SUBNET UP status??

Do I requires to configure opensm manually?? But file /etc/sysconfig/opensm is also missing.

osmtest also results in a failure. :

Command Line Arguments

Done with args

Flow = All Validations

Aug 11 03:28:40 177806 [53627700] 0x7f → Setting log level to: 0x03

Aug 11 03:28:40 177956 [53627700] 0x02 → osm_vendor_init: 1000 pending umads specified

using default guid 0xf4521403002abf01

Aug 11 03:28:40 220980 [53627700] 0x02 → osm_vendor_bind: Mgmt class 0x03 binding to port GUID 0xf4521403002abf01

Aug 11 03:28:40 285077 [53627700] 0x02 → osmtest_validate_sa_class_port_info:


SA Class Port Info:

base_ver:1

class_ver:2

cap_mask:0x2602

cap_mask2:0x3E8

resp_time_val:0x10


Aug 11 03:28:40 285105 [53627700] 0x01 → osmtest_create_db: ERR 0130: Unable to open inventory file (osmtest.dat)

Aug 11 03:28:40 285114 [53627700] 0x01 → osmtest_run: ERR 0145: Database creation failed (IB_ERROR)

OSMTEST: TEST “All Validations” FAIL

Is anything went wrong in my installation??

Don’t know if this is still an issue but some comments which may help:

This OpenSM instance went into STANDBY mode which means there is other higher priority or same priority with lower GUID SM active on subnet.

The MAD error status messages (SMInfo with status 0xc) are due to this STANDBY polling the MASTER SM and that node is rejecting that query for some unknown reason. That master SM is at direct route path of 0,1,23 from stsandby machine which means out port 1 of local machine to next hop switch and then out port 23 there. I would do smpquery -D nodeinfo 0,1,23 to see what node is there.

Also, I think there is more recent MLNX OFED OpenSM available now. You might want to try that.

osmtest failure is due to not having create inventory file first. That is done with something like osmtest -f c.