I had good progress following answers here! Thank you. I created a opensm conf file as suggested. The firmware is now updated to the latest 2.36.5000.
The latest Mlnx OFED 4.4 had issues, actually it seemed to install OK, but no ib commands worked. I uninstalled it and reinstalled MLNX OFED 4.2-1.2.0.0, the last compatible version of RHEL/CentOS7.4. The version 3.4 is incompatible with my version of CentOS7 on Rocks Cluster 7.
I have to start opensm from terminal, is there a way to start it on boot perhaps from conf file? Another question is regarding GUID, when I replace default GUID, should I use active port GUID or node? I tried both. My output is below, appreciate the help! I also notice ib0 is not green using # nmcli connection show. This is now a network issue perhaps?
[root@headnode ~]# mlxfwmanager --online -u -d 07:00.0
Querying Mellanox devices firmware …
Device #1:
Device Type: ConnectX3
Part Number: 0J05YT_Bx
Description: MCX380A-QCAA ConnectX-3 Dual-port QDR Mezzanine I/O Card
PSID: DEL0A10210018
PCI Device Name: 07:00.0
Port1 GUID: 0002c90300f932f1
Port2 GUID: 0002c90300f932f2
Versions: Current Available
FW 2.36.5000 N/A
PXE 3.4.0718 N/A
Status: No matching image found
[root@headnode ~]# /etc/init.d/opensmd status
opensm is stopped
[root@headnode ~]# /etc/init.d/opensmd start
Starting opensmd (via systemctl): [ OK ]
[root@headnode ~]# ibstat
CA ‘mlx4_0’
CA type: MT4099
Number of ports: 2
Firmware version: 2.36.5000
Hardware version: 1
Node GUID: 0x0002c90300f932f0
System image GUID: 0x0002c90300f932f3
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 1
LMC: 0
SM lid: 1
Capability mask: 0x0251486a
Port GUID: 0x0002c90300f932f1
Link layer: InfiniBand
Port 2:
State: Down
Physical state: Polling
Rate: 10
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02514868
Port GUID: 0x0002c90300f932f2
Link layer: InfiniBand
[root@headnode ~]# ibhosts
Ca : 0x0002c90300f932f0 ports 2 “headnode HCA-1”
[root@headnode ~]# hca_self_test.ofed
---- Performing Adapter Device Self Test ----
Number of CAs Detected … 1
PCI Device Check … PASS
Kernel Arch … x86_64
Host Driver Version … MLNX_OFED_LINUX-4.2-1.2.0.0 (OFED-4.2-1.2.0): 3.10.0-693.el7.x86_64
Host Driver RPM Check … PASS
Firmware on CA #0 HCA … v2.36.5000
Host Driver Initialization … PASS
Number of CA Ports Active … 1
Port State of Port #1 on CA #0 (HCA)… UP 4X QDR (InfiniBand)
Port State of Port #2 on CA #0 (HCA)… DOWN (InfiniBand)
Error Counter Check on CA #0 (HCA)… PASS
Kernel Syslog Check … PASS
Node GUID on CA #0 (HCA) … 00:02:c9:03:00:f9:32:f0
------------------ DONE ---------------------
[root@headnode ~]# ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.36.5000
node_guid: 0002:c903:00f9:32f0
sys_image_guid: 0002:c903:00f9:32f3
vendor_id: 0x02c9
vendor_part_id: 4099
hw_ver: 0x1
board_id: DEL0A10210018
phys_port_cnt: 2
Device ports:
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid: 1
port_lmc: 0x00
link_layer: InfiniBand
port: 2
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: InfiniBand
[root@headnode ~]# nmcli connection show
NAME UUID TYPE DEVICE
Bridge em1 1dad842d-1912-ef5a-a43a-bc238fb267e7 bridge em1
Bridge em2 0578038a-64e9-a2fd-0a28-e4cd0b553930 bridge em2
System pem1 c19149d5-4e53-4636-b52a-81d213a8a3cb 802-3-ethernet pem1
Wired connection 1 13bddd27-08a5-45b5-bd3d-82081536eedd 802-3-ethernet pem2
virbr0 dc113ed9-ff0e-45ae-85e1-3cd724eea69f bridge virbr0
System pem2 7379072d-ea75-335e-2486-0afa3cd10c77 802-3-ethernet –
ib0 6b15b69c-4a0b-4457-9db3-183140b4cbe4 infiniband –
ib1 a1fe6e6b-9dc1-4e47-9478-2f0c7ea6b1d3 infiniband –