Hi Colin,
thanks for your reply.
However, I didn’t manage to setup an appropriate configuration. Therefore I would like to start from scratch.
As said, there are 2 identical servers with identical NIC.
I have installed InfiniBand subnet manager (OpenSM) on both servers. This service is running in Master state on ld4465, and in Standby state on ld4464; but and I think this is the expected behavior.
ld4465:~ # systemctl status opensmd.service
● opensmd.service - LSB: Manage OpenSM
Loaded: loaded (/etc/init.d/opensmd; bad; vendor preset: disabled)
Active: active (running) since Thu 2017-07-27 17:57:46 CEST; 15h ago
Docs: man:systemd-sysv-generator(8)
Process: 8291 ExecStart=/etc/init.d/opensmd start (code=exited, status=0/SUCCESS)
Tasks: 122 (limit: 512)
CGroup: /system.slice/opensmd.service
└─8370 /usr/sbin/opensm --daemon --pidfile /var/run/opensm.pid
Jul 27 17:57:46 ld4465 opensmd[8291]: /etc/sysconfig/opensm: line 184: port_profile_switch_nodes:…ound
Jul 27 17:57:46 ld4465 opensmd[8291]: /etc/sysconfig/opensm: line 187: syntax error near unexpect…ull’
Jul 27 17:57:46 ld4465 opensmd[8291]: /etc/sysconfig/opensm: line 187: `port_prof_ignore_file (null)’
Jul 27 17:57:46 ld4465 OpenSM[8359]: Loading Cached Option:guid = 0x248a070300dc25c1
Jul 27 17:57:46 ld4465 opensmd[8291]: Starting opensm: done…done
Jul 27 17:57:46 ld4465 OpenSM[8370]: /var/log/opensm.log log file opened
Jul 27 17:57:46 ld4465 systemd[1]: Started LSB: Manage OpenSM.
Jul 27 17:57:46 ld4465 OpenSM[8370]: OpenSM 4.8.1.MLNX20170118.1a8ad26
Jul 27 17:57:46 ld4465 OpenSM[8370]: Entering DISCOVERING state
Jul 27 17:57:46 ld4465 OpenSM[8370]: Entering MASTER state
ld4464:~ # systemctl status -l opensm.service
● opensmd.service - LSB: Manage OpenSM
Loaded: loaded (/etc/init.d/opensmd; bad; vendor preset: disabled)
Active: active (running) since Fri 2017-07-28 10:10:06 CEST; 3s ago
Docs: man:systemd-sysv-generator(8)
Process: 41211 ExecStop=/etc/init.d/opensmd stop (code=exited, status=0/SUCCESS)
Process: 41260 ExecStart=/etc/init.d/opensmd start (code=exited, status=0/SUCCESS)
Tasks: 122 (limit: 512)
CGroup: /system.slice/opensmd.service
└─41311 /usr/sbin/opensm --daemon --pidfile /var/run/opensm.pid
Jul 28 10:10:06 ld4464 opensmd[41260]: /etc/sysconfig/opensm: line 184: port_profile_switch_nodes: command not found
Jul 28 10:10:06 ld4464 opensmd[41260]: /etc/sysconfig/opensm: line 187: syntax error near unexpected token `null’
Jul 28 10:10:06 ld4464 opensmd[41260]: /etc/sysconfig/opensm: line 187: `port_prof_ignore_file (null)’
Jul 28 10:10:06 ld4464 OpenSM[41309]: Loading Cached Option:guid = 0x248a070300dc2871
Jul 28 10:10:06 ld4464 opensmd[41260]: Starting opensm: done…done
Jul 28 10:10:06 ld4464 OpenSM[41311]: /var/log/opensm.log log file opened
Jul 28 10:10:06 ld4464 OpenSM[41311]: OpenSM 4.8.1.MLNX20170118.1a8ad26
Jul 28 10:10:06 ld4464 systemd[1]: Started LSB: Manage OpenSM.
Jul 28 10:10:06 ld4464 OpenSM[41311]: Entering DISCOVERING state
Jul 28 10:10:06 ld4464 OpenSM[41311]: Entering STANDBY state
The relevant lines in /etc/sysconfing/opensm reported with error are:
ROUTING OPTIONS
If TRUE count switches as link subscriptions
port_profile_switch_nodes FALSE
Name of file with port guids to be ignored by port profiling
port_prof_ignore_file (null)
OK, next I share with you the info of current FW etc. ld4465 for simplicity (everything is identical on ld4464).
ld4465:~ # connectx_port_config -s
Port configuration for PCI device: 0000:0b:00.0 is:
ib
eth
Port configuration for PCI device: 0000:86:00.0 is:
ib
eth
ld4465:~ # mstflint -d 0b:00.0 q
Image type: FS2
FW Version: 2.40.5000
FW Release Date: 27.10.2016
Product Version: 02.40.50.00
Rom Info: type=UEFI version=14.11.31
type=PXE version=3.4.746 devid=4099
Device ID: 4099
Description: Node Port1 Port2 Sys image
GUIDs: 248a070300dc25c0 248a070300dc25c1 248a070300dc25c2 248a070300dc25c3
MACs: 248a07dc25c1 248a07dc25c2
VSD:
PSID: IBM1090111019
ld4465:~ # mstflint -d 86:00.0 q
Image type: FS2
FW Version: 2.40.5000
FW Release Date: 27.10.2016
Product Version: 02.40.50.00
Rom Info: type=UEFI version=14.11.31
type=PXE version=3.4.746 devid=4099
Device ID: 4099
Description: Node Port1 Port2 Sys image
GUIDs: 248a070300dc2810 248a070300dc2811 248a070300dc2812 248a070300dc2813
MACs: 248a07dc2811 248a07dc2812
VSD:
PSID: IBM1090111019
In this article Data Center I found some info that a second instance of the subnet manager should be active for cards with two ports.
ld4465:~ # opensm -o
OpenSM 4.8.1.MLNX20170118.1a8ad26
Reading Cached Option File: /etc/opensm/opensm.conf
Loading Cached Option:guid = 0x248a070300dc25c1
Command Line Arguments:
Run Once
Log File: /var/log/opensm.log
OpenSM 4.8.1.MLNX20170118.1a8ad26
Entering DISCOVERING state
Error from osm_opensm_bind (0x2A)
Perhaps another instance of OpenSM is already running
Exiting SM
So, I would like to understand first the correct configuration for OpenSM, and then continue with the config of IB.
THX