I have two cards installed now for loopback tests. I see each card has two slots, and under lspci, I see each card has ethernet and infiniband, total of 2 ethernet + 2 IB.
[nonroot@localhost ~]$ ibstatus | grep -v gid
Infiniband device 'mlx5_0' port 1 status:
base lid: 0x0
sm lid: 0x0
state: 1: DOWN
phys state: 3: Disabled
rate: 40 Gb/sec (4X QDR)
link_layer: Ethernet
Infiniband device 'mlx5_1' port 1 status:
base lid: 0xffff
sm lid: 0x0
state: 1: DOWN
phys state: 3: Disabled
rate: 10 Gb/sec (4X SDR)
link_layer: InfiniBand
Infiniband device 'mlx5_2' port 1 status:
base lid: 0x0
sm lid: 0x0
state: 1: DOWN
phys state: 3: Disabled
rate: 40 Gb/sec (4X QDR)
link_layer: Ethernet
Infiniband device 'mlx5_3' port 1 status:
base lid: 0xffff
sm lid: 0x0
state: 1: DOWN
phys state: 2: Polling
rate: 10 Gb/sec (4X SDR)
link_layer: InfiniBand
I opensm started but getting following:
[nonroot@localhost ~]$ sudo opensm
-------------------------------------------------
OpenSM 3.3.24
Reading Cached Option File: /etc/rdma/opensm.conf
Command Line Arguments:
Log File: /var/log/opensm.log
-------------------------------------------------
OpenSM 3.3.24
Using default GUID 0xba599ffffe431320
Entering DISCOVERING state
Error from osm_opensm_bind (0x2A)
Perhaps another instance of OpenSM is already running
Exiting SM
/var/log/opensm.log:
[nonroot@localhost ~]$ tail -n 100 /var/log/opensm.log
Apr 03 18:03:44 345945 [46295740] 0x03 -> OpenSM 3.3.24
Apr 03 18:03:44 345981 [46295740] 0x80 -> OpenSM 3.3.24
Apr 03 18:03:44 347764 [46295740] 0x02 -> osm_vendor_init: 1000 pending umads specified
Apr 03 18:03:44 355788 [46295740] 0x80 -> Entering DISCOVERING state
Apr 03 18:03:44 355872 [46295740] 0x02 -> osm_vendor_bind: Mgmt class 0x81 binding to port GUID 0xe42a1fffe4a6b00
Apr 03 18:03:44 358525 [46295740] 0x01 -> osm_vendor_bind: ERR 5426: Unable to register class 129 version 1
Apr 03 18:03:44 358532 [46295740] 0x01 -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed
Apr 03 18:03:44 358535 [46295740] 0x01 -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR)
Apr 03 18:03:44 358543 [46295740] 0x01 -> perfmgr_mad_unbind: ERR 5405: No previous bind
Apr 03 18:03:44 358545 [46295740] 0x01 -> osm_congestion_control_shutdown: ERR C108: No previous bind
Apr 03 18:03:44 358576 [46295740] 0x01 -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind
Apr 03 18:03:44 359545 [46295740] 0x80 -> Exiting SM
Apr 04 19:09:23 449107 [675DD740] 0x03 -> OpenSM 3.3.24
Apr 04 19:09:23 462489 [675DD740] 0x80 -> OpenSM 3.3.24
Apr 04 19:09:23 472102 [675DD740] 0x02 -> osm_vendor_init: 1000 pending umads specified
Apr 04 19:09:23 487144 [675DD740] 0x80 -> Entering DISCOVERING state
Apr 04 19:09:23 487227 [675DD740] 0x02 -> osm_vendor_bind: Mgmt class 0x81 binding to port GUID 0xba599ffffe431320
Apr 04 19:09:23 489848 [675DD740] 0x01 -> osm_vendor_bind: ERR 5426: Unable to register class 129 version 1
Apr 04 19:09:23 489855 [675DD740] 0x01 -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed
Apr 04 19:09:23 489857 [675DD740] 0x01 -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR)
Apr 04 19:09:23 489866 [675DD740] 0x01 -> perfmgr_mad_unbind: ERR 5405: No previous bind
Apr 04 19:09:23 489868 [675DD740] 0x01 -> osm_congestion_control_shutdown: ERR C108: No previous bind
Apr 04 19:09:23 489948 [675DD740] 0x01 -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind
Apr 04 19:09:23 490673 [675DD740] 0x80 -> Exiting SM
Apr 05 15:44:26 171033 [9BAAF740] 0x03 -> OpenSM 3.3.24
Apr 05 15:44:26 186285 [9BAAF740] 0x80 -> OpenSM 3.3.24
Apr 05 15:44:26 195940 [9BAAF740] 0x02 -> osm_vendor_init: 1000 pending umads specified
Apr 05 15:44:26 213595 [9BAAF740] 0x80 -> Entering DISCOVERING state
Apr 05 15:44:26 213696 [9BAAF740] 0x02 -> osm_vendor_bind: Mgmt class 0x81 binding to port GUID 0xba599ffffe431320
Apr 05 15:44:26 216587 [9BAAF740] 0x01 -> osm_vendor_bind: ERR 5426: Unable to register class 129 version 1
Apr 05 15:44:26 216594 [9BAAF740] 0x01 -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed
Apr 05 15:44:26 216596 [9BAAF740] 0x01 -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR)
Apr 05 15:44:26 216603 [9BAAF740] 0x01 -> perfmgr_mad_unbind: ERR 5405: No previous bind
Apr 05 15:44:26 216605 [9BAAF740] 0x01 -> osm_congestion_control_shutdown: ERR C108: No previous bind
Apr 05 15:44:26 216718 [9BAAF740] 0x01 -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind
Apr 05 15:44:26 217504 [9BAAF740] 0x80 -> Exiting SM
Apr 06 14:13:34 274531 [C8793740] 0x03 -> OpenSM 3.3.24
Apr 06 14:13:34 274566 [C8793740] 0x80 -> OpenSM 3.3.24
Apr 06 14:13:34 276510 [C8793740] 0x02 -> osm_vendor_init: 1000 pending umads specified
Apr 06 14:13:34 294341 [C8793740] 0x80 -> Entering DISCOVERING state
Apr 06 14:13:34 294421 [C8793740] 0x02 -> osm_vendor_bind: Mgmt class 0x81 binding to port GUID 0xba599ffffe431320
Apr 06 14:13:34 297427 [C8793740] 0x01 -> osm_vendor_bind: ERR 5426: Unable to register class 129 version 1
Apr 06 14:13:34 297434 [C8793740] 0x01 -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed
Apr 06 14:13:34 297437 [C8793740] 0x01 -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR)
Apr 06 14:13:34 297445 [C8793740] 0x01 -> perfmgr_mad_unbind: ERR 5405: No previous bind
Apr 06 14:13:34 297447 [C8793740] 0x01 -> osm_congestion_control_shutdown: ERR C108: No previous bind
Apr 06 14:13:34 297549 [C8793740] 0x01 -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind
Apr 06 14:13:34 298400 [C8793740] 0x80 -> Exiting SM
Any idea?