IB card ports are down or polling

I have two cards installed now for loopback tests. I see each card has two slots, and under lspci, I see each card has ethernet and infiniband, total of 2 ethernet + 2 IB.

[nonroot@localhost ~]$ ibstatus | grep -v gid
Infiniband device 'mlx5_0' port 1 status:
	base lid:	 0x0
	sm lid:		 0x0
	state:		 1: DOWN
	phys state:	 3: Disabled
	rate:		 40 Gb/sec (4X QDR)
	link_layer:	 Ethernet

Infiniband device 'mlx5_1' port 1 status:
	base lid:	 0xffff
	sm lid:		 0x0
	state:		 1: DOWN
	phys state:	 3: Disabled
	rate:		 10 Gb/sec (4X SDR)
	link_layer:	 InfiniBand

Infiniband device 'mlx5_2' port 1 status:
	base lid:	 0x0
	sm lid:		 0x0
	state:		 1: DOWN
	phys state:	 3: Disabled
	rate:		 40 Gb/sec (4X QDR)
	link_layer:	 Ethernet

Infiniband device 'mlx5_3' port 1 status:
	base lid:	 0xffff
	sm lid:		 0x0
	state:		 1: DOWN
	phys state:	 2: Polling
	rate:		 10 Gb/sec (4X SDR)
	link_layer:	 InfiniBand

I opensm started but getting following:

[nonroot@localhost ~]$ sudo opensm
-------------------------------------------------
OpenSM 3.3.24
 Reading Cached Option File: /etc/rdma/opensm.conf
Command Line Arguments:
 Log File: /var/log/opensm.log
-------------------------------------------------
OpenSM 3.3.24

Using default GUID 0xba599ffffe431320
Entering DISCOVERING state


Error from osm_opensm_bind (0x2A)
Perhaps another instance of OpenSM is already running
Exiting SM

/var/log/opensm.log:

[nonroot@localhost ~]$ tail -n 100 /var/log/opensm.log
Apr 03 18:03:44 345945 [46295740] 0x03 -> OpenSM 3.3.24
Apr 03 18:03:44 345981 [46295740] 0x80 -> OpenSM 3.3.24
Apr 03 18:03:44 347764 [46295740] 0x02 -> osm_vendor_init: 1000 pending umads specified
Apr 03 18:03:44 355788 [46295740] 0x80 -> Entering DISCOVERING state
Apr 03 18:03:44 355872 [46295740] 0x02 -> osm_vendor_bind: Mgmt class 0x81 binding to port GUID 0xe42a1fffe4a6b00
Apr 03 18:03:44 358525 [46295740] 0x01 -> osm_vendor_bind: ERR 5426: Unable to register class 129 version 1
Apr 03 18:03:44 358532 [46295740] 0x01 -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed
Apr 03 18:03:44 358535 [46295740] 0x01 -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR)
Apr 03 18:03:44 358543 [46295740] 0x01 -> perfmgr_mad_unbind: ERR 5405: No previous bind
Apr 03 18:03:44 358545 [46295740] 0x01 -> osm_congestion_control_shutdown: ERR C108: No previous bind
Apr 03 18:03:44 358576 [46295740] 0x01 -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind
Apr 03 18:03:44 359545 [46295740] 0x80 -> Exiting SM
Apr 04 19:09:23 449107 [675DD740] 0x03 -> OpenSM 3.3.24
Apr 04 19:09:23 462489 [675DD740] 0x80 -> OpenSM 3.3.24
Apr 04 19:09:23 472102 [675DD740] 0x02 -> osm_vendor_init: 1000 pending umads specified
Apr 04 19:09:23 487144 [675DD740] 0x80 -> Entering DISCOVERING state
Apr 04 19:09:23 487227 [675DD740] 0x02 -> osm_vendor_bind: Mgmt class 0x81 binding to port GUID 0xba599ffffe431320
Apr 04 19:09:23 489848 [675DD740] 0x01 -> osm_vendor_bind: ERR 5426: Unable to register class 129 version 1
Apr 04 19:09:23 489855 [675DD740] 0x01 -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed
Apr 04 19:09:23 489857 [675DD740] 0x01 -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR)
Apr 04 19:09:23 489866 [675DD740] 0x01 -> perfmgr_mad_unbind: ERR 5405: No previous bind
Apr 04 19:09:23 489868 [675DD740] 0x01 -> osm_congestion_control_shutdown: ERR C108: No previous bind
Apr 04 19:09:23 489948 [675DD740] 0x01 -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind
Apr 04 19:09:23 490673 [675DD740] 0x80 -> Exiting SM
Apr 05 15:44:26 171033 [9BAAF740] 0x03 -> OpenSM 3.3.24
Apr 05 15:44:26 186285 [9BAAF740] 0x80 -> OpenSM 3.3.24
Apr 05 15:44:26 195940 [9BAAF740] 0x02 -> osm_vendor_init: 1000 pending umads specified
Apr 05 15:44:26 213595 [9BAAF740] 0x80 -> Entering DISCOVERING state
Apr 05 15:44:26 213696 [9BAAF740] 0x02 -> osm_vendor_bind: Mgmt class 0x81 binding to port GUID 0xba599ffffe431320
Apr 05 15:44:26 216587 [9BAAF740] 0x01 -> osm_vendor_bind: ERR 5426: Unable to register class 129 version 1
Apr 05 15:44:26 216594 [9BAAF740] 0x01 -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed
Apr 05 15:44:26 216596 [9BAAF740] 0x01 -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR)
Apr 05 15:44:26 216603 [9BAAF740] 0x01 -> perfmgr_mad_unbind: ERR 5405: No previous bind
Apr 05 15:44:26 216605 [9BAAF740] 0x01 -> osm_congestion_control_shutdown: ERR C108: No previous bind
Apr 05 15:44:26 216718 [9BAAF740] 0x01 -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind
Apr 05 15:44:26 217504 [9BAAF740] 0x80 -> Exiting SM
Apr 06 14:13:34 274531 [C8793740] 0x03 -> OpenSM 3.3.24
Apr 06 14:13:34 274566 [C8793740] 0x80 -> OpenSM 3.3.24
Apr 06 14:13:34 276510 [C8793740] 0x02 -> osm_vendor_init: 1000 pending umads specified
Apr 06 14:13:34 294341 [C8793740] 0x80 -> Entering DISCOVERING state
Apr 06 14:13:34 294421 [C8793740] 0x02 -> osm_vendor_bind: Mgmt class 0x81 binding to port GUID 0xba599ffffe431320
Apr 06 14:13:34 297427 [C8793740] 0x01 -> osm_vendor_bind: ERR 5426: Unable to register class 129 version 1
Apr 06 14:13:34 297434 [C8793740] 0x01 -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed
Apr 06 14:13:34 297437 [C8793740] 0x01 -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR)
Apr 06 14:13:34 297445 [C8793740] 0x01 -> perfmgr_mad_unbind: ERR 5405: No previous bind
Apr 06 14:13:34 297447 [C8793740] 0x01 -> osm_congestion_control_shutdown: ERR C108: No previous bind
Apr 06 14:13:34 297549 [C8793740] 0x01 -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind
Apr 06 14:13:34 298400 [C8793740] 0x80 -> Exiting SM

Any idea?

I set guid in /etc/rdma/opensm.conf from 0x0000 to one of the ports and now getting following on /var/log/opensm.log

Apr 06 14:45:20 676927 [54D74700] 0x01 -> pi_rcv_check_and_fix_lid: ERR 0F04: Got invalid base LID 65535 from the network. Corrected to 0
Apr 06 14:45:30 677204 [54573700] 0x01 -> pi_rcv_check_and_fix_lid: ERR 0F04: Got invalid base LID 65535 from the network. Corrected to 0
Apr 06 14:45:40 677221 [5156D700] 0x01 -> pi_rcv_check_and_fix_lid: ERR 0F04: Got invalid base LID 65535 from the network. Corrected to 0
Apr 06 14:45:50 677288 [50D6C700] 0x01 -> pi_rcv_check_and_fix_lid: ERR 0F04: Got invalid base LID 65535 from the network. Corrected to 0
Apr 06 14:45:52 822209 [6C9F3740] 0x80 -> Exiting SM

ibportstate:

[nonroot@localhost ~]$ ibportstate -C mlx5_3 -P 1 1 1 query
ibwarn: [535020] mad_rpc_open_port: can't open UMAD port (mlx5_3:1)
ibportstate: iberror: failed: Failed to open 'mlx5_3' port '1'
[nonroot@localhost ~]$ sminfo
ibwarn: [535203] mad_rpc_open_port: can't open UMAD port ((null):0)
sminfo: iberror: failed: Failed to open '(null)' port '0'

Web search does reveal someone suggested connecting cable and port should be up by itself.
Configuration is i have both cards in same system and connected 2nd port of each card by a IB cable (direct connection and for ethernet needs cross over in this situation) however from what i found, cross over cable is not applicable to IB and any cable should work either by switch or direct connection. Is that true?

I have encountered the same problem as you, did you solve this problem later, if yes, I hope you can give me some help