opensmd fails to start on connectx-3 card

my opensmd daemon will not start, instead it logs this error

Jul 25 16:54:59 606856 [9957B700] 0x80 → OpenSM 4.0.0.MLNX20130311.156f5c0

Entering DISCOVERING state

Jul 25 16:54:59 607626 [9957B700] 0x02 → osm_vendor_init: 1000 pending umads specified

Jul 25 16:54:59 607706 [9957B700] 0x80 → Entering DISCOVERING state

Jul 25 16:54:59 607812 [9957B700] 0x02 → osm_vendor_bind: Mgmt class 0x81 binding to port GUID 0x202c9fffe242ce1

Jul 25 16:54:59 614548 [9957B700] 0x01 → osm_vendor_bind: ERR 5426: Unable to register class 129 version 1

Jul 25 16:54:59 614562 [9957B700] 0x01 → osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed

Jul 25 16:54:59 614590 [9957B700] 0x01 → osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR)

Error from osm_opensm_bind (0x2A)

Perhaps another instance of OpenSM is already running

Jul 25 16:54:59 614609 [9957B700] 0x01 → perfmgr_mad_unbind: ERR 5405: No previous bind

Jul 25 16:54:59 614615 [9957B700] 0x01 → osm_congestion_control_shutdown: ERR C108: No previous bind

Jul 25 16:54:59 614621 [9957B700] 0x01 → osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind

Exiting SM

My HCA is also connected to an Ethernet switch and I was able to run the connectx_port_config command. Can you provide the output of:

mstflint -d 05:00.0 q

I want to check your HCA’s PSID.

MLNX_OFED_LINUX-2.0-2.0.5 (OFED-2.0-2.0.5):

the card itself is connected to an ethernet switch, for RoCE, however this shouldnt preclude IB mode should it?

05:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]

Subsystem: Mellanox Technologies Device 0049

both ports are in ETH mode, however changing to ib mode fails with this error

ConnectX PCI devices :

|----------------------------|

| 1 0000:05:00.0 |

|----------------------------|

Before port change:

eth

eth

|----------------------------|

| Possible port modes: |

| 1: Infiniband |

| 2: Ethernet |

| 3: AutoSense |

|----------------------------|

Select mode for port 1 (1,2,3): 1

Select mode for port 2 (1,2,3): 1

WARNING: Illegal port configuration attempted,

Please view dmesg for details.

my dmesg output is as follows:

mlx4_core 0000:05:00.0: Only same port types supported on this HCA, aborting.

mlx4_core 0000:05:00.0: Only same port types supported on this HCA, aborting.

mlx4_core 0000:05:00.0: Requested port type for port 1 is not supported on this HCA

mlx4_core 0000:05:00.0: Requested port type for port 1 is not supported on this HCA

this is despite having selected infiniband for both ports. firmware is as follows:

CA ‘mlx4_0’

CA type: MT4099

Number of ports: 2

Firmware version: 2.30.3000

Hardware version: 1

Node GUID: 0x0002c90300242ce0

System image GUID: 0x0002c90300242ce0

Branko,

Of course, here is the output.

Image type:

ConnectXFW Version: 2.30.3000Rom Info: type=PXE version=3.4.142 devid=4099 proto=ETHDevice ID: 4099Description: Node Port1 Port2 Sys imageGUIDs: 0002c90300056aa8 0002c90300056aa9 0002c90300056aaa 0002c90300056aab MACs: 0002c9242ce0 0002c9242ce1VSD: PSID: MT_1080120023

interesting. these cards are just EN, however IB support in the VPI line is required to achieve RoCE?

hmmm…

I just tried the connectx_port_config command on my HCA and it worked as shown below.

what version of OFED are you running? You can type…

[root@localhost ~]# ofed_info | head -1

MLNX_OFED_LINUX-2.0-2.0.5 (OFED-2.0-2.0.5)

lspci -v | grep Mell

02:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

Subsystem: Mellanox Technologies Device 0050

[root@localhost ~]# connectx_port_config

ConnectX PCI devices :

|----------------------------|

| 1 0000:02:00.0 |

|----------------------------|

Before port change:

eth

eth

|----------------------------|

| Possible port modes: |

| 1: Infiniband |

| 2: Ethernet |

| 3: AutoSense |

|----------------------------|

Select mode for port 1 (1,2,3): 1

Select mode for port 2 (1,2,3): 1

After port change:

ib

ib

Hi, note that opensm can be run as a daemon – are you sure there are no other instances running on the system? If no, then you may be missing some of the underlying userspace libraries needed for the SM to function correctly?

Which ConnectX-3 card are you using? Can you run:

lspci -v | grep Mellanox

Also, are the port in InfiniBand mode? Can you run:

connectx_port_config

Also, do you have the correct (and latest) firmware version installed? Can you run:

ibstat

Well, I think that’s unfortunately the issue. A PSID of MT_1080120023 means you have a ConnectX-3 Ethernet NIC (OPN of MCX312A-XCB) which I believe can only be configured for Ethernet and not InfiniBand. If you had a ConnectX-3 VPI Port Adapter Card (which I was testing on), with an OPN of MCX353/4… then you’d be able to configure it as IB. Do you happen to have any “VPI” cards or just your “EN” (Ethernet) card?

You should still be able to run RoCE on your Ethernet NIC. As long as you have OFED installed, which you do, you should be good to go. Here’s a link to some documents that can assist you with RoCE Mellanox Products: RDMA over Converged Ethernet (RoCE) - An Efficient, Low-cost, Zero Copy Implementation http://www.mellanox.com/page/products_dyn?product_family=79

Let us know how it goes!