Opensubnetmanager (openSM) for a switchless setup

Hello: I have three 12-core server boxes directly connected with ConnectX-2 Mellanox cards. The head node has a dual port MNPH29C-XTR card with each port connected to a separate compute node (direct connect; no IB switch). Each compute node has a single-port MNP* card. The two ports on the head node are configured with as separate subnets, which I believe is the correct way to set up a multiport fabric, and all hardware seems to be working correctly. I can’t seem to get OpenSM (the subnet manager) to “control/map” both ports (subnets) on the head node at the same time. I need this so that OpenMPI can use both compute nodes in one calculation process. I have OpenSM running and it detects the hardware, but claims it can only work with one subnet at a time (as I understand OpenMPI, the first compute node needs to be able to access both the head node and the other compute node for OpenMPI to work).

I can carry out a calculation using head + first compute node or else head + second compute node, but I can’t seem to get head + first + second nodes all participating in a calculation. Is it wrong to have both ports on the head node defined to separate subnets?

Can anyone help me understand how to configure this kind of setup? I thought this was the purpose of OpenSM - i.e., to map all nodes and to thereby bypass the need for a hardware IB switch.

Thank You Very Much!!! (hoping for help)

PattiMichelle

Your configuration is 2 subnets, each consisting of back to back HCAs. You need at least 1 OpenSM on each subnet. They can be run on the head node (a little more complex configuration) or on the other compute nodes. You need to configure each OpenSM with the subnet prefix and the port GUID it is to run on. Also, to keep everything separate, there are 2 environment variables: OSM_CACHE_DIR and OSM_TMP_DIR. See [ewg] Opensm for dual GUID for more on this.