Hello: I have three 12-core server boxes directly connected with ConnectX-2 Mellanox cards. The head node has a dual port MNPH29C-XTR card with each port connected to a separate compute node (direct connect; no IB switch). Each compute node has a single-port MNP* card. The two ports on the head node are configured with as separate subnets, which I believe is the correct way to set up a multiport fabric, and all hardware seems to be working correctly. I can’t seem to get OpenSM (the subnet manager) to “control/map” both ports (subnets) on the head node at the same time. I need this so that OpenMPI can use both compute nodes in one calculation process. I have OpenSM running and it detects the hardware, but claims it can only work with one subnet at a time (as I understand OpenMPI, the first compute node needs to be able to access both the head node and the other compute node for OpenMPI to work).
I can carry out a calculation using head + first compute node or else head + second compute node, but I can’t seem to get head + first + second nodes all participating in a calculation. Is it wrong to have both ports on the head node defined to separate subnets?
Can anyone help me understand how to configure this kind of setup? I thought this was the purpose of OpenSM - i.e., to map all nodes and to thereby bypass the need for a hardware IB switch.
Thank You Very Much!!! (hoping for help)